Text graph

Topic Tools

Papers published on a yearly basis

Papers

Book•

Automatic text processing: the transformation, analysis, and retrieval of information by computer

[...]

Gerard Salton¹•Institutions (1)

Cornell University¹

3 Jan 1989

3,866 citations

Journal Article•10.1609/AAAI.V33I01.33017370•

Graph Convolutional Networks for Text Classification

[...]

Liang Yao¹, Chengsheng Mao¹, Yuan Luo¹•Institutions (1)

Northwestern University¹

17 Jul 2019

TL;DR: Zhang et al. as discussed by the authors proposed a Text Graph Convolutional Network (Text GCN) for text classification, which jointly learns the embeddings for both words and documents, as supervised by the known class labels for documents.

...read moreread less

Abstract: Text classification is an important and classical problem in natural language processing. There have been a number of studies that applied convolutional neural networks (convolution on regular grid, e.g., sequence) to classification. However, only a limited number of studies have explored the more flexible graph convolutional neural networks (convolution on non-grid, e.g., arbitrary graph) for the task. In this work, we propose to use graph convolutional networks for text classification. We build a single text graph for a corpus based on word co-occurrence and document word relations, then learn a Text Graph Convolutional Network (Text GCN) for the corpus. Our Text GCN is initialized with one-hot representation for word and document, it then jointly learns the embeddings for both words and documents, as supervised by the known class labels for documents. Our experimental results on multiple benchmark datasets demonstrate that a vanilla Text GCN without any external word embeddings or knowledge outperforms state-of-the-art methods for text classification. On the other hand, Text GCN also learns predictive word and document embeddings. In addition, experimental results show that the improvement of Text GCN over state-of-the-art comparison methods become more prominent as we lower the percentage of training data, suggesting the robustness of Text GCN to less training data in text classification.

...read moreread less

2,185 citations

Journal Article•10.18637/JSS.V025.I05•

Text Mining Infrastructure in R

[...]

Ingo Feinerer, Kurt Hornik, David Meyer

31 Mar 2008-Journal of Statistical Software

TL;DR: The tm package is presented which provides a framework for text mining applications within R and techniques for count-based analysis methods, text clustering, text classification and string kernels are presented.

...read moreread less

Abstract: During the last decade text mining has become a widely used discipline utilizing statistical and machine learning methods. We present the tm package which provides a framework for text mining applications within R. We give a survey on text mining facilities in R and explain how typical application tasks can be carried out using our framework. We present techniques for count-based analysis methods, text clustering, text classification and string kernels.

...read moreread less

1,297 citations

Journal Article•10.1007/S11263-015-0823-Z•

Reading Text in the Wild with Convolutional Neural Networks

[...]

Max Jaderberg¹, Karen Simonyan¹, Andrea Vedaldi¹, Andrew Zisserman¹•Institutions (1)

University of Oxford¹

01 Jan 2016-International Journal of Computer Vision

TL;DR: An end-to-end system for text spotting—localising and recognising text in natural scene images—and text based image retrieval and a real-world application to allow thousands of hours of news footage to be instantly searchable via a text query is demonstrated.

...read moreread less

Abstract: In this work we present an end-to-end system for text spotting--localising and recognising text in natural scene images--and text based image retrieval. This system is based on a region proposal mechanism for detection and deep convolutional neural networks for recognition. Our pipeline uses a novel combination of complementary proposal generation techniques to ensure high recall, and a fast subsequent filtering stage for improving precision. For the recognition and ranking of proposals, we train very large convolutional neural networks to perform word recognition on the whole proposal region at the same time, departing from the character classifier based systems of the past. These networks are trained solely on data produced by a synthetic text generation engine, requiring no human labelled data. Analysing the stages of our pipeline, we show state-of-the-art performance throughout. We perform rigorous experiments across a number of standard end-to-end text spotting benchmarks and text-based image retrieval datasets, showing a large improvement over all previous methods. Finally, we demonstrate a real-world application of our text spotting system to allow thousands of hours of news footage to be instantly searchable via a text query.

...read moreread less

1,281 citations

Proceedings Article•10.1145/383952.383955•

Generic text summarization using relevance measure and latent semantic analysis

[...]

Yihong Gong¹, Xin Liu¹•Institutions (1)

NEC¹

1 Sep 2001

TL;DR: This paper proposes two generic text summarization methods that create text summaries by ranking and extracting sentences from the original documents, and uses the latent semantic analysis technique to identify semantically important sentences, for summary creations.

...read moreread less

Abstract: In this paper, we propose two generic text summarization methods that create text summaries by ranking and extracting sentences from the original documents. The first method uses standard IR methods to rank sentence relevances, while the second method uses the latent semantic analysis technique to identify semantically important sentences, for summary creations. Both methods strive to select sentences that are highly ranked and different from each other. This is an attempt to create a summary with a wider coverage of the document's main content and less redundancy. Performance evaluations on the two summarization methods are conducted by comparing their summarization outputs with the manual summaries generated by three independent human evaluators. The evaluations also study the influence of different VSM weighting schemes on the text summarization performances. Finally, the causes of the large disparities in the evaluators' manual summarization results are investigated, and discussions on human text summarization patterns are presented.

...read moreread less

991 citations

...

Expand

Year	Papers
2025	2
2024	11
2023	58
2022	102
2021	18
2020	19

Topic Tools

Papers published on a yearly basis

Papers

Automatic text processing: the transformation, analysis, and retrieval of information by computer

Graph Convolutional Networks for Text Classification

Text Mining Infrastructure in R

Reading Text in the Wild with Convolutional Neural Networks

Generic text summarization using relevance measure and latent semantic analysis

Related Topics (5)

Performance Metrics