From frequency to meaning: vector space models of semantics
Peter D. Turney,Patrick Pantel +1 more
TL;DR: The goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs, and to provide pointers into the literature for those who are less familiar with the field.
read more
Abstract: Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term-document, word-context, and pair-pattern matrices, yielding three classes of applications. We survey a broad range of applications in these three categories and we take a detailed look at a specific open source project in each category. Our goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs for those who are already familiar with the area, and to provide pointers into the literature for those who are less familiar with the field.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Proceedings Article
Distributed Representations of Words and Phrases and their Compositionality
Tomas Mikolov,Ilya Sutskever,Kai Chen,Greg S. Corrado,Jeffrey Dean +4 more
- 05 Dec 2013
TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.
•Posted Content
Distributed Representations of Words and Phrases and their Compositionality
TL;DR: In this paper, the Skip-gram model is used to learn high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships and improve both the quality of the vectors and the training speed.
Machine learning
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Enriching Word Vectors with Subword Information
TL;DR: This paper proposed a new approach based on skip-gram model, where each word is represented as a bag of character n-grams, words being represented as the sum of these representations, allowing to train models on large corpora quickly and allowing to compute word representations for words that did not appear in the training data.
•Proceedings Article
Distributed Representations of Sentences and Documents
Quoc V. Le,Tomas Mikolov +1 more
- 21 Jun 2014
TL;DR: Paragraph Vector is an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents, and its construction gives the algorithm the potential to overcome the weaknesses of bag-of-words models.
References
Similarity estimation techniques from rounding algorithms
Moses Charikar
- 19 May 2002
TL;DR: It is shown that rounding algorithms for LPs and SDPs used in the context of approximation algorithms can be viewed as locality sensitive hashing schemes for several interesting collections of objects.
•Journal Article
An extensive empirical study of feature selection metrics for text classification
TL;DR: An empirical comparison of twelve feature selection methods evaluated on a benchmark of 229 text classification problem instances, revealing that a new feature selection metric, called 'Bi-Normal Separation' (BNS), outperformed the others by a substantial margin in most situations and was the top single choice for all goals except precision.
Kernel Principal Component Analysis
Bernhard Schölkopf,Alexander J. Smola,Klaus-Robert Müller +2 more
- 08 Oct 1997
TL;DR: A new method for performing a nonlinear form of Principal Component Analysis by the use of integral operator kernel functions is proposed and experimental results on polynomial feature extraction for pattern recognition are presented.
2.6K
Attention, similarity, and the identification-categorization relationship.
TL;DR: In this paper, a unified quantitative approach to modeling subjects' identification and categorization of multidimensional perceptual stimuli is proposed and tested, where subjects identify and categorize the same set of perceptually confusable stimuli varying on separable dimensions.