From frequency to meaning: vector space models of semantics
Peter D. Turney,Patrick Pantel +1 more
TL;DR: The goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs, and to provide pointers into the literature for those who are less familiar with the field.
read more
Abstract: Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term-document, word-context, and pair-pattern matrices, yielding three classes of applications. We survey a broad range of applications in these three categories and we take a detailed look at a specific open source project in each category. Our goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs for those who are already familiar with the area, and to provide pointers into the literature for those who are less familiar with the field.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
A survey of paraphrasing and textual entailment methods
TL;DR: Key ideas from the two areas of paraphrasing and textual entailment are summarized by considering in turn recognition, generation, and extraction methods, also pointing to prominent articles and resources.
Topic Modeling in Management Research: Rendering New Theory from Textual Data
Timothy R. Hannigan,Richard F. J. Haans,Keyvan Vakili,Hovig Tchalian,Vern Glaser,Milo Shaoqing Wang,Sarah Kaplan,P. Devereaux Jennings +7 more
TL;DR: For example, this article used topic modeling to reveal phenomenon-based constructs and grounded conceptual relationships in textual documents. But, they did not consider the relationship between concepts and concepts in the documents.
Contextual semantics for sentiment analysis of Twitter
TL;DR: Different from typical lexicon-based approaches, SentiCircles takes into account the co-occurrence patterns of words in different contexts in tweets to capture their semantics and update their pre-assigned strength and polarity in sentiment lexicons accordingly.
•Proceedings Article
Distributional Semantics in Technicolor
Elia Bruni,Gemma Boleda,Marco Baroni,Nam Khanh Tran +3 more
- 08 Jul 2012
TL;DR: While visual models with state-of-the-art computer vision techniques perform worse than textual models in general tasks, they are as good or better models of the meaning of words with visual correlates such as color terms, even in a nontrivial task that involves nonliteral uses of such words.
440
Gaussian LDA for Topic Models with Word Embeddings
Rajarshi Das,Manzil Zaheer,Chris Dyer +2 more
- 01 Jul 2015
TL;DR: Gaussian LDA is replaced with multivariate Gaussian distributions on the embedding space, which encourages the model to group words that are a priori known to be semantically related into topics into topics.
References
A mathematical theory of communication
TL;DR: This final installment of the paper considers the case where the signals or the messages or both are continuously variable, in contrast with the discrete nature assumed until now.
74.4K
•Journal Article
The mathematical theory of communication
Claude E. Shannon,Warren Weaver +1 more
TL;DR: The Mathematical Theory of Communication (MTOC) as discussed by the authors was originally published as a paper on communication theory more than fifty years ago and has since gone through four hardcover and sixteen paperback printings.
36.2K
Latent dirichlet allocation
TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
•Proceedings Article
Latent Dirichlet Allocation
David M. Blei,Andrew Y. Ng,Michael I. Jordan +2 more
- 03 Jan 2001
TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
- 06 Dec 2004
TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.