From frequency to meaning: vector space models of semantics
Peter D. Turney,Patrick Pantel +1 more
TL;DR: The goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs, and to provide pointers into the literature for those who are less familiar with the field.
read more
Abstract: Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term-document, word-context, and pair-pattern matrices, yielding three classes of applications. We survey a broad range of applications in these three categories and we take a detailed look at a specific open source project in each category. Our goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs for those who are already familiar with the area, and to provide pointers into the literature for those who are less familiar with the field.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Proceedings Article
Medical synonym extraction with concept space models
Chang Wang,Liangliang Cao,Bowen Zhou +2 more
- 25 Jul 2015
TL;DR: A novel approach to integrate the term embedding with the medical domain knowledge for healthcare applications and it is shown that the proposed approach outperforms the baseline approaches by a large margin.
Automatic Classification Method for Software Vulnerability Based on Deep Neural Network
TL;DR: Compared to SVM, Naive Bayes, and KNN, the TFI-DNN model has achieved better performance in multi-dimensional evaluation indexes including accuracy, recall rate, precision, and F1-score.
Sketching Linear Classifiers over Data Streams
Kai Sheng Tai,Vatsal Sharan,Peter Bailis,Gregory Valiant +3 more
- 27 May 2018
TL;DR: The Weight-Median Sketch as mentioned in this paper adopts the core data structure used in the Count-Sketch, but instead of sketching counts, it captures sketched gradient updates to the model parameters.
41
Topic modeling revisited: New evidence on algorithm performance and quality metrics
TL;DR: This study compares all commonly used, non-application-specific topic modeling algorithms and assess their relative performance, and analyzes the relationship between existing metrics and the known clustering to objectively determine under what conditions these algorithms may be utilized effectively.
UCCA: A Semantics-based Grammatical Annotation Scheme
Omri Abend,Ari Rappoport +1 more
- 01 Mar 2013
TL;DR: A simple semantic annotation scheme, UCCA for Universal Conceptual Cognitive Annotation, that covers many of the most important elements and relations present in linguistic utterances, including verb-argument structure, optional adjuncts such as adverbials, clause embeddings, and the linkage between them is proposed.
References
A mathematical theory of communication
TL;DR: This final installment of the paper considers the case where the signals or the messages or both are continuously variable, in contrast with the discrete nature assumed until now.
74.4K
•Journal Article
The mathematical theory of communication
Claude E. Shannon,Warren Weaver +1 more
TL;DR: The Mathematical Theory of Communication (MTOC) as discussed by the authors was originally published as a paper on communication theory more than fifty years ago and has since gone through four hardcover and sixteen paperback printings.
36.2K
Latent dirichlet allocation
TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
•Proceedings Article
Latent Dirichlet Allocation
David M. Blei,Andrew Y. Ng,Michael I. Jordan +2 more
- 03 Jan 2001
TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
- 06 Dec 2004
TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.