Lexicon

Topic Tools

Papers published on a yearly basis

1 / 3

Papers

Journal Article•10.1162/COLI_A_00049•

Lexicon-based methods for sentiment analysis

[...]

Maite Taboada¹, Julian Brooke², Milan Tofiloski¹, Kimberly Voll³, Manfred Stede⁴ - Show less +1 more•Institutions (4)

Simon Fraser University¹, University of Toronto², University of British Columbia³, University of Potsdam⁴

01 Jun 2011-Computational Linguistics

TL;DR: The Semantic Orientation CALculator (SO-CAL) uses dictionaries of words annotated with their semantic orientation (polarity and strength), and incorporates intensification and negation, and is applied to the polarity classification task.

...read moreread less

Abstract: We present a lexicon-based approach to extracting sentiment from text. The Semantic Orientation CALculator (SO-CAL) uses dictionaries of words annotated with their semantic orientation (polarity and strength), and incorporates intensification and negation. SO-CAL is applied to the polarity classification task, the process of assigning a positive or negative label to a text that captures the text's opinion towards its main subject matter. We show that SO-CAL's performance is consistent across domains and in completely unseen data. Additionally, we describe the process of dictionary creation, and our use of Mechanical Turk to check dictionaries for consistency and reliability.

...read moreread less

3,367 citations

Proceedings Article•10.3115/980845.980860•

The Berkeley FrameNet Project

[...]

Collin F. Baker¹, Charles J. Fillmore¹, John B. Lowe¹•Institutions (1)

International Computer Science Institute¹

10 Aug 1998

TL;DR: This report will present the project's goals and workflow, and information about the computational tools that have been adapted or created in-house for this work.

...read moreread less

Abstract: FrameNet is a three-year NSF-supported project in corpus-based computational lexicography, now in its second year (NSF IRI-9618838, "Tools for Lexicon Building"). The project's key features are (a) a commitment to corpus evidence for semantic and syntactic generalizations, and (b) the representation of the valences of its target words (mostly nouns, adjectives, and verbs) in which the semantic portion makes use of frame semantics. The resulting database will contain (a) descriptions of the semantic frames underlying the meanings of the words described, and (b) the valence representation (semantic and syntactic) of several thousand words and phrases, each accompanied by (c) a representative collection of annotated corpus attestations, which jointly exemplify the observed linkings between "frame elements" and their syntactic realizations (e.g. grammatical function, phrase type, and other syntactic traits). This report will present the project's goals and workflow, and information about the computational tools that have been adapted or created in-house for this work.

...read moreread less

3,253 citations

Journal Article•10.3758/BF03193014•

The English Lexicon Project.

[...]

David A. Balota¹, Melvin J. Yap¹, Michael J. Cortese², Keith A. Hutchison³, Brett Kessler¹, Bjorn Loftis¹, James H. Neely⁴, Douglas L. Nelson⁵, Greg B. Simpson⁶, Rebecca Treiman¹ - Show less +6 more•Institutions (6)

Washington University in St. Louis¹, College of Charleston², Montana State University³, State University of New York System⁴, University of South Florida⁵, University of Kansas⁶

01 Aug 2007-Behavior Research Methods

TL;DR: The motivation for this project, the methods used to collect the data, and the search engine that affords access to the behavioral measures and descriptive lexical statistics for these stimuli are described.

...read moreread less

Abstract: The English Lexicon Project is a multiuniversity effort to provide a standardized behavioral and descriptive data set for 40,481 words and 40,481 nonwords. It is available via the Internet at elexicon.wustl.edu. Data from 816 participants across six universities were collected in a lexical decision task (approximately 3400 responses per participant), and data from 444 participants were collected in a speeded naming task (approximately 2500 responses per participant). The present paper describes the motivation for this project, the methods used to collect the data, and the search engine that affords access to the behavioral measures and descriptive lexical statistics for these stimuli.

...read moreread less

2,662 citations

Journal Article•10.1111/J.1467-8640.2012.00460.X•

Crowdsourcing a word–emotion association lexicon

[...]

Saif M. Mohammad¹, Peter D. Turney¹•Institutions (1)

National Research Council¹

1 Aug 2013

TL;DR: It is shown how the combined strength and wisdom of the crowds can be used to generate a large, high‐quality, word–emotion and word–polarity association lexicon quickly and inexpensively.

...read moreread less

Abstract: Even though considerable attention has been given to the polarity of words (positive and negative) and the creation of large polarity lexicons, research in emotion analysis has had to rely on limited and small emotion lexicons. In this paper, we show how the combined strength and wisdom of the crowds can be used to generate a large, high-quality, word–emotion and word–polarity association lexicon quickly and inexpensively. We enumerate the challenges in emotion annotation in a crowdsourcing scenario and propose solutions to address them. Most notably, in addition to questions about emotions associated with terms, we show how the inclusion of a word choice question can discourage malicious data entry, help to identify instances where the annotator may not be familiar with the target term (allowing us to reject such annotations), and help to obtain annotations at sense level (rather than at word level). We conducted experiments on how to formulate the emotion-annotation questions, and show that asking if a term is associated with an emotion leads to markedly higher interannotator agreement than that obtained by asking if a term evokes an emotion.

...read moreread less

2,623 citations

Proceedings Article•

Towards End-To-End Speech Recognition with Recurrent Neural Networks

[...]

Alex Graves¹, Navdeep Jaitly²•Institutions (2)

Google¹, University of Toronto²

21 Jun 2014

TL;DR: A speech recognition system that directly transcribes audio data with text, without requiring an intermediate phonetic representation is presented, based on a combination of the deep bidirectional LSTM recurrent neural network architecture and the Connectionist Temporal Classification objective function.

...read moreread less

Abstract: This paper presents a speech recognition system that directly transcribes audio data with text, without requiring an intermediate phonetic representation. The system is based on a combination of the deep bidirectional LSTM recurrent neural network architecture and the Connectionist Temporal Classification objective function. A modification to the objective function is introduced that trains the network to minimise the expectation of an arbitrary transcription loss function. This allows a direct optimisation of the word error rate, even in the absence of a lexicon or language model. The system achieves a word error rate of 27.3% on the Wall Street Journal corpus with no prior linguistic information, 21.9% with only a lexicon of allowed words, and 8.2% with a trigram language model. Combining the network with a baseline system further reduces the error rate to 6.7%.

...read moreread less

2,513 citations

...

Expand

Year	Papers
2026	9
2025	418
2024	670
2023	1,190
2022	1,669
2021	479

Topic Tools

Papers published on a yearly basis

Papers

Lexicon-based methods for sentiment analysis

The Berkeley FrameNet Project

The English Lexicon Project.

Crowdsourcing a word–emotion association lexicon

Towards End-To-End Speech Recognition with Recurrent Neural Networks

Related Topics (5)

Performance Metrics