Categorization

Topic Tools

Papers published on a yearly basis

1 / 2

Papers

Book Chapter•10.1007/BFB0026683•

Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

[...]

Thorsten Joachims¹•Institutions (1)

Technical University of Dortmund¹

21 Apr 1998

TL;DR: This paper explores the use of Support Vector Machines for learning text classifiers from examples and analyzes the particular properties of learning with text data and identifies why SVMs are appropriate for this task.

...read moreread less

Abstract: This paper explores the use of Support Vector Machines (SVMs) for learning text classifiers from examples. It analyzes the particular properties of learning with text data and identifies why SVMs are appropriate for this task. Empirical results support the theoretical findings. SVMs achieve substantial improvements over the currently best performing methods and behave robustly over a variety of different learning tasks. Furthermore they are fully automatic, eliminating the need for manual parameter tuning.

...read moreread less

9,683 citations

Book•

The WEIRDest People in the World

[...]

Joseph Henrich¹, Steven J. Heine¹, Ara Norenzayan¹•Institutions (1)

University of British Columbia¹

8 Sep 2020

TL;DR: A review of the comparative database from across the behavioral sciences suggests both that there is substantial variability in experimental results across populations and that WEIRD subjects are particularly unusual compared with the rest of the species – frequent outliers.

...read moreread less

Abstract: Behavioral scientists routinely publish broad claims about human psychology and behavior in the world's top journals based on samples drawn entirely from Western, Educated, Industrialized, Rich, and Democratic (WEIRD) societies. Researchers - often implicitly - assume that either there is little variation across human populations, or that these "standard subjects" are as representative of the species as any other population. Are these assumptions justified? Here, our review of the comparative database from across the behavioral sciences suggests both that there is substantial variability in experimental results across populations and that WEIRD subjects are particularly unusual compared with the rest of the species - frequent outliers. The domains reviewed include visual perception, fairness, cooperation, spatial reasoning, categorization and inferential induction, moral reasoning, reasoning styles, self-concepts and related motivations, and the heritability of IQ. The findings suggest that members of WEIRD societies, including young children, are among the least representative populations one could find for generalizing about humans. Many of these findings involve domains that are associated with fundamental aspects of psychology, motivation, and behavior - hence, there are no obvious a priori grounds for claiming that a particular behavioral phenomenon is universal based on sampling from a single subpopulation. Overall, these empirical patterns suggests that we need to be less cavalier in addressing questions of human nature on the basis of data drawn from this particularly thin, and rather unusual, slice of humanity. We close by proposing ways to structurally re-organize the behavioral sciences to best tackle these challenges.

...read moreread less

9,388 citations

Journal Article•10.1145/505282.505283•

Machine learning in automated text categorization

[...]

Fabrizio Sebastiani

01 Mar 2002-ACM Computing Surveys

TL;DR: This survey discusses the main approaches to text categorization that fall within the machine learning paradigm and discusses in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.

...read moreread less

Abstract: The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last 10 years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert labor power, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.

...read moreread less

8,559 citations

Proceedings Article•10.3115/1118693.1118704•

Thumbs up? Sentiment Classification using Machine Learning Techniques

[...]

Bo Pang¹, Lillian Lee¹, Shivakumar Vaithyanathan²•Institutions (2)

Cornell University¹, IBM²

6 Jul 2002

TL;DR: This work considers the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative, and concludes by examining factors that make the sentiment classification problem more challenging.

...read moreread less

Abstract: We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we find that standard machine learning techniques definitively outperform human-produced baselines. However, the three machine learning methods we employed (Naive Bayes, maximum entropy classification, and support vector machines) do not perform as well on sentiment classification as on traditional topic-based categorization. We conclude by examining factors that make the sentiment classification problem more challenging.

...read moreread less

7,260 citations

Thumbs up? Sentiment Classiflcation using Machine Learning Techniques

[...]

Bo Pang, Lillian Lee, Shivakumar Vaithyanathan

1 Jan 2002

TL;DR: In this paper, the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative, was considered and three machine learning methods (Naive Bayes, maximum entropy classiflcation, and support vector machines) were employed.

...read moreread less

Abstract: We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we flnd that standard machine learning techniques deflnitively outperform human-produced baselines. However, the three machine learning methods we employed (Naive Bayes, maximum entropy classiflcation, and support vector machines) do not perform as well on sentiment classiflcation as on traditional topic-based categorization. We conclude by examining factors that make the sentiment classiflcation problem more challenging.

...read moreread less

6,980 citations

...

Expand

Year	Papers
2026	17
2025	1,009
2024	1,604
2023	3,532
2022	4,677
2021	698

Topic Tools

Papers published on a yearly basis

Papers

Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

The WEIRDest People in the World

Machine learning in automated text categorization

Thumbs up? Sentiment Classification using Machine Learning Techniques

Thumbs up? Sentiment Classiflcation using Machine Learning Techniques

Related Topics (5)

Performance Metrics