Eager learning

Topic Tools

Papers

Proceedings Article•10.3115/1599081.1599146•

Authorship Attribution and Verification with Many Authors and Limited Data

[...]

Kim Luyckx¹, Walter Daelemans¹•Institutions (1)

18 Aug 2008

TL;DR: What the effect is of many authors on feature selection and learning, and robustness of a memory-based learning approach in doing authorship attribution and verification with many authors and limited training data when compared to eager learning methods such as SVMs and maximum entropy learning are shown.

...read moreread less

Abstract: Most studies in statistical or machine learning based authorship attribution focus on two or a few authors. This leads to an overestimation of the importance of the features extracted from the training data and found to be discriminating for these small sets of authors. Most studies also use sizes of training data that are unrealistic for situations in which stylometry is applied (e.g., forensics), and thereby overestimate the accuracy of their approach in these situations. A more realistic interpretation of the task is as an authorship verification problem that we approximate by pooling data from many different authors as negative examples. In this paper, we show, on the basis of a new corpus with 145 authors, what the effect is of many authors on feature selection and learning, and show robustness of a memory-based learning approach in doing authorship attribution and verification with many authors and limited training data when compared to eager learning methods such as SVMs and maximum entropy learning.

...read moreread less

189 citations

Journal Article•10.1023/A:1006506017891•

IGTree: using trees for compression and classification in lazy learning algorithms

[...]

Walter Daelemans¹, Antal van den Bosch², Ton Weijters²•Institutions (2)

Tilburg University¹, Maastricht University²

01 Feb 1997-Artificial Intelligence Review

TL;DR: IGTree is a useful algorithm for problems characterized by the availability of a large number of training instances described by symbolic features with sufficiently differing information gain values, and is obtained similar or better generalization accuracy with IGTree when trained on two complex linguistic tasks.

...read moreread less

Abstract: We describe the IGTree learning algorithm, which compresses an instance base into a tree structure. The concept of information gain is used as a heuristic function for performing this compression. IGTree produces trees that, compared to other lazy learning approaches, reduce storage requirements and the time required to compute classifications. Furthermore, we obtained similar or better generalization accuracy with IGTree when trained on two complex linguistic tasks, viz. letter–phoneme transliteration and part-of-speech-tagging, when compared to alternative lazy learning and decision tree approaches (viz., IB1, information-gain-weighted IB1, and C4.5). A third experiment, with the task of word hyphenation, demonstrates that when the mutual differences in information gain of features is too small, IGTree as well as information-gain-weighted IB1 perform worse than IB1. These results indicate that IGTree is a useful algorithm for problems characterized by the availability of a large number of training instances described by symbolic features with sufficiently differing information gain values.

...read moreread less

186 citations

Journal Article•10.1016/J.CSBJ.2021.10.027•

Automatic cell type identification methods for single-cell RNA sequencing.

[...]

Bingbing Xie¹, Qin Jiang², Antonio M. Mora³, Xuri Li¹•Institutions (3)

Sun Yat-sen University¹, Nanjing Medical University², Guangzhou Institutes of Biomedicine and Health³

01 Jan 2021-Computational and structural biotechnology journal

TL;DR: In this paper, the authors discuss and evaluate thirty-two published automatic methods for scRNA-seq data analysis in terms of their prediction accuracy, F1-score, unlabeling rate and running time.

...read moreread less

Abstract: Single-cell RNA sequencing (scRNA-seq) has become a powerful tool for scientists of many research disciplines due to its ability to elucidate the heterogeneous and complex cell-type compositions of different tissues and cell populations. Traditional cell-type identification methods for scRNA-seq data analysis are time-consuming and knowledge-dependent for manual annotation. By contrast, automatic cell-type identification methods may have the advantages of being fast, accurate, and more user friendly. Here, we discuss and evaluate thirty-two published automatic methods for scRNA-seq data analysis in terms of their prediction accuracy, F1-score, unlabeling rate and running time. We highlight the advantages and disadvantages of these methods and provide recommendations of method choice depending on the available information. The challenges and future applications of these automatic methods are further discussed. In addition, we provide a free scRNA-seq data analysis package encompassing the discussed automatic methods to help the easy usage of them in real-world applications.

...read moreread less

70 citations

Book Chapter•10.1007/978-981-15-1097-7_47•

Emotion Detection Framework for Twitter Data Using Supervised Classifiers

[...]

Matla Suhasini, Badugu Srinivasu

1 Jan 2020

TL;DR: The emotion for Twitter messages is detected as they provide rich ensemble of human emotions and Naive Bayes and k-nearest neighbor algorithm are used to detect the emotion and classify the Twitter messages into four emotional categories.

...read moreread less

Abstract: “The task of emotion detection usually involves the analysis of text. Humans show universal consistency in identifying emotions however shows an excellent deal of variation between individuals in their abilities.” We have detected the emotion for Twitter messages as they provide rich ensemble of human emotions. We have used machine learning algorithms namely Naive Bayes (NB) and k-nearest neighbor algorithm (KNN) to detect the emotion of Twitter message and then classify the Twitter messages into four emotional categories. We also made a comparative study of two supervised machine learning algorithms; the eager learning classifier (NB) performed well when compared with lazy learning classifier (KNN).

...read moreread less

60 citations

Proceedings Article•10.1109/IEMTRONICS52119.2021.9422649•

Ensemble of Supervised and Unsupervised Learning Models to Predict a Profitable Business Decision

[...]

Maryam Heidari¹, Samira Zad², Setareh Rafatirad¹•Institutions (2)

George Mason University¹, Florida International University²

21 Apr 2021

TL;DR: In this article, the authors carried out a comprehensive analysis and study of seven machine learning algorithms for rent prediction, including Linear Regression, Multilayer Perceptron, Random Forest, KNN, Locally Weighted Learning, SMO, and KStar algorithms.

...read moreread less

Abstract: Real-Estate rent prediction in housing market analysis plays a key role in calculating the Rate of Return - a salient index used to evaluate real-estate investment options. Accurate rent prediction in real estate investment can help in generating capital gains and guaranty a financial success. In this paper, we carry out a comprehensive analysis and study of seven machine learning algorithms for rent prediction, including Linear Regression, Multilayer Perceptron, Random Forest, KNN, Locally Weighted Learning, SMO, and KStar algorithms. We train new model for the US territory, including three house types of single-family, townhouse, and condo. Each data instance in the dataset has 21 internal attributes (e.g., area space, price, number of bed/bathroom, rent, school rating, so forth). A subset of the collected features selected by filter methods for the prediction models. We also employ a hierarchical clustering approach to cluster the data based on two factors of house type, and average rent estimate of zip codes. The empirical results suggest that the rent prediction models based on lazy learning algorithms lead to higher accuracy and lower prediction error compared to eager learning methods.

...read moreread less

56 citations

...

Expand

Year	Papers
2021	6
2020	3
2018	3
2016	2
2015	1
2014	1

Topic Tools

Papers

Authorship Attribution and Verification with Many Authors and Limited Data

IGTree: using trees for compression and classification in lazy learning algorithms

Automatic cell type identification methods for single-cell RNA sequencing.

Emotion Detection Framework for Twitter Data Using Supervised Classifiers

Ensemble of Supervised and Unsupervised Learning Models to Predict a Profitable Business Decision

Related Topics (5)

Performance Metrics