Proceedings Article10.1109/ICMLC.2007.4370742
A Fast KNN Algorithm for Text Categorization
Yu Wang,Zheng-Ou Wang +1 more
- 29 Oct 2007
- Vol. 6, pp 3436-3441
71
TL;DR: A method called TFKNN(Tree-Fast-K-Nearest-Neighbor) is presented, which can search the exact k nearest neighbors quickly and the time of similarity computing is decreased largely.
read more
Abstract: The KNN algorithm applied to text categorization is a simple, valid and non-parameter method. The traditional KNN has a fatal defect that the time of similarity computing is huge. The practicality will be lost when the KNN algorithm is applied to text categorization with the high dimension and huge samples. In this paper, a method called TFKNN(Tree-Fast-K-Nearest-Neighbor) is presented, which can search the exact k nearest neighbors quickly. In the method, a SSR tree for searching K nearest neighbors is created, in which all child nodes of each non-leaf node are ranked according to the distances between their central points and the central point of their parent. Then the searching scope is reduced based on the tree. Subsequently , the time of similarity computing is decreased largely.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
An improved K-nearest-neighbor algorithm for text categorization
TL;DR: An improved KNN algorithm is proposed, which builds the classification model by combining constrained one pass clustering algorithm and KNN text categorization, which can reduce the text similarity computation substantially and outperform the-state-of-the-art KNN, Naive Bayes and Support Vector Machine classifiers.
354
An Improved KNN Text Classification Algorithm Based on Clustering
Yong Zhou,Youwen Li,Shixiong Xia +2 more
TL;DR: The simulation results show that the algorithm proposed in this paper can not only effectively reduce the actual number of training samples and lower the calculation complexity, but also improve the accuracy of KNN text classification algorithm.
An Improved k-Nearest Neighbor Classification Using Genetic Algorithm
N. Suguna,K. Thanushkodi +1 more
- 01 Jan 2010
TL;DR: In this article, an improved version of KNN is proposed in which GA is combined with KNN to improve its classification performance, instead of considering all the training samples and taking k-neighbors, the GA is employed to take k-NEighbors straightaway and then calculate the distance to classify the test samples.
Using KNN algorithm for classification of textual documents
Aiman Moldagulova,Rosnafisah Sulaiman +1 more
- 17 May 2017
TL;DR: An approach for building a machine learning system in R that uses K-Nearest Neighbors (KNN) method for the classification of textual documents and challenges the KNN algorithm to find the proper value of k which represents the number of neighbors.
83
Techniques for text classification: Literature review and current trends
Rajni Jindal,Ruchika Malhotra,Abha Jain +2 more
- 01 Dec 2015
TL;DR: This paper has studied the existing work in the area of text classification and tried to summarize all existing information in a comprehensive and succinct manner to have a fair evaluation of the progress made in this field till date.
References
A re-examination of text categorization methods
Yiming Yang,Xin Liu +1 more
- 01 Aug 1999
TL;DR: The results show that SVM, kNN and LLSF signi cantly outperform NNet and NB when the number of positive training instances per category are small, and that all the methods perform comparably when the categories are over 300 instances.
3K
Similarity indexing with the SS-tree
David A. White,Ramesh Jain +1 more
- 26 Feb 1996
TL;DR: This work describes the fundamental types of "similarity queries" that should be supported and proposes a new dynamic structure for similarity indexing called the similarity search tree or SS-tree, which performs better than the R*-tree in nearly every test.
Erratum: Facilitated transport of a Dpp/Scw heterodimer by Sog/Tsg leads to robust patterning of the Drosophila blastoderm embryo (Cell (2005) 120 (873-886) )
TL;DR: It is demonstrated mathematically that heterodimer levels can be less sensitive to changes in gene dosage than homodimer levels, thereby providing further selective advantage for using heterodimers as morphogens.
330
Efficient k-NN search on vertically decomposed data
Arjen P. de Vries,Nikos Mamoulis,Niels Nes,Martin L. Kersten +3 more
- 03 Jun 2002
TL;DR: The suggested (physical) database design accommodates well a novel variant of branch-and-bound search, that reduces the high dimensional space quickly to a small candidate set, especially suited for high dimensional spaces.
Related Papers (5)
A. G. Jivani
- 21 Feb 2013
Gongde Guo,Hui Wang,David Bell,Yaxin Bi,Kieran Greer +4 more
- 01 Mar 2006
Pascal Soucy,G.W. Mineau +1 more
- 29 Nov 2001