Multilingual Text Classification
Sonam Mittal,Praveen Dhyani +1 more
TL;DR: Multilingual Text Classification using Ngram techniques seems to have produced very interesting results in the field of text categorization not only for the languages like English and French but equally good for more difficult to classify languages like Spanish, Italian, German and Russian.
read more
Abstract: Identifying the language used for a document will typically be the first step to most of the Natural Language Processing tasks. Among the wide variety of language identification methods discussed in the literature, the ones employing the Canvar and Trenkle (1994) approach to text categorization based on character n-gram frequencies have been particularly successful. Multilingual Text Classification using Ngram techniques seems to have produced very interesting results in the field of text categorization not only for the languages like English and French but equally good for more difficult to classify languages like Spanish, Italian, German and Russian. Keywords— Multilingual Text, N-gram, tf-idf, frequency, similarity, classification, prediction, classifier
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Deep-BERT: Transfer Learning for Classifying Multilingual Offensive Texts on Social Media
TL;DR: In this paper , a detection system has been developed for both monolingual and multilingual offensive texts by combining deep convolutional neural network and bidirectional encoder representations from transformers (Deep-BERT).
A multilingual offensive language detection method based on transfer learning from transformer fine-tuning model
TL;DR: This paper proposed an effective approach based on the Bidirectional Encoder Representations from Transformers (BERT) that has shown great potential in capturing the semantics and contextual information within texts.
42
A multilingual offensive language detection method based on transfer learning from transformer fine-tuning model
TL;DR: This study aims to tackle the Multilingual Offensive Language Detection (MOLD) task using transfer learning models and the fine-tuning phase with an effective approach based on the Bidirectional Encoder Representations from Transformers (BERT).
40
Fine-Tuning Transformer Models Using Transfer Learning for Multilingual Threatening Text Identification
Muhammad Rehan,Muhammad Shahid Iqbal Malik,Mona Mamdouh Jamjoom +2 more
TL;DR: Findings revealed that the proposed methodology outperformed the baselines and showed benchmark performance, and the RoBERTa model achieved the highest performance by demonstrating 92% accuracy and 90% macro F1-score with the joint multi-lingual approach.
14
Multilingual Hope Speech Detection: A Robust Framework Using Transfer Learning of Fine-tuning Roberta Model
Muhammad Shahid Iqbal Malik,Anna B. Nazarova,Mona Mamdouh Jamjoom,Dmitry I. Ignatov +3 more
TL;DR: This study proposes a multilingual hope speech detection framework using transfer learning and fine-tuning of the RoBERTa model, achieving benchmark performance in English and Russian languages with 94% accuracy and 80.24% f1-score, outperforming baselines and addressing low-resource languages.
10
References
Is Naïve Bayes a Good Classifier for Document Classification
Hung Hum
- 01 Jan 2011
TL;DR: Results show that Naive Bayes is the best classifiers against several common classifiers (such as decision tree, neural network, and support vector machines) in term of accuracy and computational efficiency.
174
•Journal Article
Is Naïve bayes a good classifier for document classification
TL;DR: In this article, the authors highlight the performance of employing Naive Bayes in document classification and show that Naïve Bayes is the best classifiers against several common classifiers (such as decision tree, neural network, and support vector machines) in terms of accuracy and computational efficiency.
173
Multilingual text categorization using Character N-gram
Makoto Suzuki,Naohide Yamagishi,Yi-Ching Tsai,Shigeichi Hirasawa +3 more
- 25 Jun 2008
TL;DR: The proposed method is language-independent and provides a new perspective and has excellent potential, and adopts character N-gram as feature terms improving the particularity of FRAM.
7