Study of Information-theoretic Properties of Arabic Based on Word Entropy and Zipf's Law
2
TL;DR: The aim of this paper is to study the word statistics of the Arabic Language with the intention of estimating the information content of Arabic based on word entropy.
read more
Abstract: Natural languages have very complicated structures but are highly redundant. Statistical studies of a language are extremely important in numerous fields of knowledge, including Education, Linguistics, Computers and Communications. The aim of this paper is to study the word statistics of the Arabic Language with the intention of estimating the information content of Arabic based on word entropy. Actual statistics of frequencies of Arabic based on a 700,000-word sample will be used to demonstrate the applicability of Zipf s Law to the Arabic Language. Word and letter entropy and redundancy of Arabic are then deduced and compared with corresponding values of English.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Arabic vs. English: Comparative Statistical Study
TL;DR: Two large Arabic and English corpora collected from newswire text data, consisting of 600 million words each, are utilized and the distribution of word length, paragraph length, punctuation marks, unigrams, bigrams and trigrams and their statistical effect are presented.
8
Dotless Representation of Arabic Text: Analysis and Modeling
Maged S. Al-shaibani,Irfan Ahmad +1 more
TL;DR: A novel dotless representation of Arabic text as an alternative to the standard Arabic text representation is presented and statistical and neural language models are constructed using the various text corpora and tokenization techniques.
References
Prediction and entropy of printed English
TL;DR: A new method of estimating the entropy and redundancy of a language is described, which exploits the knowledge of the language statistics possessed by those who speak the language, and depends on experimental results in prediction of the next letter when the preceding text is known.
2.9K
First second- and third-order entropies of Arabic text (Corresp.)
TL;DR: The first- second- and third-order entropies of written Arabic text are calculated and shows that a redundancy of more than 50 percent is exhibited by the text.
18
•Book
Computers and the Arabic language
Pierre A. MacKay
- 01 Jul 1990
TL;DR: Impact of computers on the development of the third world, R.P.Haton a survey of bilingual peripherals, N.Harfouch and S.Schneider.
15
Zipf's law and entropy (Corresp.)
TL;DR: The estimate of the entropy of a language by assuming that the word probabilities follow Zipf's law is discussed briefly and the vocabulary size and entropy per word are corrected.
13