Top 9 papers presented at Artificial Intelligence and Natural Language in 2016

Proceedings Article•

Lexical, morphological and semantic correlates of the dark triad personality traits in russian facebook texts

[...]

Polina Panicheva, Yanina Ledovaya, Olga Bogolyubova¹•Institutions (1)

1 Nov 2016

TL;DR: Morphological and semantic analysis are applied to investigate the relationship between the Dark traits and their linguistic manifestation in social network texts and identify correlated features, a step towards automatic Dark trait prediction and early detection of the potentially harmful mental states.

...read moreread less

Abstract: The presented project is intended to make use of growing amounts or textual data in social networks in the Russian language, In order to Hnd Ungulstlc correlates of the Dark Triad personality traits, comprising non-clinical Nareissism, Machiavellianism and Psychopathy. The baekgronnd for the ilwestigation includes, on the one haotl, psychological research on these phenomena and their measurement instruments, and on the other haod, recent advaoces In computational stylometry and text-based author profiling. The measures for these psychological phenomena are provided by recognized self-report psychological surveys adapted to Russian. Morphological and semantic analysis are applied to investigate the relationship between the Dark traits and their linguistic manifestation in social network texts. Slgnlflcant morphological and semantic correlates of Narcissism, MachlavelUanlsm and Psychopathy are ldentllled and compared to respective advaoces In Engltsh author proftUng. In order to deepen our underslanding of the relation between these psychological characteristics aod natural language use, the identified linguistic features are Interpreted In terms of the line-grained factor structure of the Dark traits. Identifying correlated features is a step towards automatic Dark trait prediction aod early detection of the potentially harmful mental states.

...read moreread less

20 citations

Proceedings Article•

Measuring influencers in twitter ad-hoc discussions: active users vs. internal networks in the discourse on biryuliovo bashings in 2013

[...]

Svetlana S. Bodrunova¹, Ivan S. Blekanov¹, Alexey Maksimov¹•Institutions (1)

Saint Petersburg State University¹

1 Nov 2016

TL;DR: It is shown that users who post or even get commented most do not make it to the positions of most 'central' users by network metrics, and it is demonstrated that usen that rank high by betweenness and pagerank centn form circles of reciprocal commenting that show the social cleavage wider than the discussion itselt.

...read moreread less

Abstract: Despite disputable possibility of extension of analysis of social relations on Twitter to real life, Twitter discussions are stiU being under attention of scholars studying structures and meanings of news- and issue-based ad-hoc public discourse. One of the socially relevant aspects of Twitter studies is that of influencers - accounts that produce impact, either inside or outside Twitter. But there is still no agreement in the research community on how to defme and measure who is an inDuencer: either by 'absolute figUres' or by network analysis metrics; this issue is even rarely discussed. Politically, today's mediatized pub6c sphere where traditional media play the role of information hubs is highly uneven in terms of auess to opinion expression; it privileges institutional players, including political elites, corporations, and media themselves. Hopes that Twitter would provide a more equal space for public deliberation are still not proven weD enough. Using web crawling and manual assessment of Twitter ad-hoc discussion on the Biryulyovo bashings of 2013, we show that users who post or even get commented most do not make it to the positions of most 'central' users by network metrics. We also demonstrate that usen that rank high by betweenness and pagerank centn.lity form circles of reciprocal commenting that show the social cleavage wider than the discussion itselt

...read moreread less

16 citations

Proceedings Article•

Predicting the age of social network users from user-generated texts with word embeddings

[...]

Anton Alekseev, Sergey I. Nikolenko¹•Institutions (1)

Steklov Mathematical Institute¹

1 Nov 2016

TL;DR: The efficiency of age prediction algorithms based on word2vec word embeddings are evaluated and a comprehensive experimental evaluation is conducted, comparing these algorithms with each other and with classical baseline approaches.

...read moreread less

Abstract: Many web-based applications such as advertising or recommender systems often critically depend on the demographic information, which may be unavailable for new or anonymous users. We study the problem of predicting demographic information based on user-generated texts on a Russian-language dataset from a large social network. We evaluate the efficiency of age prediction algorithms based on word2vec word embeddings and conduct a comprehensive experimental evaluation, comparing these algorithms with each other and with classical baseline approaches.

...read moreread less

12 citations

Proceedings Article•

Multiword expressions in russian thesauri RuThes and RuWordnet

[...]

Natalia V. Loukachevitch¹, German Lashevich²•Institutions (2)

Bauman Moscow State Technical University¹, Kazan Federal University²

1 Nov 2016

TL;DR: All the described expressions may look like compositiomd expressions but have specific relations that can be useful in appllcatlons and it is proposed to automatically introduce additional relations for their better representation.

...read moreread less

Abstract: We present the types or multiword expressions included into the thesaurus or Russian language RuThes. Maoy of these expressions may look like compositiomd expressions but have specific relations that can be useful in appllcatlons. The rela· tion system or the RuThes thesaurus allows natural description of relations between an expression and its components if necessary. Transforming the RnThes knowledge into the Princeton WordNet structure for creating Russian wordnet (RuWordNet), we tronsfer also all the described expressions into the new resource and propose to automatically introduce additional relations for their better representation.

...read moreread less

11 citations

Proceedings Article•

Improving neural network models for natural language processing in russian with synonyms

[...]

Ruslan Galinsky¹, Anton Alekseev¹, Sergey I. Nikolenko¹•Institutions (1)

Steklov Mathematical Institute¹

1 Nov 2016

TL;DR: This work suggests a dala augmentation method based on extending a given dataset with synonyms for the words appearing there and applies this approach to the morphologically rich Russian language and shows improvements for modem neural network NLP models on standard tasks such as sentiment analysis.

...read moreread less

Abstract: Recent advances in deep leaming for natural language processing achieve and improve over state of the art results in many natural language processing tasks. One problem with neural network models, however, is that they require large datasets, including large labeled datasets for the corresponding problems. In this work, we suggest a dala augmentation method based on extending a given dataset with synonyms for the words appearing there. We apply this approach to the morphologically rich Russian language and show improvements for modem neural network NLP models on standard tasks such as sentiment analysis.

...read moreread less

10 citations

Proceedings Article•

Towards cluster validity index evaluation and selection

[...]

Andrey Filchenkov¹, Sergey Muravyov¹, Vladimir Parfenov¹•Institutions (1)

Saint Petersburg State University of Information Technologies, Mechanics and Optics¹

1 Nov 2016

TL;DR: This work introduces four quality measures for CVI evaluation and suggests an approach for the best CVI predietion for a given dataset based on meta-lesrning.

...read moreread less

Abstract: In this work, we address the hard clustering problem. We study how well clustering algorithm efficacy measures (clustering validity indices) cao rellect the clustering quality. We use assessors' estimations for cluster partition adequacy as the ground truth and explain, why tbis is the only measure that cao be used in tbis quality. We compare different clustering validity indices and show that none of them can be the universal, relleeting quality for each cluster partition. To do so, we introduce four quality measures for CVI evaluation. Also, we suggest an approach for the best CVI predietion for a given dataset based on meta-lesrning.

...read moreread less

10 citations

Showing papers presented at "Artificial Intelligence and Natural Language in 2016"

Lexical, morphological and semantic correlates of the dark triad personality traits in russian facebook texts

Measuring influencers in twitter ad-hoc discussions: active users vs. internal networks in the discourse on biryuliovo bashings in 2013

Predicting the age of social network users from user-generated texts with word embeddings

Multiword expressions in russian thesauri RuThes and RuWordnet

Improving neural network models for natural language processing in russian with synonyms

Towards cluster validity index evaluation and selection

A general method applicable to the search for anglicisms in russian social network texts

Speech analysis and synthesis systems for the tatar language

Relational machine learning author disambiguation