Top 32 papers presented at Artificial Intelligence and Natural Language in 2020

Showing papers presented at "Artificial Intelligence and Natural Language in 2020"

Proceedings Article•10.1109/ISAI-NLP51646.2020.9376787•

Optimization of Prediction Method of Chronic Kidney Disease Using Machine Learning Algorithm

[...]

Pronab Ghosh¹, F. M. Javed Mehedi Shamrat¹, Shahana Shultana¹, Saima Afrin¹, Atqiya Abida Anjum¹, Aliza Ahmed Khan¹ - Show less +2 more•Institutions (1)

Daffodil International University¹

18 Nov 2020

TL;DR: In this paper, the overall study has been implemented based on four reliable approaches, such as Support Vector Machine (SVM), AdaBoost (AB), Linear Discriminant Analysis (LDA), and Gradient Boosting (GB) to get highly accurate results of prediction.

...read moreread less

Abstract: Chronic Kidney disease (CKD), a slow and late-diagnosed disease, is one of the most important problems of mortality rate in the medical sector nowadays Based on this critical issue, a significant number of men and women are now suffering due to the lack of early screening systems and appropriate care each year However, patients’ lives can be saved with the fast detection of disease in the earliest stage In addition, the evaluation process of machine learning algorithm can detect the stage of this deadly disease much quicker with a reliable dataset In this paper, the overall study has been implemented based on four reliable approaches, such as Support Vector Machine (henceforth SVM), AdaBoost (henceforth AB), Linear Discriminant Analysis (henceforth LDA), and Gradient Boosting (henceforth GB) to get highly accurate results of prediction These algorithms are implemented on an online dataset of UCI machine learning repository The highest predictable accuracy is obtained from Gradient Boosting (GB) Classifiers which is about to 9980% accuracy Later, different performance evaluation metrics have also been displayed to show appropriate outcomes To end with, the most efficient and optimized algorithms for the proposed job can be selected depending on these benchmarks

...read moreread less

64 citations

Proceedings Article•10.1109/ISAI-NLP51646.2020.9376839•

Real-life Human Activity Recognition with Tri-axial Accelerometer Data from Smartphone using Hybrid Long Short-Term Memory Networks

[...]

Narit Hnoohom¹, Anuchit Jitpattanakul², Sakorn Mekruksavanich•Institutions (2)

Mahidol University¹, King Mongkut's University of Technology North Bangkok²

18 Nov 2020

TL;DR: Based on the recent success of Long Short-Term Memory (LSTM) networks for HAR domains, the authors proposes a generic framework for accelerometer data based on LSTM networks for real-life HAR.

...read moreread less

Abstract: Human activity recognition (HAR) has an enthusiastic research field in time-series classification due to its variation of successful applications in various domains. The availability of affordable wearable devices have provided many challenging and interesting research HAR problems. Current researches suggest that deep learning approaches are suited to automated feature extraction from raw sensor data, instead of conventional machine learning approaches that reply on handcrafted features. Based on the recent success of Long Short-Term Memory (LSTM) networks for HAR domains, this work proposes a generic framework for accelerometer data based on LSTM networks for real-life HAR. Four hybrid LSTM networks have been comparatively studied on a public available real-life HAR dataset. Moreover, we take advantage of Bayesian optimization techniques for tuning hyperparameter of each LSTM networks. The experimental results indicate that the CNN-LSTM network surpasses other hybrid LSTM networks.

...read moreread less

25 citations

Proceedings Article•10.1109/ISAI-NLP51646.2020.9376829•

Developed Credit Card Fraud Detection Alert Systems via Notification of LINE Application

[...]

Narumol Chumuang¹, Sansanee Hiranchan², Mahasak Ketcham², Worawut Yimyam, Patiyuth Pramkeaw¹, Sakchai Tangwannawit² - Show less +2 more•Institutions (2)

MediaTech Institute¹, King Mongkut's University of Technology North Bangkok²

18 Nov 2020

TL;DR: In this paper, a credit card fraud detection system using LINE Notify is presented. The measurement results of efficiency, accuracy, and completeness of the data were in a very good level, equal to 86.67%.

...read moreread less

Abstract: As nowadays, prevention of fraud is another important issue, researcher have initiated the idea of applying suspicious frauds in credit cards to line application. The objectives of this research are: 1) for developing the suspected credit card fraud via API LINE Notify. 2) Measure the accuracy of the developed system in the notification to prevent suspicious fraud credit card. The measurement method is comprised of five steps which are: 1) Analysis of work systems is a study and analysis of problems to determine needs. 2) System design is the process of designing research tools. 3) Developing a system is the process of developing research tools. 4) A test of the tools is executed 5) Summary of results, discussion results, and suggestions. The measurement results of efficiency, accuracy, and completeness of the data were in a very good level, equal to 86.67%. The results of the measurement of efficiency to the conditions set are very good, equal to 80.00 %. The results of the measurement on time very good, equal to 86.67%. In conclusion, the developed system accomplishes all research goals.

...read moreread less

23 citations

Book Chapter•10.1007/978-3-030-59082-6_8•

Improving Results on Russian Sentiment Datasets

[...]

Anton Golubev¹, Natalia V. Loukachevitch²•Institutions (2)

Bauman Moscow State Technical University¹, Moscow State University²

7 Oct 2020

TL;DR: In this paper, the authors compare two variants of Russian BERT and show that conversational variant of BERT performs better than standard neural network architectures (CNN, LSTM, BiLSTM) for all sentiment tasks in this study.

...read moreread less

Abstract: In this study, we test standard neural network architectures (CNN, LSTM, BiLSTM) and recently appeared BERT architectures on previous Russian sentiment evaluation datasets. We compare two variants of Russian BERT and show that for all sentiment tasks in this study the conversational variant of Russian BERT performs better. The best results were achieved by BERT-NLI model, which treats sentiment classification tasks as a natural language inference task. On one of the datasets, this model practically achieves the human level .

...read moreread less

23 citations

Proceedings Article•10.1109/ISAI-NLP51646.2020.9376824•

Image Classification of Forage Plants in Fabaceae Family Using Scale Invariant Feature Transform Method

[...]

Thidarat Pinthong, Worawut Yimyam, Narumol Chumuang¹, Mahasak Ketcham², Patiyuth Pramkeaw¹, Nattavee Utakrit² - Show less +2 more•Institutions (2)

MediaTech Institute¹, King Mongkut's University of Technology North Bangkok²

18 Nov 2020

TL;DR: In this article, a novel method for the image classification of forage plants in fabaceae family by using Scale Invariant Feature Transform (SIFT) method was proposed for image classification.

...read moreread less

Abstract: This paper proposes a novel method for the image classification of forage plants in fabaceae family by using Scale Invariant Feature Transform (SIFT) method The color image extension jpeg color mode RGB adjust the image to 1000x1000 pixels to get a single image of the template file All of the sample images, four prototype images were standard scaled and rotated The image was obtained through the image extraction process using SIFT implements and matching dataset of Forage Plants leaves with matching points to evaluate the accuracy of flea leaf identification, it was found that Senna siamea, Clitoria ternatea and Pithecellobium dulce leaves 100% accuracy but Sesbania grandiflora Desv was obtained with 0% accuracy The total accuracy of all 4 plants 75%, indicated that the photosynthesis of SIFT leaves was suitable for Senna siamea, Clitoria ternatea and Pithecellobium dulce Because it is 100% accurate, but not with Sesbania grandiflora Desv leaves The accuracy is 0% because the leaves are dark green The leaves are not clear And the leaves are slender, evenly spaced leaves, which makes it a very rare feature While Senna siamea, Clitoria ternatea and Pithecellobium dulce leaves are clear Leaf edge is unique Include appropriate techniques for recognition and classification

...read moreread less

20 citations

Proceedings Article•10.1109/ISAI-NLP51646.2020.9376784•

Simulation of Autonomous Mobile Robot System for Food Delivery in In-patient Ward with Unity

[...]

Supachai Vongbunyong¹, Salil Parth Tripathi, Kitti Thamrongaphichartkul¹, Nitisak Worrasittichai¹, Aphisit Takutruea¹, Teeraya Prayongrak¹ - Show less +2 more•Institutions (1)

King Mongkut's University of Technology Thonburi¹

18 Nov 2020

TL;DR: In this paper, an autonomous mobile robot (AMRNN) is used for delivering food and medical supplies to individual patients in order to keep the physical distance between patients and health workers.

...read moreread less

Abstract: Logistic management is crucial for effective and efficient transportation of various items in hospitals. During pandemic situations, especially COVID-19, special in-patient cohort ward is established to treat patients who require special treatment due to the quarantine protocol. Autonomous Mobile Robot (AMR) is used for delivering food and medical supplies to individual patients in order to keep the physical distance between patients and health workers. In this research, delivery by using multiple AMRs working in the in-patient ward is simulated. The simulation software is developed in Unity platform to study the operations of AMRs in various scenarios.

...read moreread less

12 citations

Book Chapter•10.1007/978-3-030-59082-6_4•

Advances of Transformer-Based Models for News Headline Generation

[...]

Alexey Bukhtiyarov¹, Ilya Gusev¹•Institutions (1)

Moscow Institute of Physics and Technology¹

7 Oct 2020

TL;DR: The authors fine-tuned two pretrained Transformer-based models (mBART and BertSumAbs) for headline generation and achieved state-of-the-art results on the RIA and Lenta datasets of Russian news.

...read moreread less

Abstract: Pretrained language models based on Transformer architecture are the reason for recent breakthroughs in many areas of NLP, including sentiment analysis, question answering, named entity recognition. Headline generation is a special kind of text summarization task. Models need to have strong natural language understanding that goes beyond the meaning of individual words and sentences and an ability to distinguish essential information to succeed in it. In this paper, we fine-tune two pretrained Transformer-based models (mBART and BertSumAbs) for that task and achieve new state-of-the-art results on the RIA and Lenta datasets of Russian news. BertSumAbs increases ROUGE on average by 2.9 and 2.0 points respectively over previous best score achieved by Phrase-Based Attentional Transformer and CopyNet.

...read moreread less

10 citations

Proceedings Article•10.1109/ISAI-NLP51646.2020.9376812•

COVID-19: Data Analysis and the situation Prediction Using Machine Learning Based on Bangladesh perspective

[...]

Abir Abdullha¹, Sheikh Abujar¹•Institutions (1)

Daffodil International University¹

18 Nov 2020

TL;DR: In this paper, the authors tried to analyze the data day by day to understand the situation and also try to use some model, algorithm, logic, analysis to find the solution to this current situation.

...read moreread less

Abstract: Most of the countries are now affected by COVID19, COVID-19 is now the name of the biggest problem in the world. Bangladesh is also affected by COVID-19. The whole country is facing this virus as the biggest problem. So try to analyze the data day by day to understand the situation. We also try to use some model, algorithm, logic, analysis to find the solution to this current situation. We are also using some machine learning algorithms to predict the future situation. Machine learning supervised are Linear Regression Model and k-nearest neighbors (KNN) Algorithms. There are different types of data sets and algorithms. We have tried to explain these well.

...read moreread less

7 citations

Book Chapter•10.1007/978-3-030-59082-6_7•

Predicting Eurovision Song Contest Results Using Sentiment Analysis

[...]

Iiro Kumpulainen¹, Eemil Praks¹, Tenho Korhonen¹, Anqi Ni¹, Ville Rissanen¹, Jouko Vankka¹ - Show less +2 more•Institutions (1)

National Defence University¹

7 Oct 2020

TL;DR: This article analyzed over a million tweets in an attempt to predict the results of the Eurovision Song Contest televoting using different methods of sentiment analysis (English, multilingual polarity lexicons and deep learning) and translating the focus language tweets into English were used to determine the method that produced the best prediction for the contest.

...read moreread less

Abstract: Over a million tweets were analyzed using various methods in an attempt to predict the results of the Eurovision Song Contest televoting. Different methods of sentiment analysis (English, multilingual polarity lexicons and deep learning) and translating the focus language tweets into English were used to determine the method that produced the best prediction for the contest. Furthermore, we analyzed the effect of sampling tweets during different periods, namely during the performances and/or during the televoting phase of the competition. The quality of the predictions was assessed through correlations between the actual ranks of the televoting and the predicted ranks. The prediction was based on the application of an adjusted Eurovision televoting scoring system to the results of the sentiment analysis of tweets. A predicted rank for each performance resulted in a Spearman $\rho $ correlation coefficients of 0.62 and 0.74 during the televoting period for the lexicon sentiment-based and deep learning approaches, respectively.

...read moreread less

6 citations

Proceedings Article•10.1109/ISAI-NLP51646.2020.9376830•

Feature Extraction with SHAP Value Analysis for Student Performance Evaluation in Remote Collaboration

[...]

Mako Komatsu¹, Chihiro Takada¹, Chihiro Neshi, Teruhiko Unoki, Mikifumi Shikida¹ - Show less +1 more•Institutions (1)

Kochi University of Technology¹

18 Nov 2020

TL;DR: In this article, a remote teaching support system for group discussion was developed to help reduce the burden of teachers by analyzing the video images and using the features obtained from the videos to evaluate student performances in group discussion.

...read moreread less

Abstract: In recent years, group discussions are becoming an important part of corporate recruitment examinations in Japan. Developing a remote teaching support system for group discussion will help reduce the burden of teachers. As a part of our project, this study aims to support teachers who need effective teaching method in remote group discussions by analyzing the video images. In this study, we used the features obtained from the videos. Students performances in group discussion were assessed automatically by classification, and important features were selected for teaching from the SHapley Additive exPlanations(SHAP) values.

...read moreread less

5 citations

Book Chapter•10.1007/978-3-030-59082-6_10•

Dataset for Evaluation of Mathematical Reasoning Abilities in Russian

[...]

Mikhail Nefedov¹•Institutions (1)

National Research University – Higher School of Economics¹

7 Oct 2020

TL;DR: The authors presented a Russian version of DeepMind Mathematics Dataset, which is synthetically generated using inference rules and a set of linguistic templates, and translated the linguistic templates to Russian leaving the inference part without changes.

...read moreread less

Abstract: We present a Russian version of DeepMind Mathematics Dataset. The original dataset is synthetically generated using inference rules and a set of linguistic templates. We translate the linguistic templates to Russian leaving the inference part without changes. So as a result we get a mathematically parallel dataset where the same mathematical problems are explored but in another language. We reproduce the experiment from the original paper to check whether the performance of a Transformer model is impacted by the differences of the languages in which math problems are expressed. Though our contribution is small compared to the original work, we think it is valuable given the fact that languages other than English (and Russian in particular) are underrepresented.

...read moreread less

Book Chapter•10.1007/978-3-030-59082-6_11•

Searching Case Law Judgments by Using Other Judgments as a Query

[...]

Sami Sarsa¹, Eero Hyvönen¹•Institutions (1)

Aalto University¹

7 Oct 2020

TL;DR: It is shown that a linear combination of similarities derived from the individual models provides a robust automatic similarity assessment for ranking the case law documents for retrieval.

...read moreread less

Abstract: This paper presents an effective method for case law retrieval based on semantic document similarity and a web application for querying Finnish case law. The novelty of the work comes from the idea of using legal documents for automatic formulation of the query, including case law judgments, legal case descriptions, or other texts. The query documents may be in various formats, including image files with text content. This approach allows efficient search for similar documents without the need to specify a query string or keywords, which can be difficult in this use case. The application leverages two traditional word frequency based methods, TF-IDF and LDA, alongside two modern neural network methods, Doc2Vec and Doc2VecC. Effectiveness of the approach for document relevance ranking has been evaluated using a gold standard set of inter-document similarities. We show that a linear combination of similarities derived from the individual models provides a robust automatic similarity assessment for ranking the case law documents for retrieval.

...read moreread less

Book Chapter•10.1007/978-3-030-59082-6_14•

Matching LIWC with Russian Thesauri: An Exploratory Study

[...]

Polina Panicheva¹, Polina Panicheva², Tatiana Litvinova²•Institutions (2)

National Research University – Higher School of Economics¹, Pedagogical University²

7 Oct 2020

TL;DR: This article used linguistically motivated thesauri to analyze the psychologically meaningful word categories in two Author Profiling tasks based on Russian texts, and found that linguistically-motivated thesaurus not only provide objective and linguistic motivated content, but also result in significant correlates of certain psychological states, replicating evidence obtained with handcrafted lexical resources.

...read moreread less

Abstract: In Author Profiling research, there is a growing interest in lexical resources providing various psychologically meaningful word categories. One of such instruments is Linguistic Inquiry and Word Count, which was compiled manually in English and translated into many other languages. We argue that the resource contains a lot of subjectivity, which is further increased in the translation process. As a result, the translated lexical resource is not linguistically transparent. In order to address this issue, we translate the resource from English to Russian semi-automatically, analyze the translation in terms of agreement and match the resulting translation with two Russian linguistic thesauri. One of the thesauri is chosen as a better match for the psychologically meaningful categories in question. We further apply the linguistic thesaurus to analyze the psychologically meaningful word categories in two Author Profiling tasks based on Russian texts. Our results indicate that linguistically-motivated thesauri not only provide objective and linguistically motivated content, but also result in significant correlates of certain psychological states, replicating evidence obtained with hand-crafted lexical resources.

...read moreread less

Proceedings Article•10.1109/ISAI-NLP51646.2020.9376838•

Direction of Arrival Identification Using MUSIC Method and NLMS Beamforming

[...]

Raungrong Suleesathira

18 Nov 2020

TL;DR: The capability of the direction of arrival (DOA) identification to determine which the estimated DOA belongs to the desired signal and to undesired signals is provided.

...read moreread less

Abstract: This paper provides the capability of the direction of arrival (DOA) identification to determine which the estimated DOA belongs to the desired signal and to undesired signals. One of the well known subspace-based methods for finding directions is MUSIC (MUltiple Signal Classification). The separation of signal and noise subspaces is the crucial step to give the precise estimation. The skewness coefficient is proposed to reinforce the conventional MUSIC method for the subspace division without knowing the number of source signals. The normalized least mean square (NLMS) beamforming is used to compute the weight vector so that it directs the mainbeam towards the desired user. The angle of the mainbeam is identified to be the DOA of the desired signal which makes the rest estimated DOAs belong to interference signals. The application of the DOA identification is shown to be advantageous to the null broadening beamforming. The simulation results confirm the effectiveness of the proposed method in the case of limited snapshots.

...read moreread less

Book Chapter•10.1007/978-3-030-59082-6_6•

Unsupervised Neural Aspect Extraction with Related Terms

[...]

Timur Sokhin¹, Maria Khodorchenko¹, Nikolay Butakov¹•Institutions (1)

Saint Petersburg State University of Information Technologies, Mechanics and Optics¹

7 Oct 2020

TL;DR: In this article, a novel unsupervised neural network with convolutional multi-attention mechanism is presented, which allows extracting pairs (aspect, term) simultaneously, and demonstrate the effectiveness on the real-world dataset.

...read moreread less

Abstract: The tasks of aspect identification and term extraction remain challenging in natural language processing. While supervised methods achieve high scores, it is hard to use them in real-world applications due to the lack of labelled datasets. Unsupervised approaches outperform these methods on several tasks, but it is still a challenge to extract both an aspect and a corresponding term, particularly in the multi-aspect setting. In this work, we present a novel unsupervised neural network with convolutional multi-attention mechanism, that allows extracting pairs (aspect, term) simultaneously, and demonstrate the effectiveness on the real-world dataset. We apply a special loss aimed to improve the quality of multi-aspect extraction. The experimental study demonstrates, what with this loss we increase the precision not only on this joint setting but also on aspect prediction only.

...read moreread less

Book Chapter•10.1007/978-3-030-59082-6_2•

Automatic Detection of Hidden Communities in the Texts of Russian Social Network Corpus

[...]

Ivan Mamaev¹, Olga Mitrofanova¹•Institutions (1)

Saint Petersburg State University¹

7 Oct 2020

TL;DR: This article proposed a linguistically-rich approach to hidden community detection which was tested in experiments with the Russian corpus of VKontakte posts and revealed specific linguistic parameters of Russian posts were revealed for correct language processing.

...read moreread less

Abstract: This paper proposes a linguistically-rich approach to hidden community detection which was tested in experiments with the Russian corpus of VKontakte posts. Modern algorithms for hidden community detection are based on graph theory, these procedures leaving out of account the linguistic features of analyzed texts. The authors have developed a new hybrid approach to the detection of hidden communities, combining author-topic modeling and automatic topic labeling. Specific linguistic parameters of Russian posts were revealed for correct language processing. The results justify the use of the algorithm that can be further integrated with already developed graph methods.

...read moreread less

Proceedings Article•10.1109/ISAI-NLP51646.2020.9376835•

Myanmar POS Resource Extension Effects on Automatic Tagging Methods

[...]

Zar Zar Hlaing¹, Ye Kyaw Thu², Myat Myo Nwe Wai³, Thepchai Supnithi², Ponrudee Netisopakul¹ - Show less +1 more•Institutions (3)

King Mongkut's Institute of Technology Ladkrabang¹, NECTEC², Myanmar Institute of Information Technology³

18 Nov 2020

TL;DR: In this paper, the authors manually extended the original myPOS corpus as myPOS version 20 and the size of the extended corpus becomes approximately triple size of original my-POS corpus to evaluate the effects of the extension corpus versus the original corpus, the accuracies of four supervised tagging algorithms, namely, CRF, Hidden Markov Model (HMM), Ripple Down Rules based (RDR), and neural sequence labeling approach of Conditional Random Fields $(\mathrm{NCRF})$ are compared

...read moreread less

Abstract: Part-of-speech (POS) tagging is the process of assigning the part-of-speech tag or other lexical class marker to each word in a sentence It is also one of the most important steps in Natural Language Processing (NLP) task pipeline There are several research works in Myanmar POS tagging implemented with different approaches However, there is only one publicly available tagged corpus named myPOS corpus The size of this corpus is only 11 thousand sentences It is not enough to train downstream NLP tasks, such as machine learning For this reason, we manually extended the original myPOS corpus as myPOS version 20 and the size of the extended corpus becomes approximately triple size of the original myPOS corpus To evaluate the effects of the extended corpus versus the original corpus, the accuracies of four supervised tagging algorithms, namely, Conditional Random Fields (CRFs), Hidden Markov Model (HMM), Ripple Down Rules based (RDR), and neural sequence labeling approach of Conditional Random Fields $(\mathrm{NCRF}^{++})$ are compared The results showed that the extended myPOS version 20 improved the accuracies of automatic POS tagging methods compared with the original myPOS

...read moreread less

Book Chapter•10.1007/978-3-030-59082-6_3•

Dialog Modelling Experiments with Finnish One-to-One Chat Data

[...]

Janne Kauttonen¹, Lili Aunimo¹•Institutions (1)

Haaga-Helia University of Applied Sciences¹

7 Oct 2020

TL;DR: The authors used TF-IDF, StarSpace, ESIM and BERT methods to extract responses from a public library question-answering (QA) data and a private medical chat data.

...read moreread less

Abstract: We analyzed two conversational corpora in Finnish: A public library question-answering (QA) data and a private medical chat dataẆe developed response retrieval (ranking) models using TF-IDF, StarSpace, ESIM and BERT methods. These four represent techniques ranging from the simple and classical ones to recent pretrained transformer neural networks. We evaluated the effect of different preprocessing strategies, including raw, casing, lemmatization and spell-checking for the different methods. Using our medical chat data, We also developed a novel three-stage preprocessing pipeline with speaker role classification. We found the BERT model pretrained with Finnish (FinBERT) an unambiguous winner in ranking accuracy, reaching 92.2% for the medical chat and 98.7% for the library QA in the 1-out-of-10 response ranking task where the chance level was 10%. The best accuracies were reached using uncased text with spell-checking (BERT models) or lemmatization (non-BERT models). The role of preprocessing had less impact for BERT models compared to the classical and other neural network models. Furthermore, we found the TF-IDF method still a strong baseline for the vocabulary-rich library QA task, even surpassing the more advanced StarSpace method. Our results highlight the complex interplay between preprocessing strategies and model type when choosing the optimal approach in chat-data modelling. Our study is the first work on dialogue modelling using neural networks for the Finnish language. It is also first of the kind to use real medical chat data. Our work contributes towards the development of automated chatbots in the professional domain.

...read moreread less

Book Chapter•10.1007/978-3-030-59082-6_13•

Finding New Multiword Expressions for Existing Thesaurus

[...]

Petr Rossyaykin¹, Natalia V. Loukachevitch¹•Institutions (1)

Moscow State University¹

7 Oct 2020

TL;DR: In this paper, the authors study the task of adding new multiword expressions (MWEs) into an existing thesaurus, focusing on nominal bigrams (Adj-Noun and Nounnoun) in Russian.

...read moreread less

Abstract: In this paper we study the task of adding new multiword expressions (MWEs) into an existing thesaurus. Standard methods of MWE discovery (statistical, context, distributional measures) can efficiently detect the most prominent MWEs. However, given a large number of MWEs already present in a lexical resource those methods fail to provide sufficient results in extracting unseen expressions. We show that the information deduced from the thesaurus itself is more useful than observed frequency and other corpus statistics in detecting less prominent expressions. Focusing on nominal bigrams (Adj-Noun and Noun-Noun) in Russian, we propose a number of measures making use of thesaurus statistics (e.g. the number of expressions with a given word present in the thesaurus), which significantly outperform standard methods based on corpus statistics or word embeddings.

...read moreread less

Proceedings Article•10.1109/ISAI-NLP51646.2020.9376786•

Utilization-Weighted Algorithm for Spreading Factor Assignment in LoRaWAN

[...]

Kasama Kamonkusonman¹, Rardchawadee Silapunt¹•Institutions (1)

King Mongkut's University of Technology Thonburi¹

18 Nov 2020

TL;DR: In this paper, the authors proposed the utilization weighted (UW) algorithm, which is the spreading factor management algorithm designed based on the M/D/1 queue theory, which helps form groups of nodes assigned with different spreading factors (SFs).

...read moreread less

Abstract: Long Range Wide Area Network (LoRaWAN) is one of the leading low power wireless networks that can support thousands of Internet of Things (IoT) devices. To enhance the scalability of LoRaWAN, this paper proposes the UtilizationWeighted (UW) algorithm, which is the spreading factor management algorithm designed based on the M/D/1 queue theory. The main concept of this algorithm is channel utilization balancing that helps form groups of nodes assigned with different spreading factors (SFs). The simulations are performed under two scenarios that are similar and various uplink time interval among SFs. The results show that our UW algorithm can outperform the traditional Min-airtime method in both scenarios. The packet received rate (PRR) of the UW algorithm is clearly higher than that of the Min-airtime method for all number of nodes and time intervals. Especially in the various time interval simulation of the networks of 120, 600, and 1,200 nodes, the maximum PRR improvements occur at 1, 3, and 5 times of the minimum time interval between uplinks, T 0ffl , respectively, and are around 34%, 36%, and 35%, respectively.

...read moreread less

Proceedings Article•10.1109/ISAI-NLP51646.2020.9376833•

A Conversational Agent for Database Query: A Use Case for Thai People Map and Analytics Platform

[...]

Thikamporn Simud¹, Somchoke Ruengittinun¹, Navaporn Surasvadi, Nuttapong Sanglerdsinlapachai, Anon Plangprasopchok - Show less +1 more•Institutions (1)

Kasetsart University¹

18 Nov 2020

TL;DR: In this paper, a Thai conversational agent was developed on top of TPMAP to support self-service data analytics on complex queries, where users can simply use natural language to fetch information from a chatbot and the query results are presented to users in easy-to-use formats such as statistics and charts.

...read moreread less

Abstract: Since 2018, Thai People Map and Analytics Platform (TPMAP) has been developed with the aims of supporting government officials and policy makers with integrated household and community data to analyze strategic plans, implement policies and decisions to alleviate poverty. However, to acquire complex information from the platform, non-technical users with no database background have to ask a programmer or a data scientist to query data for them. Such a process is time-consuming and might result in inaccurate information retrieved due to miscommunication between non-technical and technical users. In this paper, we have developed a Thai conversational agent on top of TPMAP to support self-service data analytics on complex queries. Users can simply use natural language to fetch information from our chatbot and the query results are presented to users in easy-to-use formats such as statistics and charts. The proposed conversational agent retrieves and transforms natural language queries into query representations with relevant entities, query intentions, and output formats of the query. We employ Rasa, an open-source conversational AI engine, for agent development. The results show that our system yields Fl-score of 0.9747 for intent classification and 0.7163 for entity extraction. The obtained intents and entities are then used for query target information from a graph database. Finally, our system achieves end-to-end performance with accuracies ranging from 57.5%-80.0%, depending on query message complexity. The generated answers are then returned to users through a messaging channel.

...read moreread less

Proceedings Article•10.1109/ISAI-NLP51646.2020.9376834•

Estimation of Oil Content in Oil Palm Fresh Fruit Bunch by Its Surface Color

[...]

Sutat Sae-Tang¹•Institutions (1)

NECTEC¹

18 Nov 2020

TL;DR: In this paper, artificial intelligence (AI) was applied to estimate the oil content in a fresh fruit bunch (FFB) using two popular types of oil palms in Thailand, Nigrescene and Virescene.

...read moreread less

Abstract: Oil palm is one of the potential tree crops in Thailand. However, the production of oil palm has been experienced many aspects. Price factor is also one of the problems. Price of oil palm depends on the amount of oil content in the oil palm fruit which are estimated by an expert. The main consideration is the ripeness of the oil palm fresh fruit bunches. An expert determines using its surface color. A different experience of experts leads to a different estimation. The problem may be solved using the chemical analysis methods which more accurate. However, it takes time and uncomfortable. In this research, artificial intelligence (AI) will be applied to estimate the oil content in a fresh fruit bunch (FFB). Two popular types of oil palms in Thailand are used in this work. The Nigrescene fruit, color varies from dark purple to red orange depending on its gene and ripeness. The Virescene fruit, color changes from green to orange. The surface color of an oil palm fruit and structure of the bunch were considered as the feature set. An oil palm FFB image from a smartphone camera was fed to the model for predicting the oil content in FFB. Several models such as multi linear regression, artificial neural network and convolution neural network will be observed. The measure of the quality’s model uses the root mean square error (RMSE). The convolution neural network produces the average of RMSE at 727 for Nigrescene and at 4.83 for Virescene.

...read moreread less

Proceedings Article•10.1109/ISAI-NLP51646.2020.9376815•

A Memetic Algorithm for Tour Trip Design Problem

[...]

Apisit Cheng¹, Aussadavut Dumrongsiri¹•Institutions (1)

Sirindhorn International Institute of Technology¹

18 Nov 2020

TL;DR: In this paper, a Memetic algorithm which is a combination of genetics algorithm and local search algorithm was created to solve the problem of tour trip design in Thailand using real data gathered from trusted tourist community in Thailand such as TripAdvisor.

...read moreread less

Abstract: to design a tour plan which provide a maximum satisfaction, before have any experiences with the destination can be hard and time consuming process. The goal of this study is to create an algorithm that efficiently generate a tour plan with high or maximum satisfaction within a reasonable processing time. The memetic algorithm which is a combination of genetics algorithm and local search algorithm would be created to solve this problem. This study used real data gathered from trusted tourist community in Thailand such as TripAdvisor.com, Wongnai.com, etc. The result of this study shown Memetic Algorithm (MA) approach could solve tour trip design problem efficiently since both saving in computation time and % gap are in a good shape and well-balanced.

...read moreread less

Book Chapter•10.1007/978-3-030-59082-6_15•

Chinese-Russian Shared Task on Multi-domain Translation

[...]

Valentin Malykh¹, Varvara Logacheva²•Institutions (2)

Huawei¹, Skolkovo Institute of Science and Technology²

7 Oct 2020

TL;DR: This paper presents the first shared task on Machine Translation from Chinese into Russian, which is the only MT competition for this pair of languages to date and the task for participants was to train a general-purpose MT system which performs reasonably well on very diverse text domains and styles without additional fine-tuning.

...read moreread less

Abstract: We present the results the first shared task on Machine Translation (MT) from Chinese into Russian, which is the only MT competition for this pair of languages to date. The task for participants was to train a general-purpose MT system which performs reasonably well on very diverse text domains and styles without additional fine-tuning. 11 teams participated in the competition, some of the submitted models showed reasonably good performance topping at 19.7 BLEU.

...read moreread less

Book Chapter•10.1007/978-3-030-59082-6_12•

GenPR: Generative PageRank Framework for Semi-supervised Learning on Citation Graphs

[...]

Mikhail Kamalov¹, Konstantin Avrachenkov¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

7 Oct 2020

TL;DR: In this paper, the authors propose a framework focused on embedding PageRank SSL in a generative model, which allows one to do joint training of nodes latent space representation and label spreading through the reweighted adjacency matrix by node similarities in the latent space.

...read moreread less

Abstract: Nowadays, Semi-Supervised Learning (SSL) on citation graph data sets is a rapidly growing area of research. However, the recently proposed graph-based SSL algorithms use a default adjacency matrix with binary weights on edges (citations), that causes a loss of the nodes (papers) similarity information. In this work, therefore, we propose a framework focused on embedding PageRank SSL in a generative model. This framework allows one to do joint training of nodes latent space representation and label spreading through the reweighted adjacency matrix by node similarities in the latent space. We explain that a generative model can improve accuracy and reduce the number of iteration steps for PageRank SSL. Moreover, we show that our framework outperforms the best graph-based SSL algorithms on four public citation graph data sets and improves the interpretability of classification results.

...read moreread less

Proceedings Article•10.1109/ISAI-NLP51646.2020.9376779•

A proposal of evaluation method using a pressure sensor for supporting auscultation training

[...]

Yuki Kodera¹, Kunimasa Yagi², Mikifumi Shikida¹•Institutions (2)

Kochi University of Technology¹, University of Toyama²

18 Nov 2020

TL;DR: In this article, the authors proposed an evaluation method for auscultation pressure using a pressure sensor for a purpose of supporting clinical training, which is one kind of clinical training.

...read moreread less

Abstract: Japanese medical education has been focused on improving clinical skills lately In clinical training, there are many training such as medical interview, palpation, and auscultation However, assessment points of these training are not quantified Therefore, it is difficult for a trainer to check clinical skills and attitudes of student doctors objectively Auscultation is a fundamental skill, but it is difficult to assess objectively and, therefore, difficult to give appropriate feedback In this paper, we proposed an evaluation method for auscultation pressure using a pressure sensor for a purpose of supporting auscultation training, which is one kind of clinical training In addition, we implemented a prototype system, and collected pressure values during an actual doctor’s examination Moreover, We discussed feature extraction method for supporting auscultation training from the collected data Furthermore, we described that the proposed method is useful as one of ways for supporting the auscultation training

...read moreread less

Proceedings Article•10.1109/ISAI-NLP51646.2020.9376813•

Cryptocurrencies Asset Pricing Analysis: evidence from Thailand markets

[...]

Kanyawut Ariya¹, Nathee Naktnasukanjn¹, Tanarat Rattanadamrongaksorn¹, Piyachat Udomwong¹, Saronsad Sokantika¹, Nopasit Chakpitak¹ - Show less +2 more•Institutions (1)

Chiang Mai University¹

18 Nov 2020

TL;DR: In this article, the authors evaluated the relationship between cryptocurrencies price variations and exogenous classical market prices by using daily data on some of the most important asset prices and indexes in Thailand and found strong direct relationship among cryptocurrencies in digital market with SET50 index and oil price.

...read moreread less

Abstract: Can cryptocurrencies price variations be explained by exogenous classical market prices? We evaluate this issue by using daily data on some of the most important asset prices and indexes in Thailand i.e. Gold, Oil, SET50 index, Tourism index, Mutual fund, and THB/USD exchange rate in comparison with digital asset prices i.e. Bitcoin, Ethereum, Litecoin, Ripple, DASH, and Stellar. By performing both direct and inverse relationships using correlation matrix to find distance relationship and using minimum spanning tree to find the closest path between assets, we found strong direct relationship among cryptocurrencies in digital market with SET50 index and oil price in classical markets. We also found that THB-USD exchange rate has inverse relationship with Bitcoin price, SET50 index and oil price. There is a link between cryptocurrencies asset price and some classical assets’ market price.

...read moreread less

Book Chapter•10.1007/978-3-030-59082-6_1•

PolSentiLex: Sentiment Detection in Socio-Political Discussions on Russian Social Media

[...]

Olessia Koltsova¹, Svetlana Alexeeva², Sergei Pashakhin¹, Sergei Koltsov¹•Institutions (2)

National Research University – Higher School of Economics¹, Saint Petersburg State University²

7 Oct 2020

TL;DR: The authors presented a freely available Russian language sentiment lexicon PolSentiLex designed to detect sentiment in user-generated content related to social and political issues, which was generated from a database of posts and comments of the top 2,000 LiveJournal bloggers posted during one year ($\sim $1.5 million posts and 20 million comments).

...read moreread less

Abstract: We present a freely available Russian language sentiment lexicon PolSentiLex designed to detect sentiment in user-generated content related to social and political issues. The lexicon was generated from a database of posts and comments of the top 2,000 LiveJournal bloggers posted during one year ($\sim $1.5 million posts and 20 million comments). Following a topic modeling approach, we extracted 85,898 documents that were used to retrieve domain-specific terms. This term list was then merged with several external sources. Together, they formed a lexicon (16,399 units) marked-up using a crowdsourcing strategy. A sample of Russian native speakers (n = 105) was asked to assess words’ sentiment given the context of their use (randomly paired) as well as the prevailing sentiment of the respective texts. In total, we received 59,208 complete annotations for both texts and words. Several versions of the marked-up lexicon were experimented with, and the final version was tested for quality against the only other freely available Russian language lexicon and against three machine learning algorithms. All experiments were run on two different collections. They have shown that, in terms of $\text {F}_{\text {macro}}$, lexicon-based approaches outperform machine learning by 11%, and our lexicon outperforms the alternative one by 11% on the first collection, and by 7% on the negative scale of the second collection while showing similar quality on the positive scale and being three times smaller. Our lexicon also outperforms or is similar to the best existing sentiment analysis results for other types of Russian-language texts .

...read moreread less

Proceedings Article•10.1109/ISAI-NLP51646.2020.9376782•

Behavioral Analysis of Transformer Models on Complex Grammatical Structures

[...]

Kanyanut Kriengket¹, Kanchana Saengthongpattana¹, Peerachet Porkaew¹, Vorapon Luantangsrisuk¹, Prachya Boonkwan¹, Thepchai Supnithi¹ - Show less +2 more•Institutions (1)

NECTEC¹

18 Nov 2020

TL;DR: This article presented a behavioral analysis of Transformer models in translating complex grammatical structures, i.e. multiple-word expressions and long-distance dependency, and showed that the more complex structures, the less translation accuracy the models yield.

...read moreread less

Abstract: State-of-the-art neural MT, e.g. Transformer, yields quite promising translation accuracy. However, these models are easy to be interfered by noises, causing over- and undertranslation issues. This paper presents a behavioral analysis of Transformer models in translating complex grammatical structures, i.e. multiple-word expressions and long-distance dependency. Results consistently show that the more complex structures, the less translation accuracy the models yield. We imply that as phrase structures become more complex, the focus patterns learned by the attention mechanism may get erratically sporadic due to the issue of data sparseness. We suggest the use of locality penalty and the increase of attention heads to mitigate the issue, but their trade-offs should also be aware.

...read moreread less

Book Chapter•10.1007/978-3-030-59082-6_5•

An Explanation Method for Black-Box Machine Learning Survival Models Using the Chebyshev Distance

[...]

Lev V. Utkin¹, Maxim S. Kovalev¹, Ernest M. Kasimov¹•Institutions (1)

Saint Petersburg State Polytechnic University¹

7 Oct 2020

TL;DR: SurvLIME-Inf as discussed by the authors applies the Cox proportional hazards model to approximate the black-box survival model at the local area around a test example, which leads to a simple linear programming problem for determining important features and for explaining the prediction.

...read moreread less

Abstract: A new modification of the explanation method SurvLIME called SurvLIME-Inf for explaining machine learning survival models is proposed. The basic idea behind SurvLIME as well as SurvLIME-Inf is to apply the Cox proportional hazards model to approximate the black-box survival model at the local area around a test example. The Cox model is used due to the linear relationship of covariates. In contrast to SurvLIME, the proposed modification uses $L_{\infty }$-norm for defining distances between approximating and approximated cumulative hazard functions. This leads to a simple linear programming problem for determining important features and for explaining the black-box model prediction. Moreover, SurvLIME-Inf outperforms SurvLIME when the training set is very small. Numerical experiments with synthetic and real datasets demonstrate the SurvLIME-Inf efficiency.

...read moreread less