Top 19 papers presented at Artificial Intelligence and Natural Language in 2015

Showing papers presented at "Artificial Intelligence and Natural Language in 2015"

Proceedings Article•10.1109/AINL-ISMW-FRUCT.2015.7382963•

Evaluation of the modern visual SLAM methods

[...]

Arthur Huletski¹, Dmitriy Kartashov¹, Kirill Krinkin•Institutions (1)

1 Nov 2015

TL;DR: This paper compares the algorithms theoretically (based on given description) and evaluates them with TUM RGB-D benchmark and gives brief intuitive description of ORB-SLAM, LSD- SLAM, L-SlAM and OpenRatSLAM algorithms.

...read moreread less

Abstract: Simultaneous Localization and Mapping (SLAM) is a challenging task in robotics. Researchers work hard on it, so several novel SLAM algorithms as well as enhancements for the known ones are published every year. We have selected recent (2013–mid. 2015) approaches that in theory can be run on mobile robot and evaluated it. This paper gives brief intuitive description of ORB-SLAM, LSD-SLAM, L-SLAM and OpenRatSLAM algorithms, then compares the algorithms theoretically (based on given description) and evaluates them with TUM RGB-D benchmark.

...read moreread less

43 citations

Proceedings Article•10.1109/AINL-ISMW-FRUCT.2015.7382967•

Design and implementation Raspberry Pi-based omni-wheel mobile robot

[...]

Kirill Krinkin¹, Elena Stotskaya, Yury Stotskiy²•Institutions (2)

Saint Petersburg State Electrotechnical University¹, EMC Corporation²

1 Nov 2015

TL;DR: Hardware design and control software for small size omni-directional wheels robot implemented for indoor testing SLAM algorithms is described.

...read moreread less

Abstract: Nowadays simultaneous localization and mapping (SLAM) algorithms are being tested at least in two phases: software simulation and real hardware platform testing. This paper describes hardware design and control software for small size omni-directional wheels robot implemented for indoor testing SLAM algorithms.

...read moreread less

14 citations

Proceedings Article•10.1109/AINL-ISMW-FRUCT.2015.7382966•

Recurrent neural network-based language modeling for an automatic Russian speech recognition system

[...]

Irina S. Kipyatkova¹, Alexey Karpov²•Institutions (2)

Saint Petersburg State University¹, Russian Academy of Sciences²

1 Nov 2015

TL;DR: A research of recurrent neural network language models for N-best list rescoring for automatic continuous Russian speech recognition with relative word error rate reduction of 14% with respect to the baseline 3-gram model.

...read moreread less

Abstract: In the paper, we describe a research of recurrent neural network language models for N-best list rescoring for automatic continuous Russian speech recognition. We tried recurrent neural networks with different number of units in the hidden layer. We achieved the relative word error rate reduction of 14% with respect to the baseline 3-gram model.

...read moreread less

12 citations

Proceedings Article•10.1109/AINL-ISMW-FRUCT.2015.7382973•

Comparison of sentence similarity measures for Russian paraphrase identification

[...]

Ekaterina V. Pronoza¹, Elena Yagunova¹•Institutions (1)

Saint Petersburg State University¹

1 Nov 2015

TL;DR: The research disproves the supposition that it is more difficult to distinguish between precise and loose paraphrases than between loose paraphRases and non-paraphrases.

...read moreread less

Abstract: In this paper we analyze and compare different types of sentence similarity measures applied to the problem of sentential paraphrase identification. We work with Russian, and all the experiments are conducted on the Russian paraphrase corpus we have collected from the news headlines (and are collecting at the moment). Apart from the similarity measures, we also analyze the corpus itself. As a result of the research we disprove the supposition that it is more difficult to distinguish between precise and loose paraphrases than between loose paraphrases and non-paraphrases. We also come up with the recommendations for the application of different similarity measures to identifying paraphrases derived from the news texts.

...read moreread less

12 citations

Proceedings Article•10.1109/AINL-ISMW-FRUCT.2015.7382965•

Discovering text reuse in large collections of documents: A study of theses in history sciences

[...]

Anton Khritankov, Pavel V. Botov, Nikolay S. Surovenko, Sergey V. Tsarkov, Dmitriy V. Viuchnov, Yuri V. Chekhovich - Show less +2 more

1 Nov 2015

TL;DR: Using algorithmic and statistical methods groups of highly connected theses with large amount of text reuse between them are discovered and works compiled from several other theses are located and point out sources of reuse.

...read moreread less

Abstract: In this paper we investigate graphs of text reuse cases in scientific degree theses in history sciences (07.xx.xx of Russian Higher Attestation Committee topic codes). Using algorithmic and statistical methods we discovered groups of highly connected theses with large amount of text reuse between them. In addition we located works compiled from several other theses and point out sources of reuse.

...read moreread less

12 citations

Proceedings Article•10.1109/AINL-ISMW-FRUCT.2015.7382975•

Morpho-syntactic parsing based on neural networks and corpus data

[...]

Roman Rybka¹, Alexander Sboev¹, Ivan Moloshnikov¹, Dmitry Gudovskikh¹•Institutions (1)

Kurchatov Institute¹

1 Nov 2015

TL;DR: Methods to construct procedure of morpho-syntactic parsing based on corpus dataset analyzes are presented, which includes a method of parsing sentences on the basis of neural network algorithms and a selected set of parameters in the format of used corpus.

...read moreread less

Abstract: This article presents methods to construct procedure of morpho-syntactic parsing based on corpus dataset analyzes. It contains 1) the method to eliminate morphological ambiguities using existing morphological parsers and then converting the results of parsing into the format of the language corpus used; 2) a method of selecting parameters for syntactic parsing and assessment of the achievable accuracy of parsing, which can be provided by the data of the used corpus; 3) a method of parsing sentences on the basis of neural network algorithms and a selected set of parameters in the format of used corpus. The basis for this study are sentences with unambiguous morpho-syntactic marking from the Russian National Corpus.

...read moreread less

8 citations

Proceedings Article•10.1109/AINL-ISMW-FRUCT.2015.7382969•

An information retrieval system for technology analysis and forecasting

[...]

Nikita Nikitinsky, Dmitry Ustalov¹, Sergey Shashev•Institutions (1)

Ural Federal University¹

1 Nov 2015

TL;DR: A scientific information retrieval system designed for the Russian language that uses patents, research papers and government contracts for facilitating the expertise process by providing the experts with relevant documents is presented.

...read moreread less

Abstract: Expert evaluation of grant proposals and research projects is often facilitated by specialized decision support systems, which analyze research and industry trends in a large domain-dependent text corpus. Despite that there exist production-grade technological forecasting systems for English, Russian patent databases and citation indexes had been developed isolated from the global ones. This complicates technology analysis and forecasting in research conducted in Russia. In this paper, we present a scientific information retrieval system designed for the Russian language. The system uses patents, research papers and government contracts for facilitating the expertise process by providing the experts with relevant documents. Comparison of our system with a popular baseline shows promising results.

...read moreread less

6 citations

Proceedings Article•10.1109/AINL-ISMW-FRUCT.2015.7382972•

Communication between emergency medical system equipped with panic buttons and hospital information systems: Use case and interfaces

[...]

Ilya Paramonov¹, Andrey Vasilyev¹, Ivan Timofeev¹•Institutions (1)

Petrozavodsk State University¹

1 Nov 2015

TL;DR: Identification of typical use case of communication between emergency medical services equipped with the “panic button” and healthcare information systems, and analysis of possible ways of organization of such a communication are devoted.

...read moreread less

Abstract: For patients with a risk of out-of-hospital emergency situation quickness of the first aid provision is essential. Emergency medical services equipped with the “panic button” are aimed at reduction of the time of first aid provision. The further improvement of such services can be achieved by their communication with healthcare information systems deployed in hospitals. Such communication can be used to retrieve past medical history of the patient directly during the first aid provision, find an appropriate hospital for the patient's conveyance, automatically transmit the clinical handover information etc. This paper is devoted to identification of typical use case of communication between emergency medical services equipped with the “panic button” and healthcare information systems, and analysis of possible ways of organization of such a communication.

...read moreread less

6 citations

Proceedings Article•10.1109/AINL-ISMW-FRUCT.2015.7382968•

Twitter as a transport layer platform

[...]

Dmitry Namiot¹•Institutions (1)

Moscow State University¹

1 Nov 2015

TL;DR: This work introduces a programmable service called 411 for Twitter, which supports user-defined and application-specific commands through tweets, and describes the way information systems can use Twitter as a transport layer for own services.

...read moreread less

Abstract: Internet messengers and social networks have become an integral part of modern digital life. We have in mind not only the interaction between individual users but also a variety of applications that exist in these applications. Typically, applications for social networks use the universal login system and rely on data from social networks. Also, such applications are likely to get more traction when they are inside of the big social network like Facebook. At the same time, less attention is paid to communication capabilities of social networks. In this paper, we target Twitter as a messaging system at the first hand. We describe the way information systems can use Twitter as a transport layer for own services. Our work introduces a programmable service called 411 for Twitter, which supports user-defined and application-specific commands through tweets.

...read moreread less

5 citations

Proceedings Article•10.1109/AINL-ISMW-FRUCT.2015.7382982•

Revealing potential changes of significant terms in streams of textual data written in natural languages using windowing and text mining

[...]

Jan Zizka¹, Frantisek Darena¹•Institutions (1)

Mendel University¹

1 Nov 2015

TL;DR: The presented research deals with analyzing continuous streams of textual data written in natural languages and demonstrates that the suggested method provides reliable results.

...read moreread less

Abstract: The presented research deals with analyzing continuous streams of textual data written in natural languages. One of problems is revealing possible significant concept changes in Internet blogs, discussions, etc., together with discovering what represents such data, if it is more-or-less topically invariable or changing, and what kind of change occurred. A real-world textual dataset is analyzed using text-mining with automatically generated decision trees to find significant words that affect correct assignment of document labels (classes) and can be used for detecting noticeable changes. The changes and their detection are here modeled by assorted gradual mixture of two languages and the change degree is measured by cosine, Eucledian, and Jaccard distance (similarity), which provide qualitatively the same result. The monitoring procedure is based on analyzing successively adjacent couples of data-windows in the stream using the comparison of the current and its previous window, both represented by their lists of relevant features expressed in words. The presented results demonstrate that the suggested method provides reliable results.

...read moreread less

4 citations

Proceedings Article•10.1109/AINL-ISMW-FRUCT.2015.7382960•

A monolingual approach to detection of text reuse in Russian-English collection

[...]

Oleg Bakhteev, Rita Kuznetsova, Alexey Romanov, Anton Khritankov

1 Nov 2015

TL;DR: A method for cross-lingual (Russian and English) text reuse detection based on the monolingual approach - translation of texts into one language and reduction to the text similarity problem is developed.

...read moreread less

Abstract: In this paper we develop a method for cross-lingual (Russian and English) text reuse detection. The method is based on the monolingual approach — translation of texts into one language and reduction to the text similarity problem. We split texts into non-overlapping fragments and compare fragments to each other by means of different metrics — BLEU(1–2), ME-TEOR, cosine similarity between bag-of-words representations of each snippet, and cosine similarity between vectors obtained from doc2vec-trained model. We explore the impact of choice of metric on the quality of text reuse detection. We assess quality of the method on a sample of a hundred scientific documents, originally in Russian, machine translated into English. Preliminary findings demonstrate feasibility of the approach.

...read moreread less

Proceedings Article•10.1109/AINL-ISMW-FRUCT.2015.7382979•

Multi-representation approach to text regression of financial risks

[...]

Roman Trusov¹, Alexey Natekin², Pavel Kalaidin, Sergey Ovcharenko, Alois Knoll³, Aida Fazylova - Show less +2 more•Institutions (3)

Saint Petersburg State University of Information Technologies, Mechanics and Optics¹, Deloitte², Technische Universität München³

1 Nov 2015

TL;DR: This article explores opportunities of using multiple text representations simultaneously within one regression task in order to exploit conventional bag of words approach with the more semantically rich embeddings and investigates performance of this multi-representation approach on the financial risk prediction problem.

...read moreread less

Abstract: Different approaches for textual feature extraction have been proposed starting with simple word count features and continuing with deeper representations capturing distributional semantics. In recent publications word embedding methods have been successfully used as a representation basis for a large number of NLP tasks like text classification, part of speech tagging and many others. In this article we explore opportunities of using multiple text representations simultaneously within one regression task in order to exploit conventional bag of words approach with the more semantically rich embeddings. We investigate performance of this multi-representation approach on the financial risk prediction problem. Publicly available 10-K reports filled by US trading companies are used as the basis for predicting next year change in stock price volatility. Our study shows that models based on single representations achieve performance that is comparable to the previously published results on risk prediction and models with multiple representations benefit from complementary information and outperform both baseline and single representation models.

...read moreread less

Proceedings Article•10.1109/AINL-ISMW-FRUCT.2015.7382981•

Implementation of the new REST API for open source LBS-platform Geo2Tag

[...]

Mark Zaslavskiy¹, Dmitry Mouromtsev¹•Institutions (1)

Saint Petersburg State University of Information Technologies, Mechanics and Optics¹

1 Nov 2015

TL;DR: The platform was improved by following challenges: data visualization, extended datetime processing, social network integration and background calculations support, and recommendations were fully implemented in API.

...read moreread less

Abstract: The article describes current state of Geo2Tag LBS platform project and new API version implementation. The platform was improved by following challenges: data visualization, extended datetime processing, social network integration and background calculations support. These challenges were justified by review of most important tendencies for geocontext applications and LBS platforms. Recommendations were fully implemented in API. Also the article contains description of new version implementation. As an example Open Data import API and specific plugin for Open Karelia system was implemented. This extension allowed performing geocontext markup of complex spatiotemporal data inside the platform.

...read moreread less

Proceedings Article•10.1109/AINL-ISMW-FRUCT.2015.7382980•

Crowdsourcing synset relations with Genus-Species-Match

[...]

Dmitry Ustalov

1 Nov 2015

TL;DR: Genus-Species-Match is presented, a crowdsourcing workflow for matching noisy pairs of synsets representing hyponymic/hypernymic relations and demonstrates F1 score of 80% on an experiment conducted on an online labor marketplace using the EMERCOM glossary and the Yet Another RussNet sense inventory.

...read moreread less

Abstract: Enabling a domain-specific lexical resource is useful for improving the performance of a natural language processing system. However, such resources may be represented in the form of glossaries—terms provided with their sense definitions. Despite the problem of integrating such domain-specific glossaries into more sophisticated general purpose resources like thesuari being highly topical, it is complicated by ambiguity of the individual terms. This paper presents Genus-Species-Match, a crowdsourcing workflow for matching noisy pairs of synsets representing hyponymic/hypernymic relations. The system demonstrates F1 score of 80% on an experiment conducted on an online labor marketplace using the EMERCOM glossary and the Yet Another RussNet sense inventory.

...read moreread less

Proceedings Article•10.1109/AINL-ISMW-FRUCT.2015.7382977•

Applying the P-medians in the design of modern systems-on-chip

[...]

Elena Suvorova¹, Nadezhda Matveeva¹, Lev Kurbanov¹•Institutions (1)

Saint Petersburg State University of Aerospace Instrumentation¹

1 Nov 2015

TL;DR: In this paper detailed describe the solving of the p-median problem for homogeneous systems-on-chip and describes different methods of calculating the P-medians.

...read moreread less

Abstract: In this paper we consider using p-medians searching algorithms in the design of modern systems-on-chip. This mathematical apparatus can be used for decision of some tasks that faced before developer. We consider the types of systems-on-chip, for which the p-median problem is useful. We describe different methods of calculating the P-medians. Also we examine which criteria can be used for searching P-medians. In this paper detailed describe the solving of the p-median problem for homogeneous systems-on-chip.

...read moreread less

Proceedings Article•10.1109/AINL-ISMW-FRUCT.2015.7382976•

Weighted finite-state transducer approach to German compound words reconstruction for Speech Recognition

[...]

Nickolay Shamraev, Alexander Batalshchikov, Mikhail Zulkarneev, Sergey Repalov, Anna Shirokova - Show less +1 more

1 Nov 2015

TL;DR: An approach is proposed for German Large Vocabulary Speech Recognition, dealing with the problem of compound words, based on unsupervised word decomposition for German words and a probabilistic method for combining the words using finite state transducers.

...read moreread less

Abstract: An approach is proposed for German Large Vocabulary Speech Recognition, dealing with the problem of compound words, based on unsupervised word decomposition for German words and a probabilistic method for combining the words using finite state transducers. The basic idea of the method is to train n-gram language model on the texts where compound words are substituted by their parts plus concatenation symbol. Thus, the context information is taken into account for the compound words and is used in the process of recombination to find most probable variant for recognition result. The advantage of this approach is the improvement of the word recognition accuracy and a more precise recombination of compound words.

...read moreread less

Proceedings Article•10.1109/AINL-ISMW-FRUCT.2015.7382974•

Arabic manuscripts identification based on Feature Relation Graph

[...]

Oleg Redkin¹, Olga Bernikova¹, Dmitry S. Shalymov¹, Vladislav A. Pavlov¹•Institutions (1)

Saint Petersburg State University¹

1 Nov 2015

TL;DR: A new metric based on the Feature Relation Graph (FRG) has proved to be effective for the text independent Persian writer identification and may be also applied to the Arabic manuscripts since Persian script is based on Arabic writing.

...read moreread less

Abstract: We investigate a new metric based on the Feature Relation Graph (FRG). This metric has proved to be effective for the text independent Persian writer identification. Since Persian script is based on Arabic writing similar principles of analysis may be also applied to the Arabic manuscripts. We have investigated the FRG for Arabic handwritten texts. Pattern based features are extracted from handwritten texts using Gabor and XGabor filters. The extracted features are represented for each author based on the FRG that plays a role of a feature vector in the classification problems. We have also investigated different parameters of the FRG.

...read moreread less

Proceedings Article•10.1109/AINL-ISMW-FRUCT.2015.7382970•

Software-to-hardware tester for the STP-ISS transport protocol verification

[...]

Valentin Olenev¹, Irina Lavrovskaya¹, Nadezhda Chumakova¹•Institutions (1)

Saint Petersburg State University of Aerospace Instrumentation¹

1 Nov 2015

TL;DR: A description of such kind of tester, which is developed to test the on-board devices that work in conformance to the STP-ISS transport protocol standard and SpaceWire networking standard, is given.

...read moreread less

Abstract: Implementation of conformance testers for the communication protocols is an important task, which is being solved in the majority of industrial companies that develop the communication equipment. Current article gives a description of such kind of tester, which is developed to test the on-board devices that work in conformance to the STP-ISS transport protocol standard and SpaceWire networking standard. We give a brief description of the possible solutions for hardware testing; provide the description of STP-ISS protocol. Then we report on implementation of the Software-to-Hardware STP-ISS tester and fields of its application.

...read moreread less

Proceedings Article•10.1109/AINL-ISMW-FRUCT.2015.7382962•

Datasets meta-feature description for recommending feature selection algorithm

[...]

Andrey Filchenkov¹, Arseniy Pendryak¹•Institutions (1)

Saint Petersburg State University of Information Technologies, Mechanics and Optics¹

1 Nov 2015

TL;DR: A meta-feature set is found which showed the best result in predicting proper feature selection algorithms and a novel approach to engineer meta-features for data preprocessing algorithms is suggested, which is based on estimating the best parametrization of processing algorithms on small subsamples.

...read moreread less

Abstract: Meta-learning is an approach for solving the algorithm selection problem, which is how to choose the best algorithm for a certain task. This task corresponds to a dataset in machine learning and data mining. The main challenge in meta-learning is to engineer a meta-feature description for datasets. In the paper we apply meta-learning for feature selection. We found a meta-feature set which showed the best result in predicting proper feature selection algorithms. We also suggested a novel approach to engineer meta-features for data preprocessing algorithms, which is based on estimating the best parametrization of processing algorithms on small subsamples.

...read moreread less