Language, Data and Knowledge

Conference Tools

Papers published on a yearly basis

Papers

Proceedings Article•10.4230/OASICS.LDK.2019.6•

Comparison of Different Orthographies for Machine Translation of Under-Resourced Dravidian Languages

[...]

Bharathi Raja Chakravarthi¹, Mihael Arcan¹, John P. McCrae¹•Institutions (1)

National University of Ireland, Galway¹

20 May 2019

TL;DR: This paper proposed to transcribe the native script into a common representation, such as the Latin script or the International Phonetic Alphabet (IPA), to alleviate the problem of different scripts.

...read moreread less

Abstract: Under-resourced languages are a significant challenge for statistical approaches to machine translation, and recently it has been shown that the usage of training data from closely-related languages can improve machine translation quality of these languages. While languages within the same language family share many properties, many under-resourced languages are written in their own native script, which makes taking advantage of these language similarities difficult. In this paper, we propose to alleviate the problem of different scripts by transcribing the native script into common representation i.e. the Latin script or the International Phonetic Alphabet (IPA). In particular, we compare the difference between coarse-grained transliteration to the Latin script and fine-grained IPA transliteration. We performed experiments on the language pairs English-Tamil, English-Telugu, and English-Kannada translation task. Our results show improvements in terms of the BLEU, METEOR and chrF scores from transliteration and we find that the transliteration into the Latin script outperforms the fine-grained IPA transcription.

...read moreread less

48 citations

Book Chapter•10.1007/978-3-319-59888-8_6•

CoNLL-RDF: Linked Corpora Done in an NLP-Friendly Way

[...]

Christian Chiarcos¹, Christian Fäth¹•Institutions (1)

Goethe University Frankfurt¹

19 Jun 2017

TL;DR: CoNLL-RDF as mentioned in this paper is a direct rendering of the CoNLL format in RDF, accompanied by a formatter whose output mimicks the original TSV-style layout.

...read moreread less

Abstract: We introduce CoNLL-RDF, a direct rendering of the CoNLL format in RDF, accompanied by a formatter whose output mimicks CoNLL’s original TSV-style layout. CoNLL-RDF represents a middle ground that accounts for the needs of NLP specialists (easy to read, easy to parse, close to conventional representations), but that also facilitates LLOD integration by applying off-the-shelf Semantic Web technology to CoNLL corpora and annotations. The CoNLL-RDF infrastructure is published as open source. We also provide SPARQL update scripts for selected use cases as described in this paper.

...read moreread less

41 citations

Book Chapter•10.1007/978-3-319-59888-8_29•

Measuring Accuracy of Triples in Knowledge Graphs

[...]

Shuangyan Liu¹, Mathieu d'Aquin¹, Enrico Motta¹•Institutions (1)

Open University¹

19 Jun 2017

TL;DR: This paper introduces an automatic approach, Triples Accuracy Assessment (TAA), for validating RDF triples (source triples) in a knowledge graph by finding consensus of matched triples from other knowledge graphs by applying different matching methods between the predicates of source triples and target triples.

...read moreread less

Abstract: An increasing amount of large-scale knowledge graphs have been constructed in recent years. Those graphs are often created from text-based extraction, which could be very noisy. So far, cleaning knowledge graphs are often carried out by human experts and thus very inefficient. It is necessary to explore automatic methods for identifying and eliminating erroneous information. In order to achieve this, previous approaches primarily rely on internal information i.e. the knowledge graph itself. In this paper, we introduce an automatic approach, Triples Accuracy Assessment (TAA), for validating RDF triples (source triples) in a knowledge graph by finding consensus of matched triples (among target triples) from other knowledge graphs. TAA uses knowledge graph interlinks to find identical resources and apply different matching methods between the predicates of source triples and target triples. Then based on the matched triples, TAA calculates a confidence score to indicate the correctness of a source triple. In addition, we present an evaluation of our approach using the FactBench dataset for fact validation. Our findings show promising results for distinguishing between correct and wrong triples.

...read moreread less

23 citations

Book Chapter•10.1007/978-3-319-59888-8_14•

Towards Interoperability in the European Poetry Community: The Standardization of Philological Concepts

[...]

Helena Bermúdez-Sabel¹, Mariana Curado Malta¹, Elena González-Blanco¹•Institutions (1)

National University of Distance Education¹

19 Jun 2017

TL;DR: This paper presents the methodology followed in the process of defining the concepts of the domain model of this MAP, as well as some issues that arise when labeling philological terms.

...read moreread less

Abstract: This paper stems from the Poetry Standardization and Linked Open Data project (POSTDATA) As its name reveals, one of the main aims of POSTDATA is to provide a means to publish European poetry (EP) data as Linked Open Data (LOD) Thus, developing a metadata application profile (MAP) as a common semantic model to be used by the EP community is a crucial step of this project This MAP will enhance interoperability among the community members in particular, and among the EP community and other contexts in general (eg bibliographic records) This paper presents the methodology followed in the process of defining the concepts of the domain model of this MAP, as well as some issues that arise when labeling philological terms

...read moreread less

18 citations

Book Chapter•10.1007/978-3-319-59888-8_10•

Named entity linking in a complex domain: Case second world war history

[...]

Erkki Heino¹, Erkki Heino², Minna Tamper², Minna Tamper¹, Eetu Mäkelä¹, Eetu Mäkelä², Petri Leskinen², Petri Leskinen¹, Esko Ikkala¹, Esko Ikkala², Jouni Tuominen², Jouni Tuominen¹, Mikko Koho¹, Mikko Koho², Eero Hyvönen¹, Eero Hyvönen² - Show less +12 more•Institutions (2)

Aalto University¹, University of Helsinki²

19 Jun 2017

TL;DR: The challenges of applying named entity linking in a rich, complex domain – specifically, the linking of military units, places and people in the context of interlinked Second World War data are discussed.

...read moreread less

Abstract: This paper discusses the challenges of applying named entity linking in a rich, complex domain – specifically, the linking of (1) military units, (2) places and (3) people in the context of interlinked Second World War data Multiple sub-scenarios are discussed in detail through concrete evaluations, analyzing the problems faced, and the solutions developed A key contribution of this work is to highlight the heterogeneity of problems and approaches needed even inside a single domain, depending on both the source data as well as the target authority

...read moreread less

18 citations

...

Expand

Year	Papers
2021	11
2019	22
2017	32

Conference Tools

Papers published on a yearly basis

Papers

Comparison of Different Orthographies for Machine Translation of Under-Resourced Dravidian Languages

CoNLL-RDF: Linked Corpora Done in an NLP-Friendly Way

Measuring Accuracy of Triples in Knowledge Graphs

Towards Interoperability in the European Poetry Community: The Standardization of Philological Concepts

Named entity linking in a complex domain: Case second world war history

Performance Metrics