Conference
Language, Data and Knowledge
About: Language, Data and Knowledge is an academic conference. The conference publishes majorly in the area(s): Computer science & RDF. Over the lifetime, 65 publications have been published by the conference receiving 452 citations.
Papers
20 May 2019
TL;DR: This paper proposed to transcribe the native script into a common representation, such as the Latin script or the International Phonetic Alphabet (IPA), to alleviate the problem of different scripts.
Abstract: Under-resourced languages are a significant challenge for statistical approaches to machine translation, and recently it has been shown that the usage of training data from closely-related languages can improve machine translation quality of these languages. While languages within the same language family share many properties, many under-resourced languages are written in their own native script, which makes taking advantage of these language similarities difficult. In this paper, we propose to alleviate the problem of different scripts by transcribing the native script into common representation i.e. the Latin script or the International Phonetic Alphabet (IPA). In particular, we compare the difference between coarse-grained transliteration to the Latin script and fine-grained IPA transliteration. We performed experiments on the language pairs English-Tamil, English-Telugu, and English-Kannada translation task. Our results show improvements in terms of the BLEU, METEOR and chrF scores from transliteration and we find that the transliteration into the Latin script outperforms the fine-grained IPA transcription.
48 citations
19 Jun 2017
TL;DR: CoNLL-RDF as mentioned in this paper is a direct rendering of the CoNLL format in RDF, accompanied by a formatter whose output mimicks the original TSV-style layout.
Abstract: We introduce CoNLL-RDF, a direct rendering of the CoNLL format in RDF, accompanied by a formatter whose output mimicks CoNLL’s original TSV-style layout. CoNLL-RDF represents a middle ground that accounts for the needs of NLP specialists (easy to read, easy to parse, close to conventional representations), but that also facilitates LLOD integration by applying off-the-shelf Semantic Web technology to CoNLL corpora and annotations. The CoNLL-RDF infrastructure is published as open source. We also provide SPARQL update scripts for selected use cases as described in this paper.
41 citations
19 Jun 2017
TL;DR: This paper introduces an automatic approach, Triples Accuracy Assessment (TAA), for validating RDF triples (source triples) in a knowledge graph by finding consensus of matched triples from other knowledge graphs by applying different matching methods between the predicates of source triples and target triples.
Abstract: An increasing amount of large-scale knowledge graphs have been constructed in recent years. Those graphs are often created from text-based extraction, which could be very noisy. So far, cleaning knowledge graphs are often carried out by human experts and thus very inefficient. It is necessary to explore automatic methods for identifying and eliminating erroneous information. In order to achieve this, previous approaches primarily rely on internal information i.e. the knowledge graph itself. In this paper, we introduce an automatic approach, Triples Accuracy Assessment (TAA), for validating RDF triples (source triples) in a knowledge graph by finding consensus of matched triples (among target triples) from other knowledge graphs. TAA uses knowledge graph interlinks to find identical resources and apply different matching methods between the predicates of source triples and target triples. Then based on the matched triples, TAA calculates a confidence score to indicate the correctness of a source triple. In addition, we present an evaluation of our approach using the FactBench dataset for fact validation. Our findings show promising results for distinguishing between correct and wrong triples.
23 citations
19 Jun 2017
TL;DR: This paper presents the methodology followed in the process of defining the concepts of the domain model of this MAP, as well as some issues that arise when labeling philological terms.
Abstract: This paper stems from the Poetry Standardization and Linked Open Data project (POSTDATA) As its name reveals, one of the main aims of POSTDATA is to provide a means to publish European poetry (EP) data as Linked Open Data (LOD) Thus, developing a metadata application profile (MAP) as a common semantic model to be used by the EP community is a crucial step of this project This MAP will enhance interoperability among the community members in particular, and among the EP community and other contexts in general (eg bibliographic records) This paper presents the methodology followed in the process of defining the concepts of the domain model of this MAP, as well as some issues that arise when labeling philological terms
18 citations
19 Jun 2017
TL;DR: The challenges of applying named entity linking in a rich, complex domain – specifically, the linking of military units, places and people in the context of interlinked Second World War data are discussed.
Abstract: This paper discusses the challenges of applying named entity linking in a rich, complex domain – specifically, the linking of (1) military units, (2) places and (3) people in the context of interlinked Second World War data Multiple sub-scenarios are discussed in detail through concrete evaluations, analyzing the problems faced, and the solutions developed A key contribution of this work is to highlight the heterogeneity of problems and approaches needed even inside a single domain, depending on both the source data as well as the target authority
18 citations
Performance Metrics
| Year | Papers |
|---|---|
| 2021 | 11 |
| 2019 | 22 |
| 2017 | 32 |