Top 2 papers presented at Data and Knowledge Engineering in 2022

Showing papers presented at "Data and Knowledge Engineering in 2022"

Proceedings Article•10.1016/J.DATAK.2021.101943•

LEAPME: Learning-based Property Matching with Embeddings

[...]

Daniel Ayala¹, Inma Hernández¹, David Ruiz¹, Erhard Rahm²•Institutions (2)

University of Seville¹, Leipzig University²

1 Jan 2022

TL;DR: This article proposed a machine learning-based property matching approach called LEAPME (LEArning-based Property Matching with Embeddings) that utilizes numerous features of both property names and instance values.

...read moreread less

Abstract: Data integration tasks such as the creation and extension of knowledge graphs involve the fusion of heterogeneous entities from many sources. Matching and fusion of such entities require to also match and combine their properties (attributes). However, previous schema matching approaches mostly focus on two sources only and often rely on simple similarity measurements. They thus face problems in challenging use cases such as the integration of heterogeneous product entities from many sources. We therefore present a new machine learning-based property matching approach called LEAPME (LEArning-based Property Matching with Embeddings) that utilizes numerous features of both property names and instance values. The approach heavily makes use of word embeddings to better utilize the domain-specific semantics of both property names and instance values. The use of supervised machine learning helps exploit the predictive power of word embeddings. Our comparative evaluation against five baselines for several multi-source datasets with real-world data shows the high effectiveness of LEAPME. We also show that our approach is even effective when training data from another domain (transfer learning) is used.

...read moreread less

8 citations

Journal Article•10.1016/J.DATAK.2021.101944•

SQL query extensions for imprecise questions

[...]

Marie Le Guilly¹, Jean-Marc Petit¹, Vasile-Marian Scuturici¹•Institutions (1)

Institut national des sciences Appliquées de Lyon¹

1 Jan 2022

TL;DR: In this article, the authors propose SQL query extensions, which suggest several possible additional selection clauses, to complete the Where clause of the query, as a form of SQL query semantic autocompletion, to make it easier to write SQL queries when the initial question is imprecise.

...read moreread less

Abstract: Within the big data tsunami, relational databases and SQL remain inescapable in most cases for accessing data. If SQL is easy-to-use and has proved its robustness over the years, it is not always easy to formulate SQL queries as it is more and more frequent to have databases with hundreds of tables and/or attributes. Identifying the pertinent conditions to select the desired data, or even the relevant attributes, is not trivial, especially when the user only has an imprecise question in mind, and is not sure of how to translate its conditions directly into SQL. To make it easier to write SQL queries when the initial question is imprecise, we propose SQL query extensions: given a query, it suggests several possible additional selection clauses, to complete the Where clause of the query, as a form of SQL query semantic autocompletion. This is helpful for both understanding the initial query’s results, and refining the query to reach the desired tuples. The process is iterative, as a query constructed using an extension can also be completed. It is also adaptable, as the number of extensions to compute is flexible. A prototype has been implemented in a SQL editor on top of a database management system, and two types of evaluation are proposed. A first one looks at the scaling of the system with a large number of tuples. Then a user study examines two questions: does the extension tool speed up the writing of SQL queries? And is it easily adopted by users? A thorough experiment was conducted on a group of 70 computer science students divided in two groups (one with the extension tool and the other one without) to answer those questions. In the end, the results showed a faster answering time for students that could use the extensions: 32 min on average to complete the test for the group with extensions, against 48 min for the others.

...read moreread less