Scispace (Formerly Typeset)
  1. Home
  2. Conferences
  3. Data and Knowledge Engineering
  4. 2022
  1. Home
  2. Conferences
  3. Data and Knowledge Engineering
  4. 2022
Showing papers presented at "Data and Knowledge Engineering in 2022"
Proceedings Article•10.1016/J.DATAK.2021.101943•
LEAPME: Learning-based Property Matching with Embeddings

[...]

Daniel Ayala1, Inma Hernández1, David Ruiz1, Erhard Rahm2•
University of Seville1, Leipzig University2
1 Jan 2022
TL;DR: This article proposed a machine learning-based property matching approach called LEAPME (LEArning-based Property Matching with Embeddings) that utilizes numerous features of both property names and instance values.
Abstract: Data integration tasks such as the creation and extension of knowledge graphs involve the fusion of heterogeneous entities from many sources. Matching and fusion of such entities require to also match and combine their properties (attributes). However, previous schema matching approaches mostly focus on two sources only and often rely on simple similarity measurements. They thus face problems in challenging use cases such as the integration of heterogeneous product entities from many sources. We therefore present a new machine learning-based property matching approach called LEAPME (LEArning-based Property Matching with Embeddings) that utilizes numerous features of both property names and instance values. The approach heavily makes use of word embeddings to better utilize the domain-specific semantics of both property names and instance values. The use of supervised machine learning helps exploit the predictive power of word embeddings. Our comparative evaluation against five baselines for several multi-source datasets with real-world data shows the high effectiveness of LEAPME. We also show that our approach is even effective when training data from another domain (transfer learning) is used.

8 citations

Journal Article•10.1016/J.DATAK.2021.101944•
SQL query extensions for imprecise questions

[...]

Marie Le Guilly1, Jean-Marc Petit1, Vasile-Marian Scuturici1•
Institut national des sciences Appliquées de Lyon1
1 Jan 2022
TL;DR: In this article, the authors propose SQL query extensions, which suggest several possible additional selection clauses, to complete the Where clause of the query, as a form of SQL query semantic autocompletion, to make it easier to write SQL queries when the initial question is imprecise.
Abstract: Within the big data tsunami, relational databases and SQL remain inescapable in most cases for accessing data. If SQL is easy-to-use and has proved its robustness over the years, it is not always easy to formulate SQL queries as it is more and more frequent to have databases with hundreds of tables and/or attributes. Identifying the pertinent conditions to select the desired data, or even the relevant attributes, is not trivial, especially when the user only has an imprecise question in mind, and is not sure of how to translate its conditions directly into SQL. To make it easier to write SQL queries when the initial question is imprecise, we propose SQL query extensions: given a query, it suggests several possible additional selection clauses, to complete the Where clause of the query, as a form of SQL query semantic autocompletion. This is helpful for both understanding the initial query’s results, and refining the query to reach the desired tuples. The process is iterative, as a query constructed using an extension can also be completed. It is also adaptable, as the number of extensions to compute is flexible. A prototype has been implemented in a SQL editor on top of a database management system, and two types of evaluation are proposed. A first one looks at the scaling of the system with a large number of tuples. Then a user study examines two questions: does the extension tool speed up the writing of SQL queries? And is it easily adopted by users? A thorough experiment was conducted on a group of 70 computer science students divided in two groups (one with the extension tool and the other one without) to answer those questions. In the end, the results showed a faster answering time for students that could use the extensions: 32 min on average to complete the test for the group with extensions, against 48 min for the others.

Tools

SciSpace AgentBiomedical AgentSciSpace RecruitSciSpace for EnterpriseAgent GalleryChat with PDFLiterature ReviewAI WriterFind TopicsParaphraserCitation GeneratorExtract DataAI DetectorCitation Booster

Learn

ResourcesLive Workshops

SciSpace

CareersSupportBrowse PapersPricingSciSpace Affiliate ProgramCancellation & Refund PolicyTermsPrivacyData Sources

Directories

PapersTopicsJournalsAuthorsConferencesInstitutionsCitation StylesWriting templates

Extension & Apps

SciSpace Chrome ExtensionSciSpace Mobile App

Contact

support@scispace.com
SciSpace

© 2026 | PubGenius Inc. | Suite # 217 691 S Milpitas Blvd Milpitas CA 95035, USA

soc2
Secured by Delve