Scispace (Formerly Typeset)
  1. Home
  2. Conferences
  3. Data and Knowledge Engineering
  4. 2010
  1. Home
  2. Conferences
  3. Data and Knowledge Engineering
  4. 2010
Showing papers presented at "Data and Knowledge Engineering in 2010"
Journal Article•10.1016/J.DATAK.2009.10.003•
Frameworks for entity matching: A comparison

[...]

Hanna Köpcke1, Erhard Rahm1•
Leipzig University1
1 Feb 2010
TL;DR: This paper comparatively analyze 11 proposed frameworks for entity matching and considers both frameworks which do or do not utilize training data to semi-automatically find an entity matching strategy to solve a given match task.
Abstract: Entity matching is a crucial and difficult task for data integration. Entity matching frameworks provide several methods and their combination to effectively solve different match tasks. In this paper, we comparatively analyze 11 proposed frameworks for entity matching. Our study considers both frameworks which do or do not utilize training data to semi-automatically find an entity matching strategy to solve a given match task. Moreover, we consider support for blocking and the combination of different match algorithms. We further study how the different frameworks have been evaluated. The study aims at exploring the current state of the art in research prototypes of entity matching frameworks and their evaluations. The proposed criteria should be helpful to identify promising framework approaches and enable categorizing and comparatively assessing additional entity matching frameworks and their evaluations.

465 citations

Journal Article•10.1016/J.DATAK.2010.07.008•
Editorial: BeAware!-Situation awareness, the ontology-driven way

[...]

Norbert Baumgartner, Wolfgang Gottesheim1, Stefan Mitsch1, Werner Retschitzegger2, Wieland Schwinger1 •
Johannes Kepler University of Linz1, University of Vienna2
1 Nov 2010
TL;DR: BeAware!, a framework for ontology-driven information systems aiming at increasing an operator's situation awareness introduces the concept of spatio-temporal primitive relations between observed real-world objects thereby improving the reusability of the framework.
Abstract: Information overload is a severe problem for human operators of large-scale control systems as, for example, encountered in the domain of road traffic management. Operators of such systems are at risk to lack situation awareness, because existing systems focus on the mere presentation of the available information on graphical user interfaces-thus endangering the timely and correct identification, resolution, and prevention of critical situations. In recent years, ontology-based approaches to situation awareness featuring a semantically richer knowledge model have emerged. However, current approaches are either highly domain-specific or have, in case they are domain-independent, shortcomings regarding their reusability. In this paper, we present our experience gained from the development of BeAware!, a framework for ontology-driven information systems aiming at increasing an operator's situation awareness. In contrast to existing domain-independent approaches, BeAware!'s ontology introduces the concept of spatio-temporal primitive relations between observed real-world objects thereby improving the reusability of the framework. To show its applicability, a prototype of BeAware! has been implemented in the domain of road traffic management. An overview of this prototype and lessons learned for the development of ontology-driven information systems complete our contribution.

112 citations

Journal Article•10.1016/J.DATAK.2009.10.004•
Collaborative clustering with background knowledge

[...]

Germain Forestier1, Pierre Gançarski1, Cédric Wemmert1•
University of Strasbourg1
1 Feb 2010
TL;DR: In this paper, after the introduction of the collaboration process, different ways to integrate background knowledge into it are presented and how such integration in the collaborative process is beneficial is discussed.
Abstract: The aim of collaborative clustering is to make different clustering methods collaborate, in order to reach at an agreement on the partitioning of a common dataset. As different clustering methods can produce different partitioning of the same dataset, finding a consensual clustering from these results is often a hard task. The collaboration aims to make the methods agree on the partitioning through a refinement of their results. This process tends to make the results more similar. In this paper, after the introduction of the collaboration process, we present different ways to integrate background knowledge into it. Indeed, in recent years, the integration of background knowledge in clustering algorithms has been the subject of a lot of interest. This integration often leads to an improvement of the quality of the results. We discuss how such integration in the collaborative process is beneficial and we present experiments in which background knowledge is used to guide collaboration.

93 citations

Journal Article•10.1016/J.DATAK.2010.03.006•
Automatic validation of requirements to support multidimensional design

[...]

Oscar Romero1, Alberto Abelló1•
Polytechnic University of Catalonia1
1 Sep 2010
TL;DR: The most relevant step in the framework is Multidimensional Design by Examples (MDBE), which is a novel method for deriving multidimensional conceptual schemas from relational sources according to end-user requirements, and is a fully automatic approach that handles and analyzes the end- user requirements automatically.
Abstract: It is widely accepted that the conceptual schema of a data warehouse must be structured according to the multidimensional model. Moreover, it has been suggested that the ideal scenario for deriving the multidimensional conceptual schema of the data warehouse would consist of a hybrid approach (i.e., a combination of data-driven and requirement-driven paradigms). Thus, the resulting multidimensional schema would satisfy the end-user requirements and would be conciliated with the data sources. Most current methods follow either a data-driven or requirement-driven paradigm and only a few use a hybrid approach. Furthermore, hybrid methods are unbalanced and do not benefit from all of the advantages brought by each paradigm. In this paper we present our approach for multidimensional design. The most relevant step in our framework is Multidimensional Design by Examples (MDBE), which is a novel method for deriving multidimensional conceptual schemas from relational sources according to end-user requirements. MDBE introduces several advantages over previous approaches, which can be summarized as three main contributions. (i) The MDBE method is a fully automatic approach that handles and analyzes the end-user requirements automatically. (ii) Unlike data-driven methods, we focus on data of interest to the end-user. However, the user may not be aware of all the potential analyses of the data sources and, in contrast to requirement-driven approaches, MDBE can propose new multidimensional knowledge related to concepts already queried by the user. (iii) Finally, MDBE proposes meaningful multidimensional schemas derived from a validation process. Therefore, the proposed schemas are sound and meaningful.

88 citations

Journal Article•10.1016/J.DATAK.2010.01.006•
A methodology to learn ontological attributes from the Web

[...]

David Sánchez
1 Jun 2010
TL;DR: This paper uses the Web as a massive learning corpus to retrieve data and to infer information distribution using highly contextualized queries aimed at improving the quality of the result.
Abstract: Class descriptors such as attributes, features or meronyms are rarely considered when developing ontologies. Even WordNet only includes a reduced amount of part-of relationships. However, these data are crucial for defining concepts such as those considered in classical knowledge representation models. Some attempts have been made to extract those relations from text using general meronymy detection patterns; however, there has been very little work on learning expressive class attributes (including associated domain, range or data values) at an ontological level. In this paper we take this background into consideration when proposing and implementing an automatic, non-supervised and domain-independent methodology to extend ontological classes in terms of learning concept attributes, data-types, value ranges and measurement units. In order to present a general solution and minimize the data sparseness of pattern-based approaches, we use the Web as a massive learning corpus to retrieve data and to infer information distribution using highly contextualized queries aimed at improving the quality of the result. This corpus is also automatically updated in an adaptive manner according to the knowledge already acquired and the learning throughput. Results have been manually checked by means of an expert-based concept-per-concept evaluation for several well distinguished domains showing reliable results and a reasonable learning performance.

85 citations

Journal Article•10.1016/J.DATAK.2009.08.003•
Reusing ontologies and language components for ontology generation

[...]

Deryle Lonsdale1, David W. Embley1, Yihong Ding1, Li Xu2, Martin Hepp3 •
Brigham Young University1, University of Arizona2, Bundeswehr University Munich3
1 Apr 2010
TL;DR: With the implementation of this architecture, the practicality of automating ontology generation through ontology reuse is shown, and the results were encouraging, resulting in five lessons pertinent to future automated ontological reuse study.
Abstract: Realizing the Semantic Web involves creating ontologies, a tedious and costly challenge. Reuse can reduce the cost of ontology engineering. Semantic Web ontologies can provide useful input for ontology reuse. However, the automated reuse of such ontologies remains underexplored. This paper presents a generic architecture for automated ontology reuse. With our implementation of this architecture, we show the practicality of automating ontology generation through ontology reuse. We experimented with a large generic ontology as a basis for automatically generating domain ontologies that fit the scope of sample natural language web pages. The results were encouraging, resulting in five lessons pertinent to future automated ontology reuse study.

81 citations

Journal Article•10.1016/J.DATAK.2010.08.003•
Editorial: An integration of WordNet and fuzzy association rule mining for multi-label document clustering

[...]

Chun-Ling Chen1, Frank S. C. Tseng, Tyne Liang1•
National Chiao Tung University1
1 Nov 2010
TL;DR: This work proposes an effective Fuzzy-based Multi-label Document Clustering (FMDC) approach that integrates fuzzy association rule mining with an existing ontology WordNet to alleviate problems of document clustering.
Abstract: With the rapid growth of text documents, document clustering has become one of the main techniques for organizing large amount of documents into a small number of meaningful clusters. However, there still exist several challenges for document clustering, such as high dimensionality, scalability, accuracy, meaningful cluster labels, overlapping clusters, and extracting semantics from texts. In order to improve the quality of document clustering results, we propose an effective Fuzzy-based Multi-label Document Clustering (FMDC) approach that integrates fuzzy association rule mining with an existing ontology WordNet to alleviate these problems. In our approach, the key terms will be extracted from the document set, and the initial representation of all documents is further enriched by using hypernyms of WordNet in order to exploit the semantic relations between terms. Then, a fuzzy association rule mining algorithm for texts is employed to discover a set of highly-related fuzzy frequent itemsets, which contain key terms to be regarded as the labels of the candidate clusters. Finally, each document is dispatched into more than one target cluster by referring to these candidate clusters, and then the highly similar target clusters are merged. We conducted experiments to evaluate the performance based on Classic, Re0, R8, and WebKB datasets. The experimental results proved that our approach outperforms the influential document clustering methods with higher accuracy. Therefore, our approach not only provides more general and meaningful labels for documents, but also effectively generates overlapping clusters.

80 citations

Journal Article•10.1016/J.DATAK.2010.02.006•
Event-based lossy compression for effective and efficient OLAP over data streams

[...]

Alfredo Cuzzocrea1, Sharma Chakravarthy2•
University of Calabria1, University of Texas at Arlington2
1 Jul 2010
TL;DR: The compression strategy proposed in ECM-DS puts the basis for a novel class of intelligent applications over data streams where the knowledge on actual streams is integrated-with and correlated-to the knowledge related to expired events that are considered critical for the target OLAP analysis scenario.
Abstract: An innovative event-based lossy compression model for effective and efficient OLAP over data streams, called ECM-DS, is presented and experimentally assessed in this paper. The main novelty of our compression approach with respect to traditional data stream compression techniques relies on exploiting the semantics of the reference application scenario in order to drive the compression process by means of the ''degree of interestingness'' of events occurring in the target stream. This finally improves the quality of retrieved approximate answers to OLAP queries over data streams, and, in turn, the quality of complex knowledge discovery tasks over data streams developed on top of ECM-DS, and implemented via ad-hoc data stream mining algorithms. Overall, the compression strategy we propose in this research puts the basis for a novel class of intelligent applications over data streams where the knowledge on actual streams is integrated-with and correlated-to the knowledge related to expired events that are considered critical for the target OLAP analysis scenario. Finally, a comprehensive experimental evaluation over several classes of data stream sets clearly confirms the benefits deriving from the event-based data stream compression approach proposed in ECM-DS.

65 citations

Journal Article•10.1016/J.DATAK.2010.03.005•
RDFProv: A relational RDF store for querying and managing scientific workflow provenance

[...]

Artem Chebotko1, Shiyong Lu2, Xubo Fei2, Farshad Fotouhi2•
University of Texas–Pan American1, Wayne State University2
1 Aug 2010
TL;DR: This paper elaborate on the design of a relational RDF store, called RDFProv, which is optimized for scientific workflow provenance querying and management, and proposes three efficient data mapping algorithms to map provenance RDF metadata to relational data according to the generated relational database schema.
Abstract: Provenance metadata has become increasingly important to support scientific discovery reproducibility, result interpretation, and problem diagnosis in scientific workflow environments. The provenance management problem concerns the efficiency and effectiveness of the modeling, recording, representation, integration, storage, and querying of provenance metadata. Our approach to provenance management seamlessly integrates the interoperability, extensibility, and inference advantages of Semantic Web technologies with the storage and querying power of an RDBMS to meet the emerging requirements of scientific workflow provenance management. In this paper, we elaborate on the design of a relational RDF store, called RDFProv, which is optimized for scientific workflow provenance querying and management. Specifically, we propose: i) two schema mapping algorithms to map an OWL provenance ontology to a relational database schema that is optimized for common provenance queries; ii) three efficient data mapping algorithms to map provenance RDF metadata to relational data according to the generated relational database schema, and iii) a schema-independent SPARQL-to-SQL translation algorithm that is optimized on-the-fly by using the type information of an instance available from the input provenance ontology and the statistics of the sizes of the tables in the database. Experimental results are presented to show that our algorithms are efficient and scalable. The comparison with two popular relational RDF stores, Jena and Sesame, and two commercial native RDF stores, AllegroGraph and BigOWLIM, showed that our optimizations result in improved performance and scalability for provenance metadata management. Finally, our case study for provenance management in a real-life biological simulation workflow showed the production quality and capability of the RDFProv system. Although presented in the context of scientific workflow provenance management, many of our proposed techniques apply to general RDF data management as well.

64 citations

Journal Article•10.1016/J.DATAK.2009.12.002•
UFOme: An ontology mapping system with strategy prediction capabilities

[...]

Giuseppe Pirrò1, Domenico Talia1•
University of Calabria1
1 May 2010
TL;DR: This paper presents an ontology mapping software framework that has been designed and implemented to help users in designing and/or exploiting comprehensive mapping systems, based on a library of mapping modules implementing functions such as discovering mappings or evaluating mapping strategies.
Abstract: Ontology mapping, or matching, aims at identifying correspondences among entities in different ontologies. Several strands of research come up with algorithms often combining multiple mapping strategies to improve the mapping accuracy. However, few approaches have systematically investigated the requirements of a mapping system both from the functional (i.e., the features that are required) and user point of view (i.e., how the user can exploit these features). This paper presents an ontology mapping software framework that has been designed and implemented to help users (both expert and non-expert) in designing and/or exploiting comprehensive mapping systems. It is based on a library of mapping modules implementing functions such as discovering mappings or evaluating mapping strategies. In particular, the strategy predictor module of the designed framework, for each specific mapping task, can ''predict'' mapping modules to be exploited and parameter values (e.g., weights and thresholds). The implemented system, called UFOme, assists users during the various phases of a mapping task execution by providing a user friendly ontology mapping environment. The UFOme implementation and its prediction capabilities and accuracy were evaluated on the Ontology Alignment Evaluation Initiative tests with encouraging results.

60 citations

Journal Article•10.1016/J.DATAK.2009.10.007•
C-Phrase: A system for building robust natural language interfaces to databases

[...]

Michael Minock1•
Umeå University1
1 Mar 2010
TL;DR: This article presents C-Phrase, a natural language interface system that can be configured by normal, non-specialized, web-based technical teams and introduces the evaluation metric of willingness that complements the standard metrics of precision and recall.
Abstract: This article presents C-Phrase, a natural language interface system that can be configured by normal, non-specialized, web-based technical teams. C-Phrase models queries in an extended version of Codd's tuple calculus and uses synchronous context-free grammars with lambda-expressions to represent semantic grammars. Given an arbitrary relational database, authors rapidly build an NLI using what we term the name-tailor-define protocol. We present a small study demonstrating the effectiveness of this approach for the GEO corpus and we introduce the evaluation metric of willingness that complements the standard metrics of precision and recall. However our true evaluation comes as we open-source C-Phrase.
Journal Article•10.1016/J.DATAK.2010.03.008•
Ranking the sky: Discovering the importance of skyline points through subspace dominance relationships

[...]

Akrivi Vlachou1, Michalis Vazirgiannis2•
Norwegian University of Science and Technology1, Athens University of Economics and Business2
1 Sep 2010
TL;DR: SKYRANK is a framework for ranking the skyline points in the absence of a user-defined preference function, thereby discovering a limited subset of the most interesting points of the skyline set and is extended to handle top-k preference skyline queries, when the user's preferences are available.
Abstract: Skyline queries aim to help users make intelligent decisions over complex data by discovering a set of interesting points, when different and often conflicting criteria are considered. Unfortunately, as the dimensionality of the dataset grows, the skyline operator loses its discriminating power and returns a large fraction of the data. The huge size of the result set hinders decision-making and motivates the ranking of skyline points. Therefore, users prefer to retrieve the top-k skyline points instead of the whole skyline set. In this paper, we propose SKYRANK, a framework for ranking the skyline points in the absence of a user-defined preference function, thereby discovering a limited subset of the most interesting points of the skyline set. For this purpose, we define the skyline graph, which relies on the dominance relationships between the skyline points for different subsets of dimensions (subspaces). SKYRANK applies well-known authority-based ranking algorithms on the skyline graph and, as described in this paper, discovers the importance of a skyline point exploiting the subspace dominance relationships. Furthermore, we extend SKYRANK to handle top-k preference skyline queries, when the user's preferences are available. Our experimental evaluation illustrates the complexity of the dominance relationships and the ranking ability of our framework.
Journal Article•10.1016/J.DATAK.2010.02.010•
Refining non-taxonomic relation labels with external structured data to support ontology learning

[...]

Albert Weichselbraun1, Gerhard Wohlgenannt1, Arno Scharl2•
Vienna University of Economics and Business1, MODUL University Vienna2
1 Aug 2010
TL;DR: A method to integrate external knowledge sources such as DBpedia and OpenCyc into an ontology learning system that automatically suggests labels for unknown relations in domain ontologies based on large corpora of unstructured text is presented.
Abstract: This paper presents a method to integrate external knowledge sources such as DBpedia and OpenCyc into an ontology learning system that automatically suggests labels for unknown relations in domain ontologies based on large corpora of unstructured text. The method extracts and aggregates verb vectors from semantic relations identified in the corpus. It composes a knowledge base which consists of (i) verb centroids for known relations between domain concepts, (ii) mappings between concept pairs and the types of known relations, and (iii) ontological knowledge retrieved from external sources. Applying semantic inference and validation to this knowledge base improves the quality of suggested relation labels. A formal evaluation compares the accuracy and average ranking precision of this hybrid method with the performance of methods that solely rely on corpus data and those that are only based on reasoning and external data sources.
Journal Article•10.1016/J.DATAK.2009.08.009•
Representation of conceptual ETL designs in natural language using Semantic Web technology

[...]

Alkis Simitsis1, Dimitrios Skoutas2, Malu Castellanos1•
Hewlett-Packard1, National Technical University of Athens2
1 Jan 2010
TL;DR: This work presents a flexible and customizable template-based mechanism for the representation of a conceptual ETL design as a narrative, which is the most natural means of communication and does not require particular technical skills or familiarity with any specific model.
Abstract: Extract-Transform-Load (ETL) processes constitute the back stage of Data Warehouse architectures. Several studies characterize the ETL design as a time-consuming and error-prone procedure. A critical phase in the ETL lifecycle involves the early communications and design steps that aim at producing a conceptual ETL design. Various research approaches have dealt with the conceptual modeling of ETL processes, but all share two inconveniences: they require intensive human effort from the designers to create them, as well as technical knowledge from the business people to understand them. In this paper, we focus on the second aspect and provide a method for the representation of a conceptual ETL design as a narrative, which is the most natural means of communication and does not require particular technical skills or familiarity with any specific model. Specifically, this work builds upon previously proposed techniques that automate the conceptual design by leveraging Semantic Web technology. The key idea is to map the involved data stores, either source or target, to a domain ontology and then, to use a reasoner for producing the ETL design. We discuss how linguistic techniques can be used for the establishment of a common application vocabulary. We present a flexible and customizable template-based mechanism for the representation of the ETL design as a narrative. Finally, we discuss issues related to the production of meaningful reports and we provide implementation details.
Journal Article•10.1016/J.DATAK.2010.10.002•
Anchor modeling - Agile information modeling in evolving data environments

[...]

Lars Rönnbäck, Olle Regardt, Maria Bergholtz1, Paul Johannesson1, Petia Wohed1 •
Stockholm University1
1 Dec 2010
TL;DR: An agile information modeling technique, called Anchor Modeling, is proposed that offers non-destructive extensibility mechanisms, thereby enabling robust and flexible management of changes in a data warehouse environment.
Abstract: Maintaining and evolving data warehouses is a complex, error prone, and time consuming activity. The main reason for this state of affairs is that the environment of a data warehouse is in constant change, while the warehouse itself needs to provide a stable and consistent interface to information spanning extended periods of time. In this article, we propose an agile information modeling technique, called Anchor Modeling, that offers non-destructive extensibility mechanisms, thereby enabling robust and flexible management of changes. A key benefit of Anchor Modeling is that changes in a data warehouse environment only require extensions, not modifications, to the data warehouse. Such changes, therefore, do not require immediate modifications of existing applications, since all previous versions of the database schema are available as subsets of the current schema. Anchor Modeling decouples the evolution and application of a database, which when building a data warehouse enables shrinking of the initial project scope. While data models were previously made to capture every facet of a domain in a single phase of development, in Anchor Modeling fragments can be iteratively modeled and applied. We provide a formal and technology independent definition of anchor models and show how anchor models can be realized as relational databases together with examples of schema evolution. We also investigate performance through a number of lab experiments, which indicate that under certain conditions anchor databases perform substantially better than databases constructed using traditional modeling techniques.
Journal Article•10.1016/J.DATAK.2009.04.010•
A web page usage prediction scheme using sequence indexing and clustering techniques

[...]

Constantinos Dimopoulos1, Christos Makris1, Yannis Panagis1, Evangelos Theodoridis1, Athanasios K. Tsakalidis1 •
University of Patras1
1 Apr 2010
TL;DR: This paper considers the problem of web page usage prediction in a web site by modeling users' navigation history and web page content with weighted suffix trees and finds that its quality performance is fairly well and in many cases an outperforming one.
Abstract: In this paper we consider the problem of web page usage prediction in a web site by modeling users' navigation history and web page content with weighted suffix trees. This user's navigation prediction can be exploited either in an on-line recommendation system in a web site or in a web page cache system. The method proposed has the advantage that it demands a constant amount of computational effort per one user's action and consumes a relatively small amount of extra memory space. These features make the method ideal for an on-line working environment. Finally, we have performed an evaluation of the proposed scheme with experiments on various web site log files and web pages and we have found that its quality performance is fairly well and in many cases an outperforming one.
Journal Article•10.1016/J.DATAK.2009.08.002•
Usability of upper level ontologies: The case of ResearchCyc

[...]

Jordi Conesa1, Veda C. Storey2, Vijayan Sugumaran3•
Open University of Catalonia1, J. Mack Robinson College of Business2, University of Rochester3
1 Apr 2010
TL;DR: ResearchCyc, a version of Cyc that attempts to capture common sense knowledge of the real world, is analyzed and the insights acquired are used to generate suggestions for improving the usability of upper level ontologies.
Abstract: Repositories of knowledge about the real world are intended to serve as surrogates for the meaning and context of terms and concepts. These are being developed at two levels: (1) individual domain ontologies that capture concepts about a particular application domain; and (2) upper level ontologies that contain massive amounts of knowledge about the real world and are domain independent. This paper analyzes ResearchCyc, a version of Cyc, that attempts to capture common sense knowledge of the real world. Experience in applying ResearchCyc to web query processing is reported and the insights acquired are used to generate suggestions for improving the usability of upper level ontologies.
Journal Article•10.1016/J.DATAK.2010.07.002•
Evaluating ontology extraction tools using a comprehensive evaluation framework

[...]

Jinsoo Park1, Wonchin Cho2, Sangkyu Rho1•
Seoul National University1, College of Business Administration2
1 Oct 2010
TL;DR: It is concluded that ontology extraction tools still lack the ability to automate the extraction process fully and thus require functional performance improvement, and proposed a set of criteria for evaluating such tools.
Abstract: Ontologies are a key component of the Semantic Web; thus, they are widely used in various applications. However, most ontologies are still built manually, a time-consuming activity which requires many resources. Several tools such as ontology editing tools, ontology merging tools, and ontology extraction tools have therefore been proposed to speed up ontology development. To minimize building time, one promising solution is the automation of the ontology development process. Consequently, the need for an automatic ontology extraction tool has increased in the last two decades and many tools have been developed for this purpose. However, there is still no comprehensive framework for evaluating such tools. In this paper, we proposed a set of criteria for evaluating ontology extraction tools and carried out an evaluation experiment on four ontology extraction tools (i.e., OntoLT, Text2Onto, OntoBuilder, and DODDLE-OWL) using our proposed evaluation framework. Based on the results of our experiment, we concluded that ontology extraction tools still lack the ability to automate the extraction process fully and thus require functional performance improvement.
Journal Article•10.1016/J.DATAK.2010.07.006•
Reasoning with large ontologies stored in relational databases: The OntoMinD approach

[...]

Lina Al-Jadir1, Christine Parent2, Stefano Spaccapietra1•
École Polytechnique Fédérale de Lausanne1, University of Lausanne2
1 Nov 2010
TL;DR: This paper builds on the assumption that very large ontologies can be efficiently handled using database management systems (DBMS) and proposes to implement reasoning into the DBMS via a set of PL/SQL stored procedures, designed to speed up ontology querying.
Abstract: A major obstacle to the development of ontologies in support of the Semantic Web is the poor capability of current ontology techniques to handle very large ontologies, in particular regarding scalability of reasoners. This paper builds on the assumption that very large ontologies can be efficiently handled using database management systems (DBMS), designed to provide best performance in storing, updating, and managing large volumes of data. To enhance DBMS with the reasoning functionality that characterizes ontology management, we propose to implement reasoning into the DBMS via a set of PL/SQL stored procedures. These procedures support all usual reasoning tasks: Class subsumption, property subsumption, class satisfiability, ABox consistency, and ABox realization. They perform these tasks at update time and materialize all inferred knowledge (facts and axioms) in the database. Contrarily to the inferencing at query time in most of existing works, our approach is designed to speed up ontology querying, which is supposed to represent the most frequent and therefore critical usage of ontologies. The paper discusses querying patterns and reports on benchmarking (with the LUBM benchmark) the performance of our prototype, called OntoMinD, compared to Oracle with Semantic Technologies. Benchmark results demonstrate the appropriateness of our approach.
Journal Article•10.1016/J.DATAK.2010.05.001•
Editorial: New fuzzy c-means clustering model based on the data weighted approach

[...]

Chenglong Tang1, Shigang Wang1, Wei Xu1•
Shanghai Jiao Tong University1
1 Sep 2010
TL;DR: It was pointed out that the data weighted clustering approach has its unique advantages when mining the outliers of the large scale data sets, when clustering the data set for better clustering results, and especially when these two tasks are done simultaneously.
Abstract: This paper proposes a new kind of data weighted fuzzy c-means clustering approach. Different from most existing fuzzy clustering approaches, the data weighted clustering approach considers the internal connectivity of all data points. An exponent impact factors vector and an influence exponent are introduced to the new model. Together they influence the clustering process. The data weighted clustering can simultaneously produce three categories of parameters: fuzzy membership degrees, exponent impact factors and the cluster prototypes. A new fuzzy algorithm, DWG-K, is developed by combining the data weighted approach and the G-K. Two groups of numerical experiments were executed. Group 1 demonstrates the clustering performance of the DWG-K. The counterpart is the G-K. The results show the DWG-K can obtain better clustering quality and meanwhile it holds the same level of computational efficiency as the G-K holds. Group 2 checks the ability of the DWG-K in mining the outliers. The counterpart is the well-known LOF. The results show the DWG-K has considerable advantage over the LOF in computational efficiency. And the outliers mined by the DWG-K are global. It was pointed out that the data weighted clustering approach has its unique advantages when mining the outliers of the large scale data sets, when clustering the data set for better clustering results, and especially when these two tasks are done simultaneously.
Journal Article•10.1016/J.DATAK.2010.02.002•
An approach to testing conceptual schemas

[...]

Albert Tort1, Antoni Olivé1•
Polytechnic University of Catalonia1
1 Jun 2010
TL;DR: This work presents CSTL, a language for writing automated tests of executable schemas written in UML/OCL, and describes a prototype implementation of a test processor that includes a test manager and a test interpreter that coordinates the execution of the tests.
Abstract: Conceptual schemas of information systems can be tested. The testing of conceptual schemas may be an important and practical means for their validation. We present a list of five kinds of tests that can be applied to conceptual schemas. Two of them require schemas comprising both the structural and the behavioral parts, but we show that it is possible and useful to test incomplete schema fragments, even if they consist of only a few entity and relationship types, integrity constraints and derivation rules. We present CSTL, a language for writing automated tests of executable schemas written in UML/OCL. CSTL includes language primitives for each of the above kinds of tests. CSTL follows the style of the modern xUnit testing frameworks. We describe a prototype implementation of a test processor, which includes a test manager and a test interpreter that coordinates the execution of the tests. Tests written in CSTL can be executed as many times as needed.
Journal Article•10.1016/J.DATAK.2010.10.003•
Insights into enterprise conceptual modeling

[...]

Ateret Anaby-Tavor1, David Amid1, Amit Fisher1, Avivit Bercovici1, Harold Ossher2, Matthew Callery2, Michael Desmond2, Sophia Krasikov2, Ian Simmonds2 •
University of Haifa1, IBM2
1 Dec 2010
TL;DR: An empirical study about the nature of methods, diagrams, and home grown conceptual models as reflected in real practice at IBM, identifying the models as artifacts of "enterprise conceptual modeling".
Abstract: Business analysts, business architects, and solution consultants use a variety of practices and methods in their quest to understand business. The resulting work products often end up being transitioned into the formal world of software requirement definitions or as recommendations for all kinds of business activities. We describe an empirical study about the nature of these methods, diagrams, and home grown conceptual models as reflected in real practice at IBM. We identify the models as artifacts of "enterprise conceptual modeling". We study important features of these models, suggest practical classifications and characterizations, and distinguish them from drawings. Specifically we look into context, type, methods and complexity to determine enterprise conceptual models usage. Our survey shows that the "enterprise conceptual modeling" arena presents a variety of descriptive models, each used by a relatively small group of colleagues. Together they form a spectrum that extends from "drawings" on one end to "standards" on the other.
Journal Article•10.1016/J.DATAK.2010.01.002•
Towards automatization of domain modeling

[...]

Iris Reinhartz-Berger1•
University of Haifa1
1 May 2010
TL;DR: Running SDM on small repositories of project management applications and scheduling systems, it is found that the approach may provide reasonable draft domain models, whose comprehensibility, correctness, completeness, and consistency levels are satisfactory.
Abstract: A domain model, which captures the common knowledge and the possible variability allowed among applications in a domain, may assist in the creation of other valid applications in that domain. However, to create such domain models is not a trivial task: it requires expertise in the domain, reaching a very high level of abstraction, and providing flexible, yet formal, artifacts. In this paper an approach, called Semi-automated Domain Modeling (SDM), to create draft domain models from applications in those domains, is presented. SDM takes a repository of application models in a domain and matches, merges, and generalizes them into sound draft domain models that include the commonality and variability allowed in these domains. The similarity of the different elements is measured, with consideration of syntactic, semantic, and structural aspects. Unlike ontology and schema integration, these models capture both structural and behavioral aspects of the domain. Running SDM on small repositories of project management applications and scheduling systems, we found that the approach may provide reasonable draft domain models, whose comprehensibility, correctness, completeness, and consistency levels are satisfactory.
Journal Article•10.1016/J.DATAK.2010.10.004•
Schema label normalization for improving schema matching

[...]

Serena Sorrentino1, Sonia Bergamaschi1, Maciej Gawinecki1, Laura Po1•
University of Modena and Reggio Emilia1
1 Dec 2010
TL;DR: The method semi-automatically expands abbreviations/acronyms and annotates compound nouns, with minimal manual effort and empirically proves that the normalization method helps in the identification of similarities among schema elements of different data sources, thus improving schema matching results.
Abstract: Schema matching is the problem of finding relationships among concepts across heterogeneous data sources that are heterogeneous in format and in structure. Starting from the "hidden meaning" associated with schema labels (i.e. class/attribute names) it is possible to discover relationships among the elements of different schemata. Lexical annotation (i.e. annotation w.r.t. a thesaurus/lexical resource) helps in associating a "meaning" to schema labels. However, the performance of semi-automatic lexical annotation methods on real-world schemata suffers from the abundance of non-dictionary words such as compound nouns, abbreviations, and acronyms. We address this problem by proposing a method to perform schema label normalization which increases the number of comparable labels. The method semi-automatically expands abbreviations/acronyms and annotates compound nouns, with minimal manual effort. We empirically prove that our normalization method helps in the identification of similarities among schema elements of different data sources, thus improving schema matching results.
Journal Article•10.1016/J.DATAK.2010.07.004•
Editorial: Detection of semantic conflicts in ontology and rule-based information systems

[...]

Jose M. Alcaraz Calero1, Juan M. Marín Pérez1, Jorge Bernal Bernabe1, Félix J. García Clemente1, Gregorio Martínez Pérez1, Antonio F. Skarmeta1 •
University of Murcia1
1 Nov 2010
TL;DR: A taxonomy of semantic conflicts is provided, the main features of each of them are analyzed and an OWL/SWRL modelling for certain realistic scenarios related with information systems is provided.
Abstract: Nowadays, managers of information systems use ontologies and rules as a powerful tool to express the desired behaviour for the system. However, the use of rules may lead to conflicting situations where the antecedent of two or more rules is fulfilled, but their consequent is indicating contradictory facts or actions. These conflicts can be categorised in two different groups, modality and semantic conflicts, depending on whether the inconsistency is owing to the rule language expressiveness or due to the nature of the actions. While there exist certain proposals to detect and solve modality conflicts, the problem becomes more complex with semantic ones. Additionally, current techniques to detect semantic conflicts are usually not considering the use of standard information models. This paper provides a taxonomy of semantic conflicts, analyses the main features of each of them and provides an OWL/SWRL modelling for certain realistic scenarios related with information systems. It also describes different conflict detection techniques that can be applied to semantic conflicts and their pros and cons. Finally, this paper provides a comparison of these techniques based on performance measurements taken in a realistic scenario and suggests a better approach. This approach is then used in other scenarios related with information systems and where different types of semantic conflicts may appear.
Journal Article•10.1016/J.DATAK.2010.01.001•
Towards an accurate functional size measurement procedure for conceptual models in an MDA environment

[...]

Beatriz Marín1, Oscar Pastor1, Alain Abran2•
Polytechnic University of Puerto Rico1, Université du Québec2
1 May 2010
TL;DR: This paper introduces the OO-Method COSMIC Function Points (OOmCFP) procedure, which has been systematically designed to measure the functional size of object-oriented applications generated from their conceptual models by means of model transformations.
Abstract: The accurate measurement of the functional size of applications that are automatically generated in MDA environments is a challenge for the software development industry. This paper introduces the OO-Method COSMIC Function Points (OOmCFP) procedure, which has been systematically designed to measure the functional size of object-oriented applications generated from their conceptual models by means of model transformations. The OOmCFP procedure is structured in three phases: a strategy phase, a mapping phase, and a measurement phase. Finally, a case study is presented to illustrate the use of OOmCFP, as well as an analysis of the results obtained.
Book•10.1007/978-3-319-59569-6•
Natural Language Processing and Information Systems

[...]

Gosse Bouma1, Ashwin Ittoo2, Elisabeth Métais, Hans Wortmann1•
University of Groningen1, University of Liège2
1 Jan 2010
TL;DR: The refereed proceedings of the 19th International Conference on Applications of Natural Language to Information Systems, NLDB 2014, held in Montpellier, France, in June 2014 are presented in this paper.
Abstract: This book constitutes the refereed proceedings of the 19th International Conference on Applications of Natural Language to Information Systems, NLDB 2014, held in Montpellier, France, in June 2014. The 13 long papers, 8 short papers, 14 poster papers, and 7 demo papers presented together with 2 invited talks in this volume were carefully reviewed and selected from 73 submissions. The papers cover the following topics: syntactic, lexical and semantic analysis; information extraction; information retrieval; and sentiment analysis and social networks.
Journal Article•10.1016/J.DATAK.2009.10.006•
Combining ontological profiles with context in information retrieval

[...]

Geir Solskinnsbakk1, Jon Atle Gulla1•
Norwegian University of Science and Technology1
1 Mar 2010
TL;DR: The concept of an ontological profile is described, which is a semantic extension of an Ontology where each ontology concept is given a description in terms of a vector of weighted keywords.
Abstract: An ontology is a formal conceptualization of a domain, specifying the concepts of the domain and the relations between them. It is however not a straight forward task to use this knowledge for information retrieval purposes. In this paper we describe the concept of an ontological profile, which is a semantic extension of an ontology where each ontology concept is given a description in terms of a vector of weighted keywords. An experiment has been conducted with a prototype search engine using ontological profiles for query expansion. The evaluation shows encouraging results compared to standard keyword based search. Furthermore, we describe the notion of context in an information retrieval setting and address how we can combine semantics and context in search based on query expansion.
Journal Article•10.1016/J.DATAK.2010.03.004•
Editorial: An efficient index buffer management scheme for implementing a B-tree on NAND flash memory

[...]

Hyun Seob Lee1, Dong-Ho Lee1•
Hanyang University1
1 Sep 2010
TL;DR: An efficient index buffer management scheme, called IBSF, is proposed, which eliminates redundant index units in the index buffer and then delays the time that the indexbuffer requires to become full, which significantly reduces the number of write operations to a flash memory when constructing a B-tree.
Abstract: Recently, NAND flash memory has been one of the best storage mediums for various embedded systems such as MP3 players, mobile phones and laptops because of its shock-resistant, low-power consumption, and none-volatile properties. However, since it has very distinct characteristics including erase-before-write and asymmetric read/write speed, the performance of disk based systems and applications may degrade dramatically when directly adopting them on the flash memory storage systems. Especially when a B-tree is constructed on NAND flash memory, intensive overwrite operations may be caused by record inserting, deleting, and reorganizing. These may result in severe performance degradation when building the B-tree. In this paper, we propose an efficient index buffer management scheme, called IBSF, which eliminates redundant index units in the index buffer and then delays the time that the index buffer requires to become full. Consequently, IBSF significantly reduces the number of write operations to a flash memory when constructing a B-tree. We also show that IBSF yields a better performance on a flash memory by comparing it to the related technique through various experiments.
Journal Article•10.1016/J.DATAK.2009.10.010•
Structure of morphologically expanded queries: A genetic algorithm approach

[...]

Lourdes Araujo1, Hugo Zaragoza2, José R. Pérez-Agüera3, Joaquín Pérez-Iglesias1•
National University of Distance Education1, Yahoo!2, Complutense University of Madrid3
1 Mar 2010
TL;DR: A novel and simple method (query clauses) is proposed to represent expanded queries which may alleviate some of the negative effects of term correlation in query expansion algorithms and this method is applied to improve stemming.
Abstract: In this paper we deal with two issues. First, we discuss the negative effects of term correlation in query expansion algorithms, and we propose a novel and simple method (query clauses) to represent expanded queries which may alleviate some of these negative effects. Second, we discuss a method to optimize local query-expansion methods using genetic algorithms, and we apply this method to improve stemming. We evaluate this method with the novel query representation method and show very significant improvements for the problem of stemming optimization.

Tools

SciSpace AgentBiomedical AgentSciSpace RecruitSciSpace for EnterpriseAgent GalleryChat with PDFLiterature ReviewAI WriterFind TopicsParaphraserCitation GeneratorExtract DataAI DetectorCitation Booster

Learn

ResourcesLive Workshops

SciSpace

CareersSupportBrowse PapersPricingSciSpace Affiliate ProgramCancellation & Refund PolicyTermsPrivacyData Sources

Directories

PapersTopicsJournalsAuthorsConferencesInstitutionsCitation StylesWriting templates

Extension & Apps

SciSpace Chrome ExtensionSciSpace Mobile App

Contact

support@scispace.com
SciSpace

© 2026 | PubGenius Inc. | Suite # 217 691 S Milpitas Blvd Milpitas CA 95035, USA

soc2
Secured by Delve