Top 31 papers presented at Data and Knowledge Engineering in 2016

Showing papers presented at "Data and Knowledge Engineering in 2016"

Journal Article•10.1016/J.DATAK.2016.01.001•

A process ontology based approach to easing semantic ambiguity in business process modeling

[...]

Shaokun Fan¹, Zhimin Hua², Veda C. Storey³, J. Leon Zhao²•Institutions (3)

West Texas A&M University¹, City University of Hong Kong², J. Mack Robinson College of Business³

1 Mar 2016

TL;DR: A Process Ontology Based Approach is proposed to ease semantic ambiguity by providing a means to capture rich, semantic information on complex business processes through domain specific ontologies.

...read moreread less

Abstract: Business process modeling continues to increase in complexity, due, in part, to the dynamic business contexts and complicated domain concepts found in today's global economic environment. Although business process modeling is a critical step in workflow automation that powers business around the world, business process modelers often misunderstand domain concepts or relationships due to their lack of precise domain knowledge. Such semantic ambiguity affects the efficiency and quality of business process modeling. To address this problem, a Process Ontology Based Approach is proposed to ease semantic ambiguity by providing a means to capture rich, semantic information on complex business processes through domain specific ontologies. This approach is grounded in the Bunge-Shanks Framework to semantic disambiguation and evaluated using an expert survey as well as a controlled laboratory experiment.

...read moreread less

45 citations

Journal Article•10.1016/J.DATAK.2015.09.004•

Empirical insights into the development of a service-oriented enterprise architecture

[...]

Ayed Alwadain¹, Erwin Fielt², Axel Korthaus³, Michael Rosemann²•Institutions (3)

King Saud University¹, Queensland University of Technology², Victoria University, Australia³

1 Sep 2016

TL;DR: In this article, a case study with a government agency provides new empirically and theoretically grounded insights into EA evolution, in particular in relation to the introduction of Service-Oriented Architecture (SOA), and describes relevant generative mechanisms affecting EA evolution.

...read moreread less

Abstract: Organisations use Enterprise Architecture (EA) to reduce organisational complexity, improve communication, align business and information technology (IT), and drive organisational change. Due to the dynamic nature of environmental and organisational factors, EA descriptions need to change over time to keep providing value for its stakeholders. Emerging business and IT trends, such as Service-Oriented Architecture (SOA), may impact EA frameworks, methodologies, governance and tools. However, the phenomenon of EA evolution is still poorly understood. Using Archer's morphogenetic theory as a foundation, this research conceptualises three analytical phases of EA evolution in organisations, namely conditioning, interaction and elaboration. Based on a case study with a government agency, this paper provides new empirically and theoretically grounded insights into EA evolution, in particular in relation to the introduction of SOA, and describes relevant generative mechanisms affecting EA evolution. By doing so, it builds a foundation to further examine the impact of other IT trends such as mobile or cloud-based solutions on EA evolution. At a practical level, the research delivers a model that can be used to guide professionals to manage EA and continually evolve it.

...read moreread less

43 citations

Journal Article•10.1016/J.DATAK.2015.05.001•

Parallel community detection on large graphs with MapReduce and GraphChi

[...]

Seunghyeon Moon¹, Jae-Gil Lee¹, Minseo Kang¹, Minsoo Choy¹, Jinwoo Lee¹ - Show less +1 more•Institutions (1)

KAIST¹

1 Jul 2016

TL;DR: Two parallel versions of the Girvan-Newman algorithm are developed to support large-scale networks, one of which utilizes the MapReduce model and the other utilizes the vertex-centric model.

...read moreread less

Abstract: Community detection from social network data gains much attention from academia and industry since it has many real-world applications. The Girvan-Newman (GN) algorithm is a divisive hierarchical clustering algorithm for community detection, which is regarded as one of the most popular algorithms. It exploits the concept of edge betweenness to divide a network into multiple communities. Though it is being widely used, it has limitations in supporting large-scale networks since it needs to calculate the shortest path between every pair of vertices in a network. In this paper, we develop two parallel versions of the GN algorithm to support large-scale networks. First, we propose a new algorithm, which we call Shortest Path Betweenness MapReduce Algorithm (SPB-MRA), that utilizes the MapReduce model. Second, we propose another new algorithm, which we call Shortest Path Betweenness Vertex-Centric Algorithm (SPB-VCA), that utilizes the vertex-centric model. An approximation technique is also developed to further speed up community detection processes. We implemented SPB-MRA using Hadoop and SPB-VCA using GraphChi, and then evaluated the performance of SPB-MRA on Amazon EC2 instances and that of SPB-VCA on a single commodity PC. The evaluation results showed that the elapsed time of SPB-MRA decreased almost linearly as the number of reducers increased, SPB-VCA outperformed SPB-MRA just on a single PC by 4-6 times, and the approximation technique introduced negligible errors.

...read moreread less

38 citations

Journal Article•10.1016/J.DATAK.2015.05.002•

Hilbert curve-based cryptographic transformation scheme for spatial query processing on outsourced private data

[...]

Hyeong-Il Kim¹, Seung-Tae Hong¹, Jae-Woo Chang¹•Institutions (1)

Chonbuk National University¹

1 Jul 2016

TL;DR: This work proposes a Hilbert curve-based cryptographic transformation scheme to preserve the privacy of the spatial data from various attacks on outsourced databases and achieves better query processing performance than the existing cryptographic transformation schemes.

...read moreread less

Abstract: Research on preserving location data privacy in outsourced databases has been spotlighted with the development of cloud computing. However, the existing spatial transformation schemes are vulnerable to various attack models. The existing cryptographic transformation scheme provides good data privacy, but it has a high query processing cost. To improve privacy and reduce cost, we propose a Hilbert curve-based cryptographic transformation scheme to preserve the privacy of the spatial data from various attacks on outsourced databases. We also provide efficient range and k-NN query processing algorithms using a Hilbert-order index. A performance analysis confirms that the proposed scheme is robust against attack models and achieves better query processing performance than the existing cryptographic transformation scheme.

...read moreread less

33 citations

Journal Article•10.1016/J.DATAK.2015.10.002•

Inter-enterprise architecture as a tool to empower decision-making in hierarchical collaborative production planning

[...]

Alix Vargas¹, Andrés Boza², Shushma Patel³, Dilip Patel³, Llanos Cuenca², Angel Ortiz² - Show less +2 more•Institutions (3)

University of Westminster¹, Polytechnic University of Valencia², London South Bank University³

1 Sep 2016

TL;DR: A conceptual model that addresses the problem of unexpected events management in the context of hierarchical production planning to improve decision-making in collaborative environments is proposed using inter-enterprise architecture.

...read moreread less

Abstract: The novel idea of inter-enterprise architecture from the enterprise engineering perspective allows collaborative networks to integrate and coordinate different organizations. Therefore, inter-enterprise architecture offers multiple benefits, including: joint process harmonization, business strategy and information technology alignment, technological cost reduction, risk and redundancies reduction, customer services improvement and enhanced responsiveness. Inter-enterprise architecture can be used to solve the different issues that collaborative networks face on a daily basis. A conceptual model that addresses the problem of unexpected events management in the context of hierarchical production planning to improve decision-making in collaborative environments is proposed using inter-enterprise architecture. The proposed conceptual model is composed of a framework, a modeling language and the methodology. The conceptual model has been applied to a Spanish collaborative network from the ceramic tile sector.

...read moreread less

31 citations

Journal Article•10.1016/J.DATAK.2015.12.001•

System-of-systems support — A bigraph approach to interoperability and emergent behavior

[...]

Chris Stary¹, Dominik Wachholder¹•Institutions (1)

Johannes Kepler University of Linz¹

1 Sep 2016

TL;DR: This work demonstrates the utility of bigraph-based handling of SoS by orchestrating two distributed and independent systems, with orchestration enabling directly respondence to changes in a federated system's context.

...read moreread less

Abstract: When designing highly interactive distributed systems such as e-learning environments, a system-of-systems (SoS) perspective enables dynamic adaptation to situations of use and thus user-centeredness during operation. Each system, e.g., a mobile device for accessing a learning management system, can still be operated as a separate system, e.g., displaying the latest feedback from peers, while being run as part of a federated system, e.g., synchronizing a learning group for a tutoring session taking into account individual availability of participants. This type of coupling requires interoperability assurance of systems, in particular federating various devices and cross-over features (e.g., linking learning content to posts on social media platforms) in dynamically evolving environments. We demonstrate the utility of bigraph-based handling of SoS. relationships allow not only the representation of dynamic interaction but also the re-specification of these systems through behavior adaptations. This abstraction supports cross-system decomposition as well as composition of interaction patterns for the purpose of emergent behavior. We show the potential of this approach orchestrating two distributed and independent systems, with orchestration enabling directly respondence to changes in a federated system's context.

...read moreread less

30 citations

Journal Article•10.1016/J.DATAK.2016.02.004•

Digital factory system for dynamic manufacturing network supporting networked collaborative product development

[...]

David Tchoffa, Nicolas Figay¹, Parisa Ghodous², Ernesto Exposito³, Lyes Kermad, Thomas Vosgien, A. El Mhamedi - Show less +3 more•Institutions (3)

Airbus Group¹, University of Lyon², Institut national des sciences appliquées de Toulouse³

1 Sep 2016

TL;DR: A new approach is presented, which relies on the association of effective existing technologies coupled with research results on the fields of model-driven engineering (MDE), enterprise interoperability, system engineering and PLM, to produce sustainable and agile collaborative infrastructure for manufacturing digital ebusiness ecosystem.

...read moreread less

Abstract: During the last years, important research investments have been made by Airbus Group Innovations for the establishment of sustainable Product and Process data interoperability based on open standards. Driven successively by concurrent engineering, collaborative product design in the virtual enterprise or digital behavorial aircraft, it was capitalize through the establishment of a federative interoperability framework. Driven by factory of the future-related research, the dynamic manufacturing network (DMN) concept enriched the framework, which aims at providing agile infrastructure for networked collaborative product development. For such networks, protocols based on open eBusiness product lifecycle management (PLM) standards for exchange and sharing of product and process data between the implied organizations, their processes and the technical enteprise applications supporting these processes are needed This paper presents a new way of combining model-based enterprise platform engineering, model-driven architecture, and system engineering in order to adress the establishment of a sustainable interoperability within DMN. Based on relevant litterature interoperability issues, this paper describes the new approach, which relies on the association of effective existing technologies coupled with research results on the fields of model-driven engineering (MDE), enterprise interoperability, system engineering and PLM. The new approach relies on the concept of digital factory system (DFS) coupled with DMN in order to produce sustainable and agile collaborative infrastructure for manufacturing digital ebusiness ecosystem. The approach is then illustrated through use case coming from the IMAGINE project and an outline is provided on how it will be used and developed further for the assessment of PLM standards and their implementation in the Standard Interoperability PLM project at IRT-Systemx.

...read moreread less

26 citations

Journal Article•10.1016/J.DATAK.2015.12.002•

Performance assessment architecture for collaborative business processes in BPM-SOA-based environment

[...]

Maroua Hachicha, Muhammad Fahad, Néjib Moalla, Yacine Ouzrout

1 Sep 2016

TL;DR: The main objectives are to track the execution of collaborative business process and to analyze the performance trajectory of a business process regarding the business performance level and to create an ontological model-based knowledge repository in order to enrich the semantics of an evaluation business process.

...read moreread less

Abstract: To be competitive and flexible, companies engage in collaborations to develop and share their competences in order to cope with the dynamic environment. Collaborative business process evaluation helps to reflect the actual functioning of business process and their performance level. In this perspective, research in assessing collaborative business process performance presents relevant guidelines in order to adapt IT solutions when business requirements evolve.In this paper, we present an analysis and assessment approach for collaborative business processes in the service-oriented architecture in order to maintain their performance in competitive markets. Our approach proposes an evaluation method using execution traces of business process combined with a high-level assessment method using key performance indicators. Our main objectives are to track the execution of collaborative business process and to analyze the performance trajectory of a business process regarding the business performance level. To collect and structure the performance knowledge (execution and measurement), we create an ontological model-based knowledge repository in order to enrich the semantics of an evaluation business process. The precise track of execution data in our approach is able to identify events that disrupt the proper functioning of processes at the runtime. From an industrial case study, we can conclude that our ontological approach can target the performance assessment of collaborative business processes effectively.

...read moreread less

24 citations

Journal Article•10.1016/J.DATAK.2016.05.001•

Knowledge engineering for enterprise integration, interoperability and networking

[...]

Hervé Panetto¹, Lawrence E. Whitman²•Institutions (2)

Centre national de la recherche scientifique¹, University of Arkansas at Little Rock²

1 Sep 2016

TL;DR: Today, enterprises can be characterized by various key facets: globalization, distributed manufacturing, data and knowledge management, advanced automation and robotics, virtual engineering, rapid response to market and more.

...read moreread less

Abstract: Today, enterprises can be characterized by various key facets: globalization, distributed manufacturing, data and knowledge management, advanced automation and robotics, virtual engineering, rapid response to market and more. In this competitive economy, enterprises must collaborate using Information Technology (IT) and other tools to succeed in this dynamic and heterogeneous business environment. Enterprise integration, interoperability and networking are some of the major disciplines that are enabling companies to improve collaboration and communication in the most effective way. Thriving enterprise information systems processes aim to develop information systems that respond to increasingly complex objectives, which align the information systems with business goals and processes of the company. Additionally, the enterprise must adapt and improve when facing new requirements or rapidly changing opportunities. As enterprise information systems models become more ubiquitous, the sharing of best-in-class models becomes more desirable. Interoperability between dissimilar systems in sharing information is important, but other aspects are also required in the sharing of enterprise systems knowledge. First, this process is based on the need for collaboration; sharing and mutual understanding of the needs of each stakeholder, i.e. each person involved or affected by the future information system, at each stage of its development. Second, this process follows principles, which highlight the need for formal semantic definition of these models, at various abstraction levels ranging from specification to implementation on site. There is a need to also couple new theoretical results with applied methods and tools supporting existing business reconfiguration and transformation both locally and globally.

...read moreread less

20 citations

Journal Article•10.1016/J.DATAK.2015.12.003•

Detecting avoidance behaviors between moving object trajectories

[...]

Francesco Lettich¹, Luis Otavio Alvares², Vania Bogorny², Salvatore Orlando¹, Alessandra Raffaetà¹, Claudio Silvestri¹ - Show less +2 more•Institutions (2)

Ca' Foscari University of Venice¹, Universidade Federal de Santa Catarina²

1 Mar 2016

TL;DR: The avoidance behavior between moving object trajectories is defined, providing a set of theoretical definitions to precisely describe various kinds of avoidance, and an effective algorithm for detecting avoidances is proposed.

...read moreread less

Abstract: Several algorithms have been proposed in the last few years for mining different mobility patterns from trajectories, such as flocks, chasing, meeting, and convergence. An interesting behavior that has not been much explored in trajectory pattern mining is avoidance. In this paper we define the avoidance behavior between moving object trajectories, providing a set of theoretical definitions to precisely describe various kinds of avoidance, and propose an effective algorithm for detecting avoidances. The proposed method is quantitatively evaluated on a real-world dataset, and correctly detects with high precision the quasi totality of the trajectory pairs that exhibit avoidance behaviors (F-measure up to 95%).

...read moreread less

14 citations

Journal Article•10.1016/J.DATAK.2015.08.001•

An agent-based model for analyzing the impact of business interoperability on the performance of cooperativeindustrial networks

[...]

Izunildo Cabral¹, Antonio Grilo¹, António Gonçalves-Coelho¹, António Mourão¹•Institutions (1)

Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa¹

1 Sep 2016

TL;DR: An agent-based model for analyzing the impact of business interoperability on the performance of industrial networks is proposed and how dyad organizational relationships affect the network of companies that the two companies in the dyads belong to is addressed.

...read moreread less

Abstract: This paper presents an approach for analyzing the impact of business interoperability on the performance of cooperative industrial networks. The analysis of the impact is grounded on the agent-based simulation method. A theoretical agent-based model is proposed to simulate the manner in which companies interoperate in cooperative industrial networks and how the distance between the actual and the required level of business interoperability in different dyad relationships can affect the performance of these companies. To test the applicability of the proposed theoretical agent-based model, a case study regarding a dam construction project is presented. The objective of the case study is to analyze the impact of the introduction of a Radio Frequency Identification system and a cooperative information system platform, first on the business interoperability performance and then on the operational performance of a the companies involved in the dam construction project. The application of the theoretical agent-based simulation model to this case study supports our assumption that indeed, agent-based simulation is appropriate for achieving the objective set. Regarding to the case study results, the main benefits of the introduction of the cooperative information systems platform are the reduction of the time needed to analyze the slump and compression test results, which can be reduced up to 98%. We propose an agent-based model for analyzing the impact of business interoperability on the performance of industrial networks.We address how dyad organizational relationships affect the network of companies that the two companies in the dyads belong to.This is the first time that network effect is addressed in the analysis of the impact of business interoperability.We demonstrate the applicability of the proposed agent-based model through a case study regarding a dam construction project.ABM is indeed appropriate for analyzing the impact of business interoperability on the performance of networked companies.

...read moreread less

Journal Article•10.1016/J.DATAK.2015.10.003•

Converting unstructured into semi-structured process models

[...]

Rik Eshuis¹, Akhil Kumar²•Institutions (2)

Eindhoven University of Technology¹, Pennsylvania State University²

1 Jan 2016

TL;DR: An automated method to convert an unstructured process model containing parallelism into an equivalent semi-structuring process model, which contains blocks and synchronization links between parallel branches is defined.

...read moreread less

Abstract: Business process models capture process requirements that are typically expressed in unstructured, directed graphs that specify parallelism. However, modeling guidelines or requirements from execution engines may require that process models are structured in blocks. The goal of this paper is to define an automated method to convert an unstructured process model containing parallelism into an equivalent semi-structured process model, which contains blocks and synchronization links between parallel branches. We define the method by means of an algorithm that is based on dominators, a well-known technique from compiler theory for structuring sequential flow graphs. The method runs in polynomial time. We implemented and evaluated the algorithm extensively. In addition we compared the method in detail with the BPStruct method from literature. The comparison shows that our method can handle cases that BPStruct is not able to and that the method coincides with BPStruct for the cases that BPStruct is able to handle.

...read moreread less

Journal Article•10.1016/J.DATAK.2016.05.002•

An automatic method for reporting the quality of thesauri

[...]

Javier Lacasta¹, Gilles Falquet², F. Javier Zarazaga-Soria¹, Javier Nogueras-Iso¹•Institutions (2)

University of Zaragoza¹, University of Geneva²

1 Jul 2016

TL;DR: A process that automatically analyzes the thesaurus properties and relations with respect to ISO 25964 specification, and suggests the correction of potential problems is described.

...read moreread less

Abstract: Thesauri are knowledge models commonly used for information classification and retrieval whose structure is defined by standards such as the ISO 25964. However, when creators do not correctly follow the specifications, they construct models with inadequate concepts or relations that provide a limited usability. This paper describes a process that automatically analyzes the thesaurus properties and relations with respect to ISO 25964 specification, and suggests the correction of potential problems. It performs a lexical and syntactic analysis of the concept labels, and a structural and semantic analyses of the relations. The process has been tested with Urbamet and Gemet thesauri and the results have been analyzed to determine how well the proposed process works.

...read moreread less

Journal Article•10.1016/J.DATAK.2016.06.003•

Question answering in conversations

[...]

Maryam Habibi, Parvaz Mahdabi¹, Andrei Popescu-Belis¹•Institutions (1)

Idiap Research Institute¹

1 Nov 2016

TL;DR: A query refinement method applied to questions asked by users to a system during a meeting or a conversation that they have with other users, which leverages the local context of the conversation along with semantic resources, either WordNet or word embeddings from word2vec.

...read moreread less

Abstract: This paper introduces a query refinement method applied to questions asked by users to a system during a meeting or a conversation that they have with other users. To answer the questions, the proposed method leverages the local context of the conversation along with semantic resources, either WordNet or word embeddings from word2vec. The method first represents the local context by extracting keywords from the transcript of the conversation, which is obtained from a real-time Automatic Speech Recognition (ASR) system and may contain noise. It then expands the queries with keywords that best represent the topic of the query, i.e. expansion keywords accompanied by weights indicating their topical similarity to the query. Finally, semantically related terms are added, using two options: either synonymous terms drawn from WordNet or similar words based on distributed representations in a low-dimensional word embedding space learned using word2vec. To evaluate the system, we introduce a dataset (named AREX for AMI Requests for Explanations) and an evaluation metric based on relevance judgments collected by crowdsourcing. We compare our query expansion approach with other methods, over queries from the AREX dataset, showing the superiority of our method when either manual or automatic transcripts of the AMI Meeting Corpus are used. This paper introduces a query refinement method applied to questions asked by users during a meeting.To answer the questions, our method leverages the local context with external semantic resources, either WordNet or word embeddings.The method first represents the local context by extracting keywords from the transcript of the conversation.The proposed method then expands the queries with keywords that best represent the topic of the query.We compare our query expansion approach, showing its superiority when manual or automatic transcripts are used.

...read moreread less

Journal Article•10.1016/J.DATAK.2016.09.001•

Reliability analysis of psoriasis decision support system in principal component analysis framework

[...]

Vimal K. Shrivastava¹, Narendra D. Londhe¹, Rajendra S. Sonawane, Jasjit S. Suri²•Institutions (2)

National Institute of Technology, Raipur¹, Idaho State University²

1 Nov 2016

TL;DR: A dermatology decision support system used for the classification of psoriasis images into diseased and healthy skin is presented and shows the encouraging results with higher accuracy, reliability, stability and retaining power of dominant features.

...read moreread less

Abstract: Reliability and accuracy are essential components in any decision support system. These become even more important with a rising number of features during the classification process in a machine learning paradigm. Further, the selection of an optimal feature set is of paramount importance for the best performance, reliable and stable decision support systems. This paper presents a dermatology decision support system used for the classification of psoriasis images into diseased and healthy skin. A comprehensive grayscale and color feature space with 87 features are explored. The classification system consists of a machine learning paradigm embedded with principal component analysis-based optimal feature selection. The system consists of both offline training classifier and online testing classifier phases. The training parameters are estimated using unique feature space and ground truth, a priori derived by the dermatologist. The training phase generates the offline coefficients using a training classifier which is then used for transforming the online test features for prediction of two skin classes: diseased vs. healthy. The proposed system using principal component analysis shows the best classification accuracy of 99.39% for a 10-fold cross-validation using polynomial kernel of order-2 on a set of 540 images. We validate our system by computing the reliability and stability indices. The results demonstrate a mean reliability index of 98.71% for 11 distinct data sizes, and meeting the stability criteria within 2% tolerance. The ability to retain the dominant features by inclusion of increasing set of features is 90.52%. Thus proposed system shows the encouraging results with higher accuracy, reliability, stability and retaining power of dominant features.

...read moreread less

Journal Article•10.1016/J.DATAK.2016.02.001•

An efficient top-k query processing framework in mobile sensor networks

[...]

Heejung Yang¹, Chin-Wan Chung¹, Myoung Ho Kim¹•Institutions (1)

KAIST¹

1 Mar 2016

TL;DR: This paper develops a filter-based data collection method which can save the energy consumption and provide more accurate query results and devise a data compression method for disconnected sensor nodes to deal with the problem of limited memory space of sensor nodes.

...read moreread less

Abstract: Mobile sensor networks consist of a number of sensor nodes which are capable of sensing, processing, communicating and moving. These mobile sensor nodes move around and explore their surrounding areas. Top-k queries are useful in many mobile sensor network applications. However, the mobility of sensor nodes incurs new challenges in addition to the problems of static sensor networks (i.e., resource constraints). Since mobile sensor nodes tend to move continuously, the network condition changes frequently and they consume considerably more energy than static sensor nodes. In this paper, we propose an efficient top-k query processing framework in a mobile sensor network environment called mSensor. To construct an efficient routing topology, we devise a mobility-aware routing method. Using the semantics of the top-k query, we develop a filter-based data collection method which can save the energy consumption and provide more accurate query results. We also devise a data compression method for disconnected sensor nodes to deal with the problem of limited memory space of sensor nodes. The performance of our proposed approach is extensively evaluated using synthetic data sets and real data sets. The results show the effectiveness of our approach.

...read moreread less

Journal Article•10.1016/J.DATAK.2015.11.004•

Scheduling ontology development projects

[...]

Mari Carmen Suárez-Figueroa¹, Asunción Gómez-Pérez¹, Mariano Fernández-López²•Institutions (2)

Technical University of Madrid¹, CEU San Pablo University²

1 Mar 2016

TL;DR: This paper has created a method and a tool for systematizing the scheduling of ontology development projects in the context of the NeOn Methodology and explains the methodological pillars in which method and tool are grounded.

...read moreread less

Abstract: In the ontology engineering field, key aspects of real-world business contexts are not normally taken into account. One of these crucial aspects is that of planning and scheduling. Software engineering practitioners use different approaches and tools for planning and scheduling software development projects, whereas their counterparts in ontology engineering encounter many problems to create project plans and schedules. To bridge the gap we have created a method and a tool (the latter called gOntt) for systematizing the scheduling of ontology development projects in the context of the NeOn Methodology. In this paper we try to explain the methodological pillars in which method and tool are grounded.

...read moreread less

Journal Article•10.1016/J.DATAK.2015.05.003•

A continuous reverse skyline query processing method in moving objects environments

[...]

Jongtae Lim¹, He Li², Kyoungsoo Bok¹, Jaesoo Yoo¹•Institutions (2)

Chungbuk National University¹, Xidian University²

1 Jul 2016

TL;DR: This paper proposes a new reverse skyline query processing method that efficiently processes a query over the moving objects and compares it with the previous reverse skyline queries method in various environments to show the superiority of the proposed method.

...read moreread less

Abstract: Many studies on reverse skyline query processing have been done for various services. The existing reverse skyline query processing methods are based on dynamic skylines. There are no reverse skyline query processing algorithms based on metric spaces for location-based services. The existing methods for processing a reverse skyline query have the limitation of service domains and require the high costs of computation to provide various location-based services. In this paper, we propose a new reverse skyline query processing method that efficiently processes a query over the moving objects. In addition, the proposed method processes a continuous reverse skyline query efficiently. In order to show the superiority of the proposed method, we compare it with the previous reverse skyline query processing method in various environments. As a result, the proposed method achieves better performance than the existing method.

...read moreread less

Journal Article•10.1016/J.DATAK.2016.06.002•

Pragmatic question answering: A game-theoretic approach

[...]

Jon Stevens, Anton Benz, Sebastian Reuße¹, Ralf Klabunde¹•Institutions (1)

Ruhr University Bochum¹

1 Nov 2016

TL;DR: Results of quantitative evaluations of a content selection scheme for answer generation in sales dialogue which is based on an interactive game-theoretic model of the dialogue scenario show that these answers are pragmatically natural and contribute to dialogue efficiency.

...read moreread less

Abstract: We present results of quantitative evaluations of a content selection scheme for answer generation in sales dialogue which is based on an interactive game-theoretic model of the dialogue scenario. The model involves representing a probability distribution over possible customer requirements, i.e., needs that must be met before a customer will agree to buy an object. Through game-theoretic analysis we derive a content selection procedure which constitutes an optimal strategy in the dialogue game. This procedure is capable of producing pragmatically appropriate indirect answers to yes/no questions, and is implemented in an online question answering system. Evaluation results show that these answers are pragmatically natural and contribute to dialogue efficiency. The model allows for systems that learn probabilities of customer requirements, both online and from previous data.

...read moreread less

Journal Article•10.1016/J.DATAK.2015.10.001•

Enterprise integration solution for power supply company based on GeoNis interoperability framework

[...]

Leonid Stoimenov¹, Nikola Davidovic¹, Aleksandar Stanimirović¹, Miloš Bogdanović¹, Dalibor Nikolic - Show less +1 more•Institutions (1)

University of Niš¹

1 Sep 2016

TL;DR: Enterprise integration of PD Jugoistok information systems relies on the GeoNis integration platform which uses translators/wrappers in order to map each existing data model into the common model partially based on the CIM defined in IEC 61968 series of standards.

...read moreread less

Abstract: Throughout the years, electric power utility companies have been relying on their massive multibillion worth infrastructure in the form of electric power network. Control and maintenance of the network has been usually done through a set of isolated systems where each one of them was in charge for their own subset of data. Latest changes in the market, imposed by deregulation and its opening have pushed electric power utility companies into major changes in their ICT infrastructure. Dynamism of the network that the upcoming Smart Grid concept will introduce requires better collaboration and information sharing among different systems within such companies.PD Jugoistok, electric power utility company of the south-eastern Serbia is not an exception to this trend. On the contrary, PD Jugoistok has taken the first steps towards the Smart Grid by implementing enterprise integration of its information systems. The introduced enterprise integration relies on the GeoNis integration platform which uses translators/wrappers in order to map each existing data model into the common model partially based on the CIM defined in IEC 61968 series of standards. Solution implemented within PD Jugoistok uses WebGIS Portal as a central point of access to the integrated information for all possible users within the company or among external partners and regulatory bodies. The use of WebGIS Portal is based on the unique possibility of GIS to provide information in the spatial context thus properly abstracting the geographically dispersed nature of the electric power network.

...read moreread less

Journal Article•10.1016/J.DATAK.2016.08.001•

A similarity-based framework for service repository integration

[...]

Fedelucio Narducci¹, Marco Comerio², Carlo Batini², Marco Castelli•Institutions (2)

University of Bari¹, University of Milano-Bicocca²

1 Nov 2016

TL;DR: This paper provides a conceptual model for describing services and semantic relationships among them and defines a multi-level similarity function that is able to discover similarities between services belonging to different repositories, and to suggest candidate relationships among services.

...read moreread less

Abstract: Nowadays, repositories of services are becoming increasingly useful in the management of many public and private service provider organizations. In order to make a repository an integrated representation of all services delivered in an organization, a unified representation is desirable. Since several repositories of services, each potentially characterized by heterogeneous and conflicting representations, may coexist in the same organization or in cooperating organizations, the need for service repository integration techniques is emerging. In this paper, we investigate the problem of integrating heterogeneous service repositories. We first provide a conceptual model for describing services and semantic relationships among them. Then, we define a multi-level similarity function that is able to discover similarities between services belonging to different repositories, and to suggest candidate relationships among services. The proposed function combines a simple keyword-based matching with a more complex semantic matching that exploits the Explicit Semantic Analysis technique for generating a representation of services based on Wikipedia concepts. These combined techniques are implemented in the SCAn (Service Correspondence Analyzer) framework that supports the human expert during the repository integration process. The framework has been evaluated in a real-life scenario and the results demonstrate the effectiveness of the proposed approach.

...read moreread less

Journal Article•10.1016/J.DATAK.2016.03.001•

LCA-based algorithms for efficiently processing multiple keyword queries over XML streams

[...]

Evandrino G. Barros¹, Alberto H. F. Laender², Mirella M. Moro², Altigran Soares da Silva³•Institutions (3)

Centro Federal de Educação Tecnológica de Minas Gerais¹, Universidade Federal de Minas Gerais², Federal University of Amazonas³

1 May 2016

TL;DR: This article proposes two new algorithms for processing multiple keyword queries over XML streams, both of which process keyword-based queries that require minimal or no schema knowledge to be formulated, follow the lowest common ancestor (LCA) semantics, and provide optimized methods to improve the overall performance.

...read moreread less

Abstract: In a stream environment, differently from traditional databases, data arrive continuously, unindexed and potentially unbounded, whereas queries must be evaluated for producing results on the fly. In this article, we propose two new algorithms (called SLCAStream and ELCAStream) for processing multiple keyword queries over XML streams. Both algorithms process keyword-based queries that require minimal or no schema knowledge to be formulated, follow the lowest common ancestor (LCA) semantics, and provide optimized methods to improve the overall performance. Moreover, SLCAStream, which implements the smallest LCA (SLCA) semantics, outperforms the state-of-the-art, with up to 49% reduction in response time and 36% in memory usage. In turn, ELCAStream is the first to explore the exclusive LCA (ELCA) semantics over XML streams. A comprehensive set of experiments evaluates several aspects related to performance and scalability of both algorithms, which shows they are effective alternatives to search services over XML streams.

...read moreread less

Journal Article•10.1016/J.DATAK.2016.02.003•

Nearest neighbor query processing using the network voronoi diagram

[...]

Mei-Tzu Wang¹•Institutions (1)

Chinese Culture University¹

1 May 2016

TL;DR: Research shows that the dominance relation introduced in this paper plays an important role in making that distinction between divisible paths from others, i.e., those whose midpoints are exactly their border points.

...read moreread less

Abstract: The purpose of this paper is twofold: to develop sound and complete rules, along with algorithms and data structures, to construct the network voronoi diagram (NVD) on a road network. To compute the NVD, attention is focused on how to distinguish divisible paths from others, i.e., those whose midpoints are exactly their border points. Research shows that the dominance relation introduced in this paper plays an important role in making that distinction. To generate and prune candidate paths concurrently, a border-point binary tree is introduced. The pre-computed NVD is organized as linked lists and is available for access by a NVD list-based query search method (NVDL), which can compute NN in a single step. Experiments show that the NVDL method reduces execution time by 28% for sparse data distribution on a real road network compared to the existing INE method. The NVDL's query time remains nearly constant regardless of how data points are distributed on the road network or where the query point is positioned. In addition, this approach prevents the NVDL from experiencing the slow convergence condition that often occurs when using the incremental approach.

...read moreread less

Journal Article•10.1016/J.DATAK.2015.11.001•

Corrigendum to Ontological anti-patterns

[...]

Tiago Prince Sales¹, Giancarlo Guizzardi¹•Institutions (1)

Universidade Federal do Espírito Santo¹

1 Jan 2016

Journal Article•10.1016/J.DATAK.2015.12.004•

CAT: A Cost-Aware Translator for SQL-query workflow to MapReduce jobflow

[...]

Aibo Song¹, Zhiang Wu², Xu Ma¹, Junzhou Luo¹•Institutions (2)

Southeast University¹, Nanjing University of Finance and Economics²

1 Mar 2016

TL;DR: This paper develops a novel Cost-Aware Translator (CAT), which adopts a cost estimation model for MapReduce jobflows to guide the selection of a more efficient MapRed reduce jobflows auto-generated by TD and BU merging strategies.

...read moreread less

Abstract: MapReduce is undoubtedly the most popular framework for large-scale processing and analysis of vast data sets in clusters of machines. To facilitate the easier use of MapReduce, SQL-like declarative languages and SQL-to-MapReduce translators have attracted increasing attentions recently. The SQL-to-MapReduce translator can automatically generate the MapReduce jobflow for each SQL query submitted by users, which significantly simplifies the interfacing between users and systems. Although a plethora of translators have been developed, the auto-generated MapReduce programs still suffered from extremely inefficiency. In this paper, we attempt to address this challenge by developing a novel Cost-Aware Translator (CAT). CAT has two notable features. First, it defines two intra-SQL correlations: Generalized Job Flow Correlation (GJFC) and Input Correlation (IC), based on which a set of looser merging rules are introduced. Thus, both Top-Down (TD) and Bottom-Up (BU) merging strategies are proposed and integrated into CAT simultaneously. Second, it adopts a cost estimation model for MapReduce jobflows to guide the selection of a more efficient MapReduce jobflows auto-generated by TD and BU merging strategies. Finally, comparative experiments on TPC-H benchmark demonstrate the effectiveness and scalability of CAT.

...read moreread less

Journal Article•10.1016/J.DATAK.2015.11.003•

An analysis of ontologies and their success factors for application to business

[...]

Christina Feilmayr¹, Wolfram Wöß¹•Institutions (1)

Johannes Kepler University of Linz¹

1 Jan 2016

TL;DR: This research work analyzes and clarifies the term ontology and points out its difference from taxonomy, and proposes guidelines for selecting an appropriate model, methodology, and tool set to meet customer requirements while making most efficient use of resources.

...read moreread less

Abstract: Ontologies have been less successful than they could be in large-scale business applications due to a wide variety of interpretations. This leads to confusion, and consequently, people from various research communities use the term with different – sometimes incompatible – meanings. This research work analyzes and clarifies the term ontology and points out its difference from taxonomy. By way of two business case studies, both their potential in ontological engineering and the perceived requirements for ontologies are highlighted, and their misuse in research and business is discussed. In order to examine the case for applying ontologies in a specific domain or use case, the main benefits of using ontologies are defined and categorized as technical-centered or user-centered. Key factors that influence the use of ontologies in business applications are derived and discussed. Finally, the paper offers a recommendation for efficiently applying ontologies, including adequate representation languages and an ontological engineering process supported by reference ontologies. To answer the questions of when ontologies should be used, how they can be used efficiently, and when they should not be used, we propose guidelines for selecting an appropriate model, methodology, and tool set to meet customer requirements while making most efficient use of resources.

...read moreread less

Journal Article•10.1016/J.DATAK.2016.02.002•

A novel and powerful hybrid classifier method

[...]

Hamdi Tolga Kahraman¹•Institutions (1)

Karadeniz Technical University¹

1 May 2016

TL;DR: A novel weight-tuning method is introduced by applying ABC-based heuristic searching approach, and a powerful similarity measurement method, which is called the fuzzy distance metric, is explained and extended to measure the distances between the test and training observations.

...read moreread less

Abstract: Weight-tuning methods and distance metrics have a significant impact on the k-nearest neighbor-based classification. A major challenge is the issue of how to explore the optimal weight values of the features and how to measure distances between the neighbors affecting the classification accuracy of the k-nn. In this paper, a powerful similarity measurement method, which is called the fuzzy distance metric, is explained and extended to measure the distances between the test and training observations. Depending on the fuzzy metric, similarity arrays can be produced more efficiently than the classic and other weighted distance measurements. Finally, the weighting methods are combined with the fuzzy metric-based similarity measurement and the k-nearest neighbor algorithm to increase the classification accuracy of the proposed algorithm. The effectiveness of the proposed approaches is proven by comparing their performances with the performances of the classic and the population-based heuristic methods on the well-known, real-world classification problems obtained from the UCI machine-learning benchmark repository. The experimental results show that the proposed hybrid algorithms significantly explore more optimal weight vectors significantly and provide more accurate classification results than the powerful and well-known instance-based intuitive and heuristic classification algorithms and classic approaches over real datasets. A Novel and Powerful Hybrid Classifier Method has been developed.A novel weight-tuning method is introduced by applying ABC-based heuristic searching approach.A powerful similarity measurement method has been introduced.Experimental results show that the proposed hybrid algorithms significantly improves classification results of the well-known instance-based intuitive and heuristic classification algorithms over real datasets

...read moreread less

Journal Article•10.1016/J.DATAK.2016.06.001•

Information extraction for knowledge base construction in the music domain

[...]

Sergio Oramas¹, Luis Espinosa-Anke¹, Mohamed Sordo², Horacio Saggion¹, Xavier Serra¹ - Show less +1 more•Institutions (2)

Pompeu Fabra University¹, University of Miami²

1 Nov 2016

TL;DR: This paper presents and evaluates an Information Extraction pipeline aimed at the construction of a Music Knowledge Base, and demonstrates that the method is able to discover novel facts with high precision, which are missing in current generic as well as music-specific knowledge repositories.

...read moreread less

Abstract: The rate at which information about music is being created and shared on the web is growing exponentially. However, the challenge of making sense of all this data remains an open problem. In this paper, we present and evaluate an Information Extraction pipeline aimed at the construction of a Music Knowledge Base. Our approach starts off by collecting thousands of stories about songs from the songfacts.com website. Then, we combine a state-of-the-art Entity Linking tool and a linguistically motivated rule-based algorithm to extract semantic relations between entity pairs. Next, relations with similar semantics are grouped into clusters by exploiting syntactic dependencies. These relations are ranked thanks to a novel confidence measure based on statistical and linguistic evidence. Evaluation is carried out intrinsically, by assessing each component of the pipeline, as well as in an extrinsic task, in which we evaluate the contribution of natural language explanations in music recommendation. We demonstrate that our method is able to discover novel facts with high precision, which are missing in current generic as well as music-specific knowledge repositories. A system that constructs a Music Knowledge Base entirely from scratch.A method for clustering and scoring relations in a Relation Extraction pipeline.Reveals music facts absent from knowledge repositories (e.g. Wikipedia).Explains music recommendations in natural language.

...read moreread less

Journal Article•10.1016/J.DATAK.2016.04.001•

Supporting interoperability in complex adaptive enterprise systems

[...]

Georg Weichhart¹, Wided Guédria, Yannick Naudet•Institutions (1)

Steyr Mannlicher¹

1 Sep 2016

TL;DR: The research interoperability infrastructure provides components to address the decentralised nature of a CAS by providing software agents and agent interaction protocols that facilitate the identification of interoperability problems and agent negotiations to find solutions.

...read moreread less

Abstract: From a Complex Adaptive Systems (CAS) theory perspective a new approach for supporting Enterprise Interoperability (EI) is described. Particular needs informed by the theory are presented and a software environment supporting these requirements is proposed. The infrastructure aims at serving as a tool for solving problems in the EI domain, and includes a Domain Specific Language (DSL) supporting engineering interoperability solutions. The Ontology of Enterprise Interoperability (OoEI) provides the underlying conceptualisation of the Enterprise Interoperability (EI) domain and is used as basis. The DSL enhances the ontology with CAS related concepts. The CAS perspective provides a particular focus on dynamic aspects, which requires a new approach currently only addressed to a limited extend. The research interoperability infrastructure provides components to address the decentralised nature of a CAS by providing software agents and agent interaction protocols that facilitate the identification of interoperability problems and agent negotiations to find solutions. It is realised using the functional programming language Scala.

...read moreread less

Journal Article•10.1016/J.DATAK.2015.09.001•

SWRL rule-selection methodology for ontology interoperability

[...]

Tarcisio Mendes de Farias, Ana Roxin¹, Christophe Nicolle¹•Institutions (1)

Centre national de la recherche scientifique¹

1 Sep 2016

TL;DR: A novel approach is described that allows, for a given query, to ignore unnecessary rules, and it is proved that this approach allows considerably minimizing query execution time.

...read moreread less

Abstract: Data interoperability represents a great challenge for today's enterprises. Indeed, they use various information systems, each relying on several different models for data representation. Ontologies and notably ontology matching have been recognized as interesting approaches for solving the data interoperability problem. In this paper, we focus on improving the performance of queries addressed over ontology alignments expressed through SWRL rules. Indeed, when considering the context of executing queries over complex and numerous alignments, the number of SWRL rules highly impacts the query execution time. Moreover, when hybrid or backward-chaining reasoning is applied, the query execution time may grow exponentially. Still, the reasoners involved deliver performant results (in terms of execution time) when applied over reduced and simpler rule sets. Based on this statement, and to address the issue of improving the query execution time, we describe a novel approach that allows, for a given query, to ignore unnecessary rules. The proposed Rule Selector (RS) is a middleware between the considered systems and the reasoner present on the triple store side. Through the benchmarks realized we prove that our approach allows considerably minimizing query execution time.

...read moreread less