Scispace (Formerly Typeset)
  1. Home
  2. Conferences
  3. Data and Knowledge Engineering
  4. 2015
  1. Home
  2. Conferences
  3. Data and Knowledge Engineering
  4. 2015
Showing papers presented at "Data and Knowledge Engineering in 2015"
Journal Article•10.1016/J.DATAK.2015.07.010•
The Baquara2 knowledge-based framework for semantic enrichment and analysis of movement data

[...]

Renato Fileto1, Cleto May1, Chiara Renso2, Nikos Pelekis3, Douglas Klein1, Yannis Theodoridis3 •
Universidade Federal de Santa Catarina1, Istituto di Scienza e Tecnologie dell'Informazione2, University of Piraeus3
1 Jul 2015
TL;DR: The Baquara2 framework provides an ontological model for structuring and abstracting movement data in a multilevel hierarchy of progressively detailed movement segments that generalize concepts such as trajectories, stops, and moves and enables queries for movement analyses based on application and domain specific knowledge.
Abstract: The analysis of movements frequently requires more than just spatio-temporal data. Thus, despite recent progresses in trajectory handling, there is still a gap between movement data and formal semantics. This gap hinders movement analyses benefiting from available knowledge, with well-defined and widely agreed semantics. This article describes the Baquara2 framework to help narrow this gap by exploiting knowledge bases to semantically enrich and analyze movement data. It provides an ontological model for structuring and abstracting movement data in a multilevel hierarchy of progressively detailed movement segments that generalize concepts such as trajectories, stops, and moves. Baquara2 also includes a general customizable process to annotate movement data with concepts and objects described in ontologies and Linked Open Data (LOD) collections. The resulting semantic annotations enable queries for movement analyses based on application and domain specific knowledge. The proposed framework has been used in experiments to semantically enrich movement data collected from social media with geo-referenced LOD. The obtained results enable powerful queries that illustrate Baquara2 capabilities.

73 citations

Journal Article•10.1016/J.DATAK.2015.07.007•
Modelling and reasoning about security requirements in socio-technical systems

[...]

Elda Paja1, Fabiano Dalpiaz2, Paolo Giorgini1•
University of Trento1, Utrecht University2
1 Jul 2015
TL;DR: This paper proposes the STS approach for modelling and reasoning about security requirements, and applies it to a case study about e-Government, and reports on promising scalability results of the implementation.
Abstract: Modern software systems operate within the context of larger socio-technical systems, wherein they interact-by exchanging data and outsourcing tasks-with other technical components, humans, and organisations. When interacting, these components (actors) operate autonomously; as such, they may disclose confidential information without being authorised, wreck the integrity of private data, rely on untrusted third parties, etc. Thus, the design of a secure software system shall begin with a thorough analysis of its socio-technical context, thereby considering not only technical attacks, but also social and organisational ones.In this paper, we propose the STS approach for modelling and reasoning about security requirements. In STS, security requirements are specified, via the STS-ml requirements modelling language, as contracts that constrain the interactions among the actors in the socio-technical system. The requirements models of STS-ml have a formal semantics which enables automated reasoning for detecting possible conflicts among security requirements as well as conflicts between security requirements and actors' business policies. We apply STS to a case study about e-Government, and report on promising scalability results of our implementation.

69 citations

Journal Article•10.1016/J.DATAK.2015.06.009•
An incremental approach to attribute reduction from dynamic incomplete decision systems in rough set theory

[...]

Wenhao Shu1, Wenbin Qian•
East China Jiaotong University1
1 Nov 2015
TL;DR: Compared with other attribute reduction algorithms, the proposed algorithms can effectively reduce the time required for reduct computations without losing the classification performance.
Abstract: Attribute reduction is an important preprocessing step in data mining and knowledge discovery. The effective computation of an attribute reduct has a direct bearing on the efficiency of knowledge acquisition and various related tasks. In real-world applications, some attribute values for an object may be incomplete and an object set may vary dynamically in the knowledge representation systems, also called decision systems in rough set theory. There are relatively few studies on attribute reduction in such systems. This paper mainly focuses on this issue. For the immigration and emigration of a single object in the incomplete decision system, an incremental attribute reduction algorithm is developed to compute a new attribute reduct, rather than to obtain the dynamic system as a new one that has to be computed from scratch. In particular, for the immigration and emigration of multiple objects in the system, another incremental reduction algorithm guarantees that a new attribute reduct can be computed on the fly, which avoids some re-computations. Compared with other attribute reduction algorithms, the proposed algorithms can effectively reduce the time required for reduct computations without losing the classification performance. Experiments on different real-life data sets are conducted to test and demonstrate the efficiency and effectiveness of the proposed algorithms.

61 citations

Journal Article•10.1016/J.DATAK.2015.02.001•
Efficient mining of platoon patterns in trajectory databases

[...]

Yuxuan Li1, James Bailey1, Lars Kulik1•
University of Melbourne1
1 Nov 2015
TL;DR: This work proposes a novel algorithm to efficiently retrieve platoon patterns in large trajectory databases, using several pruning techniques, and demonstrates that the algorithm is able to achieve several orders of magnitude improvement in running time, compared to an existing method for retrieving moving object clusters.
Abstract: The widespread use of localization technologies produces increasing quantities of trajectory data. An important task in the analysis of trajectory data is the discovery of moving object clusters, i.e., moving objects that travel together for a period of time. Algorithms for the discovery of moving object clusters operate by applying constraints on the consecutiveness of timestamps. However, existing approaches either use a very strict timestamp constraint, which may result in the loss of interesting patterns, or a very relaxed timestamp constraint, which risks discovering noisy patterns. To address this challenge, we introduce a new type of moving object pattern called the platoon pattern.We propose a novel algorithm to efficiently retrieve platoon patterns in large trajectory databases, using several pruning techniques. Our experiments on both real data and synthetic data evaluate the effectiveness and efficiency of our approach and demonstrate that our algorithm is able to achieve several orders of magnitude improvement in running time, compared to an existing method for retrieving moving object clusters.

60 citations

Journal Article•10.1016/J.DATAK.2014.11.004•
Design of computationally efficient density-based clustering algorithms

[...]

Satyasai Jagannath Nanda1, Satyasai Jagannath Nanda2, Ganapati Panda1, Ganapati Panda2•
Malaviya National Institute of Technology, Jaipur1, Indian Institute of Technology Bhubaneswar2
1 Jan 2015
TL;DR: A new strategy to reduce the computational complexity associated with the DBSCAN is proposed by efficiently implementing new merging criteria at the initial stage of evolution of clusters by considering correlation coefficient as similarity measure.
Abstract: The basic DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm uses minimum number of input parameters, very effective to cluster large spatial databases but involves more computational complexity. The present paper proposes a new strategy to reduce the computational complexity associated with the DBSCAN by efficiently implementing new merging criteria at the initial stage of evolution of clusters. Further new density based clustering (DBC) algorithms are proposed considering correlation coefficient as similarity measure. These algorithms though computationally not efficient, found to be effective when there is high similarity between patterns of dataset. The computations associated with DBC based on correlation algorithms are reduced with new cluster merging criteria. Test on several synthetic and real datasets demonstrates that these computationally efficient algorithms are comparable in accuracy to the traditional one. An interesting application of the proposed algorithm has been demonstrated to identify the regional hazard regions present in the seismic catalog of Japan.

43 citations

Journal Article•10.1016/J.DATAK.2015.06.004•
Ontological anti-patterns

[...]

Tiago Prince Sales1, Giancarlo Guizzardi1•
Universidade Federal do Espírito Santo1
1 Sep 2015
TL;DR: A computational tool is presented that is able to automatically identify these anti-patterns in user's models, guide users in assessing their consequences, and generate corrections to these models by the automatic inclusion of OCL constraints implementing the proposed refactoring plans.
Abstract: The construction of large-scale reference conceptual models is a complex engineering activity. To develop high-quality models, a modeler must have the support of expressive engineering tools such as theoretically well-founded modeling languages and methodologies, patterns and anti-patterns and automated supporting environments. This paper proposes a set of Ontological Anti-Patterns for Ontology-Driven Conceptual Modeling. These anti-patterns capture error-prone modeling decisions that can result in the creation of models that fail to exclude unintended model instances (representing unintended state of affairs) or forbid intended ones (representing intended states of affairs). The anti-patterns presented here have been empirically elicited through an approach of conceptual models validation via visual simulation. The paper also presents a series of refactoring plans for rectifying the models in which these anti-patterns occur. In addition, we present here a computational tool that is able to: automatically identify these anti-patterns in user's models, guide users in assessing their consequences, and generate corrections to these models by the automatic inclusion of OCL constraints implementing the proposed refactoring plans. Finally, the paper also presents an empirical study for assessing the harmfulness of each of the uncovered anti-patterns (i.e., the likelihood that its occurrence in a model entails unintended consequences) as well as the effectiveness of the proposed refactoring plans.

40 citations

Journal Article•10.1016/J.DATAK.2015.06.007•
Adoption of OSS components

[...]

Lidia López1, Dolors Costal1, Claudia P. Ayala1, Xavier Franch1, Maria Carmela Annosi2, Ruediger Glott, Kirsten Haaland •
Polytechnic University of Catalonia1, Ericsson2
1 Sep 2015
TL;DR: This paper proposes to model OSS adoption strategies using a goal-oriented notation, in which different actors state their objectives and dependencies on each other, and introduces the notion of model coverage, which allows to measure the degree of concordance among every strategy with the model of the organization by comparing the respective models.
Abstract: Open Source Software (OSS) has become a strategic asset for a number of reasons, such as short time-to-market software delivery, reduced development and maintenance costs, and its customization capabilities. Therefore, organizations are increasingly becoming OSS adopters, either as a result of a strategic decision or because it is almost unavoidable nowadays, given the fact that most commercial software also relies at some extent in OSS infrastructure. The way in which organizations adopt OSS affects and shapes their businesses. Therefore, knowing the impact of different OSS adoption strategies in the context of an organization may help improving the processes undertaken inside this organization and ultimately pave the road to strategic moves. In this paper, we propose to model OSS adoption strategies using a goal-oriented notation, in which different actors state their objectives and dependencies on each other. These models describe the consequences of adopting one such strategy or another: which are the strategic and operational goals that are supported, which are the resources that emerge, etc. The models rely on an OSS ontology, built upon a systematic literature review, which comprises the activities and resources that characterize these strategies. Different OSS adoption strategy models arrange these ontology elements in diverse ways. In order to assess which is the OSS adoption strategy that better fits the organization needs, the notion of model coverage is introduced, which allows to measure the degree of concordance among every strategy with the model of the organization by comparing the respective models. The approach is illustrated with an example of application in a big telecommunications company.

27 citations

Journal Article•10.1016/J.DATAK.2015.06.012•
Hiding outliers into crowd

[...]

Hui Wang1, Ruilin Liu1•
Stevens Institute of Technology1
1 Nov 2015
TL;DR: This paper defines the distinguishability-based attack by which the adversary can identify outliers and reveal their private information from an anonymized dataset, and designs efficient algorithms to anonymize the dataset to achieve plain ?
Abstract: In recent years, many organizations publish their data in non-aggregated format for research purpose. However, publishing non-aggregated data raises serious concerns in data privacy. One of the concerns is that when outliers exist in the dataset, they are easier to be distinguished from the crowd and their privacy is prone to be compromised. In this paper, we study the problem of privacy-preserving publishing datasets that contain outliers. We define the distinguishability-based attack by which the adversary can identify outliers and reveal their private information from an anonymized dataset. We show that the existing syntactic privacy models (e.g., k-anonymity and ?-diversity) cannot defend against the distinguishability-based attack. We define the plain ?-diversity to provide privacy guarantee to outliers against the distinguishability-based attack, and design efficient algorithms to anonymize the dataset to achieve plain ?-diversity with low information loss. We extend our anonymization approach to deal with continuous release of a series of datasets that contain outliers. Our experiments demonstrate the efficiency and effectiveness of our approaches.

23 citations

Journal Article•10.1016/J.DATAK.2015.06.008•
A hybrid possibilistic approach for Arabic full morphological disambiguation

[...]

Ibrahim Bounhas1, Raja Ayed2, Bilel Elayeb2, Bilel Elayeb3, Narjès Bellamine Ben Saoud2, Narjès Bellamine Ben Saoud4 •
Carthage College1, Manouba University2, Emirates College of Technology3, Tunis El Manar University4
1 Nov 2015
TL;DR: This paper investigates new approaches to disambiguate the morphological features of non-vocalized Arabic texts, combining statistical classification and linguistic rules, and presents an approach dealing with unknown (Out-of-Vocabulary) words.
Abstract: Morphological ambiguity is an important phenomenon affecting several tasks in Arabic text analysis, indexing and mining. Nevertheless, it has not been well studied in related works. We investigate, in this paper, new approaches to disambiguate the morphological features of non-vocalized Arabic texts, combining statistical classification and linguistic rules. Indeed, we perform unsupervised training from unlabelled vocalized Arabic corpora. Thus, the training and testing sets contain imperfect instances (i.e. having ambiguous attributes and/or classes). To handle imperfect data, we compare two approaches: i) a possibilistic approach allowing to handle imperfection in a direct manner; and, ii) a data transformation-based approach permitting to convert an imperfect dataset to a perfect one, thus allowing to exploit classical classifiers. We also present an approach dealing with unknown (Out-of-Vocabulary) words. The experiments focus mainly on classical texts, which were not sufficiently studied in related works. We show that the possibilistic approach performs better than the transformation-based one. Besides, we report encouraging results as far as i) the role of linguistic rules in enhancing the disambiguation rates; and, ii) the accuracy of our approach for full morphological disambiguation of unknown words.

21 citations

Journal Article•10.1016/J.DATAK.2015.06.002•
Cardinality constraints on qualitatively uncertain data

[...]

Neil Hall1, Henning Koehler2, Sebastian Link1, Henri Prade3, Xiaofang Zhou4 •
University of Auckland1, Massey University2, University of Toulouse3, Soochow University (Suzhou)4
1 Sep 2015
TL;DR: This work describes the associated implication problem axiomatically and algorithmically in linear input time, and shows how to visualize any given set of cardinality constraints in the form of an Armstrong sketch.
Abstract: Modern applications require advanced techniques and tools to process large volumes of uncertain data. For that purpose we introduce cardinality constraints as a principled tool to control the occurrences of uncertain data. Uncertainty is modeled qualitatively by assigning to each object a degree of possibility by which the object occurs in an uncertain instance. Cardinality constraints are assigned a degree of certainty that stipulates on which objects they hold. Our framework empowers users to model uncertainty in an intuitive way, without the requirement to put a precise value on it. Our class of cardinality constraints enjoys a natural possible world semantics, which is exploited to establish several tools to reason about them. We characterize the associated implication problem axiomatically and algorithmically in linear input time. Furthermore, we show how to visualize any given set of our cardinality constraints in the form of an Armstrong sketch. Even though the problem of finding an Armstrong sketch is precisely exponential, our algorithm computes a sketch with conservative use of time and space. Data engineers may therefore compute Armstrong sketches that they can jointly inspect with domain experts in order to consolidate the set of cardinality constraints meaningful for a given application domain. Cardinality constraints on qualitatively uncertain data are introducedThe constraints help control the number of occurrences of uncertain dataThis ability has applications in integrity enforcement and query processingAxiomatic and algorithmic solutions are given for their implication problemArmstrong sketches finitely represent possibly infinite Armstrong samples

21 citations

Journal Article•10.1016/J.DATAK.2015.07.008•
Improving business process intelligence by observing object state transitions

[...]

Nico Herzberg1, Andreas Meyer1, Mathias Weske1•
Hasso Plattner Institute1
1 Jul 2015
TL;DR: This paper uses object state transitions as additional monitoring information, so-called object state transition events, based on the enablement and termination of activities and provides the basis for process monitoring and analysis in terms of a large event log.
Abstract: During the execution of business processes several events happen that are recorded in the company's information systems. These events deliver insights into process executions so that process monitoring and analysis can be performed resulting, for instance, in prediction of upcoming process steps or the analysis of the run time of single steps. While event capturing is trivial when a process engine with integrated logging capabilities is used, manual process execution environments do not provide automatic logging of events, so that typically external devices, like bar code scanners, have to be used. As experience shows, these manual steps are error-prone and induce additional work. Therefore, we use object state transitions as additional monitoring information, so-called object state transition events. Based on these object state transition events, we reason about the enablement and termination of activities and provide the basis for process monitoring and analysis in terms of a large event log. In this paper, we present the concept to utilize information from these object state transition events for capturing process progress. Furthermore, we discuss a methodology to create the required design time artifacts that then are used for monitoring at run time. In a proof-of-concept implementation, we show how the design time and run time side work and prove applicability of the introduced concept of object state transition events.
Journal Article•10.1016/J.DATAK.2015.05.005•
A novel methodology for retrieving infographics utilizing structure and message content

[...]

Zhuo Li1, Sandra Carberry1, Hui Fang1, Kathleen F. McCoy1, Kelly Peterson1, Matthew Stagitis1 •
University of Delaware1
1 Nov 2015
TL;DR: A novel methodology for retrieving infographics from a digital library that takes into account a graphic's structural and message content is presented, and it significantly outperforms a baseline method that treats queries and graphics as bags of words.
Abstract: Information graphics (infographics) in popular media are highly structured knowledge representations that are generally designed to convey an intended message This paper presents a novel methodology for retrieving infographics from a digital library that takes into account a graphic's structural and message content The retrieval methodology can be summarized thus: 1) hypothesize requisite structural and message content from a natural language query, 2) measure the relevance of each candidate infographic to the requisite structural and message content hypothesized from the user query, and 3) integrate these relevance measurements via a linear combination model in order to produce a ranked list of infographics in response to the user query The methodology has been implemented and evaluated, and it significantly outperforms a baseline method that treats queries and graphics as bags of words
Journal Article•10.1016/J.DATAK.2015.07.003•
Ontology-based mappings

[...]

Giansalvatore Mecca, Guillem Rull1, Donatello Santoro, Ernest Teniente2•
University of Barcelona1, Polytechnic University of Catalonia2
1 Jul 2015
TL;DR: A translation algorithm is developed that automatically rewrites a mapping from the source schema to the target ontology into an equivalent mapping fromThe source to thetarget databases.
Abstract: Data translation consists of the task of moving data from a source database to a target database. This task is usually performed by developing mappings, i.e. executable transformations from the source to the target schema. However, a richer description of the target database semantics may be available in the form of an ontology. This is typically defined as a set of views over the base tables that provides a unified conceptual view of the underlying data. We investigate how the mapping process changes when such a rich conceptualization of the target database is available. We develop a translation algorithm that automatically rewrites a mapping from the source schema to the target ontology into an equivalent mapping from the source to the target databases. Then, we show how to handle this problem when an ontology is available also for the source. Differently from previous approaches, the language we use in view definitions has the full power of non-recursive Datalog with negation. In the paper, we study the implications of adopting such an expressive language. Experiments are conducted to illustrate the trade-off between expressibility of the view language and efficiency of the chase engine used to perform the data exchange.
Journal Article•10.1016/J.DATAK.2015.06.006•
Computing repairs for constraint violations in UML/OCL conceptual schemas

[...]

Xavier Oriol1, Ernest Teniente1, Albert Tort•
Polytechnic University of Catalonia1
1 Sep 2015
TL;DR: This work follows here an alternative approach aimed at automatically computing the repairs of an update, i.e., the minimum additional changes that, when applied together with the requested update, bring the information base to a new state where all constraints are satisfied.
Abstract: Updating the contents of an information base may violate some of the constraints defined over the schema. The classical way to deal with this problem has been to reject the requested update when its application would lead to some constraint violation. We follow here an alternative approach aimed at automatically computing the repairs of an update, i.e., the minimum additional changes that, when applied together with the requested update, bring the information base to a new state where all constraints are satisfied. Our approach is independent of the language used to define the schema and the constraints, since it is based on a logic formalization of both, although we apply it to UML and OCL because they are widely used in the conceptual modeling community.Our method can be used for maintaining the consistency of an information base after the application of some update, and also for dealing with the problem of fixing up non-executable operations. The fragment of OCL that we use to define the constraints has the same expressiveness as relational algebra and we also identify a subset of it which provides some nice properties in the repair-computation process. Experiments are conducted to analyze the efficiency of our approach.
Journal Article•10.1016/J.DATAK.2015.04.003•
Towards accurate predictors of word quality for Machine Translation

[...]

Ngoc Quang Luong, Laurent Besacier, Benjamin Lecouteux
1 Mar 2015
TL;DR: A method that combines multiple "weak" classifiers to constitute a strong "composite" classifier by taking advantage of their complementarity allows us to achieve a significant improvement in terms of F-score, for both fr-en and en-es systems.
Abstract: This paper proposes some ideas to build effective estimators, which predict the quality of words in a Machine Translation (MT) output. We propose a number of novel features of various types (system-based, lexical, syntactic and semantic) and then integrate them into the conventional (previously used) feature set, for our baseline classifier training. The classifiers are built over two different bilingual corpora: French-English (fr-en) and English-Spanish (en-es). After the experiments with all features, we deploy a "Feature Selection" strategy to filter the best performing ones. Then, a method that combines multiple "weak" classifiers to constitute a strong "composite" classifier by taking advantage of their complementarity allows us to achieve a significant improvement in terms of F-score, for both fr-en and en-es systems. Finally, we exploit word confidence scores for improving the quality estimation system at sentence level.
Journal Article•10.1016/J.DATAK.2015.05.004•
Extraction and clustering of arguing expressions in contentious text

[...]

Amine Trabelsi1, Osmar R. Zaïane1•
University of Alberta1
1 Nov 2015
TL;DR: A Joint Topic Viewpoint (JTV) probabilistic model to analyze the underlying divergent arguing expressions that may be present in a collection of contentious documents and empirically demonstrates a better clustering of arguing expressions over state-of-the art and baseline methods.
Abstract: This work proposes an unsupervised method intended to enhance the quality of opinion mining in contentious text. It presents a Joint Topic Viewpoint (JTV) probabilistic model to analyze the underlying divergent arguing expressions that may be present in a collection of contentious documents. The conceived JTV has the potential of automatically carrying the tasks of extracting associated terms denoting an arguing expression, according to the hidden topics it discusses and the embedded viewpoint it voices. Furthermore, JTV's structure enables the unsupervised grouping of obtained arguing expressions according to their viewpoints, using a proposed constrained clustering algorithm which is an adapted version of the constrained k-means clustering (COP-KMEANS). Experiments are conducted on three types of contentious documents (polls, online debates and editorials), through six different contentious data sets. Quantitative evaluations of the topic modeling output, as well as the constrained clustering results show the effectiveness of the proposed method to fit the data and generate distinctive patterns of arguing expressions. Moreover, it empirically demonstrates a better clustering of arguing expressions over state-of-the art and baseline methods. The qualitative analysis highlights the coherence of clustered arguing expressions of the same viewpoint and the divergence of opposing ones.
Journal Article•10.1016/J.DATAK.2015.09.003•
Hiding multiple solutions in a hard 3-SAT formula

[...]

Ran Liu1, Wenjian Luo1, Lihua Yue1•
University of Science and Technology of China1
1 Nov 2015
TL;DR: The objective of this paper is to propose algorithms which could cancel the attraction to the multiple predefined solutions simultaneously, and the core element of these proposed algorithms is misguiding the SAT solvers with local search strategy to the reverse direction of the centre solution of the multiplepredefined solutions.
Abstract: Hiding solutions in 3-SAT formulas can be used in privacy protection and data security. Although the typical q -hidden algorithm could cancel the attraction to the unique predefined solution, and generate deceptive 3-SAT formulas with unique predefined solution, few works have mentioned that with multiple predefined solutions. Therefore, the objective of this paper is to propose algorithms which could cancel the attraction to the multiple predefined solutions simultaneously. The core element of these proposed algorithms is misguiding the SAT solvers with local search strategy to the reverse direction of the centre solution of the multiple predefined solutions, so that the attraction to the multiple predefined solutions can be cancelled simultaneously. Experimental results verify the behaviour of the two classical SAT solvers: the SAT solvers with local search strategy (such as WalkSAT) and that with DPLL strategy (such as zChaff). And a real-world application is introduced based on the proposed algorithm.
Journal Article•10.1016/J.DATAK.2015.04.006•
Fast updated frequent-itemset lattice for transaction deletion

[...]

Bay Vo1, Tuong Le2, Tzung-Pei Hong3, Bac Le•
Ho Chi Minh City University of Technology1, Ton Duc Thang University2, National Sun Yat-sen University3
1 Mar 2015
TL;DR: This paper proposes an approach for maintaining FILs for transaction deletion without rescanning the original database if the number of eliminated transactions is smaller than the threshold determined based on the pre-large and diffset concepts.
Abstract: The frequent-itemset lattice (FIL) is an effective structure for mining association rules. However, building an FIL for a modified database requires a lot of time and memory. Currently, there is no approach for updating an FIL with deleted transactions. Therefore, this paper proposes an approach for maintaining FILs for transaction deletion without rescanning the original database if the number of eliminated transactions is smaller than the threshold determined based on the pre-large and diffset concepts. A diffset-based approach is first used for fast building an FIL. Then, two proposed approaches (tidset-based and diffset-based) are used for updating the FIL with transaction deletion. The experiment was conducted to show that the diffset-based approach outperforms the tidset-based and the batch-mode approaches.
Journal Article•10.1016/J.DATAK.2014.11.003•
Stepwise structural verification of cyclic workflow models with acyclic decomposition and reduction of loops

[...]

Yongsun Choi1, Pauline Kongsuwan1, Cheol Min Joo2, J. Leon Zhao3•
Inje University1, Dongseo University2, City University of Hong Kong3
1 Jan 2015
TL;DR: A novel structural verification approach for cyclic workflow models by means of acyclic decomposition and reduction of loops is introduced and its execution result shows that, while providing diagnostic information, the proposed approach can handle workflow models with arbitrary cycles effectively.
Abstract: Existence of cycles (or loops) is one of the main sources that make the analysis of workflow models difficult. Several approaches of structural verification exist in the literature, but how to verify cyclic workflow models efficiently in a comprehensible form remains an open research question. Thus, a novel structural verification approach for cyclic workflow models by means of acyclic decomposition and reduction of loops is introduced in this paper with the following contributions. First, acyclic decomposition of natural loops, further enhanced by reduction of nested loops, enables existing verification techniques, normally dealing with acyclic models, to handle workflow models with natural loops. Second, instantiation of an irreducible loop into natural loops, altogether with reduction of concurrent loop entries, enables the proposed approach to handle workflow models with irreducible loops. Last, diagnostic information, provided by the proposed approach, helps stakeholders correct and improve their workflow models. Two examples are provided to show that the proposed approach is systematic and practical. In addition, a prototype of the proposed approach is developed. Its execution result shows that, while providing diagnostic information, the proposed approach can handle workflow models with arbitrary cycles effectively.
Journal Article•10.1016/J.DATAK.2015.04.004•
A user-centered approach for integrating social data into groups of interest

[...]

Xuan-Truong Vu1, Marie-Hélène Abel1, Pierre Morizet-Mahoudeaux1•
University of Technology of Compiègne1
1 Mar 2015
TL;DR: A new user-centered approach for integrating social data into groups of interest that makes it possible for a group to tap into its members' social data scattered over different social network sites and extract from these data the information relevant to the group's topic of interests.
Abstract: Social network sites with large-scale public networks like Facebook, Twitter or LinkedIn have become a very important part of our daily life. Users are increasingly connected to these services for publishing and sharing information and contents with others. Social network sites have therefore become a powerful source of contents of interest, part of which may fall into the scope of interests of a given group. So far, no efficient solution has been proposed for a group of interest to tap into social data, especially when they are protected by and scattered across different social network sites. We have therefore proposed a user-centered approach for integrating social data into groups of interests. This approach makes it possible to aggregate social data of the group's members and extract from these data the information relevant to the group's topic of interests. Moreover, it follows a user-centered design allowing each member to personalize his/her sharing settings and interests within their respective groups. We describe in this paper the conceptual and technical components of the proposed approach. To illustrate further the approach, a web-based prototype is also presented. A preliminary test using this prototype was carried out and showed encouraging results. The paper describes a new user-centered approach for integrating social data into groups of interest.The approach makes it possible for a group to tap into its members' social data scattered over different social network sites.The contents relevant to the group's collectively defined topics of interest are automatically extracted from these data.Each member is free to personalize his/her collaborative experience within the group.The paper also presents a working Web-based prototype supporting Facebook, Twitter and LinkedIn.
Journal Article•10.1016/J.DATAK.2015.09.002•
Temporal expression extraction with extensive feature type selection and a posteriori label adjustment

[...]

Michele Filannino1, Goran Nenadic•
University of Manchester1
1 Nov 2015
TL;DR: It is shown that the use of WordNet-based features in the identification task negatively affects the overall performance, and that there is no statistically significant difference in the results based on gazetteers, shallow parsing and propositional noun phrases labels on top of the morpho-lexical features.
Abstract: The automatic extraction of temporal information from written texts is pivotal for many Natural Language Processing applications such as question answering, text summarisation and information retrieval. It allows to filter information and infer temporal flows of events.This paper presents ManTIME, a general domain temporal expression identification and normalisation system, and systematically explores the impact of different features and training corpora on the performance. The identification phase combines the use of conditional random fields along with a post-processing pipeline, whereas the normalisation phase is carried out using NorMA, an open-source rule-based temporal normaliser.We investigate the performance variation with respect to different feature types. Specifically, we show that the use of WordNet-based features in the identification task negatively affects the overall performance, and that there is no statistically significant difference in the results based on gazetteers, shallow parsing and propositional noun phrases labels on top of the morpho-lexical features. We also show that the use of silver data (alone or in addition to the human-annotated ones) does not improve the performance.We evaluate six combinations of training data and post-processing pipeline with respect to the TempEval-3 benchmark test set. The best run achieved 0.95 (precision), 0.85 (recall) and 0.90 (Fβ=1) in the identification phase. Normalisation accuracies are 0.86 (for type attribute) and 0.77 (for value attribute).The proposed approach ranked 3rd in the TempEval-3 challenge (task A) as the best performing machine learning-based system among 21 participants.
Journal Article•10.1016/J.DATAK.2015.07.006•
Empirical evidence for the usefulness of Armstrong tables in the acquisition of semantically meaningful SQL constraints

[...]

Van Lam Le1, Sebastian Link2, Flavio Ferrarotti•
Victoria University of Wellington1, University of Auckland2
1 Jul 2015
TL;DR: Using new empirical measures, extensive experiments confirm that users of Armstrong tables are likely to recognize domain semantics they would overlook otherwise and complement existing schema design methodologies in producing quality schemata that process data efficiently.
Abstract: SQL schema designs result from methodologies such as UML, Entity-Relationship models, description logics, or relational normalization. Independently of the methodology, sample data is promoted by academia and industry to consolidate the schema designs produced. SQL constraints are an abstract standard-compliant encoding of the designers' perception about the semantics of an application domain. Armstrong tables can visualize SQL constraints concisely, in the sense that they satisfy all constraints perceived meaningful and violate all constraints perceived meaningless. Using new empirical measures we investigate how Armstrong tables help design teams recognize domain semantics. Extensive experiments confirm that users of Armstrong tables are likely to recognize domain semantics they would overlook otherwise. Armstrong tables therefore complement existing schema design methodologies in producing quality schemata that process data efficiently.
Journal Article•10.1016/J.DATAK.2015.06.005•
A conceptual modeling framework for network analytics

[...]

Qing Wang1•
Australian National University1
1 Sep 2015
TL;DR: This paper discusses how the semantics of network analysis queries can be modeled at the conceptual level, and explores three possible application areas of using this analytical framework for network analysis applications: governing semantic integrity, improving analysis efficiency, and supporting network dynamics.
Abstract: In this paper we propose a conceptual modeling framework for network analysis applications. Within this framework, a data model called the Network Analytics ER model (NAER) is developed, which enables us to manage and analyze network data in a unified way. In particular, not only data requirements but also query requirements can be captured by the conceptual description of network analysis applications. This unified view provides us a flexible platform to build a number of topology schemas upon the underlying core schema for supporting network analysis queries. We also discuss how the semantics of network analysis queries can be modeled at the conceptual level, and explore three possible application areas of using our analytical framework for network analysis applications: (1) governing semantic integrity, (2) improving analysis efficiency, and (3) supporting network dynamics. We believe that conceptual modeling can play an important role in managing and analyzing network data, and contribute to the development of network analytics.
Journal Article•10.1016/J.DATAK.2015.06.011•
An approach to website schema.org design

[...]

Albert Tort1, Antoni Olivé1•
Polytechnic University of Catalonia1
1 Sep 2015
TL;DR: This paper describes an approach to the design of a website schema.org by using a human-computer task-oriented dialogue, whose purpose is to arrive at that design and proposes a dialogue generator that is domain independent but that can be adapted to specific domains.
Abstract: Schema.org offers to web developers the opportunity to enrich a website's content with microdata and schema.org. For large websites, implementing microdata can take a lot of time. In general, it is necessary to perform two main activities, for which we lack methods and tools. The first consists in designing what we call the website schema.org, which is the fragment of schema.org that is relevant to the website. The second consists in adding the corresponding microdata tags to the web pages. In this paper, we describe an approach to the design of a website schema.org. The approach consists in using a human-computer task-oriented dialogue, whose purpose is to arrive at that design. We describe a dialogue generator that is domain independent but that can be adapted to specific domains. We propose a set of six evaluation criteria that we use to evaluate our approach and that could be used in future approaches.
Journal Article•10.1016/J.DATAK.2015.06.003•
Exploiting semantics for XML keyword search

[...]

Thuy Ngoc Le1, Zhifeng Bao2, Tok Wang Ling1•
National University of Singapore1, RMIT University2
1 Sep 2015
TL;DR: This paper proposes a new semantics, called CR (Common Relative) for XML keyword search, which can return answers independent from schema designs and discovers properties of common relative and proposes an efficient algorithms.
Abstract: XML keyword search has attracted a lot of interests with typical search based on lowest common ancestor (LCA). However, in this paper, we show several problems of the LCA-based approaches, including meaningless answers, incomplete answers, duplicated answers, missing answers, and schema-dependent answers. To handle these problems, we exploit the semantics of object, object identifier, relationship, and attribute (referred to as the ORA-semantics). Based on the ORA-semantics, we introduce new ways of labeling and matching. More importantly, we propose a new semantics, called CR (Common Relative) for XML keyword search, which can return answers independent from schema designs. To find answers based on the CR semantics, we discover properties of common relative and propose an efficient algorithms. Experimental results show the seriousness of the problems of the LCA-based approaches. They also show that the CR semantics possesses the properties of completeness, soundness and independence while the response time of our approach is faster than the LCA-based approaches thanks to our techniques.
Journal Article•10.1016/J.DATAK.2015.04.002•
Discovery of pathways in protein-protein interaction networks using a genetic algorithm

[...]

Hoai Anh Nguyen1, Cong Long Vu1, Minh Phuong Tu2, Thu Lam Bui1•
Le Quy Don Technical University1, Posts and Telecommunications Institute of Technology2
1 Mar 2015
TL;DR: A method for orienting protein-protein interaction networks (PPIs) and discovering pathways and a genetic algorithm is designed to find the solution for the problem taking into account the problem's characteristics is proposed.
Abstract: Biological pathways have played an important role in understanding cell activities and evolution. In order to find these pathways, it is necessary to orient protein-protein interactions, which are usually given in forms of undirected networks or graphs. Previous findings indicate that orienting protein interactions can improve the process of pathway discovery. However, assigning orientation for protein interactions is a combinatorial optimization problem which has been proved to be NP-hard, making it critical to develop efficient algorithms.This paper proposes a method for orienting protein-protein interaction networks (PPIs) and discovering pathways. For our proposal, the mathematical model of the problem is given and then a genetic algorithm is designed to find the solution for the problem taking into account the problem's characteristics. We conducted multiple runs on the data of yeast PPI networks to test the best option for the problem. The obtained results were compared with a well-known algorithm (ROLS), which was shown to be the best in dealing with this problem, in terms of the run time, fitness function values, and especially the ratio of matching gold standard pathways. The results show the good performance of our approach in addressing this problem.
Journal Article•10.1016/J.DATAK.2015.06.010•
Approximate and selective reasoning on knowledge graphs

[...]

André Freitas1, João Carlos Pereira da Silva2, Edward Curry3, Paul Buitelaar4•
University of Passau1, Federal University of Rio de Janeiro2, National University of Ireland3, University of South Africa4
1 Nov 2015
TL;DR: A selective graph navigation mechanism based on a distributional relational semantic model which can be applied to querying and reasoning over heterogeneous knowledge bases (KBs) and is evaluated using ConceptNet as a commonsense KB, and achieves high selectivity, highSelectivity scalability and high accuracy in the selection of meaningful navigational paths.
Abstract: Tasks such as question answering and semantic search are dependent on the ability of querying and reasoning over large-scale commonsense knowledge bases (KBs). However, dealing with commonsense data demands coping with problems such as the increase in schema complexity, semantic inconsistency, incompleteness and scalability. This paper proposes a selective graph navigation mechanism based on a distributional relational semantic model which can be applied to querying and reasoning over heterogeneous knowledge bases (KBs). The approach can be used for approximative reasoning, querying and associational knowledge discovery. In this paper we focus on commonsense reasoning as the main motivational scenario for the approach. The approach focuses on addressing the following problems: (i) providing a semantic selection mechanism for facts which are relevant and meaningful in a specific reasoning and querying context and (ii) allowing coping with information incompleteness in large KBs. The approach is evaluated using ConceptNet as a commonsense KB, and achieved high selectivity, high selectivity scalability and high accuracy in the selection of meaningful navigational paths. Distributional semantics is also used as a principled mechanism to cope with information incompleteness.
Journal Article•10.1016/J.DATAK.2015.04.005•
Towards richer rule languages with polynomial data complexity for the Semantic Web

[...]

Linh Anh Nguyen1, Thi-Bich-Loc Nguyen2, Andrzej Szałas3•
University of Warsaw1, University of the Sciences2, Linköping University3
1 Mar 2015
TL;DR: A Horn description logic called Horn-DL is introduced, which is strictly and essentially richer than Horn, and allows a form of the concept constructor "universal restriction" to appear at the left hand side of terminological inclusion axioms.
Abstract: We introduce a Horn description logic called Horn-DL, which is strictly and essentially richer than Horn ? R eg I, Horn ? S H I Q and Horn ? S R O I Q , while still has PTime data complexity. In comparison with Horn ? S R O I Q , Horn-DL additionally allows the universal role and assertions of the form i r r e fl e x i v e s , ?s(a, b), a ? ? b . More importantly, in contrast to all the well-known Horn fragments E L , DL-Lite, DLP, Horn ? S H I Q , and Horn ? S R O I Q of description logics, Horn-DL allows a form of the concept constructor "universal restriction" to appear at the left hand side of terminological inclusion axioms. Namely, a universal restriction can be used in such places in conjunction with the corresponding existential restriction. We develop the first algorithm with PTime data complexity for checking satisfiability of Horn-DL knowledge bases.
Journal Article•10.1016/J.DATAK.2015.07.005•
Improving conceptual data models through iterative development

[...]

Tilmann Zäschke1, Stefania Leone, Tobias Gmünder1, Moira C. Norrie1•
ETH Zurich1
1 Jul 2015
TL;DR: The concept of evolvability as a model quality characteristic is introduced and why the quality of conceptual models can generally benefit from profiling and how performance measurements convey semantic information is discussed.
Abstract: Agile methods promote iterative development with short cycles, where user feedback from the previous iteration is used to refactor and improve the current version. To facilitate agile development of information systems, this paper offers three contributions. First, we introduce the concept of evolvability as a model quality characteristic. Evolvability refers to the expected implications of future model refactorings, both in terms of complexity of the required database evolution algorithm and in terms of the expected volume of data to evolve. Second, we propose extending the agile development cycle by using database profiling information to suggest adaptations to the conceptual model to improve performance. For every software release, the database profiler identifies and analyses navigational access patterns, and proposes model optimisations based on data characteristics, access patterns and a cost-benefit model. Based on an experimental evaluation of the profiler we discuss why the quality of conceptual models can generally benefit from profiling and how performance measurements convey semantic information. Third, we discuss the flow of semantic information when developing and using information systems.Beyond these contributions, we also make a case for using object databases in agile development environments. However, most of the presented concepts are also applicable to other database paradigms.
Journal Article•10.1016/J.DATAK.2015.01.001•
Efficient repair of dimension hierarchies under inconsistent reclassification

[...]

Monica Caniupan1, Alejandro A. Vaisman2, Raúl Arredondo•
University of the Bío Bío1, Instituto Tecnológico de Buenos Aires2
1 Jan 2015
TL;DR: It is shown that, although in the general case finding an r-repair is NP-complete, for real-world hierarchy schemas, computing such repairs can be done in polynomial time.
Abstract: On-Line Analytical Processing (OLAP) dimensions are usually modeled as a set of elements connected by a hierarchical relationship. To ensure summarizability, a dimension is required to be strict, that is, every element of the dimension must have a unique ancestor in each of its ancestor categories. In practice, elements in a dimension are often reclassified, meaning that their rollups are changed. After this operation the dimension may become non-strict. To fix this problem, we propose to compute a set of minimal r-repairs for the new non-strict dimension. Each minimal r-repair is a strict dimension that keeps the result of the reclassification, and is obtained by performing a minimum number of insertions and deletions to the dimension graph. We show that, although in the general case finding an r-repair is NP-complete, for real-world hierarchy schemas, computing such repairs can be done in polynomial time. Further, we propose efficient heuristic-based algorithms for computing r-repairs, and discuss their computational complexity. We also perform experiments over synthetic and real-world dimensions to show the plausibility of our approach.

Tools

SciSpace AgentBiomedical AgentSciSpace RecruitSciSpace for EnterpriseAgent GalleryChat with PDFLiterature ReviewAI WriterFind TopicsParaphraserCitation GeneratorExtract DataAI DetectorCitation Booster

Learn

ResourcesLive Workshops

SciSpace

CareersSupportBrowse PapersPricingSciSpace Affiliate ProgramCancellation & Refund PolicyTermsPrivacyData Sources

Directories

PapersTopicsJournalsAuthorsConferencesInstitutionsCitation StylesWriting templates

Extension & Apps

SciSpace Chrome ExtensionSciSpace Mobile App

Contact

support@scispace.com
SciSpace

© 2026 | PubGenius Inc. | Suite # 217 691 S Milpitas Blvd Milpitas CA 95035, USA

soc2
Secured by Delve