Scispace (Formerly Typeset)
  1. Home
  2. Conferences
  3. Data and Knowledge Engineering
  4. 2017
  1. Home
  2. Conferences
  3. Data and Knowledge Engineering
  4. 2017
Showing papers presented at "Data and Knowledge Engineering in 2017"
Journal Article•10.1016/J.DATAK.2017.01.001•
Big data technologies and Management

[...]

Veda C. Storey1, Il-Yeol Song2•
J. Mack Robinson College of Business1, Drexel University2
1 Mar 2017
TL;DR: The five Vs of big data, volume, velocity, variety, veracity, and value, are reviewed, as well as new technologies, including NoSQL databases that have emerged to accommodate the needs ofbig data initiatives.
Abstract: The era of big data has resulted in the development and applications of technologies and methods aimed at effectively using massive amounts of data to support decision-making and knowledge discovery activities. In this paper, the five Vs of big data, volume, velocity, variety, veracity, and value, are reviewed, as well as new technologies, including NoSQL databases that have emerged to accommodate the needs of big data initiatives. The role of conceptual modeling for big data is then analyzed and suggestions made for effective conceptual modeling efforts with respect to big data.

264 citations

Journal Article•10.1016/J.DATAK.2017.06.001•
Learning multiple layers of knowledge representation for aspect based sentiment analysis

[...]

Duc-Hong Pham1, Duc-Hong Pham2, Anh-Cuong Le3•
Electric Power University1, University of Engineering and Technology, Lahore2, Ton Duc Thang University3
15 Jun 2017
TL;DR: A novel multi-layer architecture for representing customer reviews that outperforms the well-known methods in previous studies on aspect-based sentiment analysis and generates the aspect ratings as well as aspect weights.
Abstract: Sentiment Analysis is the task of automatically discovering the exact sentimental ideas about a product (or service, social event, etc.) from customer textual comments (i.e. reviews) crawled from various social media resources. Recently, we can see the rising demand of aspect-based sentiment analysis, in which we need to determine sentiment ratings and importance degrees of product aspects. In this paper we propose a novel multi-layer architecture for representing customer reviews. We observe that the overall sentiment for a product is composed from sentiments of its aspects, and in turn each aspect has its sentiments expressed in related sentences which are also the compositions from their words. This observation motivates us to design a multiple layer architecture of knowledge representation for representing the different sentiment levels for an input text. This representation is then integrated into a neural network to form a model for prediction of product overall ratings. We will use the representation learning techniques including word embeddings and compositional vector models, and apply a back-propagation algorithm based on gradient descent to learn the model. This model consequently generates the aspect ratings as well as aspect weights (i.e. aspect importance degrees). Our experiment is conducted on a data set of reviews from hotel domain, and the obtained results show that our model outperforms the well-known methods in previous studies.

143 citations

Journal Article•10.1016/J.DATAK.2017.11.003•
A Guidelines framework for understandable BPMN models

[...]

Flavio Corradini1, Alessio Ferrari2, Fabrizio Fornari1, Stefania Gnesi2, Andrea Polini1, Barbara Re1, Giorgio Oronzo Spagnolo2 •
University of Camerino1, Istituto di Scienza e Tecnologie dell'Informazione2
28 Nov 2017
TL;DR: A set of fifty guidelines that can help modelers to improve the understandability of their models are provided, focused on the Business Process Modelling Notation 2.0 standard published by the Object Management Group.
Abstract: Business process modeling allows abstracting and reasoning on how work is structured within complex organizations. Business process models represent blueprints that can serve different purposes for a variety of stakeholders. For example, business analysts can use these models to better understand how the organization works; employees playing a role in the process can use them to learn the tasks that they are supposed to perform; software analysts/developers can refer to the models to understand the system-as-is before designing the system-to-be. Given the variety of stakeholders that need to interpret these models, and considering the pivotal function that models play within organizations, understandability becomes a fundamental quality that need to be taken into particular account by modelers. In this paper we provide a set of fifty guidelines that can help modelers to improve the understandability of their models. The work focuses on the Business Process Modelling Notation 2.0 standard published by the Object Management Group, which has acquired a clear predominance among the modeling notations for business processes. Guidelines were derived by means of a thoughtful literature review – which allowed identifying around one hundred guidelines – and through successive activities of synthesis and homogenization. In addition, we implemented a freely available open source tool, named B EBoP (understandaBility vErifier for Business Process models), to check the adherence of a model to the guidelines. Finally, guidelines violation has been checked with B EBoP on a dataset of 11,294 models available in a publicly accessible repository. Our tests show that, although the majority of the guidelines are respected by the models, some guidelines, which are recognized as fundamental by the literature, are frequently violated.

93 citations

Journal Article•10.1016/J.DATAK.2017.03.009•
An adaptable fine-grained sentiment analysis for summarization of multiple short online reviews

[...]

Reinald Kim Amplayo1, Min Song1•
Yonsei University1
1 Jul 2017
TL;DR: Results show that the sentiment classifier outperforms baseline models and industry-standard classifiers while the aspect extractor outperforms other topic models in terms of aspect diversity and aspect extracting power.
Abstract: In this study, we present a novel method in generating summaries of multiple online reviews using a fine-grained sentiment extraction model for short texts, which is adaptable to different domains and languages. Adaptability of a model is defined as its ability to be easily modified and be usable on different domains and languages. This is important because of the diversity of domains and languages available. The fine-grained sentiment extraction model is divided into two methods: sentiment classification and aspect extraction. The sentiment classifier is built using a three-level classification approach, while the aspect extractor is built using extended biterm topic model (eBTM), an extension of LDA topic model for short texts. Overall, results show that the sentiment classifier outperforms baseline models and industry-standard classifiers while the aspect extractor outperforms other topic models in terms of aspect diversity and aspect extracting power. In addition, using the Naver movies dataset, we show that online review summarization can be effectively constructed using the proposed methods by comparing the results of our method and the results of a movie awards ceremony.

64 citations

Journal Article•10.1016/J.DATAK.2017.03.002•
Multi-level ontology-based conceptual modeling

[...]

Victorio Albani de Carvalho1, Victorio Albani de Carvalho2, João Paulo A. Almeida1, Claudenir M. Fonseca1, Giancarlo Guizzardi3, Giancarlo Guizzardi1 •
Universidade Federal do Espírito Santo1, International Foundation for Electoral Systems2, Free University of Bozen-Bolzano3
1 May 2017
TL;DR: The UFO-MLT combination serves as a foundation for conceptual models that can benefit from the ontological distinctions of UFO as well as MLT's basic concepts and patterns for multi-level modeling.
Abstract: Since the late 1980s, there has been a growing interest in the use of foundational ontologies to provide a sound theoretical basis for the discipline of conceptual modeling. This has led to the development of ontology-based conceptual modeling techniques whose modeling primitives reflect the conceptual categories defined in a foundational ontology. The ontology-based conceptual modeling language OntoUML, for example, incorporates the distinctions underlying the taxonomy of types in the Unified Foundational Ontology (UFO) (e.g., kinds, phases, roles, mixins, etc.). This approach has focused so far on the support to types whose instances are individuals in the subject domain, with no provision for types of types (or categories of categories). In this paper we address this limitation by extending the Unified Foundational Ontology with the MLT multi-level theory. The UFO-MLT combination serves as a foundation for conceptual models that can benefit from the ontological distinctions of UFO as well as MLT's basic concepts and patterns for multi-level modeling. We discuss the impact of the extended foundation to multi-level conceptual modeling.

50 citations

Journal Article•10.1016/J.DATAK.2017.07.005•
A secure kNN query processing algorithm using homomorphic encryption on outsourced database

[...]

Hyeong-Il Kim1, Hyeong-Jin Kim2, Jae-Woo Chang2•
Agency for Defense Development1, Chonbuk National University2
22 Jul 2017
TL;DR: This paper proposes a new secure k-nearest neighbor query processing algorithm that guarantees the confidentiality of both encrypted data and users’ query records and devise an encrypted index search scheme that performs data filtering without revealing data access patterns.
Abstract: With the adoption of cloud computing, database outsourcing has emerged as a new platform. Due to the serious privacy concerns associated with cloud computing, databases must be encrypted before being outsourced to the cloud. Therefore, various k-nearest neighbor (kNN) query processing techniques have been proposed for encrypted databases. However, existing schemes are either insecure or inefficient. In this paper, we propose a new secure kNN query processing algorithm. Our algorithm guarantees the confidentiality of both encrypted data and users’ query records. To achieve a high level of query processing efficiency, we also devise an encrypted index search scheme that performs data filtering without revealing data access patterns. A performance analysis shows that the proposed scheme outperforms the existing scheme in terms of query processing costs while preserving data privacy.

43 citations

Journal Article•10.1016/J.DATAK.2016.12.004•
Specification and derivation of key performance indicators for business analytics

[...]

Alejandro Mat1, Juan Trujillo1, John Mylopoulos2•
University of Alicante1, University of Trento2
1 Mar 2017
TL;DR: An approach that provides decision makers with an integrated view of strategic business objectives and conceptual data warehouse KPIs is proposed that links strategic business models to the data for monitoring and assessing them and enables the user to analyze data subspaces from a strategic point of view.
Abstract: Key Performance Indicators (KPI) measure the performance of an enterprise relative to its objectives thereby enabling corrective action where there are deviations. In current practice, KPIs are manually integrated within dashboards and scorecards used by decision makers. This practice entails various shortcomings. First, KPIs are not related to their business objectives and strategy. Consequently, decision makers often obtain a scattered view of the business status and business concerns. Second, while KPIs are defined by decision makers, their implementation is performed by IT specialists. This often results in discrepancies that are difficult to identify. In this paper, we propose an approach that provides decision makers with an integrated view of strategic business objectives and conceptual data warehouse KPIs. The main benefit of our proposal is that it links strategic business models to the data for monitoring and assessing them. In our proposal, KPIs are defined using a modeling language where decision makers specify KPIs using business terminology, but can also perform quick modifications and even navigate data while maintaining a strategic view. This enables monitoring and what-if analysis, thereby helping analysts to compare expectations with reported results. HighlightsNovel approach for conceptualizing and specifying Key Performance Indicators.Transforms strategic models into analytic tools to aid in decision making.Enables the user to analyze data subspaces from a strategic point of view.Based on the Semantics for Business Vocabulary and Rules specification.Implemented to support the whole process from definition to data extraction.

42 citations

Journal Article•10.1016/J.DATAK.2017.11.002•
An Information-Theoretic Filter Approach for Value Weighted Classification Learning in Naive Bayes

[...]

Chang-Hwan Lee1•
Dongguk University1
1 Nov 2017
TL;DR: The experimental results show that the value weighting method could improve the performance of naive Bayes significantly and is compared with that of some other traditional methods for a number of datasets.
Abstract: Assigning weights in features has been an important topic in some classification learning algorithms. In this paper, we propose a new paradigm of assigning weights in classification learning, called value weighting method. While the current weighting methods assign a weight to each feature, we assign a different weight to the values of each feature. The performance of naive Bayes learning with value weighting method is compared with that of some other traditional methods for a number of datasets. The experimental results show that the value weighting method could improve the performance of naive Bayes significantly.

37 citations

Journal Article•10.1016/J.DATAK.2017.03.008•
Ontology-based context modeling in service-oriented computing: A systematic mapping

[...]

Oscar Cabrera1, Xavier Franch1, Jordi Marco1•
Polytechnic University of Catalonia1
1 Jul 2017
TL;DR: A sweeping view on the anatomy of context models may help avoiding the postulation of new proposals not aligned with the current research.
Abstract: Context Service-oriented computing and context-aware computing are two consolidated paradigms that are changing the way of providing and consuming software services. Whilst service-oriented computing is based on service-oriented architectures for providing flexible software services, context-aware computing articulates different phases of a context life cycle for changing the behavior of such services. The synergy between both paradigms provides the context to this study. Objective This study analyzes the current state of the art of context models, specifically: (1) which are these proposals and how are they related; (2) what are their structural characteristics; (3) what context information is the most addressed; and (4) what are their most consolidated definitions. Given their dominance on the field, the study focuses on ontology-based approaches. Method We conducted a systematic mapping by establishing a review protocol that integrates automatic and manual searches from different sources. We applied a rigorous method to elicit the keywords from the research questions and selection criteria to retrieve the papers to evaluate. Results Overall, 138 primary studies were selected to answer our research questions. These proposals were studied in depth by analyzing: 1) distribution along time and their relationships; 2) size correlated with the number of classes and levels of the context model, and coverage of the definitions provided as indicator of quality provided; 3) most addressed context information; 4) most consolidated definitions of context information. Conclusions The contribution of this survey is to make available a unified and consolidated body of knowledge on context for service-oriented computing that could be instantiated and used as starting point in a variety of use cases. This sweeping view on the anatomy of context models may help avoiding the postulation of new proposals not aligned with the current research.

32 citations

Journal Article•10.1016/J.DATAK.2017.08.003•
A Fine‐Grained Distribution Approach for ETL Processes in Big Data Environments

[...]

Mahfoud Bala, Omar Boussaid1, Zaia Alimazighi•
University of Lyon1
1 Sep 2017
TL;DR: A new fine-grained parallelization/distribution approach for populating the Data Warehouse (DW) by employing 25 to 38 parallel tasks enables the novel approach to speed up the ETL process by up to 33% with the improvement rate being linear.
Abstract: Among the so-called “4Vs” (volume, velocity, variety, and veracity) that characterize the complexity of Big Data, this paper focuses on the issue of “ Volume ” in order to ensure good performance for Extracting-Transforming-Loading (ETL) processes. In this study, we propose a new fine-grained parallelization/distribution approach for populating the Data Warehouse (DW). Unlike prior approaches that distribute the ETL only at coarse-grained level of processing, our approach provides different ways of parallelization/distribution both at process, functionality and elementary functions levels. In our approach, an ETL process is described in terms of its core functionalities which can run on a cluster of computers according to the MapReduce (MR) paradigm. The novel approach allows thereby the distribution of the ETL process at three levels: the “process” level for coarse-grained distribution and the “functionality” and “elementary functions” levels for fine-grained distribution. Our performance analysis reveals that employing 25 to 38 parallel tasks enables the novel approach to speed up the ETL process by up to 33% with the improvement rate being linear.

26 citations

Journal Article•10.1016/J.DATAK.2017.08.004•
Frequent patterns in ETL workflows: An empirical approach

[...]

Vasileios Theodorou1, Alberto Abelló1, Maik Thiele2, Wolfgang Lehner2•
Polytechnic University of Catalonia1, Dresden University of Technology2
5 Sep 2017
TL;DR: This work logically model the ETL workflows using labeled graphs and employ graph algorithms to identify candidate patterns and to recognize them on different workflows and provides a stepping stone for the automatic translation of ETL logical models to their conceptual representation and to generate fine-grained cost models at the granularity level of patterns.
Abstract: The complexity of Business Intelligence activities has driven the proposal of several approaches for the effective modeling of Extract-Transform-Load (ETL) processes, based on the conceptual abstraction of their operations. Apart from fostering automation and maintainability, such modeling also provides the building blocks to identify and represent frequently recurring patterns. Despite some existing work on classifying ETL components and functionality archetypes, the issue of systematically mining such patterns and their connection to quality attributes such as performance has not yet been addressed. In this work, we propose a methodology for the identification of ETL structural patterns. We logically model the ETL workflows using labeled graphs and employ graph algorithms to identify candidate patterns and to recognize them on different workflows. We showcase our approach through a use case that is applied on implemented ETL processes from the TPC-DI specification and we present mined ETL patterns. Decomposing ETL processes to identified patterns, our approach provides a stepping stone for the automatic translation of ETL logical models to their conceptual representation and to generate fine-grained cost models at the granularity level of patterns.
Journal Article•10.1016/J.DATAK.2017.09.001•
SummTriver: A new trivergent model to evaluate summaries automatically without human references

[...]

Luis Adrián Cabrera-Diego1, Luis Adrián Cabrera-Diego2, Juan-Manuel Torres-Moreno3, Juan-Manuel Torres-Moreno2•
Edge Hill University1, University of Avignon2, École Polytechnique de Montréal3
1 Sep 2017
TL;DR: This paper presents SummTriver, an automatic evaluation method that tries to be more correlated to manual evaluation by using multiple divergences, and the results are promising, especially for summarization campaigns.
Abstract: The automatic evaluation of summaries is a hard task that continues to be open. The assessment aims to measure simultaneously the informativeness and readability of summaries. The scientific community has tackled this problem with partial solutions, in terms of informativeness, using ROUGE. However, to use this method, it is necessary to have multiple summaries made by humans (the references). Methods without human references have been implemented, but there are still far from being highly correlated to manual evaluations. In this paper we present SummTriver, an automatic evaluation method that tries to be more correlated to manual evaluation by using multiple divergences. The results are promising, especially for summarization campaigns. Besides this, we also present an interesting analysis, at micro-level, of how correlated the manual and automatic summaries evaluation methods are, when we make use of a large quantity of observations.
Journal Article•10.1016/J.DATAK.2017.07.008•
Social emotion classification based on noise-aware training

[...]

Xin Li1, Yanghui Rao1, Haoran Xie2, Xuebo Liu1, Tak-Lam Wong2, Fu Lee Wang3 •
Sun Yat-sen University1, University of Hong Kong2, Caritas Institute of Higher Education3
21 Jul 2017
TL;DR: This work proposes a new architecture named PCNN, which utilizes two cascading convolutional layers to model the word-phrase relation and the phrase-sentence relation, and presents a Bayesian-based model named WMCM to learn document-level semantic features.
Abstract: Social emotion classification draws many natural language processing researchers’ attention in recent years, since analyzing user-generated emotional documents on the Web is quite useful in recommending products, gathering public opinions, and predicting election results. However, the documents that evoke prominent social emotions are usually mixed with noisy instances, and it is also challenging to capture the textual meaning of short messages. In this work, we focus on reducing the impact of noisy instances and learning a better representation of sentences. For the former, we introduce an “emotional concentration” indicator, which is derived from emotional ratings to weight documents. For the latter, we propose a new architecture named PCNN, which utilizes two cascading convolutional layers to model the word-phrase relation and the phrase-sentence relation. This model regards continuous tokens as phrases based on an assumption that neighboring words are very likely to have internal relations, and semantic feature vectors are generated based on the phrase representation. We also present a Bayesian-based model named WMCM to learn document-level semantic features. Both PCNN and WMCM classify social emotions by capturing semantic regularities in language. Experiments on two real-world datasets indicate that the quality of learned semantic vectors and the performance of social emotion classification can be improved by our models.
Journal Article•10.1016/J.DATAK.2017.08.001•
Graph based knowledge discovery using MapReduce and SUBDUE algorithm

[...]

Sirisha Velampalli1, Murthy V. Jonnalagedda1•
University College of Engineering1
1 Sep 2017
TL;DR: This work aims to show how skills data from resumes is modelled into a variant of graph data structure called conceptual graph using MapReduce programming model, which is able to extract common skill-sets.
Abstract: Knowledge Discovery is the process of extracting useful and hidden information. Extracting knowledge from data represented in the form of graphs is emerging in this new generation. Graphs are used to model and solve many real world problems. In this work, we aim to show how skills data from resumes is modelled into a variant of graph data structure called conceptual graph using MapReduce programming model. Resumes are taken as data source because they are the ones containing skill-sets of candidates. Initial storage and pre-processing is done in a big data framework using Hadoop Distributed File System (HDFS ) and MapReduce. SUB Structure Discovery Using Examples (SUBDUE), a popular graph mining algorithm is used for retrieving common skill-sets. The results obtained from real-world dataset of resumes clearly demonstrate the potential of graph mining algorithms in skill set analytics. Proposed approach is able to extract common skill-sets. Common skill-set extraction is useful for course curriculum designers as well as job seekers.
Journal Article•10.1016/J.DATAK.2017.09.002•
QETL: An approach to on-demand ETL from non-owned data sources

[...]

Lorenzo Baldacci1, Matteo Golfarelli1, Simone Graziani1, Stefano Rizzi1•
University of Bologna1
1 Nov 2017
TL;DR: The experimental tests show that QETL effectively reuses data to cut extraction costs, thus leading to significant performance improvements, and is proposed to feed a multidimensional cube.
Abstract: In traditional OLAP systems, the ETL process loads all available data in the data warehouse before users start querying them. In some cases, this may be either inconvenient (because data are supplied from a provider for a fee) or unfeasible (because of their size); on the other hand, directly launching each analysis query on source data would not enable data reuse, leading to poor performance and high costs. The alternative investigated in this paper is that of fetching and storing data on-demand, i.e., as they are needed during the analysis process. In this direction we propose the Query-Extract-Transform-Load (QETL) paradigm to feed a multidimensional cube; the idea is to fetch facts from the source data provider, load them into the cube only when they are needed to answer some OLAP query, and drop them when some free space is needed to load other facts. Remarkably, QETL includes an optimization step to cheaply extract the required data based on the specific features of the data provider. The experimental tests, made on a real case study in the genomics area, show that QETL effectively reuses data to cut extraction costs, thus leading to significant performance improvements.
Journal Article•10.1016/J.DATAK.2017.07.003•
Automatically classifying source code using tree-based approaches

[...]

Anh Viet Phan1, Phuong Ngoc Chau, Minh-Le Nguyen, Lam Thu Bui1•
Le Quy Don Technical University1
27 Jul 2017
TL;DR: This paper proposes two combination models between a tree-based convolutional neural network and k-Nearest Neighbors, support vector machines to exploit both structural and semantic ASTs' information to solve software engineering problems by exploring information of programs' abstract syntax trees (ASTs) instead of software metrics.
Abstract: Analyzing source code to solve software engineering problems such as fault prediction, cost, and effort estimation always receives attention of researchers as well as companies. The traditional approaches are based on machine learning, and software metrics obtained by computing standard measures of software projects. However, these methods have faced many challenges due to limitations of using software metrics which were not enough to capture the complexity of programs. To overcome the limitations, this paper aims to solve software engineering problems by exploring information of programs' abstract syntax trees (ASTs) instead of software metrics. We propose two combination models between a tree-based convolutional neural network (TBCNN) and k-Nearest Neighbors (kNN), support vector machines (SVMs) to exploit both structural and semantic ASTs' information. In addition, to deal with high-dimensional data of ASTs, we present several pruning tree techniques which not only reduce the complexity of data but also enhance the performance of classifiers in terms of computational time and accuracy. We survey many machine learning algorithms on different types of program representations including software metrics, sequences, and tree structures. The approaches are evaluated based on classifying 52000 programs written in C language into 104 target labels. The experiments show that the tree-based classifiers dramatically achieve high performance in comparison with those of metrics-based or sequences-based; and two proposed models TBCNN + SVM and TBCNN + kNN rank as the top and the second classifiers. Pruning redundant AST branches leads to not only a substantial reduction in execution time but also an increase in accuracy.
Journal Article•10.1016/J.DATAK.2017.06.006•
A natural language interface to a graph-based bibliographic information retrieval system

[...]

Yongjun Zhu1, Erjia Yan1, Il-Yeol Song1•
Drexel University1
1 Sep 2017
TL;DR: This paper proposes a novel customized natural language processing framework that integrates a few original algorithms/heuristics for interpreting and analyzing bibliographic queries and shows that the proposed framework and natural language interface provide a practical solution for building real-world bibliographical information retrieval systems.
Abstract: With the ever-increasing volume of scientific literature, there is a need for a natural language interface to bibliographic information retrieval systems to retrieve relevant information effectively. In this paper, we propose one such interface, NLI-GIBIR, which allows users to search for a variety of bibliographic data through natural language. NLI-GIBIR makes use of a novel framework applicable to graph-based bibliographic information retrieval systems in general. This framework incorporates algorithms/heuristics for interpreting and analyzing natural language bibliographic queries via a series of text- and linguistic-based techniques, including tokenization, named entity recognition, and syntactic analysis. We find that our framework, as implemented in NLI-GIBIR, can effectively represent and address complex bibliographic information needs. Thus, the contributions of this paper are as follows: First, to our knowledge, it is the first attempt to propose a natural language interface for graph-based bibliographic information retrieval. Second, we propose a novel customized natural language processing framework that integrates a few original algorithms/heuristics for interpreting and analyzing bibliographic queries. Third, we show that the proposed framework and natural language interface provide a practical solution for building real-world bibliographic information retrieval systems. Our experimental results show that the presented system can correctly answer 39 out of 40 example natural language queries with varying lengths and complexities.
Journal Article•10.1016/J.DATAK.2017.10.002•
Multi-View Fuzzy Information Fusion in Collaborative Filtering Recommender Systems: Application to the Urban Resilience Domain

[...]

Iván Palomares1, Fiona Browne2, Peadar Davis2•
University of Bristol1, Ulster University2
24 Oct 2017
TL;DR: A hybrid framework which combines a collaborative filtering recommendation system with fuzzy decision-making approaches (based on the use of aggregation functions) to improve the accuracy of domain-specific recommendations is proposed.
Abstract: Recommender systems play an increasingly important role in on-line web services for the personalization and recommendation of content to individual users. The quantity and quality of user-based information has progressed presenting the opportunity to further tailor recommendations to users based on feature view integration. In this work, we propose a hybrid framework which combines a collaborative filtering recommendation system with fuzzy decision-making approaches (based on the use of aggregation functions) to improve the accuracy of domain-specific recommendations. We extend upon the classical, neighborhood-based collaborative filtering process by conflating preference information with user-profile data in the recommendation process. This is performed using intelligent information fusion techniques whereby Ordered Weighted Averaging (OWA) operators and uninorm aggregation functions are implemented in the fusion of multiple views of pairwise similarity degrees between users. To address the shortcoming of generating sensible recommendations to cold users, we incorporate a novel weighting scheme based on fuzzy set modeling within the uninorm-based aggregation of similarity views. We finally outline the application of the proposed approach through an empirical study based in the Urban Resilience domain, along with an example to movie recommendation.
Journal Article•10.1016/J.DATAK.2016.12.002•
Improving the efficiency of NSGA-II based ontology aligning technology

[...]

Xingsi Xue1, Xingsi Xue2, Yuping Wang2•
Fujian University of Technology1, Xidian University2
1 Mar 2017
TL;DR: The experiment results show that, comparing with the approach by using NSGA-II solely, the utilization of Dynamic Alignment Candidates Selection Strategy and Metamodel is able to highly reduce the time and main memory consumption of the tuning process while at the same time ensures the correctness and completeness of the alignments.
Abstract: There is evidence from Ontology Alignment Evaluation Initiative (OAEI) that ontology matchers do not necessarily find the same correct correspondences. Therefore, usually several competing matchers are applied to the same pair of entities in order to increase evidence towards a potential match or mismatch. How to select the proper matcher's alignments and efficiently tune them becomes one of the challenges in ontology matching domain. To this end, in this paper, we propose to use the Dynamic Alignment Candidates Selection Strategy and Metamodel to raise the efficiency of the process of using NSGA-II to optimize the ontology alignment by prescreening the less promising aligning results to be combined and individuals to be evaluated in the NSGA-II, respectively. The experiment results show that, comparing with the approach by using NSGA-II solely, the utilization of Dynamic Alignment Candidates Selection Strategy and Metamodel is able to highly reduce the time and main memory consumption of the tuning process while at the same time ensures the correctness and completeness of the alignments. Moreover, our proposal is also more efficient than the state-of-the-art ontology aligning systems.
Journal Article•10.1016/J.DATAK.2017.03.010•
Ensuring the canonicity of process models

[...]

Henrik Leopold1, Fabian Pittke, Jan Mendling•
VU University Amsterdam1
1 Sep 2017
TL;DR: The notion of canonicity is introduced to prevent the mixing of natural language and modeling language in process models and is used to define automated techniques for detecting and refactoring activities that do not comply with it.
Abstract: Process models play an important role for specifying requirements of business-related software. However, the usefulness of process models is highly dependent on their quality. Recognizing this, researches have proposed various techniques for the automated quality assurance of process models. A considerable shortcoming of these techniques is the assumption that each activity label consistently refers to a single stream of action. If, however, activities textually describe control flow related aspects such as decisions or conditions, the analysis results of these tools are distorted. Due to the ambiguity that is associated with this misuse of natural language, also humans struggle with drawing valid conclusions from such inconsistently specified activities. In this paper, we therefore introduce the notion of canonicity to prevent the mixing of natural language and modeling language. We identify and formalize non-canonical patterns, which we then use to define automated techniques for detecting and refactoring activities that do not comply with it. We evaluated these techniques by the help of four process model collections from industry, which confirmed the applicability and accuracy of these techniques.
Journal Article•10.1016/J.DATAK.2016.12.003•
Producing relevant interests from social networks by mining users' tagging behaviour

[...]

Manel Mezghani, Andr Pninou1, Corinne Amel Zayani, Ikram Amous, Florence Sdes1 •
University of Toulouse1
1 Mar 2017
TL;DR: The originality of the approach is based on the proposal of a new technique of interests' detection by analysing the accuracy of the tagging behaviour of a user in order to figure out the tags which really reflect the content of the resources.
Abstract: Social media provides an environment of information exchange. They principally rely on their users to create content, to annotate others content and to make on-line relationships. The user activities reflect his opinions, interests, etc. in this environment. We focus on analysing this social environment to detect user interests which are the key elements for improving adaptation. This choice is motivated by the lack of information in the user profile and the inefficiency of the information issued from methods that analyse the classic user behaviour (e.g. navigation, time spent on web page, etc.). So, having to cope with an incomplete user profile, the user social network can be an important data source to detect user interests. The originality of our approach is based on the proposal of a new technique of interests' detection by analysing the accuracy of the tagging behaviour of a user in order to figure out the tags which really reflect the content of the resources. So, these tags are somehow comprehensible and can avoid tags ambiguity usually associated to these social annotations. The approach combines the tag, user and resource in a way that guarantees a relevant interests detection. The proposed approach has been tested and evaluated in the Delicious social database. For the evaluation, we compare the result issued from our approach using the tagging behaviour of the neighbours (the egocentric network and the communities) with the information yet known for the user (his profile). A comparative evaluation with the classical tag-based method of interests detection shows that the proposed approach is better.
Journal Article•10.1016/J.DATAK.2017.06.005•
Ontology-based modeling and querying of trajectory data

[...]

Marwa Manaa1, Jalel Akaichi1•
Tunis University1
1 Sep 2017
TL;DR: An ontology-based trajectory pivot model is presented that covers common structures encountered in trajectories associated with links to application and geographic modules that is intended to reduce structural heterogeneity among sources and to specify the semantics of concepts in an unambiguous way.
Abstract: With the evolution of location-sensing devices and associated technologies, mobility data driven scientific discovery approaches became an important paradigm for advanced computing performed in various central areas i.e., Internet of things and social networks. Under this paradigm, trajectory data is considered as a core revealing details of instantaneous behaviors piloted by mobile entities. This forms the need of modeling of such behaviors and the understanding of them, and actually, gave rise to different modeling approaches using either conceptual modeling or ontologies. Modeling and querying of trajectory data are still challenging because of their structural and semantic heterogeneities, and due to the complexity of establishing choices about the domain’ consensual knowledge. Ontologies are promising solutions for the above two problems seeing that they are intended to reduce structural heterogeneity among sources and to specify the semantics of concepts in an unambiguous way. In this paper, we propose a framework for a semantics oriented modeling and querying of trajectory data. We present an ontology-based trajectory pivot model that covers common structures encountered in trajectories associated with links to application and geographic modules. We validate our proposal through a case study dealing with human movement activity.
Journal Article•10.1016/J.DATAK.2017.09.003•
The Merkurion approach for similarity searching optimization in Database Management Systems

[...]

Marcos V. N. Bedo, Daniel S. Kaster, Agma J. M. Traina, Caetano Traina
23 Sep 2017
TL;DR: This article addresses a novel strategy that extends the query optimizer of any DBMS, so that it can also perform both logical and physical query plan optimizations in searches that include similarity predicates.
Abstract: Modern Database Management Systems (DBMSs) retrieve songs that resemble those in a music dataset, identify plagiarism in a set of documents, or provide past cases to physicians by taking into account the characteristics of a query exam. All such tasks require the comparison of data by similarity, which can be expressed in terms of distance-based queries in metric spaces. Traditional query processing relies mostly on histograms for describing the data distribution space and choosing a data retrieval path that quickly leads to the answer, discarding comparisons of most unwanted data. However, DBMSs still lack adequate support for selectivity estimation of query operators for data types embedded in metric spaces. This article addresses a novel strategy that extends the query optimizer of a DBMS, so that it can also perform both logical and physical query plan optimizations in searches that include similarity predicates. The proposal, named Merkurion, updates the concept of Data Distribution Space and captures data distributions according to the distances between the elements within a dataset. Moreover, it employs concise representations of such distributions, called synopses, for the definition of rules that enable similarity searching optimization. An extensive evaluation of Merkurion in real-world datasets has proven its effectiveness and broad applicability to many data domains.
Journal Article•10.1016/J.DATAK.2017.12.002•
Annotation paths for matching XML-Schemas

[...]

Julius Köpke1•
Alpen-Adria-Universität Klagenfurt1
1 Dec 2017
TL;DR: This work provides a comprehensive evaluation of the annotation method and the proposed matching algorithms using real-world schemas and reference ontologies and demonstrates the feasibility of generating executable mappings using a state of the art mapping system.
Abstract: Annotation paths are a technique for the semantic annotation of XML-Schemas. The design rationale was to develop an embedded annotation method on top of SAWSDL which is fully declarative, easily applicable and still provides the proper expressiveness for high-quality logic-based schema matching. Annotation paths capture significantly more semantics than plain model references, the declarative annotation method of the W3C standard SAWSDL. While the concept of annotation paths was introduced in earlier works, we provide a new formalization of their structure and based thereon define their semantics and introduce matching methods to derive simple and complex value correspondences. Such correspondences can be used for the generation of executable schema mappings using state of the art mapping tools. We provide a comprehensive evaluation of our annotation method and the proposed matching algorithms using real-world schemas and reference ontologies and demonstrate the feasibility of generating executable mappings using a state of the art mapping system. Our evaluations show that our annotation-based matcher achieves outstanding matching quality (avg. f-measure between 0.98 and 1.0).
Journal Article•10.1016/J.DATAK.2017.02.001•
Constructing target-aware results for keyword search on knowledge graphs

[...]

Yi Shan1, Mingda Li2, Yi Chen2•
Electronic Arts1, New Jersey Institute of Technology2
1 Jul 2017
TL;DR: This paper uses the Information Theory and develops a general probability model to infer search targets by analyzing return specifiers, modifiers, relatedness relationships, and query keywords' information gain and proposes two important properties for a target-aware result: atomicity and intactness.
Abstract: Existing work of processing keyword searches on graph data focuses on efficiency of result generation. However, being oblivious to user search intention, a query result may contain multiple instances of user search target, and multiple query results may contain information for the same instance of user search target. With the misalignment between query results and search targets, a ranking function is unable to effectively rank the instances of search targets. In this paper we propose the concept of target-aware query results driven by inferred user search intention. We leverage the Information Theory and develop a general probability model to infer search targets by analyzing return specifiers, modifiers, relatedness relationships, and query keywords' information gain. Then we propose two important properties for a target-aware result: atomicity and intactness. We develop techniques to efficiently generate target-aware results. Extensive experimental evaluation shows the effectiveness and efficiency of our approach.
Journal Article•10.1016/J.DATAK.2017.03.007•
Mining task post-conditions: Automating the acquisition of process semantics

[...]

Metta Santiputri1, Aditya Ghose1, Hoa Khanh Dam1•
Information Technology University1
1 May 2017
TL;DR: This paper presents a data-driven approach to mining and validating semantic annotations (and specifically context-independent semantic annotations) and presents an empirical evaluation, which suggests that the approach provides generally reliable results.
Abstract: Semantic annotation of business process model in the business process designs has been addressed in a large and growing body of work, but these annotations can be difficult and expensive to acquire. This paper presents a data-driven approach to mining and validating these annotations (and specifically context-independent semantic annotations). We leverage event objects in process execution histories which describe both activity execution events (typically represented as process events ) and state update events (represented as object state transition events ). We present an empirical evaluation, which suggests that the approach provides generally reliable results.
Journal Article•10.1016/J.DATAK.2017.08.002•
Thematic ranking of object summaries for keyword search

[...]

Georgios John Fakas1, Yilun Cai, Zhi Cai2, Nikos Mamoulis3•
Uppsala University1, Beijing University of Technology2, University of Ioannina3
1 Oct 2017
TL;DR: This paper argues that the effective thematic ranking of OSs should combine gracefully IR-style properties, authoritative ranking and affinity, and proposes an algorithm that computes the join efficiently, taking advantage of appropriate count statistics and compare it with baseline approaches.
Abstract: An Object Summary (OS) is a tree structure of tuples that summarizes the context of a particular Data Subject (DS) tuple. The OS has been used as a model of keyword search in relational databases; where given a set of keywords, the objective is to identify the DSs tuples relevant to the keywords and their corresponding OSs. However, a query result may return a large amount of OSs, which brings in the issue of effectively and efficiently ranking them in order to present only the most important ones to the user. In this paper, we propose a model that ranks OSs containing a set of identifying keywords (e.g., Chen ) according to their relevance to a set of thematic keywords (e.g. Mining ). We argue that the effective thematic ranking of OSs should combine gracefully IR-style properties, authoritative ranking and affinity. Our ranking problem is modeled and solved as a top-k group-by join; we propose an algorithm that computes the join efficiently, taking advantage of appropriate count statistics and compare it with baseline approaches. An experimental evaluation on the DBLP and TPC-H databases verifies the effectiveness and efficiency of our proposal.
Journal Article•10.1016/J.DATAK.2017.03.003•
Planning runtime software adaptation through pragmatic goal model

[...]

Felipe Pontes Guimaraes1, Genaína Nunes Rodrigues2, Raian Ali3, Daniel Macedo Batista1•
University of São Paulo1, University of Brasília2, Bournemouth University3
1 May 2017
TL;DR: This paper argues the case for pragmatic requirements and extends the CGM with additional constructs to capture them and allow their analysis, and develops an automated analysis which aids the planning and scheduling of tasks execution to meet pragmatic goals.
Abstract: Adaptivity is a capability that enables a system to choose amongst various alternatives to satisfy or maintain the satisfaction of certain requirements. The criteria of requirements satisfaction could be pragmatic and context-dependent. Contextual Goal Models (CGM) capture the power of context on banning or allowing certain alternatives to reach requirements (goals) and also deciding the quality of those alternatives with regards to certain quality measures (softgoals). It is used to depict facets of the decision making strategy and rationale of an adaptive system at the preliminary level of requirements. In this paper we argue the case for pragmatic requirements and extend the CGM with additional constructs to capture them and allow their analysis. We also develop an automated analysis which aids the planning and scheduling of tasks execution to meet pragmatic goals. Moreover, we evaluate our modelling and analysis regarding correctness and performance. Such an evaluation showed the applicability of the approach and its usefulness in aiding sensible decisions. It has also shown its capability to do so in a time short enough to suit run-time adaptation decision making.
Journal Article•10.1016/J.DATAK.2017.06.002•
A time-dependent model with speed windows for share-a-ride problems: A case study for Tokyo transportation

[...]

Phan-Thuan Do1, Nguyen-Viet-Dung Nghiem1, Ngoc-Quang Nguyen1, Quang Dung Pham1•
Hanoi University of Science and Technology1
15 Jun 2017
TL;DR: A new fully time-dependent model of a public transportation system in the urban context that allows sharing a taxi between one passenger and parcels with speed widows consideration is introduced and is presented by a mathematical formulation.
Abstract: This paper introduces a new fully time-dependent model of a public transportation system in the urban context that allows sharing a taxi between one passenger and parcels with speed widows consideration. The model contains many real-life case features and is presented by a mathematical formulation. We study both static and dynamic scenarios in comparison to traditional strategies, i.e., the direct delivery model. Moreover, we classify speed windows by different zones and congestion levels during a day in the urban context. Different speed windows induce the dynamic graph model for road networks and make the problem much more difficult to solve. Because of the complex model, the preprocessing steps on data as well as on dynamic graphs are very important. We use a greedy algorithm to initiate the solution and then use some local search techniques to improve the solution quality. The experimental data set is recorded by Tokyo-Musen Taxi company. The data set includes more than 20000 requests per day, more than 4500 used taxis per day and more than 130000 crossing points on the Tokyo map. Experimental results are analyzed on various factors such as the total benefit, the accumulating traveling time during the day, the number of used taxis and the number of shared requests.
Journal Article•10.1016/J.DATAK.2017.10.001•
Location disclosure risks of releasing trajectory distances

[...]

Emre Kaplan1, Mehmet Emre Gursoy2, Mehmet Ercan Nergiz, Yucel Saygin1•
Sabancı University1, Georgia Institute of Technology2
16 Oct 2017
TL;DR: This work devise an attack that yields the locations which the private trajectory has visited, with high confidence, given a set of known trajectories and their distances to a private, unknown trajectory.
Abstract: Location tracking devices enable trajectories to be collected for new services and applications such as vehicle tracking and fleet management. While trajectory data is a lucrative source for data analytics, it also contains sensitive and commercially critical information. This has led to the development of systems that enable privacy-preserving computation over trajectory databases, but many of such systems in fact (directly or indirectly) allow an adversary to compute the distance (or similarity) between two trajectories. We show that the use of such systems raises privacy concerns when the adversary has a set of known trajectories. Specifically, given a set of known trajectories and their distances to a private, unknown trajectory, we devise an attack that yields the locations which the private trajectory has visited, with high confidence. The attack can be used to disclose both positive results (i.e., the victim has visited a certain location) and negative results (i.e., the victim has not visited a certain location). Experiments on real and synthetic datasets demonstrate the accuracy of our attack.

Tools

SciSpace AgentBiomedical AgentSciSpace RecruitSciSpace for EnterpriseAgent GalleryChat with PDFLiterature ReviewAI WriterFind TopicsParaphraserCitation GeneratorExtract DataAI DetectorCitation Booster

Learn

ResourcesLive Workshops

SciSpace

CareersSupportBrowse PapersPricingSciSpace Affiliate ProgramCancellation & Refund PolicyTermsPrivacyData Sources

Directories

PapersTopicsJournalsAuthorsConferencesInstitutionsCitation StylesWriting templates

Extension & Apps

SciSpace Chrome ExtensionSciSpace Mobile App

Contact

support@scispace.com
SciSpace

© 2026 | PubGenius Inc. | Suite # 217 691 S Milpitas Blvd Milpitas CA 95035, USA

soc2
Secured by Delve