Journal Article10.1007/S11042-015-2683-5
An optimized data integration model based on reverse cleaning for heterogeneous multi-media data
Hao Chen,Yueqi Ouyang,Wen Jiang +2 more
10
TL;DR: The quality of the integrated data is significantly higher than thequality of the original data, and a data accuracy assessment algorithm is designed for data quality assessment, which is based on Bayesian network and the path condition algorithm.
read more
Abstract: With the continuous development of information technology, various multi-media data are constantly emerging and presents the characteristics of autonomous and heterogeneous, how to integrate and analysis data more correctly and efficiently has become a challenging problem. Firstly, in order to improve the quality of the integrated data, two real-time threads combined with data adapter are used to monitor and refresh necessary updates from heterogeneous data efficiently. Once the original data has been updated, the real-time data will be loaded into the data center soon. Secondly, a data reverse cleaning method is proposed to improve the data quality. It uses the data source tree that built in the data integration process to find the location of the original data quickly after reverse cleaning. finally, a data accuracy assessment algorithm is designed for data quality assessment, which is based on Bayesian network and the path condition algorithm. Experimental results show that the quality of the integrated data significantly higher than the quality of the original data.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
A Review of Data Cleaning Methods for Web Information System
Abstract: Web information system (WIS) is frequently-used and indispensable in daily social life. WIS provides information services in many scenarios, such as electronic commerce, communities, and edutainment. Data cleaning plays an essential role in various WIS scenarios to improve the quality of data service. In this paper, we present a review of the state-of-the-art methods for data cleaning in WIS. According to the characteristics of data cleaning, we extract the critical elements of WIS, such as interactive objects, application scenarios, and core technology, to classify the existing works. Then, after elaborating and analyzing each category, we summarize the descriptions and challenges of data cleaning methods with sub-elements such as data & user interaction, data quality rule, model, crowdsourcing, and privacy preservation. Finally, we analyze various types of problems and provide suggestions for future research on data cleaning in WIS from the technology and interactive perspective.
14
Resource allocation and interference management for multi-layer wireless networks in heterogeneous cognitive networks
TL;DR: This paper discusses the resource allocation and interference management in heterogeneous cognitive networks and proposes the proposed cluster-based cooperative interference management scheme, which minimizes the cross-layer interference of multiple base station cells to primary users and avoids the same- layer interference between base station Cells through joint channel allocation and power allocation.
Improved particle swarm optimization LSSVM spatial location trajectory data prediction model in health care monitoring system
Guobin Chen,Zhang Li +1 more
TL;DR: The simulation experiment verifies that the CPSO-LSSVM positional space prediction model has higher prediction accuracy and more strong generalization ability to accurately and effectively predict spatial location.
6
Collective Entity Disambiguation Based on Hierarchical Semantic Similarity
TL;DR: A hierarchical semantic similarity model is proposed to find important clues related to mentions and entities based on multiple sources of information, such as contexts of the mentions, entity descriptions and categories, which can effectively measure the semantic matching between mentions and target entities.
4
Fast and Efficient Conflict Identification and Resolution in Huge Streaming Data
S. Charles Britto,S. P. Victor +1 more
TL;DR: This paper presents a fast and efficient mechanism to identify and resolve conflicts on huge streaming data using Spark using a wrapper based query formulation module that constructs queries depending on the underlying data sources.
References
Content-based multimedia information retrieval: State of the art and challenges
TL;DR: This survey reviews 100+ recent articles on content-based multimedia information retrieval and discusses their role in current research directions which include browsing and search paradigms, user studies, affective computing, learning, semantic queries, new features and media types, high performance indexing, and evaluation techniques.
Inter-media hashing for large-scale retrieval from heterogeneous data sources
Jingkuan Song,Yang Yang,Yi Yang,Zi Huang,Heng Tao Shen +4 more
- 22 Jun 2013
TL;DR: A novel inter-media hashing (IMH) model is proposed to explore the correlations among multiple media types from different data sources and tackle the scalability issue, which transforms multimedia data from heterogeneous data sources into a common Hamming space, in which fast search can be easily implemented by XOR and bit-count operations.
705
Patent
Method and apparatus for scalable, high bandwidth storage retrieval and transportation of multimedia data on a network
Andrew Laursen,Jeffrey C. Olkin,Mark A. Porter,Farzad Nazem,William Bailey,Mark Moore +5 more
- 12 Mar 1997
TL;DR: In this article, an improved system and method for providing multimedia data in a networked system is described, which allows applications to be split such that client devices (set-top boxes, personal digital assistants, etc.) can focus on presentation, while backend services running in a distributed server complex provide access to data via messaging across an abstracted interface.
543
Big data integration
Xin Luna Dong,Divesh Srivastava +1 more
- 08 Apr 2013
TL;DR: This seminar explores the progress that has been made by the data integration community on the topics of schema mapping, record linkage and data fusion in addressing these novel challenges faced by big data integration, and identifies a range of open problems for the community.
455
Big data integration
Divesh Srivastava
- 19 Dec 2013
TL;DR: This seminar explores the progress that has been made by the data integration community on the topics of schema mapping, record linkage and data fusion in addressing these novel challenges faced by big data integration, and identifies a range of open problems for the community.