Open AccessDissertation
Efficient Algorithms for Mining Data Streams
Arnold P. Boedihardjo,Chang-Tien Lu,Weiguo Fan,Yao Liang,Naren Ramakrishnan +4 more
- 10 Aug 2010
6
TL;DR: In this work, kernel density estimators (KDEs) are developed that satisfy the stringent computational stipulations of data streams, model unknown and dynamic distributions, and enhance the estimation quality of complex structures.
read more
Abstract: Data streams are ordered sets of values that are fast, continuous, mutable, and potentially unbounded. Examples of data streams include the pervasive time series which span domains such as finance, medicine, and transportation. Mining data streams require approaches that are efficient, adaptive, and scalable. For several stream mining tasks, knowledge of the data’s probability density function (PDF) is essential to deriving usable results. Providing an accurate model for the PDF benefits a variety of stream mining applications and its successful development can have far-reaching impact to the general discipline of stream analysis. Therefore, this research focuses on the construction of efficient and effective approaches for estimating the PDF of data streams. In this work, kernel density estimators (KDEs) are developed that satisfy the stringent computational stipulations of data streams, model unknown and dynamic distributions, and enhance the estimation quality of complex structures. Contributions of this work include: (1) theoretical development of the local region based KDE; (2) construction of a local region based estimation algorithm; (3) design of a generalized local region approach that can be applied to any global bandwidth KDE to enhance estimation accuracy; and (4) application extension of the local region based KDE to multi-scale outlier detection. Theoretical development includes the formulation of the local region concept to effectively approximate the computationally intensive adaptive KDE. This work also analyzes key theoretical properties of the local region based approach which include (amongst others) its expected performance, an alternative local region construction criterion, and its robustness under evolving distributions. Algorithmic design includes the development of a specific estimation technique that reduces the time/space complexities of the adaptive KDE. In order to accelerate mining tasks such as outlier detection, an integrated set of optimizations are proposed for estimating multiple density queries. Additionally, the local region concept is extended to an efficient algorithmic framework which can be applied to any global bandwidth KDEs. The combined solution can significantly improve estimation accuracy while retaining overall linear time/space costs. As an application extension, an outlier detection framework is designed which can effectively detect outliers within multiple data scale representations.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Multivariate Density Estimation
Jeffrey S. Simonoff
- 01 Jan 1996
TL;DR: Exploring and identifying structure is even more important for multivariate data than univariate data, given the difficulties in graphically presenting multivariateData and the comparative lack of parametric models to represent it.
1.1K
•Proceedings Article
Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Arnd Christian König,Stefan Dessloch,Patrick Valduriez,Gerhard Weikum +3 more
- 13 Jun 2004
TL;DR: The 2004 ACM SIGMOD International Conference on Management of Data is held in Paris, the first SIGMOD ever held outside of North America, and it has chosen a place that is rich in tradition but also rich in new departures, one of the focal points of the age of enlightenment and the place of the French revolution in 1789.
75
Compartmentalized adaptive topic mining on social media streams
Gopi Chand Nutakki,Olfa Nasraoui +1 more
- 01 Dec 2016
TL;DR: The proposed evolving topic mining framework can handle the noise, size, dynamic nature, and diversity of the data stream by partitioning the data in both the content and time spaces, hence making the challenging topic modeling task easier and faster and making the cluster models resulting from Stream-Dashboard compatible with topic modeling, thus more appropriate for text data.
3
Supervision of organizing data streaming in cloud environment
Wissam Ali Hussein Salman
- 23 Aug 2019
TL;DR: In this paper, the authors discussed the data supervision can be preserved disparity and that approximately data depict the procedures of the perilous system, and illustrated this by exploiting the conception of immediate data, as the system can reducing the reaction time for most immediate in quarries whereas ensuring less consuming or waste bandwidth package.
1
•Journal Article
Multimedia data mining using P-trees
TL;DR: The DataSURG group at NDSU has a long-standing interest in data mining remotely sensed imagery (RSI) for agricultural, forestry and other prediction and analysis applications and a spatial data structure, the Peano count tree, was developed that provided an efficient, lossless, data mining ready representation of the many types of data involved in these applications.
References
•Book
The Elements of Statistical Learning
Trevor Hastie,Robert Tibshirani,Jerome H. Friedman +2 more
- 01 Jan 2001
29.4K
•Book
A wavelet tour of signal processing
Stéphane Mallat
- 01 Jan 1998
TL;DR: An introduction to a Transient World and an Approximation Tour of Wavelet Packet and Local Cosine Bases.
20.3K
Density estimation for statistics and data analysis
Bernard W. Silverman
- 01 Jan 1986
TL;DR: The Kernel Method for Multivariate Data: Three Important Methods and Density Estimation in Action.
Density Estimation for Statistics and Data Analysis
TL;DR: Density estimation, as discussed in this book, is the construction of an estimate of the density function from the observed data from an unknown probability density function.
14.7K
Related Papers (5)
Feng Han,Yan-Ming Wang,Hua-Peng Wang +2 more
- 01 Aug 2006
Peng Yang,Biao Huang +1 more
- 20 Nov 2008
G. Anuradha,Bidisha Roy +1 more
- 04 Apr 2014