Topic

Streaming algorithm

About: Streaming algorithm is a research topic. Over the lifetime, 1009 publications have been published within this topic receiving 26860 citations. The topic is also known as: streaming algorithms.

...read moreread less

Topic Tools

Find unexplored research gaps

Generate a literature review

Explore related concepts

Papers published on a yearly basis

Papers

Proceedings Article•10.1145/882082.882086•

A symbolic representation of time series, with implications for streaming algorithms

[...]

Jessica Lin¹, Eamonn Keogh¹, Stefano Lonardi¹, Bill Chiu¹•Institutions (1)

University of California, Riverside¹

13 Jun 2003

TL;DR: A new symbolic representation of time series is introduced that is unique in that it allows dimensionality/numerosity reduction, and it also allows distance measures to be defined on the symbolic approach that lower bound corresponding distance measuresdefined on the original series.

...read moreread less

Abstract: The parallel explosions of interest in streaming data, and data mining of time series have had surprisingly little intersection. This is in spite of the fact that time series data are typically streaming data. The main reason for this apparent paradox is the fact that the vast majority of work on streaming data explicitly assumes that the data is discrete, whereas the vast majority of time series data is real valued.Many researchers have also considered transforming real valued time series into symbolic representations, nothing that such representations would potentially allow researchers to avail of the wealth of data structures and algorithms from the text processing and bioinformatics communities, in addition to allowing formerly "batch-only" problems to be tackled by the streaming community. While many symbolic representations of time series have been introduced over the past decades, they all suffer from three fatal flaws. Firstly, the dimensionality of the symbolic representation is the same as the original data, and virtually all data mining algorithms scale poorly with dimensionality. Secondly, although distance measures can be defined on the symbolic approaches, these distance measures have little correlation with distance measures defined on the original time series. Finally, most of these symbolic approaches require one to have access to all the data, before creating the symbolic representation. This last feature explicitly thwarts efforts to use the representations with streaming algorithms.In this work we introduce a new symbolic representation of time series. Our representation is unique in that it allows dimensionality/numerosity reduction, and it also allows distance measures to be defined on the symbolic approach that lower bound corresponding distance measures defined on the original series. As we shall demonstrate, this latter feature is particularly exciting because it allows one to run certain data mining algorithms on the efficiently manipulated symbolic representation, while producing identical results to the algorithms that operate on the original data. Finally, our representation allows the real valued data to be converted in a streaming fashion, with only an infinitesimal time and space overhead.We will demonstrate the utility of our representation on the classic data mining tasks of clustering, classification, query by content and anomaly detection.

...read moreread less

2,118 citations

Journal Article•10.1016/S0304-3975(03)00400-6•

Finding Frequent Items in Data Streams

[...]

Moses Charikar¹, Kevin Chen², Martin Farach-Colton³•Institutions (3)

Princeton University¹, University of California, Berkeley², Rutgers University³

8 Jul 2002

TL;DR: This work presents a 1-pass algorithm for estimating the most frequent items in a data stream using limited storage space, which achieves better space bounds than the previously known best algorithms for this problem for several natural distributions on the item frequencies.

...read moreread less

Abstract: We present a 1-pass algorithm for estimating the most frequent items in a data stream using limited storage space. Our method relies on a data structure called a COUNT SKETCH, which allows us to reliably estimate the frequencies of frequent items in the stream. Our algorithm achieves better space bounds than the previously known best algorithms for this problem for several natural distributions on the item frequencies. In addition, our algorithm leads directly to a 2-pass algorithm for the problem of estimating the items with the largest (absolute) change in frequency between two data streams. To our knowledge, this latter problem has not been previously studied in the literature.

...read moreread less

2,009 citations

Book•

Data Streams: Algorithms and Applications

[...]

S. Muthukrishnan¹•Institutions (1)

Rutgers University¹

1 Jan 2005

TL;DR: In this paper, the authors present a survey of basic mathematical foundations for data streaming systems, including basic mathematical ideas, basic algorithms, and basic algorithms and algorithms for data stream processing.

...read moreread less

Abstract: 1 Introduction 2 Map 3 The Data Stream Phenomenon 4 Data Streaming: Formal Aspects 5 Foundations: Basic Mathematical Ideas 6 Foundations: Basic Algorithmic Techniques 7 Foundations: Summary 8 Streaming Systems 9 New Directions 10 Historic Notes 11 Concluding Remarks Acknowledgements References

...read moreread less

1,569 citations

Journal Article•10.1038/NMETH.2251•

Streaming fragment assignment for real-time analysis of sequencing experiments

[...]

Adam Roberts¹, Lior Pachter¹•Institutions (1)

University of California, Berkeley¹

01 Jan 2013-Nature Methods

TL;DR: eXpress is a software package for efficient probabilistic assignment of ambiguously mapping sequenced fragments that can determine abundances of sequenced molecules in real time and can be applied to ChIP-seq, metagenomics and other large-scale sequencing data.

...read moreread less

Abstract: We present eXpress, a software package for efficient probabilistic assignment of ambiguously mapping sequenced fragments. eXpress uses a streaming algorithm with linear run time and constant memory use. It can determine abundances of sequenced molecules in real time and can be applied to ChIP-seq, metagenomics and other large-scale sequencing data. We demonstrate its use on RNA-seq data and show that eXpress achieves greater efficiency than other quantification methods.

...read moreread less

1,084 citations

Proceedings Article•10.1109/ICDE.2002.994785•

Streaming-data algorithms for high-quality clustering

[...]

Liadan O'Callaghan¹, Nina Mishra², Adam Meyerson¹, Sudipto Guha³, Rajeev Motwani¹ - Show less +1 more•Institutions (3)

Stanford University¹, Hewlett-Packard², University of Pennsylvania³

7 Aug 2002

TL;DR: This work describes a streaming algorithm that effectively clusters large data streams and provides empirical evidence of the algorithm's performance on synthetic and real data streams.

...read moreread less

Abstract: Streaming data analysis has recently attracted attention in numerous applications including telephone records, Web documents and click streams. For such analysis, single-pass algorithms that consume a small amount of memory are critical. We describe such a streaming algorithm that effectively clusters large data streams. We also provide empirical evidence of the algorithm's performance on synthetic and real data streams.

...read moreread less

760 citations

...

Expand

Performance Metrics

1,145

Papers

6,811

Citations

No. of papers in the topic in previous years
Year	Papers
2025	9
2024	9
2023	48
2022	63
2021	88
2020	118

Streaming algorithm

Topic Tools

Papers published on a yearly basis

Papers

A symbolic representation of time series, with implications for streaming algorithms

Finding Frequent Items in Data Streams

Data Streams: Algorithms and Applications

Streaming fragment assignment for real-time analysis of sequencing experiments

Streaming-data algorithms for high-quality clustering

Related Topics (5)

Performance Metrics