Proceedings Article10.1145/564691.564733
Statistical synopses for graph-structured XML databases
Neoklis Polyzotis,Minos Garofalakis +1 more
- 03 Jun 2002
 - pp 358-369
TL;DR: A novel approach to building and using statistical summaries of large XML data graphs for effective path-expression selectivity estimation, and the first work to address this timely problem in the most general setting of graph-structured data and complex (branching) path expressions.
read more
Abstract: Effective support for XML query languages is becoming increasingly important with the emergence of new applications that access large volumes of XML data. All existing proposals for querying XML (e.g., XQuery) rely on a pattern-specification language that allows path navigation and branching through the XML data graph in order to reach the desired data elements. Optimizing such queries depends crucially on the existence of concise synopsis structures that enable accurate compile-time selectivity estimates for complex path expressions over graph-structured XML data. In this paper, We introduce a novel approach to building and using statistical summaries of large XML data graphs for effective path-expression selectivity estimation. Our proposed graph-synopsis model (termed XSKETCH) exploits localized graph stability to accurately approximate (in limited space) the path and branching distribution in the data graph. To estimate the selectivities of complex path expressions over concise XSKETCH synopses, we develop an estimation framework that relies on appropriate statistical (uniformity and independence) assumptions to compensate for the lack of detailed distribution information. Given our estimation framework, we demonstrate that the problem of building an accuracy-optimal XSKETCH for a given amount of space is đ©đ«-hard, and propose an efficient heuristic algorithm based on greedy forward selection. Briefly, our algorithm constructs an XSKETCH synopsis by successive refinements of the label-split graph, the coarsest summary of the XML data graph. Our refinement operations act locally and attempt to capture important statistical correlations between data paths. Extensive experimental results with synthetic as well as real-life data sets verify the effectiveness of our approach. To the best of our knowledge, ours is the first work to address this timely problem in the most general setting of graph-structured data and complex (branching) path expressions.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
The history of histograms (abridged)
Yannis Ioannidis
- 09 Sep 2003
TL;DR: This paper compress their entire history (including their "future history" as currently anticipated) in the given/fixed space budget, mostly recording details for the periods, events, and results with the highest (personally-biased) interest.
âąJournal Article
The DBLP Computer Science bibliography: Evolution, research issues, perspectives
TL;DR: The DBLP Computer Science Bibliography of the University of Trier as discussed by the authors is a large collection of bibliographic information used by thousands of computer scientists, which is used for scientific communication.
447
The DBLP Computer Science Bibliography: Evolution, Research Issues, Perspectives
Michael Ley
- 11 Sep 2002
TL;DR: The most time-consuming task for the maintainers of DBLP may be viewed as a special instance of the authority control problem: how to normalize different spellings of person names.
436
D(k)-index: an adaptive structural summary for graph-structured data
Qun Chen,Andrew Lim,Kian Win Ong +2 more
- 09 Jun 2003
TL;DR: The D(k) index is introduced, an adaptive structural summary for general graph structured documents based on the concept of bisimilarity, and is shown to be a more effective structural summary than previous static ones, as a result of its query load sensitivity.
272
âąProceedings Article
Matching Structure and Semantics: A Survey on Graph-Based Pattern Matching.
Brian Gallagher
- 01 Jan 2006
TL;DR: A survey of existing work on graph matching is presented, describing variations among problems, general and specific solution approaches, evaluation techniques, and directions for further research.
227
References
An introduction to probability theory and its applications - 3/E. volume 3
William Feller
- 22 Mar 2002
Abstract: The classic text for understanding complex statistical probability An Introduction to Probability Theory and Its Applications offers comprehensive explanations to complex statistical problems. Delving deep into densities and distributions while relating critical formulas, processes and approaches, this rigorous text provides a solid grounding in probability with practice problems throughout. Heavy on application without sacrificing theory, the discussion takes the time to explain difficult topics and how to use them. This new second edition includes new material related to the substitution of probabilistic arguments for combinatorial artifices as well as new sections on branching processes, Markov chains, and the DeMoivreLaplace theorem.
21.5K
âąBook
Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference
Judea Pearl
- 01 Jan 1988
TL;DR: Probabilistic Reasoning in Intelligent Systems as mentioned in this paper is a complete and accessible account of the theoretical foundations and computational methods that underlie plausible reasoning under uncertainty, and provides a coherent explication of probability as a language for reasoning with partial belief.
17.6K
Three partition refinement algorithms
Robert Paige,Robert E. Tarjan +1 more
TL;DR: This work presents improved partition refinement algorithms for three problems: lexicographic sorting, relational coarsest partition, and double lexical ordering that uses a new, efficient method for unmerging two sorted sets.
Related Papers (5)
Neoklis Polyzotis,Minos Garofalakis +1 more
- 20 Aug 2002
Neoklis Polyzotis,Minos Garofalakis,Yannis Ioannidis +2 more
- 13 Jun 2004
Quanzhong Li,Bongki Moon +1 more
- 11 Sep 2001