Journal Article10.14778/1454159.1454245
XML-document-filtering automaton
Panu Silvasti,Seppo Sippu,Eljas Soisalon-Soininen +2 more
- 01 Aug 2008
- Vol. 1, Iss: 2, pp 1666-1671
TL;DR: This work presents a new XML-document-filtering algorithm that is based on the classic Aho-Corasick pattern-matching automaton, and utilizes the DTD in the preprocessing phase of the automaton to prune out descendant axes and wildcards from the XPath filters.
read more
Abstract: In a publish-subscribe system based on filtering of XML documents subscribers specify their interests with profiles expressed in the XPath language. The system processes a stream of XML documents and delivers to subscribers a notification or content of documents that match the profiles. We present a new XML-document-filtering algorithm that is based on the classic Aho-Corasick pattern-matching automaton. The automaton has a size linear in the sum of the sizes of the filters. We assume that the XML documents all conform to a given DTD; our algorithm utilizes the DTD in the preprocessing phase of the automaton to prune out descendant axes (//) and wildcards (*) from the XPath filters. The XPath subset currently supported consists of linear XPath expressions without predicates. In the case of a 683 MB protein-sequence database, we obtained a throughput of 18.8 MB/sec for 50 000 filters and 17.0 MB/sec for 500 000 filters, using a SAX parser with a throughput of 27 MB/sec.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Schema-conscious filtering of XML documents
Panu Silvasti,Seppo Sippu,Eljas Soisalon-Soininen +2 more
- 24 Mar 2009
TL;DR: An algorithm is presented that utilizes the DTD in the preprocessing phase of the filtering automaton to prune out descendant operators and wildcards from the linear XPath filters to increase the throughput of the nondeterministic YFilter automaton.
A Time/Space Efficient XML Filtering System for Mobile Environment
Yu Xiaochuan,Chan Toong Shoon Alvin +1 more
- 06 Jun 2011
TL;DR: This work developed a highly compact structure representation using bloom filter, which also provides an easy solution for separation of the parsing process from the matching process so as to relief the burden of parsing from the matched process.
6
Hardware/Software Co-design for XML-Document Processing
Anatolij Sergiyenko,Maria Orlova,Oleksii Molchanov +2 more
- 21 Jan 2020
TL;DR: The SM16 processor architecture core is developed which effectively evaluates the stack-based parsing algorithms and is implemented in the field-programmable gate array (FPGA) and has the stack architecture with three additional stack blocks, hash-table, and instructions that accelerate the execution of parsing operations.
2
Algorithms for XML filtering
Panu Silvasti
- 01 Jan 2011
TL;DR: This thesis develops several algorithms for XML filtering with linear XPath expressions based on a backtracking Aho-Corasick pattern-matching automaton (PMA) built from ``keywords'' extracted from the filters, where a keyword is a maximal substring consisting only of XML element names.
Hardware-Software XML-Documents Processing
A.M. Sergiyenko,M.M. Orlova,O.A. Molchanov +2 more
- 05 Feb 2020
References
Efficient string matching: an aid to bibliographic search
TL;DR: A simple, efficient algorithm to locate all occurrences of any of a finite number of keywords in a string of text that has been used to improve the speed of a library bibliographic search program by a factor of 5 to 10.
Fast and scalable pattern matching for content filtering
Sarang Dharmapurikar,John W. Lockwood +1 more
- 26 Oct 2005
TL;DR: This work modifications the classic Aho-Corasick algorithm to consider multiple characters at a time for higher throughput, and can support matching of several thousands of patterns at more than 10 Gbps with the help of a less than 50 KBytes of embedded memory and a few megabytes of external SRAM.
134
Multipattern string matching with q-grams
TL;DR: Three algorithms for exact string matching of multiple patterns, which apply q-grams and bit parallelism are presented, which appeared to be substantially faster than earlier solutions for sets of 1,000--10,000 patterns.
•Proceedings Article
FiST: scalable XML document filtering by sequencing twig patterns
Joonho Kwon,Praveen Rao,Bongki Moon,Sukho Lee +3 more
- 30 Aug 2005
TL;DR: A novel scalable filtering system called FiST (Filtering by Sequencing Twigs) that transforms twig patterns expressed in XPath and XML documents into sequences using Prufer's method, and organizes the sequences into a dynamic hash based index for efficient filtering.
On Classification of Strings
Eljas Soisalon-Soininen,Tatu Ylonen +1 more
- 05 Oct 2004
TL;DR: This paper focuses on the question of how a large set of user profiles can be quickly searched in order to find those that are relevant to the document.
3