Efficient data mining for path traversal patterns
TL;DR: The authors explore a new data mining capability that involves mining path traversal patterns in a distributed information-providing environment where documents or objects are linked together to facilitate interactive access and show that the option of selective scan is very advantageous and can lead to prominent performance improvement.
read more
Abstract: The authors explore a new data mining capability that involves mining path traversal patterns in a distributed information-providing environment where documents or objects are linked together to facilitate interactive access. The solution procedure consists of two steps. First, they derive an algorithm to convert the original sequence of log data into a set of maximal forward references. By doing so, one can filter out the effect of some backward references, which are mainly made for ease of traveling and concentrate on mining meaningful user access sequences. Second, they derive algorithms to determine the frequent traversal patterns-i.e., large reference sequences-from the maximal forward references obtained. Two algorithms are devised for determining large reference sequences; one is based on some hashing and pruning techniques, and the other is further improved with the option of determining large reference sequences in batch so as to reduce the number of database scans required. Performance of these two methods is comparatively analyzed. It is shown that the option of selective scan is very advantageous and can lead to prominent performance improvement. Sensitivity analysis on various parameters is conducted.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Data cleansing for Web information retrieval using query independent features
TL;DR: It is found that there exists a large proportion of low-quality Web pages in both the English and the Chinese Web page corpus, and retrieval target pages can be identified using query-independent features and cleansing algorithms.
A survey of online failure prediction methods
TL;DR: To capture the wide spectrum of approaches concerning this area, a taxonomy has been developed, whose different approaches are explained and major concepts are described in detail.
646
Efficient Algorithms for Mining High Utility Itemsets from Transactional Databases
TL;DR: Experimental results show that the proposed algorithms, especially UP-Growth+, not only reduce the number of candidates effectively but also outperform other algorithms substantially in terms of runtime, especially when databases contain lots of long transactions.
624
•Proceedings Article
SPIRIT: Sequential Pattern Mining with Regular Expression Constraints
Minos Garofalakis,Rajeev Rastogi,Kyuseok Shim +2 more
- 07 Sep 1999
TL;DR: In this article, the use of Regular Expressions (REs) as a flexible constraint specification tool that enables user-controlled focus to be incorporated into the pattern mining process is proposed.
Sliding-window filtering: an efficient algorithm for incremental mining
Chang-Hung Lee,Cheng-Ru Lin,Ming-Syan Chen +2 more
- 05 Oct 2001
TL;DR: Algorithm SWF is particularly powerful for efficient incremental mining for an ongoing time-variant transaction database, and the improvement achieved is even more prominent as the incremented portion of the dataset increases and also as the size of the database increases.
202
References
Induction of Decision Trees
TL;DR: In this paper, an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such system, ID3, in detail, is described, and a reported shortcoming of the basic algorithm is discussed.
Mining association rules between sets of items in large databases
Rakesh Agrawal,Tomasz Imielinski,Arun N. Swami +2 more
- 01 Jun 1993
TL;DR: An efficient algorithm is presented that generates all significant association rules between items in the database of customer transactions and incorporates buffer management and novel estimation and pruning techniques.
•Proceedings Article
Fast Algorithms for Mining Association Rules in Large Databases
Rakesh Agrawal,Ramakrishnan Srikant +1 more
- 12 Sep 1994
TL;DR: Two new algorithms for solving thii problem that are fundamentally different from the known algorithms are presented and empirical evaluation shows that these algorithms outperform theknown algorithms by factors ranging from three for small problems to more than an order of magnitude for large problems.
Mining sequential patterns
Rakesh Agrawal,Ramakrishnan Srikant +1 more
- 06 Mar 1995
TL;DR: Three algorithms are presented to solve the problem of mining sequential patterns over databases of customer transactions, and empirically evaluating their performance using synthetic data shows that two of them have comparable performance.
Efficient Similarity Search In Sequence Databases
Rakesh Agrawal,Christos Faloutsos,Arun N. Swami +2 more
- 13 Oct 1993
TL;DR: An indexing method for time sequences for processing similarity queries using R * -trees to index the sequences and efficiently answer similarity queries and provides experimental results which show that the method is superior to search based on sequential scanning.
Related Papers (5)
Rakesh Agrawal,Ramakrishnan Srikant +1 more
- 06 Mar 1995
Rakesh Agrawal,Ramakrishnan Srikant +1 more
- 12 Sep 1994
Rakesh Agrawal,Ramakrishnan Srikant +1 more
- 01 Jul 1998
Jiawei Han,Jian Pei,Yiwen Yin +2 more
- 16 May 2000