Journal Article10.3390/computers14100436
Sparse Keyword Data Analysis Using Bayesian Pattern Mining
Abstract: Keyword data analysis aims to extract and interpret meaningful relationships from large collections of text documents. A major challenge in this process arises from the extreme sparsity of document–keyword matrices, where the majority of elements are zeros due to zero inflation. To address this issue, this study proposes a probabilistic framework called Bayesian Pattern Mining (BPM), which integrates Bayesian inference into association rule mining (ARM). The proposed method estimates both the expected values and credible intervals of interestingness measures such as confidence and lift, providing a probabilistic evaluation of keyword associations. Experiments conducted on 9436 quantum computing patent documents, from which 175 representative keywords were extracted, demonstrate that BPM yields more stable and interpretable associations than conventional ARM. By incorporating credible intervals, BPM reduces the risk of biased decisions under sparsity and enhances the reliability of keyword-based technology analysis, offering a rigorous approach for knowledge discovery in zero-inflated text data.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
References
Mining association rules between sets of items in large databases
Rakesh Agrawal,Tomasz Imielinski,Arun N. Swami +2 more
- 01 Jun 1993
TL;DR: An efficient algorithm is presented that generates all significant association rules between items in the database of customer transactions and incorporates buffer management and novel estimation and pruning techniques.
Term Weighting Approaches in Automatic Text Retrieval
Gerard Salton,Chris Buckley +1 more
TL;DR: This paper summarizes the insights gained in automatic term weighting, and provides baseline single term indexing models with which other more elaborate content analysis procedures can be compared.
Quantum Computing in the NISQ era and beyond
TL;DR: Noisy Intermediate-Scale Quantum (NISQ) technology will be available in the near future as mentioned in this paper, which will be useful tools for exploring many-body quantum physics, and may have other useful applications.
SciBERT: A Pretrained Language Model for Scientific Text
Iz Beltagy,Kyle Lo,Arman Cohan +2 more
- 01 Nov 2019
TL;DR: SciBERT leverages unsupervised pretraining on a large multi-domain corpus of scientific publications to improve performance on downstream scientific NLP tasks and demonstrates statistically significant improvements over BERT.
Density-Based Clustering Based on Hierarchical Density Estimates
Ricardo J. G. B. Campello,Davoud Moulavi,Joerg Sander +2 more
- 14 Apr 2013
TL;DR: This work proposes a theoretically and practically improved density-based, hierarchical clustering method, providing a clustering hierarchy from which a simplified tree of significant clusters can be constructed, and proposes a novel cluster stability measure.
2.1K