Sparse Keyword Data Analysis Using Bayesian Pattern Mining

doi:10.3390/computers14100436

Journal Article10.3390/computers14100436

Sparse Keyword Data Analysis Using Bayesian Pattern Mining

Sunghae Jun

- 14 Oct 2025

- Computers

- Vol. 14, Iss: 10, pp 436-436

Abstract: Keyword data analysis aims to extract and interpret meaningful relationships from large collections of text documents. A major challenge in this process arises from the extreme sparsity of document–keyword matrices, where the majority of elements are zeros due to zero inflation. To address this issue, this study proposes a probabilistic framework called Bayesian Pattern Mining (BPM), which integrates Bayesian inference into association rule mining (ARM). The proposed method estimates both the expected values and credible intervals of interestingness measures such as confidence and lift, providing a probabilistic evaluation of keyword associations. Experiments conducted on 9436 quantum computing patent documents, from which 175 representative keywords were extracted, demonstrate that BPM yields more stable and interpretable associations than conventional ARM. By incorporating credible intervals, BPM reduces the risk of biased decisions under sparsity and enhances the reliability of keyword-based technology analysis, offering a rigorous approach for knowledge discovery in zero-inflated text data.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

References

Proceedings Article•10.1145/170035.170072

Mining association rules between sets of items in large databases

Rakesh Agrawal, +2 more

- 01 Jun 1993

TL;DR: An efficient algorithm is presented that generates all significant association rules between items in the database of customer transactions and incorporates buffer management and novel estimation and pruning techniques.

...read moreread less

17K

•Journal Article•10.1016/0306-4573(88)90021-0

Term Weighting Approaches in Automatic Text Retrieval

Gerard Salton, +1 more

- 01 Aug 1988

- Information Processing and Management

TL;DR: This paper summarizes the insights gained in automatic term weighting, and provides baseline single term indexing models with which other more elaborate content analysis procedures can be compared.

...read moreread less

10.5K

•Journal Article•10.22331/Q-2018-08-06-79

Quantum Computing in the NISQ era and beyond

John Preskill

- 02 Jan 2018

- arXiv: Quantum Physics

TL;DR: Noisy Intermediate-Scale Quantum (NISQ) technology will be available in the near future as mentioned in this paper, which will be useful tools for exploring many-body quantum physics, and may have other useful applications.

...read moreread less

6.9K

•Proceedings Article•10.18653/V1/D19-1371

SciBERT: A Pretrained Language Model for Scientific Text

Iz Beltagy, +2 more

- 01 Nov 2019

TL;DR: SciBERT leverages unsupervised pretraining on a large multi-domain corpus of scientific publications to improve performance on downstream scientific NLP tasks and demonstrates statistically significant improvements over BERT.

...read moreread less

3.5K

Book Chapter•10.1007/978-3-642-37456-2_14

Density-Based Clustering Based on Hierarchical Density Estimates

Ricardo J. G. B. Campello, +2 more

- 14 Apr 2013

TL;DR: This work proposes a theoretically and practically improved density-based, hierarchical clustering method, providing a clustering hierarchy from which a simplified tree of significant clusters can be constructed, and proposes a novel cluster stability measure.

...read moreread less

2.1K

...

Expand