Journal Article10.3390/electronics13040798
Keyword Data Analysis Using Generative Models Based on Statistics and Machine Learning Algorithms
Sunghae Jun
3
TL;DR: Keyword data analysis using generative models based on statistics and machine learning algorithms is valid and contributes to the field of text big data analysis.
read more
Abstract: For text big data analysis, we preprocessed text data and constructed a document–keyword matrix. The elements of this matrix represent the frequencies of keywords occurring in a document. The matrix has a zero-inflation problem because many elements are zero values. Also, in the process of preprocessing, the data size of the document–keyword matrix is reduced. However, various machine learning algorithms require a large amount of data, so to solve the problems of data shortage and zero inflation, we propose the use of generative models based on statistics and machine learning. In our experimental tests, we compared the performance of the models using simulation and practical data sets. Thus, we verified the validity and contribution of our research for keyword data analysis.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Technology Keyword Analysis Using Graphical Causal Models
Sunghae Jun
TL;DR: This paper proposes a technology keyword analysis method using graphical causal models to identify cause-effect relationships between technology keywords, enabling informed research and development planning in various technology management aspects.
Sparse Keyword Data Analysis Using Bayesian Pattern Mining
Abstract: Keyword data analysis aims to extract and interpret meaningful relationships from large collections of text documents. A major challenge in this process arises from the extreme sparsity of document–keyword matrices, where the majority of elements are zeros due to zero inflation. To address this issue, this study proposes a probabilistic framework called Bayesian Pattern Mining (BPM), which integrates Bayesian inference into association rule mining (ARM). The proposed method estimates both the expected values and credible intervals of interestingness measures such as confidence and lift, providing a probabilistic evaluation of keyword associations. Experiments conducted on 9436 quantum computing patent documents, from which 175 representative keywords were extracted, demonstrate that BPM yields more stable and interpretable associations than conventional ARM. By incorporating credible intervals, BPM reduces the risk of biased decisions under sparsity and enhances the reliability of keyword-based technology analysis, offering a rigorous approach for knowledge discovery in zero-inflated text data.
References
•Book
Regression Analysis of Count Data
A. Colin Cameron,Pravin K. Trivedi +1 more
- 28 Sep 1998
TL;DR: The authors combine theory and practice to make sophisticated methods of analysis accessible to researchers and practitioners working with widely different types of data and software in areas such as applied statistics, econometrics, marketing, operations research, actuarial studies, demography, biostatistics and quantitative social sciences.
6.2K
synthpop: Bespoke Creation of Synthetic Data in R
TL;DR: The synthpop package for R provides routines to generate synthetic versions of original data sets that mimic the original observed data and preserve the relationships between variables but do not contain any disclosive records.
An introduction to deep generative modeling
Lars Ruthotto,Eldad Haber +1 more
TL;DR: DGMs are introduced and a concise mathematical framework for modeling the three most popular approaches: normalizing flows, variational autoencoders, and generative adversarial networks is provided, which illustrates the advantages and disadvantages of these basic approaches using numerical experiments.
247
Rewriting a Deep Generative Model
David Bau,Steven Liu,Tongzhou Wang,Jun-Yan Zhu,Antonio Torralba +4 more
- 23 Aug 2020
TL;DR: This paper introduces a new problem setting: manipulation of specific rules encoded by a deep generative model, and proposes a formulation in which the desired rule is changed by manipulating a layer of a deep network as a linear associative memory.
136
Distribution Bias Aware Collaborative Generative Adversarial Network for Imbalanced Deep Learning in Industrial IoT
TL;DR: Wang et al. as mentioned in this paper proposed a distribution bias aware collaborative generative adversarial network (DB-CGAN) model for imbalanced deep learning in industrial IoT, especially to solve limitations caused by distribution bias issue between the generated data and original data, via a robust data augmentation.