High-Quality Train Data Generation for Deep Learning-Based Web Page Classification Models
TL;DR: In this article, the authors proposed a novel algorithm to automatically generate high-quality training data based on the frequency of the document including the entity of interest for detecting relevant web pages.
read more
Abstract: The current deep learning models detecting relevant web pages show low accuracy because of the poor quality of the training data. In this paper, we propose a novel algorithm to automatically generate high-quality training data based on the frequency of the document including the entity of interest. Our experimental results with movies and cellphones data sets show that the average $F_{1}$ -score of the deep learning models (FNN, CNN, Bi-LSTM, and SeqGAN) trained with our proposed algorithm shows up to 0.9992 in $F_{1}$ -score.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Quantitative aflatoxin B1 detection and mining key wavelengths based on deep learning and hyperspectral imaging in subpixel level
TL;DR: In this article , the authors used a one-dimensional convolutional network to detect peanut aflatoxin B1 content and mine the key wavelengths based on spectral information, which is a serious threat to agricultural safety and human health.
9
Detecting and tracking using 2D laser range finders and deep learning
TL;DR: In this paper , a 2D laser rangefinders (LRF) is used to detect and track people using a calibrated monocular camera and 2D LRF mounted on a mobile robot in order to generate a dataset of leg patterns.
An Efficient Framework for Web Content Mining Systems Using Improved CD-PAM Clustering and the A-CNN Technique
M. Pujar,Monica R. Mundada,B. J. Sowmya,S. Supreeth,G. Shruthi +4 more
TL;DR: This research proposes a new framework that combines cosine distance-based partitioning around Medoid (CD-PAM) clustering and ANOVA-Convolutional Neural Network (A-CNN) techniques to develop an efficient and accurate web content mining system.
1
Analysis of web data classification methods based on semantic similarity measure
TL;DR: In this article , 60 research papers are reviewed based on various web data classification techniques, which are used for effective classification of web data and measuring the semantic relatedness between the two words.
References
•Book
The Nature of Statistical Learning Theory
Vladimir Vapnik
- 01 Jan 1995
TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?
46K
Deep learning in neural networks
TL;DR: This historical survey compactly summarizes relevant work, much of it from the previous millennium, review deep supervised learning, unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.
18.7K
A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting
Yoav Freund,Robert E. Schapire +1 more
- 01 Aug 1997
TL;DR: The model studied can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting, and it is shown that the multiplicative weight-update Littlestone?Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems.
The Elements of Statistical Learning: Data Mining, Inference, and Prediction
TL;DR: The Elements of Statistical Learning: Data Mining, Inference, and Prediction as discussed by the authors is a popular book for data mining and machine learning, focusing on data mining, inference, and prediction.
15.4K
The Elements of Statistical Learning: Data Mining, Inference, and Prediction
TL;DR: This section will review those books whose content and level reflect the general editorial poltcy of Technometrics.
13.6K