Hierarchical Metadata-Aware Document Categorization under Weak Supervision

doi:10.1145/3437963.3441730

Open AccessProceedings Article10.1145/3437963.3441730

Hierarchical Metadata-Aware Document Categorization under Weak Supervision

Yu Zhang, +3 more

- 08 Mar 2021

- pp 770-778

19

TL;DR: In this paper, a joint representation learning and data augmentation module is proposed for document categorization under weak supervision, which allows simultaneous modeling of category dependencies, metadata information and textual semantics, and introduces a hierarchical synthesizing training documents to complement the original, small-scale training set.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Proceedings Article•10.1145/3485447.3512174

Metadata-Induced Contrastive Learning for Zero-Shot Multi-Label Text Classification

Yu Yvette Zhang, +7 more

- 11 Feb 2022

TL;DR: Experimental results show that MICoL significantly outperforms strong zero-shot text classification and contrastive learning baselines and is on par with the state-of-the-art supervised metadata-aware LMTC method trained on 10K–200K labeled documents, and tends to predict more infrequent labels than supervised methods, thus alleviates the deteriorated performance on long-tailed labels.

...read moreread less

30

•Posted Content

Coarse2Fine: Fine-grained Text Classification on Coarsely-grained Annotated Data

Dheeraj Mekala, +2 more

- 22 Sep 2021

- arXiv: Computation and Language

TL;DR: The authors proposed a coarse-to-fine grained classification approach to perform fine-grained classification on coarsely annotated data, which leverages label surface names as the only human guidance and employs rich pre-trained generative language models into the iterative weak supervision strategy.

...read moreread less

16

•Proceedings Article•10.1145/3488560.3498384

MotifClass: Weakly Supervised Text Classification with Higher-order Metadata Information

11 Feb 2022

TL;DR: MotifClass as mentioned in this paper proposes a heterogeneous information network to capture higher-order structures in the network, and uses motifs to describe metadata combinations to help weakly supervised text classification.

...read moreread less

9

Journal Article•10.1145/3470888

dhCM: Dynamic and Hierarchical Event Categorization and Discovery for Social Media Stream

GuoJinjin, +2 more

- 23 Sep 2021

- ACM Transactions on Intelligent Systems ...

TL;DR: The online event discovery in social media based documents is useful, such as for disaster recognition and intervention, but the diverse events incrementally identified from social media streaks need to be addressed.

...read moreread less

6

Journal Article•10.1145/3580305.3599544

Weakly Supervised Multi-Label Classification of Full-Text Scientific Papers

Yu Zhang, +6 more

- 04 Aug 2023

TL;DR: Weakly supervised multi-label classification of full-text scientific papers focuses on classifying papers into coarse-grained research topics and fine-grained themes using category descriptions and full text. The proposed framework, FUTEX, leverages the cross-paper network structure and the in-paper hierarchy structure to achieve competitive performance.

...read moreread less

3

...

Expand

References

•Journal Article

Visualizing Data using t-SNE

Laurens van der Maaten, +1 more

- 01 Jan 2008

- Journal of Machine Learning Research

TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.

...read moreread less

45.8K

•Proceedings Article•10.18653/V1/N16-1174

Hierarchical Attention Networks for Document Classification

Zichao Yang, +5 more

- 13 Jun 2016

TL;DR: Experiments conducted on six large scale text classification tasks demonstrate that the proposed architecture outperform previous methods by a substantial margin.

...read moreread less

5.7K

•Proceedings Article•10.1145/2736277.2741093

LINE: Large-scale Information Network Embedding

Jian Tang, +5 more

- 18 May 2015

TL;DR: A novel network embedding method called the ``LINE,'' which is suitable for arbitrary types of information networks: undirected, directed, and/or weighted, and optimizes a carefully designed objective function that preserves both the local and global network structures.

...read moreread less

4.9K

•Proceedings Article•10.1145/2736277.2741093

LINE: Large-scale Information Network Embedding

Jian Tang, +5 more

- 12 Mar 2015

- arXiv: Learning

TL;DR: LINE as discussed by the authors proposes a network embedding method called LINE, which is suitable for arbitrary types of information networks: undirected, directed, and/or weighted, and optimizes a carefully designed objective function that preserves both the local and global network structures.

...read moreread less

4.2K

Journal Article•10.1109/TKDE.2017.2754499

Knowledge Graph Embedding: A Survey of Approaches and Applications

Quan Wang, +3 more

- 01 Dec 2017

- IEEE Transactions on Knowledge and Data ...

TL;DR: This article provides a systematic review of existing techniques of Knowledge graph embedding, including not only the state-of-the-arts but also those with latest trends, based on the type of information used in the embedding task.

...read moreread less

2.8K

...

Expand

Hierarchical Metadata-Aware Document Categorization under Weak Supervision

Chat with Paper

AI Agents for this Paper

Citations

Metadata-Induced Contrastive Learning for Zero-Shot Multi-Label Text Classification

Coarse2Fine: Fine-grained Text Classification on Coarsely-grained Annotated Data

MotifClass: Weakly Supervised Text Classification with Higher-order Metadata Information

dhCM: Dynamic and Hierarchical Event Categorization and Discovery for Social Media Stream

Weakly Supervised Multi-Label Classification of Full-Text Scientific Papers

References

Visualizing Data using t-SNE

Hierarchical Attention Networks for Document Classification

LINE: Large-scale Information Network Embedding

LINE: Large-scale Information Network Embedding

Knowledge Graph Embedding: A Survey of Approaches and Applications

Related Papers (5)

Hierarchical Metadata-Aware Document Categorization under Weak Supervision

A Versatile Hypergraph Model for Document Collections

Knowledge extraction and retrieval for domain-specific documents

Inter-document similarities, language models, and ad hoc information retrieval

Ontology construction for information selection