Proceedings Article10.1145/3400903.3400919
A Versatile Hypergraph Model for Document Collections
Andreas Spitz,Dennis Aumiller,Bálint Soproni,Michael Gertz +3 more
- 07 Jul 2020
3
TL;DR: Heterogeneous hypergraphs are introduced as a versatile model for representing annotated document collections that integrates external metadata, document content, entity and term annotations, and document segmentation at different granularity levels in a joint model that bridges the gap between structured and unstructured data.
read more
Abstract: Efficiently and effectively representing large collections of text is of central importance to information retrieval tasks such as summarization and search. Since models for these tasks frequently rely on an implicit graph structure of the documents or their contents, graph-based document representations are naturally appealing. For tasks that consider the joint occurrence of words or entities, however, existing document representations often fall short in capturing cooccurrences of higher order, higher multiplicity, or at varying proximity levels. Furthermore, while numerous applications benefit from structured knowledge sources, external data sources are rarely considered as integral parts of existing document models. To address these shortcomings, we introduce heterogeneous hypergraphs as a versatile model for representing annotated document collections. We integrate external metadata, document content, entity and term annotations, and document segmentation at different granularity levels in a joint model that bridges the gap between structured and unstructured data. We discuss selection and transformation operations on the set of hyperedges, which can be chained to support a wide range of query scenarios. To ensure compatibility with established information retrieval methods, we discuss projection operations that transform hyperedges to traditional dyadic cooccurrence graph representations. Using PostgreSQL and Neo4j, we investigate the suitability of existing database systems for implementing the hypergraph document model, and explore the impact of utilizing implicit and materialized hyperedge representations on storage space requirements and query performance.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
A Hypergraph Approach for Estimating Growth Mechanisms of Complex Networks
01 Jan 2022
TL;DR: In this article , the authors proposed a new hypergraph growth model with a data-driven preferential attachment mechanism estimated from observed data, which can preserve higher-order relationships by using hyperedges.
3
Vec2Node: Self-Training with Tensor Augmentation for Text Classification with Few Labels
Sara Abdali,Subhabrata Mukherjee,Evangelos E. Papalexakis +2 more
- 01 Jan 2023
TL;DR: This work develops Vec2Node that leverages self-training from in-domain unlabeled data augmented with tensorized word embeddings that significantly improves over state-of-the-art models, particularly in low-resource settings.
2
A Hypergraph Approach for Estimating Growth Mechanisms of Complex Networks
TL;DR: Fitting the proposed hypergraph model to 13 real-world datasets from diverse domains, it is found that all estimated preferential attachment functions deviates substantially from the linear form, demonstrating the need of doing away with the linear preferential attachment assumption and adopting a data-driven approach.
2
References
Glove: Global Vectors for Word Representation
Jeffrey Pennington,Richard Socher,Christopher D. Manning +2 more
- 01 Oct 2014
TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
WordNet: a lexical database for English
TL;DR: WordNet1 provides a more effective combination of traditional lexicographic information and modern computing, and is an online lexical database designed for use under program control.
16.9K
•Book
Introduction to Information Retrieval
Christopher D. Manning,Prabhakar Raghavan,Hinrich Schütze +2 more
- 01 Jan 2008
TL;DR: In this article, the authors present an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections.
Deep contextualized word representations
Matthew E. Peters,Mark Neumann,Mohit Iyyer,Matt Gardner,Christopher Clark,Kenton Lee,Luke Zettlemoyer +6 more
- 15 Feb 2018
TL;DR: This paper introduced a new type of deep contextualized word representation that models both complex characteristics of word use (e.g., syntax and semantics), and how these uses vary across linguistic contexts (i.e., to model polysemy).
A neural probabilistic language model
TL;DR: The authors propose to learn a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences, which can be expressed in terms of these representations.
Related Papers (5)
Krisztian Balog
- 04 Feb 2013
Lillian Lee,Oren Kurland +1 more
- 01 Jan 2006
Jay Ponte,W. Bruce Croft +1 more
- 01 Aug 1998