Attribute Relation Extraction from Template-inconsistent Semi-structured Text by Leveraging Site-level Knowledge

Open AccessProceedings Article

Attribute Relation Extraction from Template-inconsistent Semi-structured Text by Leveraging Site-level Knowledge

- 01 Oct 2013

- pp 1097-1101

TL;DR: A novel method to leverage sitelevel knowledge for attribute-value extraction from semistructured text with consistent templates, which uses a graph-based random walk model to acquire site-level knowledge and can improve the extraction performance significantly.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

References

•Proceedings Article•10.1109/ICDM.2006.70

Fast Random Walk with Restart and Its Applications

Hanghang Tong, +2 more

- 18 Dec 2006

TL;DR: The heart of the approach is to exploit two important properties shared by many real graphs: linear correlations and block- wise, community-like structure and exploit the linearity by using low-rank matrix approximation, and the community structure by graph partitioning, followed by the Sherman- Morrison lemma for matrix inversion.

...read moreread less

1.3K

•Proceedings Article

Towards automatic data extraction from large web sites

Valter Crescenzi, +2 more

- 01 Jan 2001

Abstract: The paper investigates techniques for extracting data from HTML sites through the use of automatically generated wrappers. To automate the wrapper generation and the data extraction process, the paper develops a novel technique to compare HTML pages and generate a wrapper based on their similarities and dierences. Experimental results on real-life data-intensive Web sites confirm the feasibility of the approach.

...read moreread less

994

Proceedings Article•10.1145/872757.872799

Extracting structured data from Web pages

Arvind Arasu, +1 more

- 09 Jun 2003

TL;DR: This paper presents an algorithm that takes, as input, a set of template-generated pages, deduces the unknown template used to generate the pages, and extracts, as output, the values encoded in the pages.

...read moreread less

764

•Journal Article•10.14778/2002938.2002939

Recovering semantics of tables on the web

Petros Venetis, +7 more

- 01 Jun 2011

TL;DR: A system that attempts to recover the semantics of tables by enriching the table with additional annotations, which leverages a database of class labels and relationships automatically extracted from the Web.

...read moreread less

420

•Proceedings Article•10.3115/1699648.1699697

Character-level Analysis of Semi-Structured Documents for Set Expansion

Richard C. Wang, +1 more

- 06 Aug 2009

TL;DR: This paper illustrated in detail the construction of character-level wrappers for set expansion implemented in SEAL and demonstrated a technique that extends SEAL to learn binary relational concepts from only two seeds, thus demonstrating language-independence.

...read moreread less

44