Web-scale table census and classification

doi:10.1145/1935826.1935904

Proceedings Article10.1145/1935826.1935904

Web-scale table census and classification

Eric Crestan, +1 more

- 09 Feb 2011

- pp 545-554

103

TL;DR: Empirical evidence is shown, through a large-scale experimental analysis over a crawl of the Web, that classification accuracy significantly outperforms several baselines and a detailed feature analysis is presented that outlines the most salient features for each table type.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.3233/SW-180333

Information extraction meets the semantic web: a survey

José-Lázaro Martínez-Rodríguez, +2 more

- 01 Jan 2020

- Social Work

TL;DR: Millennium Institute for Foundational Research on Data (IMFD) Comision Nacional de Investigacion Cientifica y Tecnologica (CONICYT), CONICyT FONDECYT: 1181896

...read moreread less

222

Journal Article•10.1145/3372117

Web Table Extraction, Retrieval, and Augmentation: A Survey

Shuo Zhang, +1 more

- 25 Jan 2020

- ACM Transactions on Intelligent Systems ...

TL;DR: The objective of this survey is to synthesize and present two decades of research on web tables into six main categories of information access tasks: table extraction, table interpretation, table search, question answering, knowledge base augmentation, and table augmentation.

...read moreread less

103

•Proceedings Article•10.1145/3178876.3186067

Ad Hoc Table Retrieval using Semantic Similarity

Shuo Zhang, +1 more

- 16 Feb 2018

- arXiv: Information Retrieval

TL;DR: In this article, the authors address the problem of ad hoc table retrieval by answering a keyword query with a ranked list of tables, and propose a method for performing semantic matching between queries and tables.

...read moreread less

92

•Proceedings Article•10.1145/3447548.3467434

TUTA: Tree-based Transformers for Generally Structured Table Pre-training

Zhiruo Wang, +6 more

- 14 Aug 2021

TL;DR: TUTA as discussed by the authors proposes a tree-based attention and position embedding to better capture the spatial and hierarchical information of tables, and devise three progressive pre-training objectives to enable representations at the token, cell, and table levels.

...read moreread less

79

...

Expand

References

•Journal Article•10.1214/AOS/1013203451

Greedy function approximation: A gradient boosting machine.

Jerome H. Friedman

- 01 Oct 2001

- Annals of Statistics

TL;DR: A general gradient descent boosting paradigm is developed for additive expansions based on any fitting criterion, and specific algorithms are presented for least-squares, least absolute deviation, and Huber-M loss functions for regression, and multiclass logistic likelihood for classification.

...read moreread less

26.4K

•Proceedings Article•10.1145/1242572.1242583

Towards domain-independent information extraction from web tables

Wolfgang Gatterbauer, +4 more

- 08 May 2007

TL;DR: This paper shifts attention from the tree-based representation of webpages to a variation of the two-dimensional visual box model used by web browsers to display the information on the screen and believes that this approach can become the basis for a new way of large-scale knowledge acquisition from the current "Visual Web".

...read moreread less

257

Proceedings Article•10.1145/511446.511478

A machine learning based approach for table detection on the web

Yalin Wang, +1 more

- 07 May 2002

TL;DR: A machine learning based approach to classify each given table entity as either genuine or non-genuine, and designed a novel web document table ground truthing protocol and used it to build a large table ground truth database.

...read moreread less

239

•Proceedings Article

Identifying synonyms among distributionally similar words

Dekang Lin, +3 more

- 09 Aug 2003

TL;DR: This work presents two methods for identifying synonyms among distributionally similar words and presents two approaches to compute similarities between words based on their distributions in contexts.

...read moreread less

218

•Journal Article•10.1007/S00357-006-0012-4

Recent advances in predictive (machine) learning

Jerome H. Friedman

- 01 Sep 2006

- Journal of Classification

TL;DR: This paper provides an introduction to these two new methods tracing their respective ancestral roots to standard kernel methods and ordinary decision trees.

...read moreread less

126