Open Access
Modeling Tree Structures, Machine Learning, and Information Extraction
Rémi Gilleron,Joachim Niehren,Karine Lewandowski,Anne-Cécile Caron,Aurélien Lemay,Yves Roos,Isabelle Tellier,Sophie Tison,Marc Tommasi,Fabien Torre,Mathias Samuelides,Sławek Staworko,Lingbo Kong,Florent Jousse,Patrick Marty,Jérôme Champavère,Emmanuel Filiot,Olivier Gauwin,Édouard Gilbert,Damien Poirier,Matthieu Keith,Hanh-Missi Tran,Feriel Lahlali +22 more
- 01 Jan 2007
10
TL;DR: This project wants to incorporate novel approaches for modeling tree structure and emerging techniques for machine learning into adaptive information extraction systems for the Web.
read more
Abstract: The Web of data with meaning in the sense that a computer program can learn enough about what the data means to process it. During the last decade, the World Wide Web has evolved into the most important public data store on world. An important challenge for computer science today is to develop accurate information extraction and question answering mechanisms for the Web. Berners-Lee points out the difficulty of that task, and that it might even require more adequate formats of Web data representation. Information must be structured, and structure should reflect semantic information, so that machines can learn enough about what the data means. The standard document formats of the Web today, HTML and XML, rely on tree structures that encompass textual information. In this project we want to incorporate novel approaches for modeling tree structure and emerging techniques for machine learning into adaptive information extraction systems for the Web. In the future, we might also have to account for semantic information. The research team is a joint project team with the lifl (CNRS and Lille 1 University) and the Grappa Group (Lille 3 University). The project will be located in Lille 3 University.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Machine learning
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Efficient Learning of Semi-structured Data from Queries
Hiroki Arimura,Hiroshi Sakamoto,Setsuo Arikawa,博紀 有村,比呂志 坂本,節夫 有川 +5 more
- 01 May 2001
TL;DR: This paper presents a polynomial time learning algorithm for µ-OGT, the subclass of OGT without repeated tree variables, and gives representation-independent hardness results which indicate that both of equivalence and membership queries are necessary to learn µ- OGT.
37
•Journal Article
Parallelism and tree regular constraints
Joachim Niehren,Mateu Villaret +1 more
TL;DR: It is proved that parallelism constraints and context unification remain equivalent when extended with tree regular constraints.
5
On decidability of boundedness property for regular path queries
Yves Andre,Francis Bossut,Anne-Cécile Caron +2 more
- 01 Nov 2000
TL;DR: In this paper, the authors studied the evaluation of regular path queries on semi-structured data, i.e. path queries of the form nd all objects reachable by path whose labels form a word in p where p is a regular expression.
3
Analyzing the Average-Case Behavior of Conjunctive Learning Algorithms
Rüdiger Reischuk,Thomas Zeugmann +1 more
- 01 Aug 1998
TL;DR: A new learning model, stochastic nite learning, in which, in contrast to PAC learning, some information about the underlying distribution is given and the goal is to find a correct (not only approximatively correct) hypothesis.
2
References
Machine learning
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
•Book
Foundations of Statistical Natural Language Processing
Christopher D. Manning,Hinrich Schütze +1 more
- 28 May 1999
TL;DR: This foundational text is the first comprehensive introduction to statistical natural language processing (NLP) to appear and provides broad but rigorous coverage of mathematical and linguistic foundations, as well as detailed discussion of statistical methods, allowing students and researchers to construct their own implementations.
Combining labeled and unlabeled data with co-training
Avrim Blum,Tom M. Mitchell +1 more
- 24 Jul 1998
TL;DR: A PAC-style analysis is provided for a problem setting motivated by the task of learning to classify web pages, in which the description of each example can be partitioned into two distinct views, to allow inexpensive unlabeled data to augment, a much smaller set of labeled examples.
6.4K
Language identification in the limit
TL;DR: It was found that theclass of context-sensitive languages is learnable from an informant, but that not even the class of regular languages is learningable from a text.
3.8K
Text Classification from Labeled and Unlabeled Documents using EM
TL;DR: This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents, and presents two extensions to the algorithm that improve classification accuracy under these conditions.