Open AccessProceedings Article
Frustratingly Easy Domain Adaptation
Hal Daumé
- 01 Jun 2007
- pp 256-263
1.7K
TL;DR: This work describes an approach to domain adaptation that is appropriate exactly in the case when one has enough “target” data to do slightly better than just using only “source’ data.
read more
Abstract: We describe an approach to domain adaptation that is appropriate exactly in the case when one has enough “target” data to do slightly better than just using only “source” data. Our approach is incredibly simple, easy to implement as a preprocessing step (10 lines of Perl!) and outperforms stateof-the-art approaches on a range of datasets. Moreover, it is trivially extended to a multidomain adaptation problem, where one has data from a variety of different domains.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
From 0 to 10 million annotated words: part-of-speech tagging for Middle High German
Sarah Schulz,Nora Ketschik +1 more
- 01 Dec 2019
TL;DR: This work builds a part-of-speech (POS) tagger for Middle High German and explores self-learning techniques which yield the advantage that unannotated data can be utilized to improve tagging performance on specific subcorpora.
6
•Proceedings Article
A House United: Bridging the Script and Lexical Barrier between Hindi and Urdu
Riyaz Ahmad Bhat,Irshad Ahmad Bhat,Naman Jain,Dipti Misra Sharma +3 more
- 01 Dec 2016
TL;DR: This article proposes a simple but efficient approach to bridge the lexical and orthographic differences between Hindi and Urdu texts and demonstrates that a neural network-based dependency parser trained on augmented, harmonized Hindi andUrdu resources performs significantly better than the parsing models trained separately on the individual resources.
6
Opinion summarization on spontaneous conversations
Dong Wang,Yang Liu +1 more
TL;DR: The experimental results show that both the graph-based method and the supervised method outperform the baseline approach, and the pronoun related features can help to generate better summaries.
6
EZLearn: Exploiting Organic Supervision in Automated Data Annotation
Maxim Grechkin,Hoifung Poon,Bill Howe +2 more
- 01 Jul 2018
TL;DR: Without using any manually labeled data, the EZLearn system learned to accurately annotate data samples in functional genomics and scientific figure comprehension, substantially outperforming state-of-the-art supervised methods trained on tens of thousands of annotated examples.
•Dissertation
Applying Natural Language Processing to Clinical Information Retrieval
James Cogley
- 01 Jan 2014
TL;DR: In this article, the authors propose a solution to solve the problem of the problem: this article ] of "uniformity" and "uncertainty" of the solution.
References
•Proceedings Article
Analysis of Representations for Domain Adaptation
Shai Ben-David,John Blitzer,Koby Crammer,Fernando Pereira +3 more
- 04 Dec 2006
TL;DR: The theory illustrates the tradeoffs inherent in designing a representation for domain adaptation and gives a new justification for a recently proposed model which explicitly minimizes the difference between the source and target domains, while at the same time maximizing the margin of the training set.
Domain Adaptation with Structural Correspondence Learning
John Blitzer,Ryan McDonald,Fernando Pereira +2 more
- 22 Jul 2006
TL;DR: This work introduces structural correspondence learning to automatically induce correspondences among features from different domains in order to adapt existing models from a resource-rich source domain to aresource-poor target domain.
Search-based structured prediction
TL;DR: Searn is an algorithm for integrating search and learning to solve complex structured prediction problems such as those that occur in natural language, speech, computational biology, and vision and comes with a strong, natural theoretical guarantee: good performance on the derived classification problems implies goodperformance on the structured prediction problem.
Domain adaptation for statistical classifiers
Hal Daumé,Daniel Marcu +1 more
TL;DR: This work introduces a statistical formulation of this problem in terms of a simple mixture model and presents an instantiation of this framework to maximum entropy classifiers and their linear chain counterparts and leads to improved performance on three real world tasks on four different data sets from the natural language processing domain.