Language identification

Topic Tools

Papers published on a yearly basis

1 / 2

Papers

Proceedings Article•10.1145/1390156.1390177•

A unified architecture for natural language processing: deep neural networks with multitask learning

[...]

Ronan Collobert¹, Jason Weston¹•Institutions (1)

Princeton University¹

5 Jul 2008

TL;DR: This work describes a single convolutional neural network architecture that, given a sentence, outputs a host of language processing predictions: part-of-speech tags, chunks, named entity tags, semantic roles, semantically similar words and the likelihood that the sentence makes sense using a language model.

...read moreread less

Abstract: We describe a single convolutional neural network architecture that, given a sentence, outputs a host of language processing predictions: part-of-speech tags, chunks, named entity tags, semantic roles, semantically similar words and the likelihood that the sentence makes sense (grammatically and semantically) using a language model. The entire network is trained jointly on all these tasks using weight-sharing, an instance of multitask learning. All the tasks use labeled data except the language model which is learnt from unlabeled text and represents a novel form of semi-supervised learning for the shared tasks. We show how both multitask learning and semi-supervised learning improve the generalization of the shared tasks, resulting in state-of-the-art-performance.

...read moreread less

6,528 citations

Book•

Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition

[...]

Dan Jurafsky, James Martin

1 Jan 2000

TL;DR: This book takes an empirical approach to language processing, based on applying statistical and other machine-learning algorithms to large corpora, to demonstrate how the same algorithm can be used for speech recognition and word-sense disambiguation.

...read moreread less

Abstract: From the Publisher: This book takes an empirical approach to language processing, based on applying statistical and other machine-learning algorithms to large corpora.Methodology boxes are included in each chapter. Each chapter is built around one or more worked examples to demonstrate the main idea of the chapter. Covers the fundamental algorithms of various fields, whether originally proposed for spoken or written language to demonstrate how the same algorithm can be used for speech recognition and word-sense disambiguation. Emphasis on web and other practical applications. Emphasis on scientific evaluation. Useful as a reference for professionals in any of the areas of speech and language processing.

...read moreread less

4,732 citations

Book•

Natural Language Processing with Python

[...]

Steven Bird¹, Steven Bird², Ewan Klein, Edward Loper•Institutions (2)

University of Melbourne¹, University of Pennsylvania²

12 Jun 2009

TL;DR: This book offers a highly accessible introduction to natural language processing, the field that supports a variety of language technologies, from predictive text and email filtering to automatic summarization and translation.

...read moreread less

Abstract: This book offers a highly accessible introduction to natural language processing, the field that supports a variety of language technologies, from predictive text and email filtering to automatic summarization and translation. With it, you'll learn how to write Python programs that work with large collections of unstructured text. You'll access richly annotated datasets using a comprehensive range of linguistic data structures, and you'll understand the main algorithms for analyzing the content and structure of written communication. Packed with examples and exercises, Natural Language Processing with Python will help you: Extract information from unstructured text, either to guess the topic or identify "named entities" Analyze linguistic structure in text, including parsing and semantic analysis Access popular linguistic databases, including WordNet and treebanks Integrate techniques drawn from fields as diverse as linguistics and artificial intelligence This book will help you gain practical skills in natural language processing using the Python programming language and the Natural Language Toolkit (NLTK) open source library. If you're interested in developing web applications, analyzing multilingual news sources, or documenting endangered languages -- or if you're simply curious to have a programmer's perspective on how human language works -- you'll find Natural Language Processing with Python both fascinating and immensely useful.

...read moreread less

4,331 citations

Journal Article•10.1016/S0019-9958(67)91165-5•

Language identification in the limit

[...]

E. Mark Gold

01 May 1967-Information & Computation

TL;DR: It was found that theclass of context-sensitive languages is learnable from an informant, but that not even the class of regular languages is learningable from a text.

...read moreread less

Abstract: Language learnability has been investigated. This refers to the following situation: A class of possible languages is specified, together with a method of presenting information to the learner about an unknown language, which is to be chosen from the class. The question is now asked, “Is the information sufficient to determine which of the possible languages is the unknown language?” Many definitions of learnability are possible, but only the following is considered here: Time is quantized and has a finite starting time. At each time the learner receives a unit of information and is to make a guess as to the identity of the unknown language on the basis of the information received so far. This process continues forever. The class of languages will be considered learnable with respect to the specified method of information presentation if there is an algorithm that the learner can use to make his guesses, the algorithm having the following property: Given any language of the class, there is some finite time after which the guesses will all be the same and they will be correct. In this preliminary investigation, a language is taken to be a set of strings on some finite alphabet. The alphabet is the same for all languages of the class. Several variations of each of the following two basic methods of information presentation are investigated: A text for a language generates the strings of the language in any order such that every string of the language occurs at least once. An informant for a language tells whether a string is in the language, and chooses the strings in some order such that every string occurs at least once. It was found that the class of context-sensitive languages is learnable from an informant, but that not even the class of regular languages is learnable from a text.

...read moreread less

3,876 citations

Journal Article•10.1023/A:1011424425034•

Foundations of Statistical Natural Language Processing

[...]

Paul B. Kantor¹•Institutions (1)

Rutgers University¹

01 Apr 2001-Information Retrieval

TL;DR: This book is already in probability information theory and linguistic found it should be well grounded and indeed it is, this foundational text in human language applications who want to create the way.

...read moreread less

3,063 citations

...

Expand

Year	Papers
2025	12
2024	22
2023	66
2022	108
2021	199
2020	253

Topic Tools

Papers published on a yearly basis

Papers

A unified architecture for natural language processing: deep neural networks with multitask learning

Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition

Natural Language Processing with Python

Language identification in the limit

Foundations of Statistical Natural Language Processing

Related Topics (5)

Performance Metrics