Phonetic representation

Topic Tools

Papers published on a yearly basis

Papers

Proceedings Article•

Towards End-To-End Speech Recognition with Recurrent Neural Networks

[...]

Alex Graves¹, Navdeep Jaitly²•Institutions (2)

Google¹, University of Toronto²

21 Jun 2014

TL;DR: A speech recognition system that directly transcribes audio data with text, without requiring an intermediate phonetic representation is presented, based on a combination of the deep bidirectional LSTM recurrent neural network architecture and the Connectionist Temporal Classification objective function.

...read moreread less

Abstract: This paper presents a speech recognition system that directly transcribes audio data with text, without requiring an intermediate phonetic representation. The system is based on a combination of the deep bidirectional LSTM recurrent neural network architecture and the Connectionist Temporal Classification objective function. A modification to the objective function is introduced that trains the network to minimise the expectation of an arbitrary transcription loss function. This allows a direct optimisation of the word error rate, even in the absence of a lexicon or language model. The system achieves a word error rate of 27.3% on the Wall Street Journal corpus with no prior linguistic information, 21.9% with only a lexicon of allowed words, and 8.2% with a trigram language model. Combining the network with a baseline system further reduces the error rate to 6.7%.

...read moreread less

2,513 citations

Book•

The Theory of Lexical Phonology

[...]

K.P. Mohanan

1 Jan 1982

TL;DR: This chapter discusses Malayalam Phonology: Suprasegmentals, which focuses on the development of syllable structure in English and its application to Strata 2, 3 and 4.

...read moreread less

Abstract: I: Introduction.- 1.1. The Issues.- 1.2. The Historical Perspective.- 1.3. The Spiral of Progress.- Notes.- II: An Outline of the Theory: English Phonology.- 2.1. Lexical and Postlexical Rule Applications.- 2.1.1. Two Criteria.- 2.1.2. Lexical Representations.- 2.1.3. Modularity in Lexical Phonology.- 2.1.4. The Intuitions: Word Phonology and Phrase Phonology.- 2.2. Lexical Morphology.- 2.3. The Use of Morphological Information in Phonology.- 2.3.1. Junctures and Rule Blocking.- 2.3.2. Junctures as Triggers: Bracket Erasure.- 2.3.3. Consequences of Bracket Erasure.- 2.4. How Many Strata in English?.- 2.4.1. Stratum 2 vs. Stratum 3: Stem Final Tensing.- 2.4.2. Syllable Structure in English.- 2.4.3. Strata 2,3 and 4: Syllabic Consonants.- 2.4.4. More on Strata 2, 3 and 4: [1] Velarization.- 2.4.5. Linking [r] in Nonrhotic Accents.- 2.4.6. Summary.- 2.5. Rules, Domains, and Stratum Ordering.- 2.5.1. Why Domains?.- 2.5.2. Multiple Stratum Domain in Phonology.- 2.5.3. Multiple Stratum Domain in Morphology.- 2.5.4. Marked and Unmarked Options.- 2.5.5. The Metaphor of Stratal Organization.- 2.5.6. Cycles and Strata.- 2.5.7. Cyclic and Noncyclic Strata.- 2.5.8. The Loop.- 2.6. The Mental Representation of Lexical Entries.- 2.6.1. Actual and Potential Words.- 2.6.2. Productivity: Phonological Rules and Performance.- 2.6.3. The Productivity Continuum.- Notes.- III: Malayalam Phonology: Segmentals.- 3.1. The Lexical Alphabet.- 3.1.1. Lexical Contrasts.- 3.1.2. Voicing of Stops.- 3.1.3. Lenition of Stops.- 3.1.4. Schwa Onglide after Voiced Stops.- 3.2. The Underlying Alphabet.- 3.2.1. Nasals: Place and Nasality Assimilations.- 3.2.2. Other Rules for Nasals.- 3.2.3. Underlying Stops.- 3.3. Syllable Structure in Malayalam.- 3.3.1. The Syllable Template.- 3.3.2. Glide Formation.- 3.3.3. Schwa Insertion.- 3.4. Lexical Strata in Malayalam.- 3.4.1. Productivity, Sanskrit and Dravidian.- 3.4.2. Two Types of Compounding.- 3.4.3. Schwa Insertion in Compounds.- 3.4.4. Degemination of Sonorants.- 3.4.5. Stem-Initial Gemination.- 3.4.6. Stem-Final Gemination.- 3.4.7. Postsonorant Gemination.- 3.4.8. Nasal Deletion.- 3.4.9. Vowel Lengthening.- 3.4.10. Vowel Sandhi.- 3.5. Summary.- Notes.- IV: Malayalam Phonology: Suprasegmentals.- 4.1. The Loop in Malayalam Morphology.- 4.2. Stress and Word Melody.- 4.2.1. Stress.- 4.2.2. Word Melody.- 4.3. The Domain of Stress and Word Melody.- 4.4. Schwa Insertion and Word Melody.- 4.5. An Ordering Paradox.- 4.6. The Effect of the Loop on Stress and Word Melody.- Notes.- V: Accessing Morphological Information.- 5.1. Types of Nonphonological Information in Phonology.- 5.2. Boundaries.- 5.2.1. Boundaries, Concatenation, and Domains.- 5.2.2. Boundary Assignment in SPE.- 5.2.3. Concatenation/Stratum vs. Boundary/Bracket Theories.- 5.3. Domains as Node Labels on Trees.- 5.3.1. Selkirk's Theory.- 5.3.2. Lexicalist Phonology: Concatenation, Stratum and Brackets.- 5.4. Hierarchical Structure in Morphology Notes.- VI: The Postlexical Module.- 6.1. Syntactic and Postsyntactic Modules.- 6.1.1. Accessing Syntactic Information in Phonology.- 6.1.2. Phonological Rules Sensitive to Syntax.- 6.1.3. Phonological Phrases.- 6.1.4. Preview.- 6.2. Speech as Implementation of Phonetic Representation.- 6.3. The Nature of Phonetic Representations.- 6.3.1. Phonetic Features on a Scale.- 6.3.2. How Abstract are Phonetic Representations?.- 6.3.3. The Status of Segments in Phonetic Representations.- 6.4. Language-Specific Implementational Phenomena.- 6.5. Types of Subsegmental Phenomena.- 6.5.1. Timing of Articulatory Gestures.- 6.5.2. Coordination of Articulatory Gestures.- 6.5.3. Degree of Articulatory Gestures.- 6.5.4. Enhancement as Phonetic Implementation.- 6.6. Underlying and Lexical Alphabets.- 6.7. Phonological Structure and Phonetic Implementation.- 6.8. Phonetic Implementation and Classical Phonemics.- 6.8.1. Conditions Relating the Phonemic and Phonetic Levels.- 6.8.2. The Nature of the Mapping.- Notes.- VII: Lexical Phonology and Psychological Reality.- 7.1. The Nature of Evidence in Phonology.- 7.1.1. Corpus vs. Speaker Behaviour.- 7.1.2. Internal and External Evidence.- 7.2. Speaker Judgments.- 7.2.1. Judgments on the Number of Segments.- 7.2.2. Judgments on Segment Distinctions.- 7.2.3. The Perceptual Grid.- 7.2.4. What the Speakers Think They Are Saying or Hearing.- 7.3. Phonemic Orthography.- 7.4. Conventions of Sound Patterning in Versification.- 7.4.1. Rhyme in English.- 7.4.2. Rhyme in Malayalam.- 7.4.3. Metre in Malayalam.- Notes.- Conclusion.- References.- Index of Names.- Index of Subjects.

...read moreread less

615 citations

Journal Article•10.1097/00003446-199510000-00004•

Lexical effects on spoken word recognition by pediatric cochlear implant users

[...]

Karen Iler Kirk¹, David B. Pisoni, Mary Joe Osberger¹•Institutions (1)

Indiana University¹

01 Oct 1995-Ear and Hearing

TL;DR: The results demonstrate that pediatric cochlear implant users are sensitive to the acoustic-phonetic similarities among words, that they organize words into similarity neighborhoods in long-term memory, and that they use this structural information in recognizing isolated words.

...read moreread less

Abstract: The Nucleus multichannel cochlear implant provides substantial auditory information to children with profound hearing impairments who are unable to benefit from conventional amplification. However, children who use the Nucleus cochlear implant greatly vary in their spoken word recognition skills (Staller, Beiter, Brimacombe, Mecklenburg, & Arndt, 1991a), depending in part on the age at onset and duration of their hearing loss (Fryauf-Bertschy, Tyler, Kelsay, & Gantz, 1992; Osberger, Todd, Berry, Robbins, & Miyamoto, 1991b; Staller et al., 1991a; Staller, Dowell, Beiter, & Brimacombe, 1991b), and on the length of cochlear implant use (Fryauf-Bertschy et al., 1992; Miyamoto et al., 1992, 1994; Osberger et al., 1991a; Waltzman, Cohen, & Shapiro, 1992; Waltzman et al., 1990). Several different types of tests have been used to assess the perceptual benefits of cochlear implant use in children because of this variability in performance. Closed-set tests, which provide the listener with a limited number of response alternatives, have been used to measure the perception of prosodic cues, vowel and consonant identification, and word identification. According to Tyler (1993), approximately 50% of children with multichannel cochlear implants perform significantly above chance on closed-set tests of word identification, and some obtain very high levels of performance (70% to 100% correct). For this latter group, more difficult open-set tests of spoken word recognition, wherein no response alternatives are provided, are needed to assess their perceptual capabilities. Historically, spoken word recognition tests were adapted from articulation tests used to evaluate military communications equipment during World War I1 (Hudgins, Hawkins, Karlin, & Stevens, 1947). Several criteria were considered essential in selecting test items, including familiarity, homogeneity of audibility, and phonetic balancing (i.e., to have phonemes within a word list represented in the same proportion as in English). Phonetic balancing was included as a criterion because it was assumed that all speech sounds must be included to test hearing (Hudgins et al., 1947), and that phonetic balancing ensured homogeneity across different lists (Hirsh et al., 1952). Subsequent research demonstrated that phonetic balancing was not necessary to achieve equivalent word lists (Carhart, 1965; Hood & Poole, 1980; Tobias, 1964) and that other nonauditory factors, such as subject age or language level, also influence spoken word recognition (Hodgson, 1985; Jerger, 1984; Smith & Hodgson, 1970). Nonetheless, phonetically balanced word recognition tests still enjoy widespread use in both clinical and research settings because their psychometric properties have been well established (Hirsh et al., 1952; Hudgins et al., 1947). These tests also are widely used because recorded versions of the test materials are available commercially, thereby facilitating comparison of results obtained at different test sites. Phonetically balanced word lists have been used to evaluate potential cochlear implant candidates, as well as to measure post-implant performance. Spoken word recognition is often assessed in children using phonetically balanced materials such as the Phonetically Balanced Kindergarten word lists (PB-K) (Haskins, Reference Note 1). Children with multichannel cochlear implants generally perform poorly on these phonetically balanced tests (Fryauf-Bertschy et al., 1992; Miyamoto, Osberger, Robbins, Myres, & Kessler, 1993; Osberger et al., 1991a; Staller et al., 1991a). For example, Osberger et al. (1991a) reported that the mean PB-K score for 28 subjects with approximately 2 yr of cochlear implant use was 11% (range 0% to 36%). Only six of their subjects scored above 0% words correct. Similarly, Staller et al. (1991a) reported mean PB-K scores of approximately 9% words correct for 80 children who had 1 yr of multichannel cochlear implant experience. It is difficult to distinguish among children with differing spoken word recognition skills using the PB-K test, or to measure changes with increased device experience because the scores of these subjects cluster in a restricted range near 0% correct. Furthermore, the parents and educators of children with cochlear implants have sometimes reported a discrepancy between the observed performance on these phonetically balanced word lists and real-world or everyday communication abilities in more natural settings. That is, children may obtain very low scores on phonetically balanced word lists, but demonstrate relatively good performance during daily activities. The administration of spoken word recognition tests assesses the underlying peripheral and central perceptual processes employed in spoken word recognition (Lively, Pisoni, & Goldinger, 1994; Pisoni & Luce, 1986). Models of spoken word recognition generally propose an initial stage of processing wherein the speech signal is converted to a phonetic representation, followed by a second stage wherein the phonetic representations are matched to the target words by comparing them to items stored in the mental lexicon (Luce, 1986; Luce, Pisoni, & Goldinger, 1990; Marslen-Wilson, 1987). (For an alternative view, see Klatt's Lexical Access From Spectra [LAFS] model [Klatt, 1980]). Poor performance on phonetically balanced speech identification tests may result from difficulties at either stage. If the auditory signal presented via the cochlear implant is too degraded to allow accurate phonetic encoding, word recognition performance will be impaired or reduced. The structure and organization of sound patterns in the mental lexicon can also influence word recognition (Pisoni, Nusbaum, Luce, & Slowiaczek, 1985). For example, when test item selection is constrained by phonetic balancing, the resulting lists may contain many words that are unfamiliar to children with profound hearing losses, who typically have limited vocabularies (Dale, 1974; Lach, Ling, & Ling, 1970; Quigley & Paul, 1984). Children should be able to repeat unfamiliar words if their sensory aid provides adequate auditory information for phoneme identification. If not, then children will most likely select a phonemically similar word within their working vocabulary. In addition, lexical characteristics, such as the frequency with which words occur in the language (Andrews, 1989; Elliot, Clifton, & Servi, 1983) and the number of phonemically similar words in the language (Treisman, 1978a, 1978b) have been shown to affect the speed and accuracy of spoken word recognition (Luce, 1986; Luce et al., 1990). Phonetically balanced word recognition tests were not designed to assess the influence of these lexical factors on word recognition. This paper reports the development of two new word recognition tests in which lexical properties of the test items were carefully controlled; test development was motivated by several assumptions embodied in current theories of spoken word recognition discussed below. Pediatric cochlear implant subjects’ performance on these new tests will also be compared with their performance on a phonetically balanced, word recognition test, the PB-K test.

...read moreread less

401 citations

Journal Article•10.1017/S0272263104262052•

Context of learning in the acquisition of spanish second language phonology

[...]

Manuel Díaz-Campos¹•Institutions (1)

Indiana University¹

01 Jun 2004-Studies in Second Language Acquisition

TL;DR: The authors examined whether study abroad, as it provides opportunities for authentic L2 context, facilitates the acquisition of Spanish phonology, and found similar gain for both regular classroom and study abroad students across time: (a) similar gain in the case of voiced initial stops and word-final laterals, (b) lack of gain for intervocalic fricatives, and (c) high levels of accuracy for the palatal nasal in the pretest.

...read moreread less

Abstract: Studies in SLA have debated the importance of context of learning in the process of developing linguistic skills in a second language (L2). The present paper examines whether study abroad, as it provides opportunities for authentic L2 context, facilitates the acquisition of Spanish phonology. The corpus of this investigation is composed of speech samples from 46 students of Spanish: 26 studying abroad in Spain and 20 in a regular classroom environment in the United States. The students read a paragraph with 60 target words including segments such as word-initial stops (i.e., [p t k]),Throughout the article a phonetic representation of all sounds following the International Phonetic Alphabet (IPA) is presented. Phonetic (instead of phonological) representations avoid making assumptions about underlying L2 representations. intervocalic fricatives (i.e., [ ]), word-final laterals (i.e., [l]), and palatal nasals (i.e., ). The findings reveal the following patterns for both regular classroom and study abroad students across time: (a) similar gain in the case of voiced initial stops and word-final laterals, (b) lack of gain in the case of intervocalic fricatives, and (c) high levels of accuracy in the case of the palatal nasal in the pretest. Concerning the external data, the following factor groups predicted phonological gain among all learners: years of formal language instruction, reported use of Spanish before the semester, reported use of Spanish outside the classroom during the semester (days), reported use of Spanish outside the classroom during the semester (hours), gender, entrance Oral Proficiency Interview, exit Oral Proficiency Interview, and level at which formal instruction began.

...read moreread less

314 citations

Journal Article•10.1016/0010-0277(94)90042-6•

The effect of subphonetic differences on lexical access

[...]

Jean E. Andruski¹, Sheila E. Blumstein¹, Martha W. Burton²•Institutions (2)

Brown University¹, Pennsylvania State University²

01 Sep 1994-Cognition

TL;DR: Results suggest that activation levels of words in the lexicon are graded, depending on the subphonetic shape of the input word and that words which are phonologically similar to the intended word candidate are activated to some extent, whether the input provides a relatively poor phonetic representation of the intendedword or a good one.

...read moreread less

307 citations

...

Expand

Year	Papers
2021	2
2020	1
2019	5
2018	8
2017	8
2016	5

Topic Tools

Papers published on a yearly basis

Papers

Towards End-To-End Speech Recognition with Recurrent Neural Networks

The Theory of Lexical Phonology

Lexical effects on spoken word recognition by pediatric cochlear implant users

Context of learning in the acquisition of spanish second language phonology

The effect of subphonetic differences on lexical access

Related Topics (5)

Performance Metrics