About: Monaural is a research topic. Over the lifetime, 2043 publications have been published within this topic receiving 49176 citations. The topic is also known as: monophonic & monaural.
TL;DR: In this article, the physics of the external ear (transfer functions of external ear, area function and termination of the ear canal, analysis of transfer characteristics) evaluation of monaural attributes of ear input signals (lateralization and multiple auditory events, summing localization and the law of the first wavefront, inhibition of the primary sound) two sound sources radiating partially coherent or incoherent signals (the influence of the degree of coherence, binaural signal detection) more than two sound source and diffuse sound fields.
Abstract: Part 1 Introduction: auditory events and auditory space systems analysis of the auditory experiment remarks concerning experimental procedures (psychometric methods, signals and sound fields, probe microphones). Part 2 Spatial hearing with one sound source: localization and localization blur the sound field at the two ears (propagation in the ear canal, the pinna and the effect of the head, transfer functions of the external ear) evaluating identical ear input signals (directional hearing in the median plane, distance hearing and inside-the-head locatedness) evaluating nonidentical ear inputs signals (interaural time differences, interaural level differences, the interaction of interaural time and level differences) additional parameters (motional theories, bone-condition, visual, vestibular and tactile theories). Part 3 Spatial hearing with multiple sound sources and in enclosed spaces: two sound sources radiating coherent signals (summing localization, the law of the first wavefront, inhibition of the primary sound) two sound sources radiating partially coherent or incoherent signals (the influence of the degree of coherence, binaural signal detection) more than two sound sources and diffuse sound fields. Part 4 Progress and trends since 1972: preliminary remarks the physics of the external ear (transfer functions of the external ear, area function and termination of the ear canal, analysis of transfer characteristics) evaluation of monaural attributes of the ear input signals evaluation of interaural attributes of the ear input signals (lateralization and multiple auditory events, summing localization and the law of the first wavefront, binaural localization, signal detection, and speech recognition in the presence of interfering noise, models of binaural signal processing) examples of applications (the auditory spatial impression, dummy-head stereophony). Part 5 Progress and trends since 1982: preliminary remarks binaural room simulation and auditory virtual reality binaural signal processing and speech enhancement the precedence effect - a case of cognition.
TL;DR: The joint optimization of the deep learning models (deep neural networks and recurrent neural networks) with an extra masking layer, which enforces a reconstruction constraint, is proposed to enhance the separation performance of monaural speech separation models.
Abstract: Monaural source separation is useful for many real-world applications though it is a challenging problem. In this paper, we study deep learning for monaural speech separation. We propose the joint optimization of the deep learning models (deep neural networks and recurrent neural networks) with an extra masking layer, which enforces a reconstruction constraint. Moreover, we explore a discriminative training criterion for the neural networks to further enhance the separation performance. We evaluate our approaches using the TIMIT speech corpus for a monaural speech separation task. Our proposed models achieve about 3.8~4.9 dB SIR gain compared to NMF models, while maintaining better SDRs and SARs.
TL;DR: In this paper, a joint optimization of masking functions and deep recurrent neural networks for monaural source separation tasks was proposed, which achieved 2.30-4.98 dB SDR and 4.32-5.42 dB GSIR gain compared to existing models in the singing voice separation task and outperformed NMF and DNN baselines in the speech denoising task.
Abstract: Monaural source separation is important for many real world applications. It is challenging because, with only a single channel of information available, without any constraints, an infinite number of solutions are possible. In this paper, we explore joint optimization of masking functions and deep recurrent neural networks for monaural source separation tasks, including monaural speech separation, monaural singing voice separation, and speech denoising. The joint optimization of the deep recurrent neural networks with an extra masking layer enforces a reconstruction constraint. Moreover, we explore a discriminative criterion for training neural networks to further enhance the separation performance. We evaluate the proposed system on the TSP, MIR-1K, and TIMIT datasets for speech separation, singing voice separation, and speech denoising tasks, respectively. Our approaches achieve 2.30--4.98 dB SDR gain compared to NMF models in the speech separation task, 2.30--2.48 dB GNSDR gain and 4.32--5.42 dB GSIR gain compared to existing models in the singing voice separation task, and outperform NMF and DNN baselines in the speech denoising task.
TL;DR: Five bilateral cochlear implant users were tested for their localization abilities and speech understanding in noise, and participated in lateralization tasks to assess the impact of variations in interaural time delays (ITDs) and Interaural level differences (ILDs) for electrical pulse trains under direct computer control.
Abstract: Five bilateral cochlear implant users were tested for their localization abilities and speech understanding in noise, for both monaural and binaural listening conditions. They also participated in lateralization tasks to assess the impact of variations in interaural time delays (ITDs) and interaural level differences (ILDs) for electrical pulse trains under direct computer control. The localization task used pink noise bursts presented from an eight-loudspeaker array spanning an arc of approximately 108° in front of the listeners at ear level (0-degree elevation). Subjects showed large benefits from bilateral device use compared to either side alone. Typical root-mean-square (rms) averaged errors across all eight loudspeakers in the array were about 10° for bilateral device use and ranged from 20° to 60° using either ear alone. Speech reception thresholds (SRTs) were measured for sentences presented from directly in front of the listeners (0°) in spectrally matching speech-weighted noise at either 0°, +90° or −90° for four subjects out of five tested who could perform the task. For noise to either side, bilateral device use showed a substantial benefit over unilateral device use when noise was ipsilateral to the unilateral device. This was primarily because of monaural head-shadow effects, which resulted in robust SRT improvements (P<0.001) of about 4 to 5 dB when ipsilateral and contralateral noise positions were compared. The additional benefit of using both ears compared to the shadowed ear (i.e., binaural unmasking) was only 1 or 2 dB and less robust (P=0.04). Results from the lateralization studies showed consistently good sensitivity to ILDs; better than the smallest level adjustment available in the implants (0.17 dB) for some subjects. Sensitivity to ITDs was moderate on the other hand, typically of the order of 100 μs. ITD sensitivity deteriorated rapidly when stimulation rates for unmodulated pulse-trains increased above a few hundred Hz but at 800 pps showed sensitivity comparable to 50-pps pulse-trains when a 50-Hz modulation was applied. In our opinion, these results clearly demonstrate important benefits are available from bilateral implantation, both for localizing sounds (in quiet) and for listening in noise when signal and noise sources are spatially separated. The data do indicate, however, that effects of interaural timing cues are weaker than those from interaural level cues and according to our psychophysical findings rely on the availability of low-rate information below a few hundred Hz.
TL;DR: The virtual auditory space technique was used to quantify the relative strengths of interaural time difference (ITD), Interaural level difference (ILD), and spectral cues in determining the perceived lateral angle of wideband, low-pass, and high-pass noise bursts.
Abstract: The virtual auditory space technique was used to quantify the relative strengths of interaural time difference (ITD), interaural level difference (ILD), and spectral cues in determining the perceived lateral angle of wideband, low-pass, and high-pass noise bursts. Listeners reported the apparent locations of virtual targets that were presented over headphones and filtered with listeners' own directional transfer functions. The stimuli were manipulated by delaying or attenuating the signal to one ear (by up to 600 micros or 20 dB) or by altering the spectral cues at one or both ears. Listener weighting of the manipulated cues was determined by examining the resulting localization response biases. In accordance with the Duplex Theory defined for pure-tones, listeners gave high weight to ITD and low weight to ILD for low-pass stimuli, and high weight to ILD for high-pass stimuli. Most (but not all) listeners gave low weight to ITD for high-pass stimuli. This weight could be increased by amplitude-modulating the stimuli or reduced by lengthening stimulus onsets. For wideband stimuli, the ITD weight was greater than or equal to that given to ILD. Manipulations of monaural spectral cues and the interaural level spectrum had little influence on lateral angle judgements.