How many parameters are added to the AVG model?

The STAN models compute data-dependent attention weights and implement the channel scoring functions Zi with 20 LSTM units followed by a single dense unit with a SELU non-linearity (Eq. 5), resulting in 11k additional parameters over the AVG model (+0.09% relative).

What is the way to improve the design of a DNN?

Their proposed design improves with invariance to channel order and the design is simplified by using long short-term memory (LSTM) and dense units instead of a custom-designed neural network cell.

How does the STAN model reduce the relative CER?

Compared to a two-sensor model that weighs both sensors equally, the equivalent STAN model has only a relative parameter increase of 0.09%, but reduces the relative CER by up to 19.1% on the CHiME-4 dataset.

What is the noise level of the random walk process?

q(k) =k ∑i=1ni, with ni ∼ N (0, 1) (6)σ(k) = σmax · q(k)−min{q1, ..., qK}max{q1, ..., qK} −min{q1, ..., qK} (7)The resulting random walk process yields an average noise level of E(σ(k)) = σmax/2.

What are the properties of the STAN-2CH attention mechanism?

These properties make the model useful for multi-sensor systems on real-world robotic platforms (e.g. rescue robots [2], [3]) because the attention weights can help to identify failing sensors for replacement or sub-optimal sensors for removal in order to save hardware, computation and energy resources.

What is the CER improvement of the BEAMFORMIT-5CH model?

The BEAMFORMIT-5CH model achieves the lowest overall error rates with relative CER improvements of 8.7% to 10.2% over STAN-5CH and 5.8% to 8.6% over STAN-2CH.

What is the difference between the two models?

Both models allow the re-use of the same classifier because the merged feature dimensionality does not change with the number of input channels.

What is the average attention weight assigned to channel 5?

The average attention weight assigned to channel 5 is 3.5x higher than for the noisy channel 2: ᾱ5 = 3.5 · ᾱ2, and the relation α5k > α 2 k holds true for 94.2% of all K = 985619 frames.

What is the funding agreement for this work?

This work was partially supported by Samsung Advanced Institute of Technology and the European Union’s Horizon 2020 research and innovation program under grant agreement No 644732.

Why are the scoring functions not shared?

Because the sensors provide distinct feature modalities, the parameters of the scoring functions are not shared, i.e. θZ1 6= θZ2 .

Open AccessProceedings Article10.1109/IJCNN.2019.8852396

Attention-driven Multi-sensor Selection

Stefan Braun, +4 more

- 14 Jul 2019

- pp 1-8

TL;DR: A sensor transformation attention network (STAN) that embeds a sensory attention mechanism to dynamically weigh and combine individual input sensors based on their task-relevant information is reported on.

Abstract: Recent encoder-decoder models for sequence-to-sequence mapping show that integrating both temporal and spatial attention mechanisms into neural networks considerably improve network performance. The use of attention for sensor selection in multi-sensor setups and the benefit of such an attention mechanism is less studied. This work reports on a sensor transformation attention network (STAN) that embeds a sensory attention mechanism to dynamically weigh and combine individual input sensors based on their task-relevant information. We demonstrate the correlation of the attentional signal to changing noise levels of each sensor on the audio-visual GRID dataset and synthetic noise; and on CHiME-4, a multi-microphone real-world noisy dataset. In addition, we demonstrate that the STAN model is able to deal with sensor removal and addition without retraining, and is invariant to channel order. Compared to a two-sensor model that weighs both sensors equally, the equivalent STAN model has a relative parameter increase of only 0.09%, but reduces the relative character error rate (CER) by up to 19.1% on the CHiME-4 dataset. The attentional signal helps to identify a lower SNR sensor with up to 94.2% accuracy.

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Most frequently asked questions

1. What have the authors contributed in "Attention-driven multi-sensor selection" ?

The use of attention for sensor selection in multi-sensor setups and the benefit of such an attention mechanism is less studied.. This work reports on a sensor transformation attention network ( STAN ) that embeds a sensory attention mechanism to dynamically weigh and combine individual input sensors based on their task-relevant information.. The authors demonstrate the correlation of the attentional signal to changing noise levels of each sensor on the audio-visual GRID dataset and synthetic noise ; and on CHiME-4, a multi-microphone real-world noisy dataset.. In addition, the authors demonstrate that the STAN model is able to deal with sensor removal and addition without retraining, and is invariant to channel order.

2. How long does it take to generate the enhanced output from the five input channels?

In order to generate the enhanced output from the five input channels on a sample of average length 6s, the beamforming algorithm takes 3554ms (CPU) while the attention mechanism of STAN-5CH only takes 195ms (CPU) or 25ms (GPU), i.e. 25x to 142x faster (Skylake Xeon CPU with 4.3GHz, GTX 1080 GPU).

3. What is the purpose of the random walk noise model?

The random walk noise model adds noise with a timevarying noise level, σ(k), to each sensor and is used for both training and testing.

4. What is the scoring function for each channel?

Because the input channels are of the same modality, the authors apply the same scoring function Z to each channel i, therefore θZ1 = ... = θZN .

TABLE I: Results of the GRID experiments, averaged over 10 runs. All values are reported in the format mean ± standard deviation. The ATTCORR values are not computed for the hi-lo noise because the correlation function is not defined for constant functions. The lowest WER is printed bold. The ATTCORR and ATTACC values are rescaled to the range [−100, 100] in the interest of readability.

Fig. 2: Attention response on a randomly selected sample from the multi-modal GRID dataset. The top row depicts the noise levels applied to each input, and the bottom row depicts the attention weights computed by STAN. The green bars indicate frames where the relative SNR value of the correct sensor is identified. (a) shows the response to random walk noise resulting in ATTACC of 72%. Note how the attention weights dynamically change, mostly in correlation with the noise level. (b) and (c) show responses to cross and hi-lo noise, with ATTACCs of 92% and 100% respectively.

Fig. 3: Operation of STAN-2CH on a sample with channel configuration (1/2/3/4/5/6). (a) Filterbank features for the 6 input channels and the merged representation. (b) Attention weights αik for the 6 input channels. The attention weights show three distinct tiers: the cleanest channels (1,4,5,6) are assigned the highest attention weights that are roughly equal for all 4 channels. The weights of noisy channel 2 lies between those of (1,4,5,6) and the highly corrupted channel 3 (isolated case of microphone failure). The merged representation appears to be hardly corrupted by channels 2 and 3.

Fig. 1: STAN architecture for a setup with two input sensors. The input feature vectors f ik are transformed and then weighted and summed to generate the merged representation mk that is used for classification. The sensory attention mechanism dynamically adapts its attention weights to create a cleaner merged representation.

TABLE II: Results for the CHiME-4 multi-channel ASR experiments. The CER [%] is given for the et05_real and dt05_real subsets. The attention weights for STAN-2CH and STAN-5CH are averaged over all frames of the dt05_real subset. The lowest CER and highest attention weight are printed bold. All models are trained and tested on matched channel configurations, and the CONCAT, AVG and STAN-2CH models are additionally tested on new channel configurations without re-training.

Citations

•Journal Article•10.3389/FNINS.2020.00637

Hand-Gesture Recognition Based on EMG and Event-Based Camera Sensor Fusion: A Benchmark in Neuromorphic Computing

Enea Ceolini, +7 more

- 05 Aug 2020

- Frontiers in Neuroscience

TL;DR: This paper presents a fully neuromorphic sensor fusion approach for hand-gesture recognition comprised of an event-based vision sensor and three different neuromorphic processors, and designed specific spiking neural networks for sensor fusion that showed classification accuracy comparable to the software baseline.

...read moreread less

179

•Journal Article•10.1155/2022/4718684

Recognition of Hand Gesture Using Electromyography Signal: Human-Robot Interaction

S. L. Aarthy, +5 more

- 11 Jul 2022

- Journal of Sensors

TL;DR: A unique strategy for combining the advantages of depth vision learning with EMG-based hand gesture detection was developed, accomplished of automatically categorizing the class of the obtained EMG data using ensemble learning without considering the hand motion sequence.

...read moreread less

•Posted Content

Brain-inspired self-organization with cellular neuromorphic computing for multimodal unsupervised learning

Lyes Khacef, +2 more

- 11 Apr 2020

- arXiv: Neural and Evolutionary Computing

TL;DR: The Reentrant Self-Organizing Map (ReSOM) as discussed by the authors is a brain-inspired neural system based on the reentry theory using self-organizing maps and Hebbian-like learning.

...read moreread less

•Dissertation•10.3929/ETHZ-B-000393230

Parameter Uncertainty and Multi-sensor Attention Models for End-to-end Speech Recognition

Stefan Braun

- 01 Jan 2019

TL;DR: This thesis has the objective to advance the state-of-the art in end-to-end models for ASR, with a focus on improving noise robustness and model interpretability.

...read moreread less

10.1017/s1380203821000234

Ceci N’est Pas Un Subalterne. A Comment on Indigenous Erasure in Ontology-Related Archaeologies

Beatriz Marin-Aguilera

TL;DR: This essay critiques ontology-related archaeologies for perpetuating Western imperialism by extracting, depoliticizing, and re-signifying Indigenous knowledge, highlighting the need to decolonize archaeological scholarship and prioritize Indigenous perspectives and epistemologies.

...read moreread less

References

•Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

- 01 Jan 2015

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

138.5K

Journal Article•10.1162/NECO.1997.9.8.1735

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997

- Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

99K

•Posted Content

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

- 22 Dec 2014

- arXiv: Learning

TL;DR: In this article, the adaptive estimates of lower-order moments are used for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimate of lowerorder moments.

...read moreread less

82.5K

Journal Article•10.1109/5.726791

Gradient-based learning applied to document recognition

Yann LeCun, +6 more

- 01 Jan 1998

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.

...read moreread less

53.5K

Gradient-based learning applied to document recognition

Yann LeCun, +7 more

- 01 Jan 2001

TL;DR: This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task, and Convolutional neural networks are shown to outperform all other techniques.

...read moreread less

32.7K

...

Expand

Attention-driven Multi-sensor Selection

Chat with Paper

AI Agents for this Paper

Most frequently asked questions

1. What have the authors contributed in "Attention-driven multi-sensor selection" ?

2. How long does it take to generate the enhanced output from the five input channels?

3. What is the purpose of the random walk noise model?

4. What is the scoring function for each channel?

5. How many parameters are added to the AVG model?

6. What is the way to improve the design of a DNN?

7. How does the STAN model reduce the relative CER?

8. What is the noise level of the random walk process?

9. What are the properties of the STAN-2CH attention mechanism?

10. What is the CER improvement of the BEAMFORMIT-5CH model?

11. What is the difference between the two models?

12. What is the average attention weight assigned to channel 5?

13. What is the funding agreement for this work?

14. Why are the scoring functions not shared?

Figures

Citations

Hand-Gesture Recognition Based on EMG and Event-Based Camera Sensor Fusion: A Benchmark in Neuromorphic Computing

Recognition of Hand Gesture Using Electromyography Signal: Human-Robot Interaction

Brain-inspired self-organization with cellular neuromorphic computing for multimodal unsupervised learning

Parameter Uncertainty and Multi-sensor Attention Models for End-to-end Speech Recognition

Ceci N’est Pas Un Subalterne. A Comment on Indigenous Erasure in Ontology-Related Archaeologies

References

Adam: A Method for Stochastic Optimization

Long short-term memory

Adam: A Method for Stochastic Optimization

Gradient-based learning applied to document recognition

Gradient-based learning applied to document recognition

Related Papers (5)

Distributed Multi-task Learning for Sensor Network

Behavioral feature recognition of multi-task compressed sensing with fusion relevance in the Internet of Things environment

Across-Sensor Feature Learning for Energy-Efficient Activity Recognition on Mobile Devices

A Power-Performance Approach to Comparing Sensor Families, with application to comparing neuromorphic to traditional vision sensors

Wireless sensor networks for acoustic monitoring