A Vision and Speech Enabled, Customizable, Virtual Assistant for Smart Environments

doi:10.1109/HSI.2018.8431232

Proceedings Article10.1109/HSI.2018.8431232

A Vision and Speech Enabled, Customizable, Virtual Assistant for Smart Environments

Giancarlo Iannizzotto, +3 more

- 04 Jul 2018

- pp 50-56

65

TL;DR: Some of the most advanced techniques in computer vision, deep learning, speech generation and recognition, and artificial intelligence are combined into a virtual assistant architecture for smart home automation systems, which is effective and resource-efficient, interactive and customizable.

Abstract: Recent developments in smart assistants and smart home automation are lately attracting the interest and curiosity of consumers and researchers. Speech enabled virtual assistants (often named smart speakers) offer a wide variety of network-oriented services and, in some cases, can connect to smart environments, thus enhancing them with new and effective user interfaces. However, such devices also reveal new needs and some weaknesses. In particular, they represent faceless and blind assistants, unable to show a face, and therefore an emotion, and unable to ‘see’ the user. As a consequence, the interaction is impaired and, in some cases, ineffective. Moreover, most of those devices heavily rely on cloud-based services, thus transmitting potentially sensitive data to remote servers. To overcome such issues, in this paper we combine some of the most advanced techniques in computer vision, deep learning, speech generation and recognition, and artificial intelligence, into a virtual assistant architecture for smart home automation systems. The proposed assistant is effective and resource-efficient, interactive and customizable, and the realized prototype runs on a low-cost, small-sized, Raspberry PI 3 device. For testing purposes, the system was integrated with an open source home automation environment and ran for several days, while people were encouraged to interact with it, and proved to be accurate, reliable and appealing.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.4108/airo.v1i.19

Deep Learning Application Pros And Cons Over Algorithm

Ata Jahangir Moshayedi, +3 more

- 20 Feb 2022

- EAI endorsed transactions on artificial ...

TL;DR: This review provides a general overview of a new concept and the growing benefits and popularity of deep learning, which can help researchers and students interested in deep learning methods.

...read moreread less

84

•Journal Article•10.33022/ijcs.v12i1.3146

Voice Assistant Integrated with Chat GPT

Abdulla Shafeeg, +4 more

- 28 Feb 2023

- Indonesian Journal of Computer Science

TL;DR: Farcana as discussed by the authors combines the functionality of the GPT chatbot and a voice assistant and offers players a new approach to familiarize themselves with game mechanics and general account management.

...read moreread less

53

Proceedings Article•10.1109/AIVR46125.2019.00013

Effects of Patient Care Assistant Embodiment and Computer Mediation on User Experience

Kangsoo Kim, +5 more

- 01 Dec 2019

TL;DR: The results show that, as expected, a real caregiver provides the optimal user experience but an embodied virtual assistant is also a viable option for patient care environments, providing significantly higher social presence and engagement than voice-only interaction.

...read moreread less

32

•Journal Article•10.3390/ELECTRONICS9061009

Priority-Based Bandwidth Management in Virtualized Software-Defined Networks

Luca Leonardi, +2 more

- 01 Jun 2020

- Electronics

TL;DR: This work presents the PrioSDN Resource Manager (PrioSDN_RM), a resource management mechanism based on admission control for virtualized SDN-based networks that exploits a priority-based runtime bandwidth distribution mechanism to dynamically react to load changes (e.g., due to alarms).

...read moreread less

27

•Journal Article•10.3390/app13031278

A Perspective on Ethernet in Automotive Communications—Current Status and Future Trends

Lucia Lo Bello, +2 more

- 18 Jan 2023

- Applied Sciences

TL;DR: In this article , the authors provide an overview of Ethernet-based in-car networking and discuss novel trends and future developments in automotive communications, as well as discuss the potential of Ethernet for automotive communications.

...read moreread less

26

...

Expand

References

•Proceedings Article•10.1109/CVPR.2016.90

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

- 27 Jun 2016

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

198.7K

•Posted Content

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

- 10 Dec 2015

- arXiv: Computer Vision and Pattern Recog...

TL;DR: This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.

...read moreread less

117.9K

Journal Article•10.1023/B:VISI.0000013087.49260.FB

Robust Real-Time Face Detection

Paul A. Viola, +1 more

- 01 May 2004

- International Journal of Computer Vision

TL;DR: In this paper, a face detection framework that is capable of processing images extremely rapidly while achieving high detection rates is described. But the detection performance is limited to 15 frames per second.

...read moreread less

14.6K

•Proceedings Article•10.1109/CVPR.2015.7298682

FaceNet: A Unified Embedding for Face Recognition and Clustering

Florian Schroff, +2 more

- 12 Mar 2015

- arXiv: Computer Vision and Pattern Recog...

TL;DR: FaceNet as discussed by the authors uses a deep convolutional network trained to directly optimize the embedding itself, rather than an intermediate bottleneck layer as in previous deep learning approaches, and achieves state-of-the-art face recognition performance using only 128 bytes per face.

...read moreread less

14.2K

Proceedings Article•10.1109/ICCV.2001.937709

Robust real-time face detection

Paul A. Viola, +1 more

- 07 Jul 2001

TL;DR: A new image representation called the “Integral Image” is introduced which allows the features used by the detector to be computed very quickly and a method for combining classifiers in a “cascade” which allows background regions of the image to be quickly discarded while spending more computation on promising face-like regions.

...read moreread less

13.8K