TL;DR: By treating each detector as an agent, this work presents the first collaborative multi-agent deep reinforcement learning algorithm to learn the optimal policy for joint active object localization, which effectively exploits such beneficial contextual information.
Abstract: We examine the problem of joint top-down active search of multiple objects under interaction, e.g., person riding a bicycle, cups held by the table, etc. Such objects under interaction often can provide contextual cues to each other to facilitate more efficient search. By treating each detector as an agent, we present the first collaborative multi-agent deep reinforcement learning algorithm to learn the optimal policy for joint active object localization, which effectively exploits such beneficial contextual information. We learn inter-agent communication through cross connections with gates between the Q-networks, which is facilitated by a novel multi-agent deep Q-learning algorithm with joint exploitation sampling. We verify our proposed method on multiple object detection benchmarks. Not only does our model help to improve the performance of state-of-the-art active localization models, it also reveals interesting co-detection patterns that are intuitively interpretable.
TL;DR: In this article, an end-to-end solution via deep reinforcement learning is proposed, where a ConvNet-LSTM function approximator is adopted for the direct frame-toaction prediction.
Abstract: We study active object tracking, where a tracker takes as input the visual observation (ie, frame sequence) and produces the camera control signal (eg, move forward, turn left, etc) Conventional methods tackle the tracking and the camera control separately, which is challenging to tune jointly It also incurs many human efforts for labeling and many expensive trial-and-errors in realworld To address these issues, we propose, in this paper, an end-to-end solution via deep reinforcement learning, where a ConvNet-LSTM function approximator is adopted for the direct frame-toaction prediction We further propose an environment augmentation technique and a customized reward function, which are crucial for a successful training The tracker trained in simulators (ViZDoom, Unreal Engine) shows good generalization in the case of unseen object moving path, unseen object appearance, unseen background, and distracting object It can restore tracking when occasionally losing the target With the experiments over the VOT dataset, we also find that the tracking ability, obtained solely from simulators, can potentially transfer to real-world scenarios
TL;DR: In this article, a generative model of object similarities based on the Dirichlet distribution is proposed and embedded in the network for encoding the state of the system, and the training is carried out by simultaneously minimizing the label and action prediction errors using gradient descent.
TL;DR: In this article, a prior situation knowledge is captured by a set of flexible, kernel-based density estimations that represent the expected spatial structure of the given situation, allowing the system to use context as it is discovered to narrow the search.
Abstract: A major goal of computer vision is to enable computers to interpret visual situations — abstract concepts (e.g., “a person walking a dog,” “a crowd waiting for a bus,” “a picnic”) whose image instantiations are linked more by their common spatial and semantic structure than by low-level visual similarity. In this paper, we propose a novel method for prior learning and active object localization for this kind of knowledge-driven search in static images. In our system, prior situation knowledge is captured by a set of flexible, kernel-based density estimations — a situation model — that represent the expected spatial structure of the given situation. These estimations are updated by information gained as the system searches for relevant objects, allowing the system to use context as it is discovered to narrow the search. More specifically, at any given time in a run on a test image, our system uses image features plus contextual information it has discovered to identify a small subset of training images — an importance cluster — that is deemed most similar to the given test image, given the context. This subset is used to generate an updated situation model in an on-line fashion, using an efficient multipole expansion technique. As a proof of concept, we apply our algorithm to a highly varied and challenging dataset consisting of instances of a “dogwalking” situation. Our results support the hypothesis that dynamically-rendered, context-based probabilistic models can support efficient object localization in visual situations. Moreover, our approach is general enough to be applied to diverse machine learning paradigms requiring interpretable, probabilistic representations generated from partially observed data.
TL;DR: This paper presents a complete framework around the multi-active object programming model through ProActive, the Java library that offers multi- active objects, and through MultiASP, the programming language that allows the formalisation of developments.
Abstract: In order to tackle the development of concurrent and distributed systems, the
active object programming model provides a high-level abstraction to program
concurrent behaviours. There exists already a variety of active object
frameworks targeted at a large range of application domains: modelling,
verification, efficient execution. However, among these frameworks, very few
consider a multi-threaded execution of active objects. Introducing controlled
parallelism within active objects enables overcoming some of their limitations.
In this paper, we present a complete framework around the multi-active object
programming model. We present it through ProActive, the Java library that
offers multi-active objects, and through MultiASP, the programming language
that allows the formalisation of our developments. We then show how to compile
an active object language with cooperative multi-threading into multi-active
objects. This paper also presents different use cases and the development
support to illustrate the practical usability of our language. Formalisation of
our work provides the programmer with guarantees on the behaviour of the
multi-active object programming model and of the compiler.
TL;DR: This paper introduces the notion of a “Future-based Data Stream” to extend the ABS and illustrates the application by means of a case study in the domain of social networks simulation.
Abstract: Many modern distributed software applications require a continuous interaction between their components exploiting streaming data from the server to the client The Abstract Behavioral Specification (ABS) language has been developed for the modeling and analysis of distributed systems In ABS, concurrent objects communicate by calling each other’s methods asynchronously Return values are communicated asynchronously too via the return statement and so-called futures In this paper, we extend the basic ABS model of asynchronous method invocation and return in order to support the streaming of data We introduce the notion of a “Future-based Data Stream” to extend the ABS The application of this notion and its impact on performance are illustrated by means of a case study in the domain of social networks simulation
TL;DR: The first tactile transfer learning algorithm is developed to enable robotic systems to autonomously exploit their prior tactile knowledge while learning about a new set of objects, and a newly developed tactile-based method to determine the center of mass of rigid objects is introduced.
Abstract: Sense of touch plays an important role in our daily lives from grasping and manipulating to identifying and interacting with objects. For robotic systems that interact with dynamic environments, it is crucial to recognize objects via their physical properties. However, this is difficult to achieve even with advanced vision techniques due to occlusion and poor lighting conditions. Tactile sensing instead, can simultaneously provide rich and direct feedback to the robotic systems. The performance of the previously proposed tactile object discrimination strategies is dependent on the tactile feature extraction and learning methods designed for particular experimental setup.
Here, we propose novel tactile descriptors which are robust regardless of the number of tactile sensors used in robotic systems, types and techniques of tactile sensors, and structure of objects’ surfaces. Previous researchers have used various robotic systems and tactile sensors to passively learn about objects and identify them from each other by utilizing uniformly collected training samples in an offline manner. However, the informativeness of the data varies. Some objects have distinctive tactile properties, which makes them easy to be discriminated. Therefore, collecting many training samples can be redundant.
Contrary to the previous studies, for the first time, we propose a complete probabilistic tactile-based framework consists of an active pre-touch workspace exploration strategy and active tactile object learning method. The robots with the active pre-touch efficiently explore the unknown workspace to estimate the number of objects, their location, and their orientation. With the benefit of the active touch learning algorithm, the robotic systems efficiently learn about objects based on their physical properties with the lowest possible number of samples.
Furthermore, we propose a full-fledged touch-based probabilistic framework consisting of an active workspace exploration, active object discrimination, and active target object search. Taking advantage of our previously proposed active tactile object learning and our new active object discrimination algorithm, the robotic system efficiently distinguishes between objects and searches for specified target objects by strategically selecting the optimal exploratory actions.
We also introduce a newly developed tactile-based method to determine the center of mass of rigid objects. Although several strategies have been proposed to robotic systems to learn about objects, robots are still unable to re-use their prior tactile experience while learning new objects. In order to tackle this problem, we developed the first tactile transfer learning algorithm to enable robotic systems to autonomously exploit their prior tactile knowledge while learning about a new set of objects.
Moreover, we improved this strategy by combining the previously proposed active learning, active object discrimination method, and the tactile transfer learning. This new algorithm named as active tactile transfer learning is to further reduce training samples by strategically selecting and exploiting relevant prior tactile knowledge. We also introduce novel tactile- based strategies for detecting slips and regulating grasping forces to enable robots to safely manipulate deformable objects with the dynamic center of mass. Our proposed method does not require any prior knowledge of the contact surface and friction coefficient.
Recent advances in tactile sensing have opened up new pathways for humanoids to more accurately communicate with humans. Through tactile interaction, various touch modalities may be carried out; a robot may be patted, slapped, punched, or tickled. For any robotic system that is to work closely with humans, evaluation and classification of these touch modalities are vital. Taking advantage of our proposed tactile descriptors, we present a novel approach for touch modality identification during the tactile human-robot interaction.
TL;DR: This work studies an active object model with no explicit future type and wait-by-necessity synchronisations, a lightweight technique that synchronises invocations when the corresponding values are strictly needed.
Abstract: The active object concept is a powerful computational model for defining distributed and concurrent systems. This model has recently gained prominence, largely thanks to its simplicity and its abstraction level. In this work we study an active object model with no explicit future type and wait-by-necessity synchronisations, a lightweight technique that synchronises invocations when the corresponding values are strictly needed. Although high concurrency combined with a high level of transparency leads to good performances, they also make the system more prone to problems such as deadlocks. This is the reason that led us to study deadlock analysis in this active objects model.The development of our deadlock analysis is divided in two main works. In the first work we focus on the implicit synchronisation on the availability of some value. This way we are able to analyse the data-flow synchronisation inherent to languages that feature wait-by-necessity. In the second work we present a static analysis technique based on effects and behavioural types for deriving synchronisation patterns of stateful active objects and verifying the absence of deadlocks in this context. Our effect system traces the access to object fields, thus allowing us to compute behavioural types that express synchronisation patterns in a precise way. As a consequence we can automatically verify the absence of deadlocks in active object based programs with wait-by-necessity synchronisations and stateful active objects.
TL;DR: A general mathematical framework to analyze and classify interactions, defined as dynamic motions performed by an active object onto a passive one, is described, factorize interactions via motion features computed in the spatio-temporal domain, and encoded into a global, object-centric signature.
Abstract: Many real-world tasks for autonomous agents benefit from understanding dynamic inter-object interactions. Detecting, analyzing and differentiating between the various ways that an object can be interacted with provides implicit information about its function. This can help train autonomous agents to handle objects and understand unknown scenes. We describe a general mathematical framework to analyze and classify interactions, defined as dynamic motions performed by an active object onto a passive one. We factorize interactions via motion features computed in the spatio-temporal domain, and encoded into a global, object-centric signature. Equipped with a similarity measure to compare such signatures, we showcase classification of interactions with a single object. We also propose a novel acquisition setup combining RGBD sensing with a virtual reality (VR) display, to capture interactions with purely virtual objects.
TL;DR: In this paper, a collaborative multi-agent deep reinforcement learning algorithm is proposed to learn the optimal policy for joint active object localization, which effectively exploits beneficial contextual information, such as person riding a bicycle, cups held by the table, etc.
Abstract: We examine the problem of joint top-down active search of multiple objects under interaction, e.g., person riding a bicycle, cups held by the table, etc.. Such objects under interaction often can provide contextual cues to each other to facilitate more efficient search. By treating each detector as an agent, we present the first collaborative multi-agent deep reinforcement learning algorithm to learn the optimal policy for joint active object localization, which effectively exploits such beneficial contextual information. We learn inter-agent communication through cross connections with gates between the Q-networks, which is facilitated by a novel multi-agent deep Q-learning algorithm with joint exploitation sampling. We verify our proposed method on multiple object detection benchmarks. Not only does our model help to improve the performance of state-of-the-art active localization models, it also reveals interesting co-detection patterns that are intuitively interpretable.
TL;DR: This work presents an online learning method that creates visual object models through active object exploration that enables a robot to use manipulations of an object to learn autonomously visual features from several points of view.
Abstract: As robots are increasingly acting in real-world environments, learning and recognition of objects is a problem. Existing methods for learning visual object models use offline techniques to generate high-quality models or online techniques to dynamically expand the object model library. We present an online learning method that creates visual object models through active object exploration. Our approach enables a robot to use manipulations of an object to learn autonomously visual features from several points of view. The ability to segment background, robot parts and the object in the visual space allows to filter irrelevant feature points. This improves the quality of the object model while decreasing its size. Finally, a human-robot interaction enables a human collaborator to improve the object model. The method is evaluated on a Pepper robot, showing the improvement in performance and accuracy with respect to interactive learning.
TL;DR: A model-driven active object recognition and pose estimation system via exploiting the feature association probability under scale and viewpoint variations is proposed, which can predict future information more accurately thus laying the foundation of a successful active Next-Best-View planning system.
Abstract: Object recognition and localisation are indispensable competency for service robots in everyday environments like offices and kitchens. Presence of similar objects that can only be differentiated from a small part of the surface together with clutter that leads to occlusions make it impossible to detect target objects accurately and reliably from a single observation. When the sensor observing the environment is mounted on a mobile platform, object detection and pose estimation can be facilitated by observing the environment from a series of different viewpoints. Computing Active perception strategies, with the aim of finding optimal actions to enhance object recognition and pose estimation performance is the focus of this thesis.
This thesis consists of two main parts:
In the first part, it focuses on object detection and pose estimation from a single frame of observation. Using an RGB-D sensor, we propose a modular 3D textured object detection and pose estimation framework which can recognise object under cluttered environment by taking advantage of the geometric information provided from the sensor. To handle less-textured objects and objects under severe illumination conditions, we propose a novel RGB-D feature which is robust to illumination, scale, rotation and viewpoint variations, and provides reliable feature matching results under challenging conditions. The proposed feature is validated for multiple applications including object detection and point cloud alignment. Parts of the above approaches are integrated with existing work to produce a practical and effective perception module for a warehouse automation task. The designed perception system can detect objects of different types and estimate their poses robustly thus guaranteeing a reliable object grasping and manipulation performances.
In the second part of the thesis, we investigate the problem of active object detection and pose estimation from two perspectives: with and without considering the uncertainties in the motion model and the observation model. First, we propose a model-driven active object recognition and pose estimation system via exploiting the feature association probability under scale and viewpoint variations. By explicitly modelling the feature association, the proposed system can predict future information more accurately thus laying the foundation of a successful active Next-Best-View planning system even with a naive greedy search technique. We also present a probabilistic framework which handles motion and observation uncertainties in the active object detection and pose estimation problem. We present an optimisation framework which computes the optimal control at each step, using an objective function which incorporates uncertainties in state estimation, feature coverage for better recognition confidence and control consumption. The proposed framework can handle various issues such as object initialisation, collision avoidance, occlusion and changing the object hypothesis. Validations based on a simulation environment are…
TL;DR: A complete probabilistic tactile-based framework to enable robots to autonomously explore unknown workspaces and recognize objects based on their physical properties and takes advantage of the prior knowledge obtained during the learning process.
Abstract: In this letter, we propose a complete probabilistic tactile-based framework to enable robots to autonomously explore unknown workspaces and recognize objects based on their physical properties. Our framework consists of three components: 1) an active pretouch strategy to efficiently explore unknown workspaces; 2) an active touch learning method to learn about unknown objects based on their physical properties (surface texture, stiffness, and thermal conductivity) with the least number of training samples; and 3) an active touch algorithm for object discrimination, which selects the most informative exploratory action to apply to the object, so that the robot can efficiently distinguish between objects with a few number of actions. Our proposed framework was experimentally evaluated using a robotic arm equipped with multimodal artificial skin. The robot with the active pretouch method reduced the uncertainty of the workspace up to 30% and 70% compared to uniform and random strategies, respectively. By means of the active touch learning algorithm, the robot used 50% fewer samples to achieve the same learning accuracy than the baseline methods. By taking advantage of the prior knowledge obtained during the learning process, the robot actively discriminated objects with an improvement of 10% recognition accuracy compare to the random action selection approach.
TL;DR: A requirement model and system model of the CosRDL is proposed for analyzing event-driven control systems and the requirement model is built by directed acyclic graph (DAG) to describe the system behavior.
Abstract: This paper develops a requirement modeling language called CosRDL for modeling and analyzing of the time series embedded control systems. The system consists of a series of parallel active task that is composed of the functions for different control behaviors, which is largely applied in the development of embedded control systems. CosRDL can specify the features event-driven behaviors, and each event in CosRDL can contain such as input/output and communication of active objects. The active object is a concept model that expresses a computing or control processing. It contains a set of operation, an event queue and task priority. A requirement model and system model of the CosRDL is proposed for analyzing event-driven control systems. The requirement model of CosRDL is built by directed acyclic graph (DAG) to describe the system behavior. The system model of CosRDL is built by uFusion framework. The uFusion was designed to implement the event-driven system architecture. Meanwhile, a case study is presented to illustrate our approach to requirement modeling in the development of event-driven control systems.