1. How can insights from neuroscientific theories of perception and action be leveraged to develop embodied AI models?
Insights from neuroscientific theories of perception and action can be leveraged to develop embodied AI models by integrating two theories from systems neuroscience to create a combined perception-action model for intrinsically driven active sensing. This approach is based on the theory of predictive coding, where the brain maintains a generative model of the world to predict sensory input and minimize prediction error. The action component is based on the proposition that the brain minimizes uncertainty of inferred latent states during exploratory behavior. By using a deep generative model based on predictive coding, the model can optimize a Monte Carlo approximation to the information gain objective in a fully differentiable manner without assuming explicit knowledge of the true generative model of the environment. This approach leads to a highly efficient exploration strategy and can be applied to any exploration setting in a task-independent manner without the need for extrinsic reward signals. The model has been evaluated on sensorimotor tasks such as maze navigation and active vision, demonstrating its ability to learn underlying transition distributions and spatial relationships between pixels of a given image. The modular structure of the model facilitates interpretability, allowing for insights into the possible neural computations utilized in biological systems. Overall, integrating neuroscientific theories of perception and action into embodied AI models holds promise for developing more efficient and task-independent exploration strategies.
read more
2. What is the connection between VAEs and predictive coding?
The connection between VAEs and predictive coding lies in their shared goal of maximizing the evidence lower bound (ELBO). VAEs use neural networks to parameterize distributions and optimize the ELBO objective, which is similar to the goal of predictive coding. Predictive coding aims to minimize prediction errors by inferring hidden states from observations. Both approaches involve variational inference and amortized learning to achieve efficient inference and learning in generative models. By understanding the connection between VAEs and predictive coding, researchers can leverage these techniques to enhance active exploration in various domains.
read more
3. What is a Controllable Markov Chain (CMC) in discrete state and action spaces?
A Controllable Markov Chain (CMC) is a Markov decision process (MDP) without a specified reward function. It is defined as a 3-tuple (S, A, P), where S represents a set of finite states, A represents a finite set of allowable actions, and P is a 3-dimensional kernel of transition probabilities. The transition probabilities are denoted as P s,a,s' = p(s' | s, a), where s is the current state, a is the action taken, and s' is the resulting next state. The goal of an agent in this setting is to efficiently explore the environment and learn an estimate of the underlying transition probability matrix P. In the provided example, a maze environment with N = n^2 states and 4 actions (up, down, right, left) is used. Each action produces a noisy translation, with more bias towards the cardinal direction associated with that action. Transitions that do not correspond to a one-step translation are assigned a probability of zero. The mazes are randomly generated, and the probability distributions in P are drawn from a Dirichlet distribution with concentration parameters a = 0.25 for states with non-zero probability.
read more
4. How does the model select actions for active vision tasks?
The model selects actions for active vision tasks by minimizing uncertainty. It uses a greedy approach with a simple heuristic that guides the model towards states with greater uncertainty. The uncertainty reduction score is calculated using Equation 8, which represents the expected reduction in uncertainty for action a over a single step and a single transition distribution. The final score maximized by the agent is the sum of the uncertainty reduction in 8 and expected future uncertainty in 9. The value function for a given action quantifies how much information the agent expects to gain as a result of observing the input image at a specific location. To make the model end-to-end differentiable, an action network is used to select fixation locations that maximize the value function. This network is trained with gradient descent and outputs actions with high informational value. The action network is a two-layer feedforward network that receives the current estimate of the state and outputs the mean of a Gaussian distribution over fixation locations. The standard deviation of this distribution is a fixed hyperparameter. The agent chooses a fixation location by sampling from the output distribution of the action network. Algorithm 1 describes the differentiable approach for selecting continuous actions with uncertainty reduction in active vision tasks.
read more