A Risk-Averse Preview-Based Q-Learning Algorithm: Application to Highway Driving of Autonomous Vehicles

Question

1. How does the probabilistic predictor of traffic participants work?

2. What is the risk-averse cost function in risk-averse Q-learning?

3. What is the objective of the low-level LQR state feedback controller in the highway driving EQUATION scenario?

4. What is the approach for risk-averse planning in autonomous vehicles?

Accepted Answer

The probabilistic predictor of traffic participants (PPTP) learns the behavior of traffic participants based on traffic observations provided by sensors such as cameras and LiDAR. Approaches for PPTP include heuristics methods and deep learning-based methods for trajectory prediction. It predicts the probability of each MDP state being occupied by participants and its probability of being a failure state. However, it also identifies high-risk zones where safe mode cannot avoid crashing with high probability. The stochastic reachability module uses this information along with safety constraints to introduce probabilistic occupancy functions and characterize keep-out sets for probabilistic safety. This helps in assigning safety state categorization and corresponding risk distribution to each cell state. Previewing information is advantageous as it allows for imagining the reward function ahead of time and generating samples for the proposed riskaverse Q-learning planner.

Accepted Answer

The risk-averse cost function in risk-averse Q-learning is defined as J p k = E[exp(Rk + a * (s - sg)^2)], where a > 0 is the risk-averse factor, Rk is the reward function, s is the current state, sg is the terminal goal state, and p k is the policy. This cost function takes into account not only the expected value but also the variance of the cost, providing high-confidence safety assurance. The function is nondecreasing and convex, ensuring optimal planning while considering risk aversion. Lemma 1 and Lemma 2 provide proofs for the properties of the risk-averse cost function. The entropic-based risk-averse cost function is used to optimize the expected value of the cost function while accounting for the risk or variance of the performance, leading to safer and more reliable planning decisions.

Accepted Answer

The objective of the low-level LQR state feedback controller in the highway driving EQUATION scenario is to follow the planned trajectory T p (t) and stabilize the ego-vehicle's yaw, yaw rate, and side slip angle states. This is achieved by minimizing the performance index 0 e T Qe + d T Rd dt, where Q = diag(3, 1, 1, 1) and R = 1. The controller aims to control the lateral dynamics of the ego-vehicle by adjusting the steering angle d to ensure that the ego-vehicle follows the planned trajectory T p (t) and maintains stability in its yaw, yaw rate, and side slip angle states.

Accepted Answer

The approach involves representing multi-lane roads as finite-state MDPs, assessing risks, and using a convex program-based preview-based Q learning algorithm. This algorithm learns risk-averse optimal planning strategies by leveraging probabilistic reward values. The developed scheme is implemented in driving scenarios to demonstrate its efficiency. Future investigations using vehicle simulators like CARLA aim to explore the approach's knowledge transfer capabilities in more depth.

Accepted Answer

The monotonicity property for T in APPENDIX A requires showing that if Q1k(s,a) <= Q2k(s,a), then TQ1k(s,a) <= TQ2k(s,a) for any (s,a) in k x A. This property is demonstrated by considering Q1k(s,a) and Q2k(s,a) and using injective relations, inequalities, and Jensen's inequality to prove the desired result. The proof is completed by utilizing (37) and Assumption 1, leading to the conclusion that TQ2k(s,a) - TQ1k(s,a) <= sup EQUATION, which aligns with (21). This establishes the monotonicity property for T in the given context.

A Risk-Averse Preview-Based Q-Learning Algorithm: Application to Highway Driving of Autonomous Vehicles

Chat with Paper

AI Agents for this Paper

Most frequently asked questions

1. How does the probabilistic predictor of traffic participants work?

2. What is the risk-averse cost function in risk-averse Q-learning?

3. What is the objective of the low-level LQR state feedback controller in the highway driving EQUATION scenario?

4. What is the approach for risk-averse planning in autonomous vehicles?

5. What is the monotonicity property for T in APPENDIX A?

Citations

Improving policy training for autonomous driving through randomized ensembled double Q-learning with Transformer encoder feature evaluation

References

Sampling-based algorithms for optimal motion planning

Rapidly-exploring random trees : a new tool for path planning

Approximate Dynamic Programming

Coupled hidden Markov models for complex action recognition

Rapidly-Exploring Random Trees: Progress and Prospects : Steven M. LaValle, Iowa State University, A James J. Kuffner, Jr., University of Tokyo, Tokyo, Japan

Related Papers (5)

An SAT-Based Method to Multithreaded Program Verification for Mobile Crowdsourcing Networks

Reachability Analysis for spatial Deployment of Heterogenous Nodes in a WSN

Reachability Analysis for Specified Processes in a Behavior Description

A new reachability algorithm for symmetric multi-processor architecture

“TripBuddy” Travel Planner with Recommendation based on User‘s Browsing Behaviour