Deep Reinforcement Learning for Autonomous Driving: A Survey

doi:10.1109/TITS.2021.3054625

Open AccessJournal Article10.1109/TITS.2021.3054625

Deep Reinforcement Learning for Autonomous Driving: A Survey

B Ravi Kiran, +6 more

- 09 Feb 2021

- IEEE Transactions on Intelligent Transpo...

- pp 1-18

1.2K

TL;DR: This review summarises deep reinforcement learning algorithms, provides a taxonomy of automated driving tasks where (D)RL methods have been employed, highlights the key challenges algorithmically as well as in terms of deployment of real world autonomous driving agents, the role of simulators in training agents, and finally methods to evaluate, test and robustifying existing solutions in RL and imitation learning.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1371/JOURNAL.PONE.0252754

Optimizing hyperparameters of deep reinforcement learning for autonomous driving based on whale optimization algorithm.

Nesma M. Ashraf, +3 more

- 10 Jun 2021

- PLOS ONE

TL;DR: In this paper, a swarm-based optimization algorithm, namely the Whale Optimization Algorithm (WOA), was employed for optimizing the hyperparameters of the DDPG algorithm to achieve the optimum control strategy in an autonomous driving control problem.

...read moreread less

Journal Article•10.1016/j.ifacol.2023.10.1261

Tabular Q-learning Based Reinforcement Learning Agent for Autonomous Vehicle Drift Initiation and Stabilization

Szilárd Hunor Tóth, +2 more

- Advances in Control and Optimization of ...

TL;DR: This paper presents a Tabular Q-learning based reinforcement learning agent for autonomous vehicle drift initiation and stabilization, addressing learning instability issues with SAC, and successfully achieving and retaining a target drift state in a MATLAB/Simulink-based simulation environment.

...read moreread less

Journal Article•10.1016/j.cie.2025.111654

Solving car resequencing in automotive assembly shops based on multi-objective deep reinforcement learning

Yuzhe Huang, +5 more

- 02 Nov 2025

- Computers & Industrial Engineering

Journal Article•10.1109/icme59968.2025.11209801

Safety-constrained Reinforcement Learning with Interaction-aware for Decision-making of Autonomous Driving

Di Zhang, +3 more

- 30 Jun 2025

TL;DR: This paper proposes a novel RL framework for autonomous driving that incorporates a motion prediction model and safety constraints to enhance decision-making capability, demonstrating superior performance in success rate, completion time, safety, and data efficiency.

...read moreread less

Journal Article•10.1007/s10994-025-06887-x

ST-PPO: a spatio-temporal attention enhanced proximal policy optimization algorithm for autonomous driving in complex traffic scenarios

Cheng Da, +7 more

- 14 Oct 2025

- Machine Learning

...

Expand

References

•Journal Article•10.3156/JSOFT.29.5_177_2

Generative Adversarial Nets

Ian Goodfellow, +7 more

- 08 Dec 2014

TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.

...read moreread less

48.6K

•Book

Reinforcement Learning: An Introduction

Richard S. Sutton, +1 more

- 01 Jan 1988

TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.

...read moreread less

39.7K

•Posted Content

Proximal Policy Optimization Algorithms

John Schulman, +4 more

- 20 Jul 2017

- arXiv: Learning

TL;DR: A new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent, are proposed.

...read moreread less

18K