Skip to Content
Report an accessibility problem
Engineering  |  Dimitri Bertsekas

Reinforcement learning and optimal control

Dimitri P. Bertsekas


Lecture slides for a course in Reinforcement Learning and Optimal Control (January 8-February 21, 2019), at Arizona State University:

Slides-Lecture 1Slides-Lecture 2Slides-Lecture 3Slides-Lecture 4Slides-Lecture 5Slides-Lecture 6Slides-Lecture 7Slides-Lecture 8Slides-Lecture 9Slides-Lecture 10Slides-Lecture 11Slides-Lecture 12Slides-Lecture 13.

Videos of lectures from Reinforcement Learning and Optimal Control course at Arizona State University: (Click around the screen to see just the video, or just the slides, or both simultaneously).

Video-Lecture 1Video-Lecture 2Video-Lecture 3,Video-Lecture 4Video-Lecture 5Video-Lecture 6Video-Lecture 7Video-Lecture 8Video-Lecture 9Video-Lecture 10Video-Lecture 11Video-Lecture 12Video-Lecture 13.

Lecture 13 is an overview of the entire course.

Selected sections from book:

Reinforcement Learning and Optimal Control Book, Athena Scientific, July 2019

The book is available from the publishing company Athena Scientific, or from

Click here for an extended lecture/summary of the book: Ten Key Ideas for Reinforcement Learning and Optimal Control.

The purpose of the book is to consider large and challenging multistage decision problems, which can be solved in principle by dynamic programming and optimal control, but their exact solution is computationally intractable. We discuss solution methods that rely on approximations to produce suboptimal policies with adequate performance. These methods are collectively referred to as reinforcement learning, and also by alternative names such as approximate dynamic programming, and neuro-dynamic programming.

Our subject has benefited enormously from the interplay of ideas from optimal control and from artificial intelligence. One of the aims of this monograph is to explore the common boundary between these two fields and to form a bridge that is accessible by workers with background in either field.

The mathematical style of the book is somewhat different from the author’s dynamic programming books, and the neuro-dynamic programming monograph, written jointly with John Tsitsiklis. We rely more on intuitive explanations and less on proof-based insights. Still we provide a rigorous short account of the theory of finite and infinite horizon dynamic programming, and some basic approximation methods, in an appendix. For this we require a modest mathematical background: calculus, elementary probability, and a minimal use of matrix-vector algebra.

The methods of this book have been successful in practice, and often spectacularly so, as evidenced by recent amazing accomplishments in the games of chess and Go. However, across a wide range of problems, their performance properties may be less than solid. This is a reflection of the state of the art in the field: there are no methods that are guaranteed to work for all or even most problems, but there are enough methods to try on a given challenging problem with a reasonable chance that one or more of them will be successful in the end. Accordingly, we have aimed to present a broad range of methods that are based on sound principles, and to provide intuition into their properties, even when these properties do not include a solid performance guarantee. Hopefully, with enough exploration with some of these methods and their variations, the reader will be able to address adequately his/her own problem.

Dynamic Programming and Optimal Control, Vol. 1, 4th Edition

Dimitri P. Bertsekas

The fourth edition (February 2017) contains a substantial amount of new material, particularly on approximate DP in Chapter 6. This chapter was thoroughly reorganized and rewritten, to bring it in line, both with the contents of Vol. II, whose latest edition appeared in 2012, and with recent developments, which have propelled approximate DP to the forefront of attention.

Some of the highlights of the revision of Chapter 6 are an increased emphasis on one-step and multistep lookahead methods, parametric approximation architectures, neural networks, rollout, and Monte Carlo tree search. Among other applications, these methods have been instrumental in the recent spectacular success of computer Go programs. The material on approximate DP also provides an introduction and some perspective for the more analytically oriented treatment of Vol. II.

Click here for direct ordering from the publisher and preface, table of contents, supplementary educational material, lecture slides, videos, etc

Dynamic Programming and Optimal Control, Vol. I, ISBN-13: 978-1-886529-43-4, 576 pp., hardcover, 2017