Springe direkt zu Inhalt

19208111 Masterseminar Stochastics

Winter Term 2025/2026

Lecturer: Dr. Dave Jacobi, Dr. Guilherme de Lima Feltes


  • Time and place: Thursdays, 16--18h, SR 119, Arnimallee 3

Prerequisites: Stochastics I and II. Desirable: Stochastics III.
Target Group: BMS Students, Master students of Mathematics and advanced Bachelor students of Mathematics.

Contents: The seminar covers advanced topics of stochastics.

Reinforcement learning lies at the core of many state-of-the-art artificial intelligence algorithms, enabling agents to solve complex optimal control tasks in robotics, finance, physical AI, drug discovery, computer games and many other applications.
This seminar offers a rigorous treatment of reinforcement learning, focusing on the mathematical principles that make reinforcement learning algorithms work. We will develop a mathematically sound understanding of Markov decision processes, value function based methods and their connections to stochastic optimal control, policy gradient methods, emphasizing convergence properties of classical reinforcement learning algorithms through stochastic approximation and stochastic gradient descent. We will also explore continuous time reinforcement learning in the framework of stochastic differential equations.
The seminar aims to provide a rigorous foundational perspective for students interested in current research related to reinforcement learning and artificial intelligence. Participants should have a strong background in mathematics specifically in probability theory.

Talks

Date Subject Speaker
16.10. preliminary discussion Dave Jacobi
23.10. Basics of discrete stochastic optimal control Sichen Jiang
30.10. Principles of stochastic approximation Chijie Zhou
06.11. Stochastic gradient descent methods (smooth case) Wiktoria Krawczyk
13.11. Stochastic gradient descent methods (non-smooth case) Bolkar Eren
20.11. The stochastic fixed point theorem Lucie Knop
27.11. Policy gradient methods Gauri Kshetry
04.12. Value function based methods Dilara Kus
11.12. Actor critic methods Anna Yuan
18.12. Monte carlo tree search Jakob Zimmermann
2026    
08.01. Convergence of value function based methods Erva Yurtbas
15.01. Reinforcement learning in continuous time Romain Akinlami
22.01. Policy evaluation and TD-learning in continuous time N.N.
29.01. Policy gradient and actor critic in continuous time N.N.
05.02. Q-learning in continuous time N.N.
12.02. Gradient flow for regularized stochastic optimal control N.N.

Literature: will be announced in the preliminary discussion