Springe direkt zu Inhalt

# Mathematics of machine learning: reinforcement learning

## Schedule, Summer 2019

Seminar:

Thursday 10.00-12.00, Seminarraum SR 007/008
PI-Gebäude - Arnimallee 6 - 14195 Berlin
From 5.06. to 26.06. additionally Wednesday 10.00-12.00, Seminarraum 024
Arnimallee 3 - 14195 Berlin

Language: English

## Description

Machine learning deals with searching for and generating patterns in data. Although it is traditionally considered a branch of computer science, it heavily relies on mathematical foundations. Thus, it is the primary goal of our seminar to understand these mathematical foundations. In doing so, we will put emphasis on the probabilistic viewpoint. In this semester, we will focus on techniques that allow one to find optimal strategies with the help of reinforcement learning. The students are very much encouraged to implement the methods that they will learn.

The acquaintance with basics of probability theory is required.

The language of the seminar is English. The grades will be based upon presentations and active participation.

## Topics (list)

1. The reinforcement learning problem: environment, agent, reward, return, Markov property, MDP, state-value V function, action Q function, policy, Bellman equation, examples (Chap. 3)

2. Dynamic programming: iterative policy evaluation, proof of convergence, policy improvement theorem, policy iteration, value iteration (Chapter 4)

3. Monte Carlo methods: policy evaluation, estimation of action values, Monte Carlo control, on- and off- policies, evaluating one policy while following another (Chapter 5).

4. Temporal-difference (TD) learning: TD vs. Monte Carlo, convergence of TD (Bibl. Remarks 6.1-2), sarsa on-policy control, Q-learning off-policy control and a proof of its convergence (Bibl. Remarks 6.5, http://users.isr.ist.utl.pt/~mtjspaan/readingGroup/ProofQlearning.pdf), actor-critic methods (Chapter 6).

5. Eligibility traces: n-step TD methods, forward and backward views of TD(λ) and their equivalence (Chapter 7)

6. Generalization and function approximation: approximation of the V function, gradient descent, convergence for linear approximation (Tsitsiklis and Van Roy, 1997a), control with function approximation, Baird’s counterexample, Tsitsiklis and Van Roy’s counterexample (Chapter 8).

7. Trust Region and Proximal Policy methods (https://arxiv.org/pdf/1502.05477.pdf, https://arxiv.org/pdf/1707.06347.pdf)

8.  DQN and Double Q-learning: (https://arxiv.org/pdf/1312.5602.pdf, https://www.nature.com/articles/nature14236.pdf and https://arxiv.org/pdf/1509.06461.pdf).

9.  Prioritized Experience Replay (https://arxiv.org/pdf/1511.05952.pdf)

## Literature

The chapters indicated in the topics are from the following monograph:

1. R. Sutton and A. Barto. Reinforcement Learning: an introduction. The MIT Press, 1998. New Version: http://incompleteideas.net/book/RLbook2018.pdf