Springe direkt zu Inhalt

Mathematics of machine learning: sampling methods and applications to Bayesian neural networks

PD Dr. Pavel Gurevich, Dr. Hannes Stuke

Schedule, Winter 2018/19


Thursday 10.00-12.00, Arnimallee 6, SR 025/026

Language: English


Machine learning deals with searching for and generating patterns in data. Although it is traditionally considered a branch of computer science, it heavily relies on mathematical foundations. Thus, it is the primary goal of our seminar to understand these mathematical foundations. In doing so, we will put emphasis on the probabilistic viewpoint. In this semester, we will focus on techniques that allow one to approximate complex probability distributions by means of sampling. We will also see how some of these techniques are used to approximate posterior distributions in Bayesian neural networks and variational autoencoders. The students are very much encouraged to implement the methods that they will learn.


The acquaintance with basics of probability theory [Bishop, 2006; Chap. 2] is required and the knowledge of Bayesian neural networks [Bishop, 2006, Sec. 5.7] is recommended. 


The language of the seminar is English. The grades are based upon presentations and active participation.


The references in the list of topics are given to the book [Bishop, 2006] (by default) and to the papers from the list below.

  1. Introduction. Exercise 11.1. Standard distributions. Exercises 11.2-11.5 (Sec. 11.0, 11.1.1, [Goodfellow, Sec. 17.1])
  2. Rejection sampling, Exercise 11.6. Adaptive rejection sampling (11.1.2, 11.1.3, [Gilks and Wild, 1992], [Gilks 1992])
  3. Importance sampling. Sampling-importance-resampling (11.1.4, 11.1.5, [Goodfellow, Sec. 17.2])
  4. Sampling and the EM Algorithm (9.3.0, 9.4, 11.1.6)
  5. Reparametrization trick and normalizing flows ([Rezende, Mohamed, Wierstra, 2014; Appendix D], [Rezende, Mohamed, 2016; Sec. 1-4], [Kingma et. al., 2017])
  6. Markov chains, Exercise 11.10. The Metropolis-Hastings algorithm (11.2.0, 11.2.1, 11.2.2, [Neal, 1993], [Hastings, 1970], [Goodfellow, Sec. 17.3]) (Metropolis-Hastings implementation in Jupyter notebook)
  7. Gibbs sampling (11.3, [Goodfellow, Sec. 17.4, 17.5])
  8. Restricted Boltzmann Machines ([Fischer, Igel, 2012], [Goodfellow, Sec. 16.7])
  9. Spike and slab restricted Boltzmann machine ([Courville et. al. 2011])
  10. Hybrid Monte Carlo algorithm (11.5, [Neal, 2012]) (code of Moritz Röhrich)
  11. Hamilton-Langevin Monte Carlo algorithm ([Chen et. Al. 2014])
  12. Stein variational gradient descent ([Liu, Wang, 2016])


  • [1] C. M. Bishop, Pattern recognition and machine learning, 2006
  • [2] T. Chen, E. B. Fox, C. Guestrin, Stochastic Gradient Hamiltonian Monte Carlo, 2014, (pdf)
  • [3] A. Courville, J. Bergstra, Y. Bengio, A Spike and Slab Restricted Boltzmann Machine, Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS) 2011, Fort Lauderdale, FL, USA, Volume 15 of JMLR: W&CP 15, 2011 (pdf)
  • [4] A. Fischer, C. Igel, An Introduction to Restricted Boltzmann Machines, In L. Alvarez, M. Mejail, L. Gomez, J. Jacobo (eds), Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, CIARP 2012, Lecture Notes in Computer Science, Vol. 7441, Springer, Berlin, Heidelberg, 2012, (pdf)
  • [5] W. R. Gilks and P. Wild, Adaptive rejection sampling for Gibbs sampling, Applied Statistics 41, 1992, pp. 337–348
  • [6] W. R. Gilks, Derivative-free adaptive rejection sampling for Gibbs sampling, In J. Bernardo, J. Berger, A. P. Dawid, and A. F. M. Smith (Eds.), Bayesian Statistics, Vol. 4, Oxford University Press, 1992
  • [7] I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, 2016
  • [8] W. K. Hastings, Monte Carlo sampling methods using Markov chains and their applications, Biometrika 57, 1970, pp. 97–109
  • [9] D. P. Kingma et. al, Improved Variational Inference with Inverse Autoregressive Flow, 29th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain, 2017, (pdf)
  • [10] Q. Liu, D. Wang, Stein variational gradient descent: a general purpose Bayesian inference algorithm, 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain, 2016
  • [11] R. M. Neal, Probabilistic inference using Markov chain Monte Carlo methods, Technical Report CRG-TR-93-1, Department of Computer Science, University of Toronto, Canada, 1993
  • [12] R. M. Neal, Slice sampling, Annals of Statistics 31, 2003, pp. 705–767
  • [13] R. M. Neal, MCMC Using Hamiltonian Dynamics, Chapter 5 in “Handbook of Markov Chain Monte Carlo”, 2012, (pdf)
  • [14] D. J. Rezende, S. Mohamed, D. Wierstra, Stochastic Backpropagation and Approximate Inference in Deep Generative Models, 2014, (pdf)
  • [15] D. J. Rezende, S. Mohamed, Variational Inference with Normalizing Flows, 2016, (pdf)

See also at the homepage of Machine Learning.