기계학습 특론 (Advanced Machine Learning)
- 강의실: 공학 x관 xxx호
- Textbooks
- Richard S. Sutton and Andrew G. Barto, Reinforcement Learning: An Introduction, MIT Press, 1998 [link] [2nd_edition][link]
- Szepesvari, Algorithms for Reinforcement Learning, 2010 pdf
- I. Goodfellow, Y. Bengio, A. Courville, Deep learning, MIT Press, 2016 [book]
- Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern Approach (Third edition), Prentice Hall, 2010 [book]
- D. Bertsekas, Dynamic Programming and Optimal Control, Vol 1-2, Athena Scientific, 2012-2017[book]
- D. Bertsekas and J. Tsitsiklis, Nero-dynamic programming, Athena Scientific, 1996 [book]
- References
- TensorFlow [link]
- D. Bertsekas's Dynamic programming and optimal control [link]
- Lecture 0: Introduction and sigularity (R. Sutton) [pdf]
- Lecture 1: Introduction to reinforcment learning (D. Sliver) [pdf]
- Lecture 2: Bandit problems (R. Sutton) [pdf]
- Lecture 3: Markov decision processes [pdf]
- Defining intelligent sytsems (R. Sutton) [pdf]
- Examples of MDP (R. Sutton) [pdf]
- MDPs (R. Sutton) [pdf]
- Lecture 4: Planning by dynamic programming [pdf]
- DP (R. Sutton) [pdf]
- Proof of convergence for policy/value iteration [pdf]
- Lecture 5: Model-free prediction [pdf]
- MC (R. Sutton) [pdf]
- TD (R. Sutton) [pdf]
- Multistep Bootstrapping (R. Sutton) [pdf]
- On-policy prediction (R. Sutton) [pdf]
- Lecture 6: Model-free control [pdf]
- On-policy control (R. Sutton) [pdf]
- Lecture 7: Value function approximation [pdf]
- Gradient TD (R. Sutton) [pdf]
- M. Lagoudakis, R. Parr, Least-squares policy iteration, JMLR '03 [pdf]
- Lecture 8: Policy gradient [pdf]
- R. Williams, Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning, Machine Learning, '92 [pdf]
- R. Sutton, D. McAllester, S. Singh, Y. Mansour, Policy Gradient Methods for Reinforcement Learning with Function Approximation, NIPS '00 [pdf]
- J. Peters, S. Vijayakumar, S. Schaal, Natural actor-critic, ECML '05 [pdf]
- V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu. Asynchronous methods for deep reinforcement learning, ICML '16 [pdf]
- Lecture 9: Integrating learning and planning [pdf]
- A Survey of Monte Carlo Tree Search, 2012 [pdf]
- Monte Carlo Tree Search [pdf]
- S. Gelly, D. Silver, Monte-carlo tree search and rapid action value estimation in computer Go, 2011 [pdf]
- D. Silver, R. Sutton, M. Muller, Sample-Based Learning and Search with Permanent and Transient Memories, ICML '08 [pdf]
- Lecture 10: Exploration and exploitation [pdf]
- R. Weber, Multi-armed Bandits and the Gittins Index Theorem [pdf]
- D. J. Russo1, B. Van Roy, A. Kazerouni, I. Osband, and Z. Wen, A Tutorial on Thompson Sampling [pdf]
- P. Auer, N. Cesa-Bianchi P. Fischer, Finite-time Analysis of the Multiarmed Bandit Problem, 2002 [pdf]
- L. Kocsis, C. Szepesvari, Bandit based Monte-Carlo Planning, ECML [pdf]
- S. Bubeck and N. Cesa-Bianchi, Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, 2012 [pdf]
- Lecture 11: Classic Games [pdf]
- S. Gelly, Y. Wang, R. Munos, O. Teytaud, Modification of UCT with patterns in Monte-carlo Go, Technical report [pdf]
- Assignment 1: [pdf]
- Assignment 2: [pdf]
- Assignment 3: [pdf]