기계학습 특론 (Advanced Machine Learning)

강의실: 공학 x관 xxx호
Textbooks
- Richard S. Sutton and Andrew G. Barto, Reinforcement Learning: An Introduction, MIT Press, 1998 [link] [2nd_edition][link]
- Szepesvari, Algorithms for Reinforcement Learning, 2010 pdf
- I. Goodfellow, Y. Bengio, A. Courville, Deep learning, MIT Press, 2016 [book]
- Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern Approach (Third edition), Prentice Hall, 2010 [book]
- D. Bertsekas, Dynamic Programming and Optimal Control, Vol 1-2, Athena Scientific, 2012-2017[book]
- D. Bertsekas and J. Tsitsiklis, Nero-dynamic programming, Athena Scientific, 1996 [book]
References
- TensorFlow [link]
- D. Bertsekas's Dynamic programming and optimal control [link]

Lecture 0: Introduction and sigularity (R. Sutton) [pdf]
Lecture 1: Introduction to reinforcment learning (D. Sliver) [pdf]
Lecture 2: Bandit problems (R. Sutton) [pdf]
Lecture 3: Markov decision processes [pdf]
- Defining intelligent sytsems (R. Sutton) [pdf]
- Examples of MDP (R. Sutton) [pdf]
- MDPs (R. Sutton) [pdf]
Lecture 4: Planning by dynamic programming [pdf]

R. Williams, Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning, Machine Learning, '92 [pdf]
R. Sutton, D. McAllester, S. Singh, Y. Mansour, Policy Gradient Methods for Reinforcement Learning with Function Approximation, NIPS '00 [pdf]
J. Peters, S. Vijayakumar, S. Schaal, Natural actor-critic, ECML '05 [pdf]
V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu. Asynchronous methods for deep reinforcement learning, ICML '16 [pdf]

A Survey of Monte Carlo Tree Search, 2012 [pdf]
Monte Carlo Tree Search [pdf]
S. Gelly, D. Silver, Monte-carlo tree search and rapid action value estimation in computer Go, 2011 [pdf]
D. Silver, R. Sutton, M. Muller, Sample-Based Learning and Search with Permanent and Transient Memories, ICML '08 [pdf]

R. Weber, Multi-armed Bandits and the Gittins Index Theorem [pdf]
D. J. Russo1, B. Van Roy, A. Kazerouni, I. Osband, and Z. Wen, A Tutorial on Thompson Sampling [pdf]
P. Auer, N. Cesa-Bianchi P. Fischer, Finite-time Analysis of the Multiarmed Bandit Problem, 2002 [pdf]
L. Kocsis, C. Szepesvari, Bandit based Monte-Carlo Planning, ECML [pdf]
S. Bubeck and N. Cesa-Bianchi, Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, 2012 [pdf]

S. Gelly, Y. Wang, R. Munos, O. Teytaud, Modification of UCT with patterns in Monte-carlo Go, Technical report [pdf]