[ICLR 2022] Part 3: Reinforcement Learning as a sequence modeling problem

참고

[1] Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, and Igor Mordatch. Decision transformer: Reinforcement learning via sequence modeling. In Thirty-Fifth Conference on Neural Information Processing Systems, 2021.

[2] Hiroki Furuta, Yutaka Matsuo, Shixiang Shane Gu. Generalized Decision Transformer for Offline Hindsight Information Matching. In International Conference on Learning Representations, 2022.

[3] Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, and Wojciech Zaremba. Hindsight experience replay. In Advances in neural information processing systems, 2017.

[4] Vitchyr Pong, Shixiang Gu, Murtaza Dalal, and Sergey Levine. Temporal difference models: Modelfree deep rl for model-based control. International Conference on Learning Representations, 2018.

[5] Corey Lynch, Mohi Khansari, Ted Xiao, Vikash Kumar, Jonathan Tompson, Sergey Levine, and Pierre Sermanet. Learning latent plans from play. In Conference on Robot Learning, 2019.

[6] Marc G Bellemare, Will Dabney, and Remi Munos. A distributional perspective on reinforcement learning. In International Conference on Machine Learning, 2017.

[7] Lee, Kuang-Huei, Ofir Nachum, Mengjiao Yang, Lisa Lee, Daniel Freeman, Winnie Xu, Sergio Guadarrama et al. "Multi-Game Decision Transformers." arXiv preprint arXiv:2205.15241 (2022).

[8] Reed, Scott, Konrad Zolna, Emilio Parisotto, Sergio Gomez Colmenarejo, Alexander Novikov, Gabriel Barth-Maron, Mai Gimenez et al. "A generalist agent." arXiv preprint arXiv:2205.06175 (2022).

목록보기