Literatur |
Chen, L., Lu, K., Rajeswaran, A., Lee, K., Grover, A., Laskin, M., Abbeel, P., Srinivas, A., Mordatch, I., 2021. Decision Transformer: Reinforcement Learning via Sequence Modeling. https://doi.org/10.48550/arXiv.2106.01345 Dabney, W., Ostrovski, G., Silver, D., Munos, R., 2018. Implicit Quantile Networks for Distributional Reinforcement Learning. https://doi.org/10.48550/arXiv.1806.06923 Fu, J., Luo, K., Levine, S., 2018. Learning Robust Rewards with Adversarial Inverse Reinforcement Learning. https://doi.org/10.48550/arXiv.1710.11248 Haarnoja, T., Zhou, A., Abbeel, P., Levine, S., 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor, in: International Conference on Machine Learning. pp. 1861–1870. Hessel, M., Modayil, J., Van Hasselt, H., Schaul, T., Ostrovski, G., Dabney, W., Horgan, D., Piot, B., Azar, M., Silver, D., 2018. Rainbow: Combining improvements in deep reinforcement learning, in: Thirty-Second AAAI Conference on Artificial Intelligence. Kumar, A., Zhou, A., Tucker, G., Levine, S., 2020. Conservative Q-Learning for Offline Reinforcement Learning. https://doi.org/10.48550/arXiv.2006.04779 Lu, C., Kuba, J.G., Letcher, A., Metz, L., de Witt, C.S., Foerster, J., 2022. Discovered Policy Optimisation. Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., Kavukcuoglu, K., 2016. Asynchronous methods for deep reinforcement learning, in: International Conference on Machine Learning. pp. 1928–1937. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M., 2013. Playing Atari with Deep Reinforcement Learning. Nagabandi, A., Kahn, G., Fearing, R.S., Levine, S., 2017. Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning. https://doi.org/10.48550/arXiv.1708.02596 Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C.L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J., Lowe, R., 2022. Training language models to follow instructions with human feedback. Schrittwieser, J., Antonoglou, I., Hubert, T., Simonyan, K., Sifre, L., Schmitt, S., Guez, A., Lockhart, E., Hassabis, D., Graepel, T., Lillicrap, T., Silver, D., 2020. Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model. Nature 588, 604–609. https://doi.org/10.1038/s41586-020-03051-4 Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P., 2015. Trust region policy optimization, in: International Conference on Machine Learning. pp. 1889–1897. Schulman, J., Moritz, P., Levine, S., Jordan, M., Abbeel, P., 2018. High-Dimensional Continuous Control Using Generalized Advantage Estimation. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O., 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347. Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T., Simonyan, K., Hassabis, D., 2017. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. https://doi.org/10.48550/arXiv.1712.01815 Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M., 2014. Deterministic policy gradient algorithms, in: International Conference on Machine Learning. pp. 387–395. Van Hasselt, H., Guez, A., Silver, D., 2016. Deep reinforcement learning with double q-learning, in: Proceedings of the AAAI Conference on Artificial Intelligence. |