ECCOMAS 2024

Approximated Bayesian Reinforcement Learning

  • Arnaoutis, Vasos (University of Twente)
  • Rosic, Bojana (University of Twente)

Please login to view abstract download link

Model-free algorithms are used for the control and optimization of complex (non-physical) systems. Reinforcement learning is a widely used model-free approach that has seen success in the optimization of control applications and beyond. State and parameter estimation is a critical aspect of reinforcement learning in defining a model. While often described deterministically, some reinforcement learning algorithms can already account for a stochastic model through probability distributions often simplified to its first two moments, mean and variance [1]. This paper describes reinforcement learning as an inverse problem through a Bayesian framework of conditional expectations for any stochastic model of uncertain parameters. From that, a general formulation of the Kalman Filter is presented for the estimation of states and parameters of the system. The derived operators are approximated by polynomial expansions that can represent higher moments of the probability distribution. For practical implementation, the generalized Kalman Filter-based Reinforcement Learning algorithm is applied to a classical control benchmark. [1] Fortunato, M., Azar, M. G., Piot, B., Menick, J., Hessel, M., Osband, I., Graves, A., Mnih, V., Munos, R., Hassabis, D., Pietquin, O., Blundell, C., & Legg, S. (2017). Noisy Networks for Exploration. 6th International Conference on Learning Representations, ICLR 2018 - Conference Track Proceedings. https://doi.org/10.48550/arxiv.1706.10295