id author title date pages extension mime words sentences flesch summary cache txt work_se473dceh5atrhc37ydulr7ns4 Per‐Arne Andersen Increasing sample efficiency in deep reinforcement learning using generative environment modelling 2020 15 .pdf application/pdf 7814 784 60 artificial experience-replay, deep reinforcement learning, environment Modelling, exploration, Bangaru, Suhas and Ravindran (2016) proposed a method of deducing the Markov Decision Process (MDP) by introducing an adaptive exploration signal (pseudo-reward), which was obtained using deep generative model. Xiao and Kesineni (2016) proposed the use of generative adversarial networks (GAN) for model-based reinforcement learning. was used to learn the optimal policy of the environment using algorithms such as Deep Q-Networks (DQN) (Mnih et al. Deep Planning Network (PlaNet) is a model-based agent that interpret the pixels of a state to learn the dynamics of an environment. Their algorithms show state-of-the-art performance in self-supervised generative modelling for reinforcement learning agents. In this section, we show results of model-based reinforcement learning using DVAE in the deep-maze and deep-line wars environment. Increasing sample efficiency in deep reinforcement learning using generative environment modelling Increasing sample efficiency in deep reinforcement learning using generative environment modelling ./cache/work_se473dceh5atrhc37ydulr7ns4.pdf ./txt/work_se473dceh5atrhc37ydulr7ns4.txt