Papers
arxiv:2307.12983

Parallel Q-Learning: Scaling Off-policy Reinforcement Learning under Massively Parallel Simulation

Published on Jul 24, 2023
Authors:
,
,
,
,

Abstract

Parallel Q-Learning leverages GPU-based simulation to achieve faster wall-clock training times than PPO with equivalent sample efficiency.

AI-generated summary

Reinforcement learning is time-consuming for complex tasks due to the need for large amounts of training data. Recent advances in GPU-based simulation, such as Isaac Gym, have sped up data collection thousands of times on a commodity GPU. Most prior works used on-policy methods like PPO due to their simplicity and ease of scaling. Off-policy methods are more data efficient but challenging to scale, resulting in a longer wall-clock training time. This paper presents a Parallel Q-Learning (PQL) scheme that outperforms PPO in wall-clock time while maintaining superior sample efficiency of off-policy learning. PQL achieves this by parallelizing data collection, policy learning, and value learning. Different from prior works on distributed off-policy learning, such as Apex, our scheme is designed specifically for massively parallel GPU-based simulation and optimized to work on a single workstation. In experiments, we demonstrate that Q-learning can be scaled to tens of thousands of parallel environments and investigate important factors affecting learning speed. The code is available at https://github.com/Improbable-AI/pql.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2307.12983 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2307.12983 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2307.12983 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.