A simple reinforcement learning gym for vision-language models, written in JAX. Drop in any environment, any model, and train with PPO.
This post is coming soon.