ROLL is an efficient and user-friendly RL library designed for Large Language Models (LLMs) utilizing Large Scale GPU resources. It significantly enhances LLM performance in key areas such as human preference alignment, complex reasoning, and multi-turn agentic interaction scenarios.
Leveraging a multi-role distributed architecture with Ray for flexible resource allocation and heterogeneous task scheduling, ROLL integrates cutting-edge technologies like Megatron-Core, SGLang and vLLM to accelerate model training and inference.
Github:https://github.com/alibaba/ROLL
✨ Key Features
- Multi-task RL Training (RLVR): Covers mathematics, coding, general reasoning, open-ended Q&A, instruction following, etc.
- Flexible domain_batch_size distribution control.
- Sample-level asynchronous parallel Rollout, asynchronous reward calculation, and dynamic sampling.
- Asynchronous training under implementation.
- Agentic RL: Multi-turn interaction capabilities for games, multi-turn dialogues, tool use, etc.
- Environment-level asynchronous parallel rollout.
- Supports asynchronous training.
- Multi-turn interaction rollout supports local debugging, improving multi-turn interaction business development efficiency.
- Supports TrajectoryWise (StartPO) and StepWise (GiGPO) training paradigms.
- Algorithm-Friendly: Provides flexible and rich RL strategy configurations by default.
- Over 20 rich reinforcement learning strategy options, such as reward normalization, reward clipping, various advantage estimation methods, etc.
- Out-of-the-box support for reinforcement learning algorithms, such as PPO, GRPO, Reinforce++, TOPR, RAFT++, GSPO, etc.
- Rich Training and Inference Engine: Ray-based multi-role distributed architecture; Strategy abstraction unifies various backends, enabling easy operation from single machines to thousands-of-GPU clusters.
- Inference/Generation supports vLLM, SGLang.
- Training supports DeepSpeed (ZeRO), Megatron-LM 5D parallelism (mcore-adapter, dp/tp/pp/cp/ep), FSDP under implementation.
- Extreme offload/reload capabilities.
- Supports LoRA training.
- Supports FP8 rollout (FP8 inference for LLM as judge, FP8 rollout with BF16 training under development).
- AutoDeviceMapping: Supports custom device mapping for different roles, flexibly managing colocated and disaggregated deployments.
- Observability: Integrated with SwanLab / WandB / TensorBoard, tracking of performance for each domain and reward type.
- Rich Post-training Technical Support:
- Agentic RL LLM & VLM
- RLVR LLM & VLM
- Distill Pipeline LLM & VLM
- DPO Pipeline
- SFT Pipeline under development
🚀 Get Started
https://alibaba.github.io/ROLL/docs/Getting%20Started/Quick%20Start/single_node_quick_start/
🤝 About [ROCK & ROLL Team]
ROLL is a project jointly developed by Alibaba Future Living Lab and Alibaba AI Engine Team, with a strong emphasis on pioneering the future of Reinforcement Learning (RL). Our mission is to explore and shape innovative forms of future living powered by advanced RL technologies.
Top comments (0)