verl

Quickstart

  • Installation
  • Quickstart: PPO training on GSM8K dataset
  • Multinode Training
  • Ray Debug Tutorial
  • More Resources

Programming guide

  • HybridFlow Programming Guide
  • The Design of verl.single_controller

Data Preparation

  • Prepare Data for Post-Training
  • Implement Reward Function for Dataset

Configurations

  • Config Explanation

PPO Example

  • PPO Example Architecture
  • GSM8K Example
  • Multi-Modal Example Architecture

Algorithms

  • Proximal Policy Optimization (PPO)
  • Group Relative Policy Optimization (GRPO)
  • Recipe: Decoupled Clip and Dynamic Sampling Policy Optimization (DAPO)
  • Recipe: Self-Play Fine-Tuning (SPIN)
  • Recipe: Self-Play Preference Optimization (SPPO)
  • Recipe: Entropy Mechanism
  • On-Policy RL with Optimal Reward Baseline (OPO)
  • Algorithm Baselines
  • GPG: Group Policy Gradient

PPO Trainer and Workers

  • PPO Ray Trainer
  • PyTorch FSDP Backend
  • Megatron-LM Backend
  • SGLang Backend

Performance Tuning Guide

  • Training DeepSeek 671b
  • Performance Tuning Guide
  • Upgrading to vLLM >= 0.8
  • Hardware Resource Needed for RL
  • NVIDIA Nsight Systems profiling in verl

Adding new models

  • Add models with the FSDP backend
  • Add models with the Megatron-LM backend

Advanced Features

  • Using Checkpoints to Support Fault Tolerance Training
  • RoPE Scaling override
  • RL(HF) algorithms with LoRA Support
  • Multi-turn Rollout Support
  • Interaction System for Multi-turn RL Training
  • Ray API Design Tutorial
  • Extend to other RL(HF) algorithms
  • Sandbox Fusion Example

Hardware Support

  • Getting started with AMD (ROCM Kernel)
  • verl performance tuning for AMD (ROCm Kernel)
  • verl x Ascend

API References

  • Data interface
  • Single Controller interface
  • Trainer Interface
  • Utilities

FAQ

  • Frequently Asked Questions

Development Notes

  • Sandbox Fusion Tool Integration
verl
  • Search


© Copyright 2024 ByteDance Seed Foundation MLSys Team.

Built with Sphinx using a theme provided by Read the Docs.