The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.
[](https://arxiv.org/pdf/2505.22617) [](https://github.com/PRIME-RL/Entropy-Mechanism-of-RL) [](https://www.alphaxiv.org/abs/2505.22617) [](https://x.com/stingning/status/1928088554166505667) [](https://x.com/charlesfornlp/status/1928089451080585283) [](https://x.com/_akhaliq/status/1928077929105268861)
## 🎉News
- **[2025/05/29]** 🎉 Ranked **#1** of the day on [Huggingface Daily Papers](https://huggingface.co/papers?date=2025-05-29).
- **[2025/05/29]** Released our Paper on arXiv. See [here](https://arxiv.org/pdf/2505.22617). We provide insights into the entropy mechanism of RL for LLMs and propose two simple yet effective strategies to alleviate the entropy collapse.
## ✨Getting started
After preparing the training data, for training Qwen2.5-7B on a single node, taking the KL-Cov approach as an example, you can simply run:
```
cd verl
conda activate your_env
bash recipe/dapo/7b_kl_cov.sh
```
While for training Qwen2.5-32B on multi nodes, you can run the following commands:
```
cd verl
conda activate your_env
bash recipe/dapo/32b_kl_cov.sh
```
## 📖Introduction