Add models with the Megatron-LM backend
Last updated: 04/25/2025.
Model
If use latest verl, we have direct support of GPTModel for Megatron backend.
You can use the similar way of using Megatron to pretrain custom models.
We list the steps here:
Find model_initializer.py
If your model is configurable by
TransformerLayerSpec, you can directly useGPTModel. Otherwise, Please implement a newModelLayerSpecandModelLayerhere.Use the right
LayerSpec,TransformerConfigandHuggingfaceConfigas arguments to initialize the GPTModel.Return the model at last.