Add models with the Megatron-LM backend
Last updated: 04/25/2025.
Model
If use latest verl, we have direct support of GPTModel
for Megatron backend.
You can use the similar way of using Megatron to pretrain custom models.
We list the steps here:
Find model_initializer.py
If your model is configurable by
TransformerLayerSpec
, you can directly useGPTModel
. Otherwise, Please implement a newModelLayerSpec
andModelLayer
here.Use the right
LayerSpec
,TransformerConfig
andHuggingfaceConfig
as arguments to initialize the GPTModel.Return the model at last.