Hyperparameter | Global Batch Size | Learning rate | Epochs | Max length | Weight decay |
---|---|---|---|---|---|
Vicuna-13B | 128 | 2e-5 | 3 | 2048 | 0 |
--model_name_or_path
with the actual path to LLaMA weights and --data_path
with the actual path to data.fastchat/train/train_mem.py
above with fastchat/train/train_xformers.py.