RWKV Github
Model Weights
Hugging Face Integration
Community Discord
RWKV Github
Model Weights
Hugging Face Integration
Community Discord
  • RWKV Language Model
  • Getting Started
    • RWKV Architecture History
    • How to Experience RWKV
    • RWKV-Evals
    • RWKV Decoding Parameters
    • Integrate with your application
    • RWKV-Performance-Data
    • Frequently Asked Questions
  • RWKV Prompting
    • Prompting Format Guidelines
    • Chat Prompt Examples
    • Completion Prompt Examples
  • Advanced
    • RWKV-LM Pre-training Tutorial
    • RWKV-FLA User Guide
    • Fine-tuning
    • Preparing The Training Datasets
    • RWKV Training Environment
  • Inference Tutorials
    • RWKV pip Usage Guide
    • llama.cpp Inference
    • Ollama Inference
    • Silly Tavern Inference
    • Text Generation WebUI Inference
    • KoboldCpp Inference
    • Ai00 Inference
  • Fine Tune Tutorials
    • State Tuning Tutorial
    • LoRA Fine-Tuning Tutorial
    • MiSS Fine-Tuning Tutorial
    • PiSSA Fine-Tuning Tutorial
    • FAQ about Fine-Tuning
  • Community
    • Code Of Conduct
    • Contributing to RWKV
    • Various RWKV related links

LoRA Fine-Tuning Tutorial

Info

What is LoRA Fine-Tuning?

LORA (Low-Rank Adaptation) is a fine-tuning technique for large pre-trained models. It does not change most of the parameters of the original model but adjusts part of the model's weights to achieve optimization for specific tasks.


The LoRA fine-tuning method in this article comes from the RWKV community fine-tuning project RWKV-PEFT.

Before starting the LoRA fine-tuning, make sure you have a Linux workspace and an NVIDIA graphics card that supports CUDA.

LoRA VRAM Reference

The GPU VRAM requirements for RWKV LoRA fine-tuning can be referred to in the following table:

RWKV-7
Model Parametersbf16int8nf4
RWKV7-0.1B2.7GB GPU2.5GB GPU2.4GB GPU
RWKV7-0.4B3.4GB GPU2.9GB GPU2.7GB GPU
RWKV7-1.5B5.6GB GPU4.6GB GPU3.9GB GPU
RWKV7-2.9B8.8GB GPU6.7GB GPU5.7GB GPU
RWKV-6
Model Parametersbf16int8nf4
RWKV6-1.6B7.3GB GPU5.9GB GPU5.4GB GPU
RWKV6-3B11.8GB GPU9.4GB GPU8.1GB GPU
RWKV6-7B23.7GB GPU17.3GB GPU14.9GB GPU

The data in the above table is based on the following training parameters:

  • ctxlen=1024
  • micro_bsz=1
  • strategy=deepspeed_stage_1
  • peft_config='{"r":64,"lora_alpha":32,"lora_dropout":0.05}'

As the training parameters change, the VRAM required for RWKV LoRA fine-tuning will also change.

Collect Training Data

You need to collect binidx data that is more suitable for training RWKV. For specific methods, you can refer to Preparing the Training Dataset.

Configure the Training Environment

To train the RWKV model, you first need to configure the training environment such as conda. For the specific process, please refer to the RWKV Training Environment Configuration section.

Clone the Repository and Install Dependencies

In Linux or WSL, use the git command to clone the RWKV-PEFT repository:

git clone https://github.com/JL-er/RWKV-PEFT.git

After the cloning is completed, use the cd RWKV-PEFT command to enter the RWKV-PEFT directory. And run the following command to install the dependencies required by the project:

pip install -r requirements.txt

Modify the Training Parameters

Open the run_lora.sh file in the RWKV-PEFT/scripts directory using any text editor (such as vscode), and you can modify the training parameters to control the fine-tuning training process and training effect:

lora-sh-config

The following is a parameter adjustment process for LoRA fine-tuning:

Adjust the Path Parameters

The first three lines of the run_lora.sh file are file path parameters:

  • load_model: The path of the base RWKV model
  • proj_dir: The output path of the training log and the LoRA file obtained from training
  • data_file: The path of the training dataset. Note that there is no need to include the bin and idx suffixes in the path, only the file name is required.

Adjust the n_layer and n_embd Parameters

Warning

For RWKV models with different parameters, the values of n_layer and n_embd used during training are different.

The following are the corresponding n_layer/n_embd values for RWKV model parameters:

Model Parametersn_layern_embd
0.1B12768
0.4B241024
1.5B242048
2.9B322560
7B324096
14B614096

Adjust the Important Training Parameters

Tips

The following parameters are recommended to be adjusted according to your fine-tuning data and device performance.

ParameterDescription
micro_bsz=1Micro-batch size. Adjust according to the size of the VRAM. Gradually increase it starting from 1 during fine-tuning
epoch_save=5Save the LoRA file every few training epochs. Pay attention to whether the storage space is sufficient
epoch_steps=1000The number of steps in each training epoch. Increasing this value will lengthen the training time of a single epoch
ctx_len=512The context length of the fine-tuned model. It is recommended to modify it according to the length of the corpus
--my_testing "x070"The RWKV model version for training. Use x070 for v7, x060 for v6, and x052 for v5 (deprecated, not recommended).

Adjust the LoRA-Related Parameters

Tips

lora_config contains the parameters for LoRA fine-tuning. Refer to the following table for the effects:

ParameterDescription
"r": 32The rank parameter for LoRA fine-tuning. Higher values generally give better results but require more training time and GPU memory. Typically, 32 or 64 is sufficient.
"lora_alpha": 32The alpha (scaling) parameter for LoRA fine-tuning. It is recommended to keep it at twice the value of lora_r.
"lora_dropout": 0.01The dropout rate for LoRA fine-tuning. A value of 0.01 is recommended.

Adjust Other Training Parameters

The following lists other modifiable training parameters in the script and the effects of their modification.

ParameterDescription
--vocab_size 65536Vocabulary size, default is 65536. Setting it to 0 allows the model to determine the vocabulary size automatically.
--data_type binidxFormat of the training corpus. Supported formats: utf-8, utf-16le, numpy, binidx, dummy, uint16, sft, jsonl. It is recommended to use jsonl or binidx.
--epoch_count 5Total number of training epochs.
--lr_init 2e-5Initial learning rate. MiSS recommends 2e-5, and it should not exceed 1e-4.
--lr_final 2e-5Final learning rate. It is recommended to keep this the same as the initial learning rate.
--accelerator gpuType of accelerator to use. Currently supports mainly gpu. cpu is generally not suitable for training.
--devices 1Number of GPUs. Set to 1 for a single GPU, or the actual number when using multiple GPUs.
--precision bf16Training precision. Recommended to keep the default bf16. Supported options: fp32, tf32, fp16, bf16.
--strategy deepspeed_stage_1Lightning training strategy. For fine-tuning, deepspeed_stage_1 is recommended. If GPU memory is too small, change 1 to 2.
--grad_cp 1Gradient accumulation steps. 0 trains faster but uses more GPU memory; 1 trains slower but saves memory.
--peft loraFine-tuning type. Use lora for LoRA fine-tuning.
--opOperator selection. Supports cuda, fla, triton. Default is cuda.
--wandb RWKV-PEFT-LoRAOptional. Enable wandb for training log visualization. Requires a configured wandb account.
--lr_schedule wsdOptional. Learning rate scheduler. Default is cos_decay. Supported: cos_decay, wsd.

Warning

After adjusting the parameters, remember to save the run_lora.sh file.

Appendix: Configuration Reference for run_lora.sh

load_model="/home/rwkv/model/rwkv7-g1-1.5b-20250429-ctx4096.pth"
proj_dir='/home/rwkv/JL/out_model/test'
data_file=/home/rwkv/JL/data/roleplay

n_layer=24
n_embd=2048

micro_bsz=8
epoch_save=1
epoch_steps=200
ctx_len=128
peft_config='{"r":8,"lora_alpha":32,"lora_dropout":0.05}'

python train.py --load_model $load_model \
--proj_dir $proj_dir --data_file $data_file \
--vocab_size 65536 \
--data_type jsonl \
--n_layer $n_layer --n_embd $n_embd \
--ctx_len $ctx_len --micro_bsz $micro_bsz \
--epoch_steps $epoch_steps --epoch_count 4 --epoch_save $epoch_save \
--lr_init 1e-5 --lr_final 1e-5 \
--accelerator gpu --precision bf16 \
--devices 1 --strategy deepspeed_stage_1 --grad_cp 1 \
--my_testing "x070" \
--peft lora --peft_config $peft_config
# Optional parameters
# --op cuda/fla/triton     (Select different operators; if not set, defaults to cuda)
# --wandb RWKV-PEFT-LoRA (Whether to use wandb to monitor the training process)
# --lr_schedule wsd        (Whether to enable cosine annealing for learning rate; 
#                           the default lr_schedule is cos_decay)

Start the Training

In the RWKV-PEFT directory, run the command sh scripts/run_lora.sh to start the LoRA fine-tuning.

After the training starts normally, it should be as follows:

lora-tuning-running

Use the LoRA Weight File

After training is completed, you can find the full LoRA fine-tuned model file (in .pth format) in the output directory.

lora-merged-model

The merged LoRA fine-tuned model can be used normally in RWKV Runner or Ai00.

lora-model-usage-of-runner

lora-model-usage

For detailed usage, please refer to the Ai00 Tutoria.

Edit this page
Last Updated:
Contributors: luoqiqi, manjuan
Prev
State Tuning Tutorial
Next
MiSS Fine-Tuning Tutorial