RWKV Github
Model Weights
Hugging Face Integration
Community Discord
RWKV Github
Model Weights
Hugging Face Integration
Community Discord
  • RWKV Language Model
  • Getting Started
    • RWKV Architecture History
    • How to Experience RWKV
    • RWKV-Evals
    • RWKV Decoding Parameters
    • Integrate with your application
    • RWKV-Performance-Data
    • Frequently Asked Questions
  • RWKV Prompting
    • Prompting Format Guidelines
    • Chat Prompt Examples
    • Completion Prompt Examples
  • Advanced
    • RWKV-LM Pre-training Tutorial
    • RWKV-FLA User Guide
    • Fine-tuning
    • Preparing The Training Datasets
    • RWKV Training Environment
  • Inference Tutorials
    • RWKV pip Usage Guide
    • llama.cpp Inference
    • Ollama Inference
    • Silly Tavern Inference
    • Text Generation WebUI Inference
    • KoboldCpp Inference
    • Ai00 Inference
  • Fine Tune Tutorials
    • State Tuning Tutorial
    • LoRA Fine-Tuning Tutorial
    • MiSS Fine-Tuning Tutorial
    • PiSSA Fine-Tuning Tutorial
    • FAQ about Fine-Tuning
  • Community
    • Code Of Conduct
    • Contributing to RWKV
    • Various RWKV related links

MiSS Fine-Tuning Tutorial

What is MiSS Fine-Tuning?

MiSS Fine-Tuning (Matrix Shard Sharing) is a novel parameter-efficient fine-tuning method (PEFT, Parameter-Efficient Fine-Tuning), aiming to further reduce the number of trainable parameters while accelerating convergence.

The MiSS fine-tuning framework expands the design space of PEFT by introducing two core computational structures: Block Affine Efficient Computation (Bone) and Block Affine Transformation (Bat). The Bone structure significantly improves computational efficiency while maintaining high performance, and Bat overcomes the possible collinearity problems in some MiSS variants through non-linear transformation, thus enhancing the model's capabilities.

Experimental results show that MiSS and its Bone structure outperform LoRA and its variants in terms of evaluation metrics, computational efficiency, and resource usage.

different-between-lora-MiSS.jpg


The MiSS fine-tuning method in this article comes from the RWKV community fine-tuning project RWKV-PEFT.

Before starting the MiSS fine-tuning, make sure you have a Linux workspace and an NVIDIA graphics card that supports CUDA.

MiSS Fine-Tuning VRAM Reference

The GPU VRAM requirements for RWKV MiSS fine-tuning can be referred to in the following table:

RWKV-7
Model Parametersbf16int8nf4
RWKV7-0.1B2.7GB GPU2.5GB GPU2.4GB GPU
RWKV7-0.4B3.1GB GPU2.9GB GPU2.7GB GPU
RWKV7-1.5B5.6GB GPU4.5GB GPU3.9GB GPU
RWKV7-2.9B8.8GB GPU6.7GB GPU5.7GB GPU
RWKV-6
Model Parametersbf16int8nf4
RWKV-6-1.6B7.3GB5.9GB5.4GB
RWKV-6-3B11.8GB9.4GB8.1GB
RWKV-6-7B23.7GB17.3GB14.9GB

The data in the above table is based on the following training parameters:

  • ctxlen=1024
  • micro_bsz=1
  • strategy=deepspeed_stage_1
  • peft_config='{"r":64}'

As the training parameters change, the VRAM required for RWKV MiSS fine-tuning will also change.

Collect Training Data

You need to collect binidx data that is more suitable for training RWKV. For specific methods, you can refer to Preparing the Training Dataset.

Configure the Training Environment

To train the RWKV model, you first need to configure the training environment such as conda. For the specific process, please refer to the RWKV Training Environment Configuration section.

Clone the Repository and Install Dependencies

In Linux or WSL, use the git command to clone the RWKV-PEFT repository:

git clone https://github.com/JL-er/RWKV-PEFT.git

After the cloning is completed, use the cd RWKV-PEFT command to enter the RWKV-PEFT directory. And run the following command to install the dependencies required by the project:

pip install -r requirements.txt

Modify the Training Parameters

Open the miss.sh file in the RWKV-PEFT/scripts directory using any text editor (such as vscode), and you can modify the training parameters to control the fine-tuning training process and training effect:

miss-sh-config

The following is a parameter adjustment process for MiSS fine-tuning:

Adjust the Path Parameters

The first three lines of the miss.sh file are file path parameters:

  • load_model: The path of the base RWKV model
  • proj_dir: The output path of the training log and the MiSS file obtained from training
  • data_file: The path of the training dataset. Note that there is no need to include the bin and idx suffixes in the path, only the file name is required.

Adjust the n_layer and n_embd Parameters

Warning

For RWKV models with different parameters, the values of n_layer and n_embd used during training are different.

The following are the corresponding n_layer/n_embd values for RWKV model parameters:

Model Parametersn_layern_embd
0.1B12768
0.4B241024
1.5B242048
2.9B322560
7B324096
14B614096

Adjust the Important Training Parameters

Tips

The following parameters are recommended to be adjusted according to your fine-tuning data and device performance.

ParameterDescription
micro_bsz=1Micro batch size. Adjust based on VRAM capacity; for fine-tuning, start from 1 and gradually increase.
epoch_save=5Interval for saving the MiSS file (every N epochs). Ensure sufficient storage space is available.
epoch_steps=1200Number of steps per training epoch. Increasing this will extend the training time for a single epoch.
ctx_len=512Context length for the fine-tuned model. It is recommended to adjust this based on the length of the corpus.
peft_config='{"r":64}'Rank parameter for MiSS fine-tuning. Higher values yield better results but result in slower training speed and higher VRAM requirements. Generally, 32 or 64 is sufficient.

Adjust Other Training Parameters

The following lists other modifiable training parameters in the script and the effects of their modification.

ParameterDescription
--vocab_size 65536Vocabulary size. Default is 65536. Set to 0 to let the model automatically determine the vocabulary size.
--data_type binidxTraining corpus file format. Supports: utf-8, utf-16le, numpy, binidx, dummy, uint16, sft, jsonl. Recommended: jsonl or binidx.
--epoch_count 5Total number of training epochs.
--lr_init 2e-5Initial learning rate. Recommended 2e-5 for MiSS; maximum should not exceed 1e-4.
--lr_final 2e-5Final learning rate. Recommended to keep consistent with the initial learning rate.
--accelerator gpuAccelerator type. Currently primarily supports gpu; cpu is generally not supported for training.
--devices 1Number of GPUs. Enter 1 for a single card, or the actual count for multi-GPU setups.
--precision bf16Training precision. Recommended to keep default bf16. Supports: fp32, tf32, fp16, bf16.
--strategy deepspeed_stage_1Lightning training strategy parameter. deepspeed_stage_1 is recommended for fine-tuning. If VRAM is too low, change 1 to 2.
--grad_cp 1Gradient checkpointing steps. 0: Faster training but requires more VRAM. 1: Slower training but saves VRAM.
--my_testing "x070"RWKV model version for training. Select x070 for v7, x060 for v6, and x052 for v5 (deprecated, not recommended).
--peft missFine-tuning type. Use miss for MiSS fine-tuning.
--opOperator selection. Supports cuda, fla, triton. Default is set to cuda.
--wandb RWKV-PEFT-MiSSOptional. Whether to use wandb for logging training logs. Requires a pre-configured wandb account.
--lr_schedule wsdOptional. Learning rate scheduler. Default is cos_decay. Supports: cos_decay, wsd.

Appendix: Configuration Reference for miss.sh

load_model='/home/rwkv/models/rwkv7-g1a3-1.5b.pth'
proj_dir='/home/rwkv/roleplay_1107_g1a3_1b5_73k_miss'
data_file='/home/rwkv/roleplay_data/73k-cn-en-doubao.jsonl'

n_layer=24
n_embd=2048

micro_bsz=4
epoch_save=1
epoch_steps=1000
ctx_len=4096
peft_config='{"r":64}'

python train.py --load_model $load_model \
--proj_dir $proj_dir --data_file $data_file \
--vocab_size 65536 \
--data_type jsonl \
--n_layer $n_layer --n_embd $n_embd \
--ctx_len $ctx_len --micro_bsz $micro_bsz \
--epoch_steps $epoch_steps --epoch_count 4 --epoch_save $epoch_save \
--lr_init 1e-5 --lr_final 1e-5 \
--accelerator gpu --precision bf16 \
--devices 4 --strategy deepspeed_stage_1 --grad_cp 1 \
--my_testing "x070" \
--peft miss --peft_config $peft_config

Warning

After adjusting the parameters, remember to save the miss.sh file.

Start the Training

In the RWKV-PEFT directory, run the command sh scripts/miss.sh to start the MiSS fine-tuning.

After the training starts normally, it should be as follows:

miss-tuning-running

How to Use MiSS Fine-Tuned Models

After training is complete, you should be able to find the complete MiSS weight file (.pth format) and the training log (.txt file) in the output folder:

miss-merged-model

The merged MiSS model can be used normally in Ai00. For detailed usage instructions, please refer to the Ai00 Tutorial.

miss-model-usage-of-runner

miss-model-usage

Edit this page
Last Updated:
Contributors: luoqiqi, manjuan
Prev
LoRA Fine-Tuning Tutorial
Next
PiSSA Fine-Tuning Tutorial