Main Github
Hugging Face Integration
Community Discord
Main Github
Hugging Face Integration
Community Discord
  • RWKV Language Model
  • Getting Started
    • How to Experience RWKV
    • RWKV Decoding Parameters
    • Integrate with your application
    • Frequently Asked Questions
  • RWKV Prompting
    • Prompting Format Guidelines
    • Chat Prompt Examples
    • Completion Prompt Examples
  • Advanced
    • Fine-tuning
    • Preparing The Training Datasets
    • RWKV Training Environment
    • RWKV Architecture History
    • RWKV pip Usage Guide
  • Inference Tutorials
    • llama.cpp Inference
    • Ollama Inference
    • Silly Tavern Inference
    • Text Generation WebUI Inference
    • KoboldCpp Inference
    • Ai00 Inference
  • Fine Tune Tutorials
    • State Tuning Tutorial
    • LoRA Fine-Tuning Tutorial
    • PiSSA Fine-Tuning Tutorial
    • DiSHA Fine-Tuning Tutorial
    • FAQ about Fine-Tuning
  • Community
    • Code Of Conduct
    • Contributing to RWKV
    • Various RWKV related links

Fine-tuning

Tips

You are generally expected to know what you are doing, if you are attempting to finetune RWKV with LoRa. If you are new to RWKV, you are adviced to play with the base model first, before attempting to finetune with LoRa.

In many cases what most people want to achieve can be done with tuning their prompts, which is much easier then finetuning.

Why Fine-tune the RWKV Model?

Currently, all the publicly released RWKV models are base models (also known as pre-trained models). These base models are trained on large-scale datasets in fields such as natural language processing and possess strong generalization ability and rich knowledge reserves.

However, to maintain generalization ability and universality, the RWKV base model is not optimized for a specific type of task. Therefore, the performance of the RWKV model on certain specific tasks may not be satisfactory.

Fine-tuning the RWKV model, simply put, means retraining the RWKV model using high-quality datasets from specific domains (such as law, literature, medicine, etc.) or tasks (such as material summarization, novel continuation, etc.). The fine-tuned RWKV model will exhibit higher-quality and more stable performance on the corresponding tasks.

Compared with training a brand-new model from scratch, fine-tuning only requires adjusting the parameters of the pre-trained model to achieve satisfactory task results, which requires fewer training cycles and less computing resources.

In summary, we can optimize the performance of the RWKV model on various tasks through fine-tuning, thereby quickly building application scenarios and implementing applications based on the RWKV model.

What Do I Need to Prepare for Fine-tuning Training?

To fine-tune the RWKV model, you need:

  • prepare a Linux system and basic knowledge of Linux
  • prepare a high-performance NVIDIA graphics card
  • configure a virtual environment and software packages for training the RWKV model in the Linux system
  • prepare a dataset for fine-tuning training

VRAM Requirements for RWKV-PEFT Fine-tuning Methods

Tips

Below are the VRAM requirements for various RWKV-PEFT fine-tuning methods with different training precisions. Our tests were conducted on an RTX 4090 GPU with 24GB of VRAM.

RWKV-7 Fine-tuning

State tuning

VRAM requirements for State tuning of RWKV-7 models:

Model Sizebf16int8 quantizationnf4 quantization
RWKV7-0.1B2.6GB GPU2.4GB GPU2.5GB GPU
RWKV7-0.4B3.1GB GPU2.9GB GPU2.8GB GPU
RWKV7-1.5B5.3GB GPU4.1GB GPU3.7GB GPU
RWKV7-3B8.2GB GPU5.7GB GPU4.7GB GPU
LoRA

VRAM requirements for LoRA fine-tuning of RWKV-7 models:

Model Sizebf16int8 quantizationnf4 quantization
RWKV7-0.1B2.7GB GPU2.5GB GPU2.4GB GPU
RWKV7-0.4B3.4GB GPU2.9GB GPU2.7GB GPU
RWKV7-1.5B5.6GB GPU4.6GB GPU3.9GB GPU
RWKV7-3B8.8GB GPU6.7GB GPU5.7GB GPU
DiSHA

VRAM requirements for DiSHA fine-tuning of RWKV-7 models:

Model Sizebf16int8 quantizationnf4 quantization
RWKV7-0.1B2.7GB GPU2.5GB GPU2.4GB GPU
RWKV7-0.4B3.1GB GPU2.9GB GPU2.7GB GPU
RWKV7-1.5B5.6GB GPU4.5GB GPU3.9GB GPU
RWKV7-3B8.8GB GPU6.7GB GPU5.7GB GPU
PiSSA

VRAM requirements for PiSSA fine-tuning of RWKV-7 models:

Model Sizebf16int8 quantizationnf4 quantization
RWKV7-0.1B2.6GB GPU2.5GB GPU2.4GB GPU
RWKV7-0.4B3.4GB GPU3.0GB GPU2.7GB GPU
RWKV7-1.5B5.6GB GPU4.6GB GPU3.9GB GPU
RWKV7-3B8.8GB GPU6.7GB GPU5.7GB GPU

RWKV-6 Fine-tuning

RWKV-6 models require slightly more VRAM for fine-tuning compared to RWKV-7. The following VRAM requirements are for reference:

Model SizeFull Fine-tuningDiSHA/LoRA/PISSAQLoRA/QPissaState tuning
RWKV6-1.6BOOM7.4GB GPU5.6GB GPU6.4GB GPU
RWKV6-3BOOM12.1GB GPU8.2GB GPU9.4GB GPU
RWKV6-7BOOM23.7GB GPU(batch size 8 causes OOM)14.9GB GPU(batch size 8 requires 19.5GB)18.1GB GPU

Recommended fine-tuning repositories

  • RWKV-PEFT: RWKV-PEFT is the official implementation for efficient parameter fine-tuning of RWKV models, supporting various advanced fine-tuning methods across multiple hardware platforms.
  • OpenMOSE/RWKV-LM-RLHF: Reinforcement Learning Toolkit for RWKV.(v6,v7,ARWKV) Distillation,SFT,RLHF(DPO,ORPO)

read the tutorials for different fine-tuning methods of RWKV-PEFT:

  • State Tuning
  • Pissa Fine-Tuning
  • DiSHA Fine-Tuning
  • LoRA Fine-Tuning
Edit this page
Last Updated:
Contributors: luoqiqi, manjuan
Prev
RWKV Prompting
Next
Inference Tutorials