Main Github
Hugging Face Integration
Community Discord
Main Github
Hugging Face Integration
Community Discord
  • RWKV Language Model
  • Getting Started
    • How to Experience RWKV
    • RWKV Decoding Parameters
    • Integrate with your application
    • Frequently Asked Questions
  • RWKV Prompting
    • Prompting Format Guidelines
    • Chat Prompt Examples
    • Completion Prompt Examples
  • Advanced
    • Fine-tuning
    • Preparing The Training Datasets
    • RWKV Training Environment
    • RWKV Architecture History
    • RWKV pip Usage Guide
  • Inference Tutorials
    • llama.cpp Inference
    • Ollama Inference
    • Silly Tavern Inference
    • Text Generation WebUI Inference
    • KoboldCpp Inference
    • Ai00 Inference
  • Fine Tune Tutorials
    • State Tuning Tutorial
    • LoRA Fine-Tuning Tutorial
    • PiSSA Fine-Tuning Tutorial
    • DiSHA Fine-Tuning Tutorial
    • FAQ about Fine-Tuning
  • Community
    • Code Of Conduct
    • Contributing to RWKV
    • Various RWKV related links

RWKV raven avartar

RWKV Language Model

RWKV (pronounced RWaKuV) is an RNN with GPT-level large language model (LLM) performance that can be trained directly like a GPT Transformer (parallelizable).

RWKV combines the best features of RNN and Transformer: excellent performance, constant memory usage, constant inference generation speed, "infinite" context length, and free sentence embeddings. It is also 100% free of self-attention mechanisms.

The RWKV project was initially proposed by Bo Peng (Blink_DL), and as the project gained attention, it gradually developed into an open-source community.

On September 20, 2023, the RWKV open-source project officially joined the Linux Foundation. Today, the RWKV project is an open-source non-profit organization under the Linux Foundation, with some computing power previously supported by sponsors.

  • Discord Forum
  • HF Gradio-1 | RWKV-7-World-2.9B-v3
  • HF Gradio-2 | RWKV-7-G1-0.4B

RWKV Architecture and Papers

RWKV-7 (Goose) is the latest version of the RWKV architecture. The paper was co-authored by Bo Peng and the RWKV community, published on March 18, 2025.

  • RWKV-7 Paper: "RWKV-7 Goose with Expressive Dynamic State Evolution"
  • Paper Link: arXiv:2503.14456

RWKV-7 adopts Dynamic State Evolution, surpassing the fundamental limitations of the TC0 expressive power of the attention/linear attention paradigm.

Click to view RWKV-7 Architecture Diagram

RWKV-7-architecture

RWKV 5/6 (Eagle/Finch) architectures have several improvements based on the RWKV-4 architecture. Therefore, these two architectures are published in the same paper.

  • RWKV 5/6 Paper: "Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence"
  • Paper Link: arXiv:2404.05892

RWKV-4 is the first official version of the RWKV model. The paper was co-authored by Bo Peng and the RWKV community and was first published on May 22, 2023. In October of the same year, the RWKV-4 architecture paper was accepted by EMNLP 2023.

  • RWKV-4 Paper: "RWKV: Reinventing RNNs for the Transformer Era"
  • Paper Link: arXiv:2305.13048

RWKV Model Version Status

RWKV has released open-source models of various parameter scales for each architecture version.

VersionRWKV-V4RWKV-v5-EagleRWKV-v6-FinchRWKV-v7-GooseRWKV-v7-G1
PaperPublishedPublishedPublishedPublishedComing Soon
Overall StatusEOLEOLStableStableIn Progress
0.4B ModelReleasedReleasedNo PlanReleasedReleased
1.5B ModelReleasedReleasedReleasedReleased📅 Planned
3B ModelReleasedReleasedReleasedReleased📅 Planned
7B ModelReleasedReleasedReleased📅 Planned📅 Planned
14B ModelReleasedNo PlanReleased📅 Planned📅 Planned

Which RWKV Models Should I Use?

Please use RWKV-7 series models. RWKV-7 models are based on the latest RWKV-7 architecture and latest datasets, therefore offering better performance.

Since RWKV-7 7B and larger models are still in training for 7B and larger parameter models, please use the RWKV-6-World-14B-V2.1 model; consider using the RWKV-6-World-7B-V3 model if your hardware cannot run the 14B model.

Tips

RWKV-7-World 7B/14B will replace the existing RWKV-6-World 7B/14B models once training is complete. Earlier RWKV versions have come to the end of their lifecycle, and existing models are only for archival purposes.

Differences Between RWKV and Transformer

  • Advantages

    • Lower resource usage during runtime and training (VRAM, CPU, GPU, etc.).
    • 10 to 100 times lower computational requirements compared to Transformers with larger contexts.
    • Supports linear scaling to any context length (Transformers scale quadratically).
    • Performs as well as Transformer architectures in terms of answer quality and generalization ability.
    • RWKV models' training data includes languages other than English (e.g., Chinese, Japanese, etc.), offering better multilingual capabilities than most existing open-source models.
  • Disadvantages

    • RWKV base models are very sensitive to the format of prompts, and the format of prompts significantly affects the generation results.
    • Due to architectural design, RWKV models are weaker in tasks requiring retrospection, so prompts need to be appropriately ordered. For example, provide task instructions to the model first, then provide the material text needed to perform the task.

Basic Terminology of the RWKV Community

ConceptDescription
RWKVThe model architecture itself, training code can be found here.
ChatRWKVThe official chatbot of RWKV (similar to ChatGPT but based on RWKV), code can be found here.
RWKV-4/5/6/7Different architecture versions of RWKV. Note that using the latest RWKV-7 series models is recommended.
RWKV WorldThe base RWKV model trained with global languages, covering a broader and more diverse dataset, including training data in over 100 languages and some instruction training.
RavenThe official fine-tuned version of the RWKV-4 base model, including instruction training. However, since the RWKV-4 series is no longer updated, it is not recommended for continued use.
RWKV ABC/MIDIRWKV music models based on ABC/MIDI format
RWKV CHNtuned / one-state-chat / role_play / novel ...Fine-tuned models provided by the RWKV community, optimized for specific tasks or data types. Please prioritize using RWKV-7 series fine-tuned models.

RWKV Model Naming Rules

RWKV models typically have two naming conventions:

  • RWKV-6-World-3B-v2.1-20240208-ctx4096.pth
  • RWKV-x070-World-1.5B-v3-20250127-ctx4096.pth

The meaning of each field in the model name:

FieldMeaning
RWKVModel name
6 / 070RWKV model architecture, recommended to use RWKV-7 models
WorldModel type, World indicates RWKV models trained with global languages, thus supporting multilingual tasks
3B / 1.5BModel parameter scale, "B" stands for "Billions"
v2.1 / v3Model training dataset version, v2.1 ≈ 1.1T , v3 ≈ 2.5T
20240208 / 20250127Model release date
ctx4096Pre-trained context length
.pthRWKV model file format, also supports .gguf and .safetensors etc.

Who sponsors the compute for RWKV?

RWKV is made possible, as an Open Source project, thanks to the large amount of GPU compute and researchers time contributions from

Without their invaluable support, we would not have been able to develop the core RWKV foundation models that you see today.


In addition, we would like to thank

  • alpin @ pygmalionAI
  • AutoMeta @ AlignmentLab
  • FeatherlessAI
  • Various other folks who donated slices of GPU time / preferred not to be named

For helping with GPU time, on smaller experiments, finetunes, and various models. Especially for those models that never get publically released in failed runs.

Edit this page
Last Updated:
Contributors: luoqiqi, manjuan
Next
Getting Started