Main Github
Hugging Face Integration
Community Discord
Main Github
Hugging Face Integration
Community Discord
  • RWKV Language Model
  • Getting Started
    • RWKV Architecture History
    • How to Experience RWKV
    • RWKV Decoding Parameters
    • Integrate with your application
    • RWKV-Performance-Data
    • Frequently Asked Questions
  • RWKV Prompting
    • Prompting Format Guidelines
    • Chat Prompt Examples
    • Completion Prompt Examples
  • Advanced
    • RWKV-LM Pre-training Tutorial
    • RWKV-FLA User Guide
    • Fine-tuning
    • Preparing The Training Datasets
    • RWKV Training Environment
  • Inference Tutorials
    • RWKV pip Usage Guide
    • llama.cpp Inference
    • Ollama Inference
    • Silly Tavern Inference
    • Text Generation WebUI Inference
    • KoboldCpp Inference
    • Ai00 Inference
  • Fine Tune Tutorials
    • State Tuning Tutorial
    • LoRA Fine-Tuning Tutorial
    • PiSSA Fine-Tuning Tutorial
    • DiSHA Fine-Tuning Tutorial
    • FAQ about Fine-Tuning
  • Community
    • Code Of Conduct
    • Contributing to RWKV
    • Various RWKV related links

RWKV-Performance-Data

NVIDIA

Info

The data on this page is from: RWKV-Inference-Performance-Test. We welcome everyone to follow the repository's guidelines to run tests and submit performance data for NVIDIA hardware.

NVIDIA RTX 4090

Performance of RWKV models on NVIDIA RTX 4090:

Inference ToolModelPrecisionTokens/sVRAM Usage
RWKV pipRWKV7-G1 2.9Bfp1656.185.52 GB
llama.cpp(CUDA)RWKV7-G1 2.9Bfp1689.165.75GB
llama.cpp(CUDA)RWKV7-G1 2.9BQ8_0110.33.47GB
web-rwkvRWKV7-G1 2.9Bfp1695.985.9GB
web-rwkvRWKV7-G1 2.9Bint8108.223.9GB
web-rwkvRWKV7-G1 2.9Bnf4115.462.4GB

Data Source: issue #3

Test Environment:

  • CPU: Intel(R) Xeon(R) Platinum 8331C
  • OS: ubuntu 22.04 Linux-6.8.0-60-generic-x86_64-with-glibc2.35
  • python_version: 3.10.16
  • pytorch_version: 2.5.1+cu121

NVIDIA RTX 4060Ti 8GB

Performance of RWKV models on NVIDIA RTX 4060Ti 8GB:

Inference ToolModelPrecisionTokens/sVRAM Usage
RWKV pipRWKV7-G1 2.9Bfp1636.615.52 GB
web-rwkvRWKV7-G1 2.9Bfp1643.925.9GB
web-rwkvRWKV7-G1 2.9Bint862.933.9GB
web-rwkvRWKV7-G1 2.9Bnf486.032.4GB

Data Source: issue #1

Test Environment:

  • CPU: Intel i7-13700F
  • OS version: Windows 10 Professional
  • driver version: 576.02
  • CUDA version: 12.9

NVIDIA RTX 4060 Laptop

Inference ToolModelPrecisionTokens/sVRAM Usage
web-rwkvRWKV7-G0 7.2Bnf440.305.1GB
web-rwkvRWKV7-G1 2.9Bfp1640.985.9GB
web-rwkvRWKV7-G1 2.9Bint860.213.9GB
web-rwkvRWKV7-G1 2.9Bnf477.262.4GB

Data Source: issue #15

Test Environment:

  • CPU: AMD Ryzen 7 8845H (16) @ 5.61 GHz
  • OS version: Arch Linux x86_64 @ Kernel: Linux 6.15.7-arch1-1
  • driver version: 575.64.05

AMD

Info

Inference performance of RWKV models on AMD hardware, including various professional GPUs, consumer GPUs, and even integrated graphics.

AMD RX 7900 XTX

Inference ToolModelPrecisionTokens/sVRAM Usage
llama.cpp(Vulkan)RWKV7-G1 2.9Bfp1641.555.75GB
llama.cpp(Vulkan)RWKV7-G1 2.9BQ8_042.853.47GB
web-rwkvRWKV7-G1 2.9Bfp16106.005.9GB
web-rwkvRWKV7-G1 2.9Bint8137.363.9GB
web-rwkvRWKV7-G1 2.9Bnf4151.642.4GB

Data Source: issue #5 | issue #6

Test Environment:

  • CPU: AMD Ryzen 9 5950X
  • OS version: Ubuntu 25.04, 6.14.0-23-generic
  • driver_info: "radv Mesa 25.0.3-1ubuntu2"

AMD Radeon PRO W7900

Inference ToolModelPrecisionTokens/sVRAM Usage
RWKV pipRWKV7-G1 2.9Bfp1645.285.52 GB
llama.cpp(ROCm)RWKV7-G1 2.9Bfp1648.715.75GB
llama.cpp(ROCm)RWKV7-G1 2.9BQ8_058.593.47GB
llama.cpp(Vulkan)RWKV7-G1 2.9Bfp1639.495.75GB
llama.cpp(Vulkan)RWKV7-G1 2.9BQ8_045.213.47GB
web-rwkvRWKV7-G1 2.9Bfp1661.625.9GB
web-rwkvRWKV7-G1 2.9Bint879.463.9GB
web-rwkvRWKV7-G1 2.9Bnf489.762.4GB

Data Source: issue #9 | issue #13 | issue #14

Test Environment:

  • CPU: Intel I3 12100
  • OS version: Ubuntu 24.04.2 LTS 6.11.0-26-generic

AMD Radeon Pro VII (Instinct MI50)

Inference ToolModelPrecisionTokens/sVRAM Usage
web-rwkvRWKV7-G1 2.9BFP1659.835.9GB
web-rwkvRWKV7-G1 2.9BINT872.703.9GB
web-rwkvRWKV7-G1 2.9BNF465.992.4GB

Data Source: issue #10

Test Environment:

  • CPU: AMD Ryzen 9 5900X
  • OS version: Windows11 24H2
  • AMD Software: PRO Edition 25.5.1 Vulkan

AMD RYZEN AI MAX+ 395w [CPU]

Inference ToolModelPrecisionTokens/sRAM Usage
llama.cpp(CPU)RWKV7-G1 2.9Bfp1614.10to-be-tested
llama.cpp(CPU)RWKV7-G1 2.9BQ8_022.42to-be-tested

Data Source: issue #18

Test Environment:

  • CPU: AMD RYZEN AI MAX+ 395w
  • OS version: Ubuntu 24.04.2 LTS Linux-6.14.0-24-generic-x86_64-with-glibc2.39
  • driver_info: "Mesa 24.2.8-1ubuntu1 24.04.1"

Radeon 8060S [Integrated]

Inference ToolModelPrecisionTokens/sVRAM Usage
RWKV pipRWKV7-G0 7.2Bfp169.4913.47GB
web-rwkvRWKV7-G0 7.2Bfp1610.1613.25GB
web-rwkvRWKV7-G0 7.2Bint814.717.82GB
web-rwkvRWKV7-G0 7.2Bnf426.094.85GB
RWKV pipRWKV7-G1 2.9Bfp1617.575.52 GB
llama.cpp(ROCm)RWKV7-G1 2.9Bfp1627.385.75GB
llama.cpp(ROCm)RWKV7-G1 2.9BQ8_043.103.47GB
web-rwkvRWKV7-G1 2.9Bfp1631.295.9GB
web-rwkvRWKV7-G1 2.9Bint851.563.9GB
web-rwkvRWKV7-G1 2.9Bnf477.712.4GB

Data Source: issue #16 | issue #17 | issue #18

Test Environment:

  • CPU: AMD RYZEN AI MAX+ 395w
  • OS version: Ubuntu 24.04.2 LTS Linux-6.14.0-24-generic-x86_64-with-glibc2.39
  • driver_info: "Mesa 24.2.8-1ubuntu1~24.04.1"

AMD Radeon 780M [Integrated]

Inference ToolModelPrecisionTokens/sVRAM Usage
web-rwkvRWKV7-G0 7.2Bfp165.8013.26GB
web-rwkvRWKV7-G0 7.2Bint810.267.8GB
web-rwkvRWKV7-G0 7.2Bnf415.764.9GB
web-rwkvRWKV7-G1 2.9Bfp1613.615.9GB
web-rwkvRWKV7-G1 2.9Bint823.653.9GB
web-rwkvRWKV7-G1 2.9Bnf432.222.4GB

Test Environment:

  • CPU: AMD Ryzen 7 8845H (16) @ 5.61 GHz
  • OS version: Arch Linux x86_64 @ Kernel: Linux 6.15.7-arch1-1
  • driver_info: "Mesa 25.1.6-arch1.1"

Data Source: issue #11 | issue #12 | issue #15

AMD Radeon 610M [Integrated]

Inference ToolModelPrecisionTokens/sVRAM Usage
llama.cpp(Vulkan)RWKV7-G1 2.9Bfp166.125.75GB
llama.cpp(Vulkan)RWKV7-G1 2.9BQ8_07.543.47GB
web-rwkvRWKV7-G1 2.9Bfp168.495.9GB
web-rwkvRWKV7-G1 2.9Bint811.963.9GB
web-rwkvRWKV7-G1 2.9Bnf48.032.4GB

Data Source: issue #19 | issue #20

Test Environment:

  • CPU: AMD Ryzen 9 9955HX 16-Core Processor
  • OS version: Ubuntu 25.04 Kernel 6.14.0-15-generic
  • driver_info: "Mesa 25.0.7-0ubuntu0.25.04.1"

Mobile Chips

Info

Inference performance of RWKV models on mobile chips, including Qualcomm, MTK SoCs, and various embedded/edge computing devices.

Qualcomm Snapdragon 8 Gen 3

Performance on Snapdragon 8 Gen 3 (Xiaomi 14):

ModelPrecisionTokens/s
RWKV7-G1 2.9BA16W431.3
RWKV7-G1 2.9BA16W818.7

Qualcomm Snapdragon 8 Elite

Performance on Qualcomm Snapdragon 8 Elite (Xiaomi 15):

ModelPrecisionTokens/s
RWKV7-G1 2.9BA16W430.26
RWKV7-G1 2.9BA16W819.34

Info

Explanation of parameters in the table:

  • Precision: Represents different quantization strategies or computational precisions.
  • a16: Activations are quantized to 16-bit (int16).
  • w8/w4: Weights are quantized to 8-bit/4-bit (per-channel linear quantization).

Rockchip RK3588

Inference ToolModelPrecisionTokens/sVRAM Usage
llama.cpp(BLAS)RWKV7-G1 2.9BF163.62~6.5GB System RAM
llama.cpp(BLAS)RWKV7-G1 2.9BQ8_05.67~3.9GB System RAM
RKNN-LLM(NPU)RWKV7-G1 2.9BFP164.045.49GB
RKNN-LLM(NPU)RWKV7-G1 2.9BW8A86.582.80GB

Data Source: issue #7

Test Environment:

  • CPU: Rockchip RK3588
  • OS version: Armbian 25.5.2 noble on Radxa ROCK 5B

Moore-Threads Hardware

Info

Inference performance of RWKV models on Moore-Threads hardware. Currently includes the MTT-S4000, and performance data for other Moore-Threads hardware will be added later.

Moore-Threads MTT-S4000

Performance of RWKV models on Moore-Threads MTT-S4000:

ModelPrecisionTokens/sVRAM Usage (GB)
RWKV-6-1B6-v2.1fp1657.313.42
RWKV-6-1B6-v2.1fp3230.456.30
RWKV-6-3B-v2.1fp1636.096.27
RWKV-6-3B-v2.1fp3230.0311.99
RWKV-6-7B-v2.1fp1630.3914.43
RWKV-6-7B-v2.1fp3216.6228.71
RWKV-6-14B-v2.1fp1616.1926.57

Explanation of parameters in the table:

  • Model: Represents RWKV-6 models with different parameter counts.
  • Precision: Represents different quantization strategies or computational precisions.

Info

The performance data is based on the project: https://github.com/yuunnn-w/RWKV_Pytorch

Edit this page
Last Updated:
Contributors: luoqiqi, manjuan
Prev
Integrate with your application
Next
Frequently Asked Questions