Main Github
Hugging Face Integration
Community Discord
Main Github
Hugging Face Integration
Community Discord
  • RWKV Language Model
  • Getting Started
    • How to Experience RWKV
    • RWKV Decoding Parameters
    • Integrate with your application
    • Frequently Asked Questions
  • RWKV Prompting
    • Prompting Format Guidelines
    • Chat Prompt Examples
    • Completion Prompt Examples
  • Advanced
    • RWKV-LM Pre-training Tutorial
    • RWKV-FLA User Guide
    • Fine-tuning
    • Preparing The Training Datasets
    • RWKV Training Environment
    • RWKV Architecture History
    • RWKV pip Usage Guide
  • Inference Tutorials
    • llama.cpp Inference
    • Ollama Inference
    • Silly Tavern Inference
    • Text Generation WebUI Inference
    • KoboldCpp Inference
    • Ai00 Inference
  • Fine Tune Tutorials
    • State Tuning Tutorial
    • LoRA Fine-Tuning Tutorial
    • PiSSA Fine-Tuning Tutorial
    • DiSHA Fine-Tuning Tutorial
    • FAQ about Fine-Tuning
  • Community
    • Code Of Conduct
    • Contributing to RWKV
    • Various RWKV related links

Ollama Inference

Tips

Ollama is a simple and easy-to-use open-source local large language model running framework that supports one-click deployment and running of various open-source models on personal computers, featuring simple configuration and low resource consumption.

With the efforts of RWKV community member @MollySophia, Ollama now supports the RWKV-7 and RWKV-6 model.

This chapter introduces how to use the RWKV model for inference in Ollama.

Download and Installation of Ollama

You can download the Ollama installer from the Ollama official website.

After downloading, double-click the exe file to install. After installation, Ollama will automatically start, and you can see the Ollama icon in the system taskbar.

ollama-icon

Running RWKV Models Provided by Ollama

Tips

Ollama's RWKV repository provides RWKV7-G1 (recommended) and RWKV7-World models.

Caution

The RWKV-6 World model is outdated and no longer recommended.

RWKV7-G1 Model (Recommended)

Run the ollama run mollysama/rwkv-7-g1:2.9b command in the terminal, and Ollama will automatically download and run the RWKV7-G1 2.9B model. You can then have a conversation with the RWKV model in the terminal.

ollama-run-rwkv-7-g1-2.9b

Tips

By default, Ollama's RWKV7-G1 model has thinking mode enabled, which can be flexibly toggled on and off with the /set nothink and /set think commands.

All available Ollama/RWKV7-G1 models:

  • mollysama/rwkv-7-g1:2.9b: Quantization: Q8_0
  • mollysama/rwkv-7-g1:2.9b-q6_k: Quantization: Q6_K
  • mollysama/rwkv-7-g1:2.9b-thinkdisabled: Thinking mode disabled, Quantization: Q8_0
  • mollysama/rwkv-7-g1:2.9b-thinkdisabled-q6_k: Thinking mode disabled, Quantization: Q6_K
  • mollysama/rwkv-7-g1:1.5b: Quantization: Q8_0
  • mollysama/rwkv-7-g1:1.5b-q6_k: Quantization: Q6_K
  • mollysama/rwkv-7-g1:1.5b-thinkdisabled: Thinking mode disabled, Quantization: Q8_0
  • mollysama/rwkv-7-g1:1.5b-thinkdisabled-q6_k: Thinking mode disabled, Quantization: Q6_K

If you have previously downloaded the mollysama/rwkv-7-g1:2.9b model, please run the ollama pull mollysama/rwkv-7-g1:2.9b command to pull the latest changes.

RWKV-7-World Model

The RWKV7-G1 model is a comprehensive upgrade to the RWKV-7-World model. It is recommended to use the RWKV7-G1 model.

Run the ollama run mollysama/rwkv-7-world:2.9b command in the terminal, and Ollama will automatically download and run the RWKV7-World 2.9B model. You can have a conversation with the RWKV model in the terminal, as shown in the figure below:

ollama-run-rwkv-7-world-2.9b

All available Ollama/RWKV-7 World models:

  • mollysama/rwkv-7-world:1.5b: Quantization: Q4_K_M
  • mollysama/rwkv-7-world:2.9b: Quantization: Q4_K_M

Tips

If you have previously downloaded the mollysama/rwkv-7-world:2.9b model, please run the ollama pull mollysama/rwkv-7-world:2.9b command to pull the latest changes.

Running a Custom RWKV Model

To run a custom RWKV model, you need a model file in .gguf format and a Modelfile for configuring the chat template and decoding parameters. Then, use the ollama create command to create a custom Ollama model.

After creation is complete, you can use the ollama run command to run the custom model.

1. Download the RWKV gguf Model

You can download RWKV models in gguf format from the RWKV GGUF Collection.

Warning

RWKV gguf models have various quantized versions. It is recommended to use FP16 and Q8_0 quantization levels. Lower quantization levels (like Q5_K_M, Q4_K_M, etc.) may degrade the model's responses.

Tips

Fine-tuned an RWKV-7 model yourself and want to convert it from pth to gguf format? Check the llama.cpp docs - Convert pth model to gguf.


2. Create the Model's Modelfile

In the folder where the RWKV gguf model file is stored, create a text file named Modelfile, with no file extension.

Modelfile

Open the Modelfile with a text editor like "Notepad", and then create different Modelfile content based on whether the model supports thinking mode.

For RWKV Models that Support Thinking

For RWKV G1 series models that support thinking, please write the following content into the Modelfile:

FROM rwkv7-g1-2.9b-20250519-ctx4096-Q8_0.gguf

TEMPLATE """{{- if .System }}System: {{ .System }}{{ end }}
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1}}
{{- if eq .Role "user" }}
{{- if eq $i 0}}User: {{ .Content }}{{- else }}

User: {{ .Content }}{{- end }}
{{- else if eq .Role "assistant" }}

Assistant: <{{- if and $last .Thinking -}}think>{{ .Thinking }}</think>{{- else }}think>
</think>{{- end }}{{ .Content }}{{- end }}
{{- if and $last (ne .Role "assistant") }}

Assistant:{{- if $.IsThinkSet }} <{{- if not $.Think }}think>
</think>{{- end }}{{- end }}{{- end }}{{- end }}"""

PARAMETER stop """

"""
PARAMETER stop """
User"""

PARAMETER stop "User"
PARAMETER stop "Assistant"

PARAMETER temperature 1
PARAMETER top_p 0.5
PARAMETER repeat_penalty 1.2
For RWKV Models that Do Not Support Thinking

For RWKV-World and other models that do not support thinking, please write the following content into the Modelfile:

FROM rwkv7-g1-1.5b-20250429-ctx4096-Q8_0.gguf

TEMPLATE """{{- if .System }}System: {{ .System }}{{ end }}
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1}}
{{- if eq .Role "user" }}
{{- if eq $i 0}}User: {{ .Content }}{{- else }}

User: {{ .Content }}{{- end }}
{{- else if eq .Role "assistant" }}

Assistant:{{ .Content }}{{- end }}
{{- if and $last (ne .Role "assistant") }}

Assistant:{{- end -}}{{- end }}"""

PARAMETER stop """

"""
PARAMETER stop """
User"""

PARAMETER temperature 1
PARAMETER top_p 0.5
PARAMETER repeat_penalty 1.2

Please change rwkv-xxx.gguf after FROM in the first line to the filename of your local RWKV model.

Decoding parameters like PARAMETER temperature 1, PARAMETER top_p 0.5, etc., can be adjusted as needed.

3. Create and Run the Custom RWKV Model

In the directory where the RWKV gguf model and Modelfile are located, open a terminal and execute the ollama create command:

ollama create rwkv-xxx -f Modelfile

Tips

Change the model name after ollama create to your local RWKV model's name (it should be consistent with the model name in the Modelfile), but without the .gguf suffix.

After creation is complete, use the ollama run command to run the model directly:

ollama run rwkv-xxx

Once it runs successfully, you can start a chat conversation with the model.

Stopping Ollama

Please use the ollama stop mollysama/rwkv-7-g1:2.9b command to stop the current model instance, which will reset the conversation context.

Otherwise, Ollama will continuously retain the current session's context (history messages) as a reference for subsequent conversations.

Ollama GUIs and Desktop Applications

Ollama itself does not provide a GUI or WebUI service, but its community offers third-party GUIs and desktop applications.

You can view all third-party Ollama tools in Ollama's GitHub documentation.

References

  • Ollama official website
  • RWKV gguf model repository
  • Ollama GitHub documentation
Edit this page
Last Updated:
Contributors: luoqiqi, manjuan, 漫卷
Prev
llama.cpp Inference
Next
Silly Tavern Inference