Main Github
Hugging Face Integration
Community Discord
Main Github
Hugging Face Integration
Community Discord
  • RWKV Language Model
  • Getting Started
    • How to Experience RWKV
    • RWKV Decoding Parameters
    • Integrate with your application
    • Frequently Asked Questions
  • RWKV Prompting
    • Prompting Format Guidelines
    • Chat Prompt Examples
    • Completion Prompt Examples
  • Advanced
    • Fine-tuning
    • Preparing The Training Datasets
    • RWKV Training Environment
    • RWKV Architecture History
    • RWKV pip Usage Guide
  • Inference Tutorials
    • llama.cpp Inference
    • Ollama Inference
    • Silly Tavern Inference
    • Text Generation WebUI Inference
    • KoboldCpp Inference
    • Ai00 Inference
  • Fine Tune Tutorials
    • State Tuning Tutorial
    • LoRA Fine-Tuning Tutorial
    • PiSSA Fine-Tuning Tutorial
    • DiSHA Fine-Tuning Tutorial
    • FAQ about Fine-Tuning
  • Community
    • Code Of Conduct
    • Contributing to RWKV
    • Various RWKV related links

Ollama Inference

Tips

Ollama is a simple and easy-to-use open-source local large language model running framework that supports one-click deployment and running of various open-source models on personal computers, featuring simple configuration and low resource consumption.

With the efforts of RWKV community member @MollySophia, Ollama now supports the RWKV-6 model.

This chapter introduces how to use the RWKV-6 model for inference in Ollama.

Download and Installation of Ollama

You can download the Ollama installer from the Ollama official website.

After downloading, double-click the exe file to install. After installation, Ollama will automatically start, and you can see the Ollama icon in the system taskbar.

ollama-icon

Run RWKV model

There are two ways to run the gguf format RWKV model in Ollama:

  • Download from Ollama's RWKV-6 repository: Simple to operate, but the RWKV model provided by Ollama only has the q4_k_m quantized version
  • Custom RWKV model: Requires manually downloading the gguf format RWKV model and creating a Modelfile configuration file, but you can freely run any quantized RWKV model
Download from Ollama repository

Ollama's RWKV-6 repository provides RWKV-6-World models with four parameter scales: 1.6B, 3B, 7B, and 14B.

ollama-rwkv-6-model-repo

Execute the command ollama run mollysama/rwkv-6-world:1.6b in your terminal to automatically download and run the q4_k_m quantized version of the RWKV-6-World 1.6B model.

ollama-run-rwkv-6-world-1.6b

Tips

You can replace 1.6b with 3b, 7b, or 14b to run RWKV-6-World models with other parameter scales.

Custom RWKV model

Download RWKV gguf model

To customize the RWKV model, you need to first download the gguf format RWKV-6-World model from the RWKV-6 GGUF repository or RWKV-7 GGUF repository.


⚠️ RWKV gguf models come in various quantized versions. It is recommended to use Q5_1 or Q8_0 quantization precision. Lower quantization precision (such as Q4_0, Q3_0, etc.) may significantly degrade the model's responses.


Create Modelfile for the model

Create a text file named Modelfile in the folder where the RWKV gguf model file is stored, without any file extension.

Modelfile

Then open this text file with a text editor like "Notepad" and write the following content:

FROM rwkv-6-world-1.6b-Q8_0.gguf

TEMPLATE """
{{- range .Messages }}
{{- if eq .Role "user" }}User: 
{{- else if eq .Role "assistant" }}Assistant:
{{- end }}{{ .Content }}

{{ end }}Assistant:"""

PARAMETER stop "\n\n"
PARAMETER stop "\nUser:"

Modify the rwkv-6-world-1.6b-Q8_0.gguf after the first line FROM to the filename of the RWKV model you downloaded.


⚠️ It is recommended to directly copy the above content into the Modelfile to ensure there is a space after User: and no space after Assistant:; there is an empty line above {{ end }}Assistant:""" and no extra characters after it.


Modelfile

Run custom RWKV model

Open the terminal in the RWKV gguf model folder and execute the ollama create command:

ollama create rwkv-6-world-1.6b-Q8_0 -f Modelfile

Change the model name after ollama create to your local RWKV model, ensuring it matches the model name in the Modelfile.


ollama-create

After creation, use the ollama run command to directly run the model:

ollama run rwkv-6-world-1.6b-Q8_0

After successfully running, you can chat with the model:

ollama-chat

Ollama GUI and Desktop Programs

Ollama itself does not provide GUI or WebUI services, but its community offers third-party GUI and desktop programs.

You can view all third-party Ollama tools in the Ollama GitHub documentation.

References

  • Ollama official website
  • RWKV gguf model repository
  • Ollama GitHub documentation
Edit this page
Last Updated:
Contributors: luoqiqi, manjuan
Prev
llama.cpp Inference
Next
Silly Tavern Inference