Ollama Inference
Tips
Ollama is a simple and easy-to-use open-source local large language model running framework that supports one-click deployment and running of various open-source models on personal computers, featuring simple configuration and low resource consumption.
With the efforts of RWKV community member @MollySophia, Ollama now supports the RWKV-6 model.
This chapter introduces how to use the RWKV-6 model for inference in Ollama.
Download and Installation of Ollama
You can download the Ollama installer from the Ollama official website.
After downloading, double-click the exe file to install. After installation, Ollama
will automatically start, and you can see the Ollama
icon in the system taskbar.
Run RWKV model
There are two ways to run the gguf format RWKV model in Ollama:
- Download from Ollama's RWKV-6 repository: Simple to operate, but the RWKV model provided by Ollama only has the
q4_k_m
quantized version - Custom RWKV model: Requires manually downloading the
gguf
format RWKV model and creating aModelfile
configuration file, but you can freely run any quantized RWKV model
Ollama's RWKV-6 repository provides RWKV-6-World models with four parameter scales: 1.6B, 3B, 7B, and 14B.
Execute the command ollama run mollysama/rwkv-6-world:1.6b
in your terminal to automatically download and run the q4_k_m
quantized version of the RWKV-6-World 1.6B model.
Tips
You can replace 1.6b
with 3b
, 7b
, or 14b
to run RWKV-6-World models with other parameter scales.
Download RWKV gguf model
To customize the RWKV model, you need to first download the gguf
format RWKV-6-World model from the RWKV-6 GGUF repository or RWKV-7 GGUF repository.
⚠️ RWKV gguf models come in various quantized versions. It is recommended to use Q5_1
or Q8_0
quantization precision. Lower quantization precision (such as Q4_0
, Q3_0
, etc.) may significantly degrade the model's responses.
Create Modelfile for the model
Create a text file named Modelfile
in the folder where the RWKV gguf model file is stored, without any file extension.
Then open this text file with a text editor like "Notepad" and write the following content:
FROM rwkv-6-world-1.6b-Q8_0.gguf
TEMPLATE """
{{- range .Messages }}
{{- if eq .Role "user" }}User:
{{- else if eq .Role "assistant" }}Assistant:
{{- end }}{{ .Content }}
{{ end }}Assistant:"""
PARAMETER stop "\n\n"
PARAMETER stop "\nUser:"
Modify the rwkv-6-world-1.6b-Q8_0.gguf
after the first line FROM
to the filename of the RWKV model you downloaded.
⚠️ It is recommended to directly copy the above content into the Modelfile
to ensure there is a space after User:
and no space after Assistant:
; there is an empty line above {{ end }}Assistant:"""
and no extra characters after it.
Run custom RWKV model
Open the terminal in the RWKV gguf model folder and execute the ollama create
command:
ollama create rwkv-6-world-1.6b-Q8_0 -f Modelfile
Change the model name after ollama create
to your local RWKV model, ensuring it matches the model name in the Modelfile
.
After creation, use the ollama run
command to directly run the model:
ollama run rwkv-6-world-1.6b-Q8_0
After successfully running, you can chat with the model:
Ollama GUI and Desktop Programs
Ollama itself does not provide GUI or WebUI services, but its community offers third-party GUI and desktop programs.
You can view all third-party Ollama tools in the Ollama GitHub documentation.