Choose a model
We can split local models into three tiers based on their size and capabilities, balancing speed, intelligence, and world knowledge:Small (0.5-3B) | Medium (4B-9B) | Large (10B-32B) | |
---|---|---|---|
Speed | |||
Intelligence | |||
World Knowledge | |||
Recommended model | Gemma 2 2B | Qwen 2.5 7B | Mistral Small 3.1 |
You need a GPU for any of the medium or large models to be useful in practice. If you don’t have a GPU, you can use Hosted Models if small ones are insufficient.
Run models locally
You need to run a program on your computer that servers models to Cellm. We call these programs “providers”. Cellm supports Ollama, Llamafiles, and vLLM, as well as any OpenAI-compatible provider. If you don’t know any of these names, just use Ollama.Ollama
To get started with Ollama, we recommend you try out the Gemma 2 2B model, which is Cellm’s default local model.- Download and install Ollama. Ollama will start after the install and automatically run whenever you start up your computer.
- Download Gemma 2 2B model: Open Windows Terminal (open start menu, type
Windows Terminal
, and clickOK
), typeollama pull gemma2:2b
, and wait for the download to finish. - In Excel, select the
ollama/gemma2:2b
from the model dropdown menu, and type out the formula=PROMPT("Which model are you and who made you?")
. The model will tell you that is called “Gemma” and made by Google DeepMind.
You can use any model that Ollama supports. See https://ollama.com/search for a complete list.
LLamafile
Llamafile is project by Mozilla that combines llama.cpp with Cosmopolitan Libc, enabling you to download and run a single-file executable (called a “llamafile”) that runs locally on most computers, with no installation. To get started:- Download a llamafile from https://github.com/Mozilla-Ocho/llamafile (e.g. Gemma 2 2B).
-
Append
.exe
to the filename. For example,gemma-2-2b-it.Q6_K.llamafile
should be renamed togemma-2-2b-it.Q6_K.llamafile.exe
. -
Run the following command in your Windows terminal (open start menu, type
Windows Terminal
, and clickOK
):To offload inference to your NVIDIA or AMD GPU, run: -
Start Excel and select the
openaicompatible
provider from the model drop-down on Cellm’s ribbon menu. It doesn’t matter what model name you choose, as Llamafiles ignore a model’s name because a particular Llamafile serves one model only. A name is required though, because the OpenAI API expects it. - Set the Base Address textbox to http://localhost:8080.
Llamafiles are especially useful if you don’t have the necessary permissions to install programs on your computer.
Dockerized Ollama and vLLM
If you prefer to run models via docker, both Ollama and vLLM are packaged up with docker compose files in thedocker/
folder. vLLM is designed to run many requests in parallel and particularly useful if you need to process a lot of data with Cellm.
To get started, we recommend using Ollama with the Gemma 2 2B model:
-
Clone the source code:
-
Run the following command in the
docker/
directory: -
Start Excel and select the
openaicompatible
provider from the model drop-down on Cellm’s ribbon menu. Replace the model name with the name of the model you want to use. For Gemma 2 2B, the textbox should read “openaicompatible/gemma2:2b”. -
Set the Base Address textbox to
http://localhost:11434
.
ollama run mistral-small3.1:24b
in the container.
If you want to speed up inference, you can use your GPU as well:
Open WebUI is included in both Ollama and vLLM docker compose files so you can test the local model outside of Cellm. Open WebUI is available at
http://localhost:3000
.