> ## Documentation Index
> Fetch the complete documentation index at: https://docs.getcellm.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Local Models

> How to use local models

Cellm supports local models that run on your computer via Llamafiles, Ollama, or vLLM. This ensures none of your data ever leaves your machine. And it's free.

On this page you will learn what to consider when choosing a local model and how to run it.

## Model sizes

We can split local models into three tiers based on their size and capabilities, balancing speed, intelligence, and world knowledge:

|                   | Small Model                                                                                                           | Medium Model                                                                                                        | Large Model                                                                                                         |
| ----------------- | --------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------- |
| Speed             | <Icon icon="star" iconType="solid" /><Icon icon="star" iconType="solid" /><Icon icon="star" iconType="solid" />       | <Icon icon="star" iconType="solid" /><Icon icon="star" iconType="solid" /><Icon icon="star" iconType="regular" />   | <Icon icon="star" iconType="solid" /><Icon icon="star" iconType="regular" /><Icon icon="star" iconType="regular" /> |
| Intelligence      | <Icon icon="star" iconType="solid" /><Icon icon="star" iconType="regular" /><Icon icon="star" iconType="regular" />   | <Icon icon="star" iconType="solid" /><Icon icon="star" iconType="solid" /><Icon icon="star" iconType="regular" />   | <Icon icon="star" iconType="solid" /><Icon icon="star" iconType="solid" /><Icon icon="star" iconType="solid" />     |
| World Knowledge   | <Icon icon="star" iconType="regular" /><Icon icon="star" iconType="regular" /><Icon icon="star" iconType="regular" /> | <Icon icon="star" iconType="solid" /><Icon icon="star" iconType="regular" /><Icon icon="star" iconType="regular" /> | <Icon icon="star" iconType="solid" /><Icon icon="star" iconType="solid" /><Icon icon="star" iconType="regular" />   |
| Recommended model | Gemma 4 E4B                                                                                                           | Gemma 4 26B                                                                                                         | Gemma 4 31B                                                                                                         |

<Tip>
  You need a GPU for any of the medium or large models to be useful in practice. If you don't have a GPU, you can use [Hosted Models](/models/hosted-models) if small ones are insufficient.
</Tip>

In general, smaller models are faster and less intelligent, while larger models are slower and more intelligent. When using local models, it's important to find the right balance for your task, because speed impacts your productivity and intelligence impacts your results. You should try out different models and choose the smallest one that gives you good results.

Small models are sufficient for many common tasks such as categorizing text or extracting person names from news articles. Medium models are appropriate for more complex tasks such as document review, survey analysis, or tasks involving function calling. Large models are useful for creative writing, tasks requiring nuanced language understanding such as spam detection, or tasks requiring world knowledge.

Models larger than 32B require significant hardware investment to run locally, and you are better off using [Hosted Models](/models/hosted-models) if you need this kind of intelligence and don't have the hardware already.

<Tip>
  Large models are needed to use the Internet Browser tool effectively.
</Tip>

## Run models locally

You need to run a program on your computer that serves models to Cellm. We call these programs "providers". Cellm supports Ollama, Llamafiles, and vLLM, as well as any OpenAI-compatible provider. If you don't know any of these names, just use Ollama.

### Ollama

To get started with Ollama, we recommend you try out the Gemma 4 E4B model, which is Cellm's default local model.

<Steps>
  <Step title="Install Ollama">
    Download and install [Ollama](https://ollama.com/). Ollama will start after the install and automatically run whenever you start up your computer.
  </Step>

  <Step title="Download the model">
    When you select an Ollama model in Cellm, it will prompt you to download it automatically. Alternatively, open Windows Terminal (open start menu, type `Windows Terminal`, and click `OK`), then run:

    ```bash Download Gemma 4 E4B theme={null}
    ollama pull gemma4:e4b
    ```

    Wait for the download to finish.
  </Step>

  <Step title="Test in Excel">
    In Excel, select `ollama/gemma4:e4b` from the model dropdown menu, and type:

    ```mdx Test prompt theme={null}
    =PROMPT("Which model are you and who made you?")
    ```

    The model will tell you that it is called "Gemma 4" and made by Google DeepMind.
  </Step>
</Steps>

<Info>
  You can use any model that Ollama supports. See [https://ollama.com/search](https://ollama.com/search) for a complete list.
</Info>

### LLamafile

Llamafile is a project by Mozilla that combines llama.cpp with Cosmopolitan Libc, enabling you to download and run a single-file executable (called a "llamafile") that runs locally on most computers, with no installation.

<Steps>
  <Step title="Download a llamafile">
    Download a llamafile from [https://github.com/Mozilla-Ocho/llamafile](https://github.com/Mozilla-Ocho/llamafile) (e.g. [Gemma 3 4B IT](https://huggingface.co/Mozilla/gemma-3-4b-it-llamafile/resolve/main/google_gemma-3-4b-it-Q6_K.llamafile)).
  </Step>

  <Step title="Rename the file">
    Append `.exe` to the filename. For example, `google_gemma-3-4b-it-Q6_K.llamafile` should be renamed to `google_gemma-3-4b-it-Q6_K.llamafile.exe`.
  </Step>

  <Step title="Run the llamafile">
    Open Windows Terminal (open start menu, type `Windows Terminal`, and click `OK`) and run:

    ```bash CPU only theme={null}
    .\google_gemma-3-4b-it-Q6_K.llamafile.exe --server --v2
    ```

    To offload inference to your NVIDIA or AMD GPU, run:

    ```bash With GPU theme={null}
    .\google_gemma-3-4b-it-Q6_K.llamafile.exe --server --v2 -ngl 999
    ```
  </Step>

  <Step title="Configure Cellm">
    Start Excel and select the `OpenAiCompatible` provider from the model drop-down on Cellm's ribbon menu. Enter any model name e.g., "gemma". Llamafiles ignore the model name since each llamafile serves only one model, but a name is required by the OpenAI API.

    Set the Base Address to `http://localhost:8080`.
  </Step>
</Steps>

<Tip>
  Llamafiles are especially useful if you don't have the necessary permissions to install programs on your computer.
</Tip>

### Dockerized Ollama

If you prefer to run models via docker, both Ollama and vLLM are packaged up with docker compose files in the `docker/` folder. vLLM is designed to run many requests in parallel and particularly useful if you need to process a lot of data with Cellm.

<Steps>
  <Step title="Clone the repository">
    ```bash Clone repo theme={null}
    git clone https://github.com/getcellm/cellm
    ```
  </Step>

  <Step title="Start Ollama container">
    Run the following command in the `docker/` directory:

    ```bash Start container theme={null}
    docker compose -f docker-compose.Ollama.yml up --detach
    ```

    To use your GPU for faster inference:

    ```bash Start with GPU theme={null}
    docker compose -f docker-compose.Ollama.yml -f docker-compose.Ollama.GPU.yml up --detach
    ```

    To stop the container:

    ```bash Stop container theme={null}
    docker compose -f docker-compose.Ollama.yml down
    ```
  </Step>

  <Step title="Configure Cellm">
    Start Excel and select the `openaicompatible` provider from the model drop-down on Cellm's ribbon menu. Enter the model name you want to use, e.g., `gemma4:e4b`.

    Set the Base Address to `http://localhost:11434`.
  </Step>
</Steps>

To use other Ollama models, pull another of the [supported models](https://ollama.com/search) by running e.g. `ollama run mistral-small3.1:24b` in the container.

### Dockerized vLLM

If you want to speed up running many requests in parallel, you can use vLLM instead of Ollama. vLLM requires a Hugging Face API key to download models from the Hugging Face Hub.

<Steps>
  <Step title="Set up Hugging Face API key">
    You must supply the docker compose file with a Hugging Face API key either via an environment variable or by editing the docker compose file directly. Look at the vLLM docker compose file for details.

    If you don't know what a Hugging Face API key is, just use Ollama instead.
  </Step>

  <Step title="Start vLLM container">
    ```bash Start vLLM theme={null}
    docker compose -f docker-compose.vLLM.GPU.yml up --detach
    ```

    To use other vLLM models, change the `--model` argument in the docker compose file to another Hugging Face model.
  </Step>
</Steps>

<Tip>
  Open WebUI is included in both Ollama and vLLM docker compose files so you can test the local model outside of Cellm. Open WebUI is available at `http://localhost:3000`.
</Tip>
