Configure Ollama

Introduction

When running Ollama on a server, you may need to configure various aspects of the deployment - from changing model storage locations to adjusting performance parameters. This guide covers common server-side administration tasks:

Understanding Ollama environment variables and configuration options
Changing the model storage location (useful for disk space management)
Configuring Ollama via systemd service files
Migrating existing models to new storage locations

For information on connecting to a remote Ollama instance from your local machine, see Connecting to Remote Ollama Servers with SSH Tunneling.

showing Environment variables

ollama serve --help

Start ollama

Usage:
  ollama serve [flags]

Aliases:
  serve, start

Flags:
  -h, --help   help for serve

Environment Variables:
      OLLAMA_DEBUG               Show additional debug information (e.g. OLLAMA_DEBUG=1)
      OLLAMA_HOST                IP Address for the ollama server (default 127.0.0.1:11434)
      OLLAMA_CONTEXT_LENGTH      Context length to use unless otherwise specified (default: 4096)
      OLLAMA_KEEP_ALIVE          The duration that models stay loaded in memory (default "5m")
      OLLAMA_MAX_LOADED_MODELS   Maximum number of loaded models per GPU
      OLLAMA_MAX_QUEUE           Maximum number of queued requests
      OLLAMA_MODELS              The path to the models directory
      OLLAMA_NUM_PARALLEL        Maximum number of parallel requests
      OLLAMA_NOPRUNE             Do not prune model blobs on startup
      OLLAMA_ORIGINS             A comma separated list of allowed origins
      OLLAMA_SCHED_SPREAD        Always schedule model across all GPUs
      OLLAMA_FLASH_ATTENTION     Enabled flash attention
      OLLAMA_KV_CACHE_TYPE       Quantization type for the K/V cache (default: f16)
      OLLAMA_LLM_LIBRARY         Set LLM library to bypass autodetection
      OLLAMA_GPU_OVERHEAD        Reserve a portion of VRAM per GPU (bytes)
      OLLAMA_LOAD_TIMEOUT        How long to allow model loads to stall before giving up (default "5m")

showing the current model folder size

sudo du -h -d 1 /usr/share/ollama/.ollama/models

Why Change the Model Storage Location?

You might want to change Ollama’s model storage location for several reasons:

Disk space: Models can be large (7B models ~4GB, 70B models ~40GB+)
Performance: Move models to faster storage (NVMe SSD vs HDD)
Organization: Separate system and data partitions

Edit Config file in systemd

sudo vi /etc/systemd/system/ollama.service

add the following line to point to the new location

Environment="OLLAMA_MODELS=/your/desired/path" # point to the new location

make sure to change the ownership of the new path

sudo chown ollama:ollama <new model path>

and move existing models to the new location

sudo -u ollama rsync -av --ignore-existing /usr/share/ollama/.ollama/models/ <new model path>/

restart the service

sudo systemctl daemon-reload
sudo systemctl restart ollama

double check if the service runs.

systemctl status ollama
ollama list

Verify Models Work After Migration

After moving models, test that they’re accessible and functioning properly:

# List all available models
ollama list

# Test a model with a simple prompt
ollama run <model-name> "test prompt"

If the models are listed and respond correctly, the migration was successful!

Yihui (Ray) Ren