Introduction

Ollama is a tool for running large language models (LLMs) locally. When you have Ollama running on a remote server (e.g., a GPU-enabled workstation or HPC cluster), you can access it securely from your local machine using SSH tunneling. This guide demonstrates how to:

  • Create an SSH tunnel to a remote Ollama instance
  • Test the connection and query available models
  • Use Ollama with the OpenAI-compatible API for seamless integration with existing code

Getting Started

Setting up the SSH Tunnel

First, configure your environment variables. We’ll map the remote Ollama service (default port 11434) to a local port 11435 to avoid conflicts with any local Ollama instance.

export OLLAMA_PORT="11435"
export REMOTE_HOST=<remote host>

Create a secure SSH tunnel to the remote server:

ssh -N -L "$OLLAMA_PORT":localhost:11434 "$REMOTE_HOST"

Flags explained:

  • -N: Don’t execute a remote command (tunnel only)
  • -L: Create a local port forward (LocalPort:RemoteHost:RemotePort)

Tip: Run this command in a separate terminal or use & to run it in the background. The tunnel must stay active for the duration of your session.

Testing the Connection

Verify that Ollama is accessible through the tunnel:

# Test if ollama is running
curl http://localhost:"$OLLAMA_PORT"

# List all available models
curl http://localhost:"$OLLAMA_PORT"/api/tags

The /api/tags endpoint returns a JSON list of all models installed on the remote server, including their sizes and modification dates.

Making Direct Queries

Test text generation using the Ollama API:

# Test generate endpoint
curl http://localhost:"$OLLAMA_PORT"/api/generate -d '{
  "model": "gpt-oss:20b",
  "prompt":"Why is the sky blue?"
}'

This endpoint streams the response back line-by-line in JSON format.

Working with OpenAI API

Ollama provides an OpenAI-compatible API endpoint, making it easy to use with existing OpenAI client libraries. This is particularly useful when migrating code or working with tools that expect the OpenAI API format.

Basic Usage

Using the Python OpenAI SDK:

from openai import OpenAI

# Configure client to use local Ollama instance
client = OpenAI(
    base_url=f"http://localhost:11435/v1",
    api_key="ollama",  # Dummy key, not used by Ollama for authentication
)

# Simple chat completion
chat_completion = client.chat.completions.create(
    model="gpt-oss:20b",
    messages=[{"role": "user", "content": "What is the meaning of life?"}],
    stream=False,
)

print(chat_completion.choices[0].message.content)

Streaming Responses

For better interactivity with longer responses, enable streaming:

# Streaming chat completion
stream = client.chat.completions.create(
    model="gpt-oss:20b",
    messages=[{"role": "user", "content": "Explain quantum computing in detail."}],
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="", flush=True)

Error Handling

Add robust error handling for production use:

from openai import OpenAI, OpenAIError

try:
    client = OpenAI(
        base_url="http://localhost:11435/v1",
        api_key="ollama",
    )

    response = client.chat.completions.create(
        model="gpt-oss:20b",
        messages=[{"role": "user", "content": "Hello!"}],
        stream=False,
    )

    print(response.choices[0].message.content)

except OpenAIError as e:
    print(f"API error: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

Troubleshooting

Port already in use: If port 11435 is already occupied, choose a different port:

export OLLAMA_PORT="11436"  # or any available port

Connection refused:

  • Verify Ollama is running on the remote server: ssh $REMOTE_HOST "systemctl status ollama"
  • Check firewall settings on the remote server
  • Ensure you have SSH access to the remote host

Model not found: List available models first using /api/tags, then use the exact model name from that list.

Tunnel disconnects: Use autossh for automatic reconnection:

autossh -M 0 -N -L "$OLLAMA_PORT":localhost:11434 "$REMOTE_HOST"

Conclusion

By tunneling to a remote Ollama instance, you can leverage powerful remote hardware while developing locally. The OpenAI-compatible API makes integration straightforward, allowing you to switch between Ollama and other LLM providers with minimal code changes.

Next steps: