Connecting to Remote Ollama Servers with SSH Tunneling
Introduction
Ollama is a tool for running large language models (LLMs) locally. When you have Ollama running on a remote server (e.g., a GPU-enabled workstation or HPC cluster), you can access it securely from your local machine using SSH tunneling. This guide demonstrates how to:
- Create an SSH tunnel to a remote Ollama instance
- Test the connection and query available models
- Use Ollama with the OpenAI-compatible API for seamless integration with existing code
Getting Started
Setting up the SSH Tunnel
First, configure your environment variables. We’ll map the remote Ollama service (default port 11434) to a local port 11435 to avoid conflicts with any local Ollama instance.
export OLLAMA_PORT="11435"
export REMOTE_HOST=<remote host>
Create a secure SSH tunnel to the remote server:
ssh -N -L "$OLLAMA_PORT":localhost:11434 "$REMOTE_HOST"
Flags explained:
-N: Don’t execute a remote command (tunnel only)-L: Create a local port forward (LocalPort:RemoteHost:RemotePort)
Tip: Run this command in a separate terminal or use & to run it in the background. The tunnel must stay active for the duration of your session.
Testing the Connection
Verify that Ollama is accessible through the tunnel:
# Test if ollama is running
curl http://localhost:"$OLLAMA_PORT"
# List all available models
curl http://localhost:"$OLLAMA_PORT"/api/tags
The /api/tags endpoint returns a JSON list of all models installed on the remote server, including their sizes and modification dates.
Making Direct Queries
Test text generation using the Ollama API:
# Test generate endpoint
curl http://localhost:"$OLLAMA_PORT"/api/generate -d '{
"model": "gpt-oss:20b",
"prompt":"Why is the sky blue?"
}'
This endpoint streams the response back line-by-line in JSON format.
Working with OpenAI API
Ollama provides an OpenAI-compatible API endpoint, making it easy to use with existing OpenAI client libraries. This is particularly useful when migrating code or working with tools that expect the OpenAI API format.
Basic Usage
Using the Python OpenAI SDK:
from openai import OpenAI
# Configure client to use local Ollama instance
client = OpenAI(
base_url=f"http://localhost:11435/v1",
api_key="ollama", # Dummy key, not used by Ollama for authentication
)
# Simple chat completion
chat_completion = client.chat.completions.create(
model="gpt-oss:20b",
messages=[{"role": "user", "content": "What is the meaning of life?"}],
stream=False,
)
print(chat_completion.choices[0].message.content)
Streaming Responses
For better interactivity with longer responses, enable streaming:
# Streaming chat completion
stream = client.chat.completions.create(
model="gpt-oss:20b",
messages=[{"role": "user", "content": "Explain quantum computing in detail."}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="", flush=True)
Error Handling
Add robust error handling for production use:
from openai import OpenAI, OpenAIError
try:
client = OpenAI(
base_url="http://localhost:11435/v1",
api_key="ollama",
)
response = client.chat.completions.create(
model="gpt-oss:20b",
messages=[{"role": "user", "content": "Hello!"}],
stream=False,
)
print(response.choices[0].message.content)
except OpenAIError as e:
print(f"API error: {e}")
except Exception as e:
print(f"Unexpected error: {e}")
Troubleshooting
Port already in use:
If port 11435 is already occupied, choose a different port:
export OLLAMA_PORT="11436" # or any available port
Connection refused:
- Verify Ollama is running on the remote server:
ssh $REMOTE_HOST "systemctl status ollama" - Check firewall settings on the remote server
- Ensure you have SSH access to the remote host
Model not found:
List available models first using /api/tags, then use the exact model name from that list.
Tunnel disconnects:
Use autossh for automatic reconnection:
autossh -M 0 -N -L "$OLLAMA_PORT":localhost:11434 "$REMOTE_HOST"
Conclusion
By tunneling to a remote Ollama instance, you can leverage powerful remote hardware while developing locally. The OpenAI-compatible API makes integration straightforward, allowing you to switch between Ollama and other LLM providers with minimal code changes.
Next steps:
- Explore Ollama’s model library for available models
- Learn about Ollama’s API documentation
- Consider using LangChain or LlamaIndex with Ollama for advanced applications
