Skip to content
Mchael Poncardas
Go back

How to Run Local AI Models with Ollama and OpenCode

8 min read

Step-by-step guide on how to run local LLMs with Ollama and using them via OpenCode.

Table of contents

Open Table of contents

Why Run Local AI Models?

Pros

Cons

The Trade-off

Use local models when: Privacy matters, you process sensitive code, or you want zero ongoing costs.

Use cloud APIs when: You need maximum capability, fastest responses, or don’t want to manage hardware.

This guide focuses on the local approach with a mid-range setup (RTX 3060 + 32GB RAM) running an 8B parameter model—balanced for everyday coding assistance without cloud dependency.

Prerequisites

My setup:

ComponentSpecification
OSUbuntu 24.04.4 LTS (Noble Numbat)
CPUAMD Ryzen 7 3700X (16 threads) @ 4.43 GHz
GPUNVIDIA GeForce RTX 3060
RAM32 GB

Minimum requirements:

Why a good GPU matters:

Local AI models are compute-intensive. A dedicated NVIDIA GPU with CUDA cores (or Apple Silicon with Metal) dramatically speeds up token generation:

VRAM is the limiting factor. The model must fit entirely in GPU memory to run efficiently. The 9.6GB gemma4:e4b model fits comfortably in an RTX 3060’s 12GB VRAM. Larger models (13B, 70B) require proportionally more VRAM (24GB, 48GB+).

Check your specs:

# OS version
lsb_release -a

# CPU info
lscpu | grep "Model name\|CPU(s)"

# RAM
free -h

# GPU (if NVIDIA)
nvidia-smi -L

Step 1: Install Ollama

Download and install Ollama using the official install script:

curl -fsSL https://ollama.com/install.sh | sh

Source: https://ollama.com/

Note: This installs Ollama as a system service that runs in the background.

Ollama installation complete

What this does:

Step 2: Pull a Model

Browse available models at https://ollama.com/search. For this guide, we’ll use Gemma 4 e4b — the 8B parameter, Q4_K_M quantized variant tagged as latest.

Tag explanation: Ollama uses Docker-style tagging. gemma4 or gemma4:latest pulls the default variant. Explicit tags like gemma4:e2b or gemma4:26b pull specific sizes.

Run the command to pull the model:

ollama pull gemma4:e4b

Model specs (gemma4:e4b):

PropertyValue
Size9.6GB
Parameters8B
QuantizationQ4_K_M
Context length128K tokens
ModalitiesText + Image
LicenseApache 2.0

Download complete:

Model download complete

Note: First pull downloads ~9.6 GB. Subsequent runs are instant (model cached locally).

Step 3: Test the Model (Interactive Mode)

Run the model interactively to verify it’s working:

ollama run gemma4:e4b

This drops you into a chat session with the model. Try asking it something:

>>> Why is the sky blue?

Expected behavior:

Gemma 4 shows its reasoning process before responding:

Chain-of-thought reasoning

Chain-of-thought reasoning displayed during generation

Then outputs a formatted, detailed answer:

Formatted response

Formatted response with markdown headers, emojis, and blockquotes

To exit: press Ctrl+D or type /bye.

Step 4: Install OpenCode

OpenCode is an AI coding harness that works with local models via Ollama.

Install via the official script:

curl -fsSL https://opencode.ai/install | bash

Source: https://opencode.ai/

Installation progress:

OpenCode installation complete

OpenCode installed and added to PATH

The installer automatically adds OpenCode to your shell PATH (.zshrc in this case).

Step 5: Configure OpenCode to Use Ollama

Create the OpenCode configuration file to connect to your local Ollama instance:

mkdir -p ~/.config/opencode
cat > ~/.config/opencode/opencode.json << 'EOF'
{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "ollama": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "Ollama",
      "options": {
        "baseURL": "http://localhost:11434/v1"
      },
      "models": {
        "gemma4:e4b": {
          "name": "gemma4:e4b"
        }
      }
    }
  }
}
EOF

Important: The JSON key in the models section must match the Ollama model name exactly, including the colon (:). Using gemma4-e4b (with hyphen) will cause a “model not found” error. Always use gemma4:e4b (with colon).

Note: Make sure Ollama is running (check with ollama list to see model names). The systemd service started automatically after installation.

Launch OpenCode

opencode

Initial interface:

OpenCode launch

OpenCode v1.3.17 main interface on first launch

The interface shows:

Switch to Your Local Model

Type /models or select it from the list.

Model selection screen:

Model selection screen

Local gemma4:e4b appears under the Ollama section

The model picker shows:

Select gemma4:e4b and press Enter. OpenCode will now use your local model for all requests.

Verification: Local Model Working

Once selected, start a conversation. The chat interface will show:

Local model working

OpenCode successfully using local gemma4:e4b via Ollama — generating a Python learning plan

Confirmation signs:

Note: If you see model 'gemma4-e4b' not found, the JSON key in your config has a hyphen instead of a colon. Fix: use "gemma4:e4b": as the key (not "gemma4-e4b":).

Testing: Build a Note-Taking App

Verify your local model can generate working code. In OpenCode with your local model selected, paste:

Create a simple note-taking app using vanilla JavaScript, HTML, and CSS.
Do not use web search — just write the code directly.

Requirements:
- Input field to add new notes
- Display list of saved notes with timestamps
- Delete button for each note
- Save to localStorage (persist after page refresh)
- Clean, centered layout (max-width 600px)
- No external libraries — pure vanilla JS/CSS

Provide the complete code as a single HTML file.

Local model generating code:

Code generation

Gemma 4 26B model generating a complete note-taking app with HTML, CSS, and JS

The model outputs a complete, runnable HTML file with:

Model Variant

The screenshot shows gemma4:26b (26B parameters, 18GB) being used for code generation. I switched to the larger gemma4:26b model because I was having some tool-calling issues with gemma4:e4b. It handled structured outputs more reliably but with slower token generation.

Generated app running in browser:

Note app working

Working note-taking app generated by the local Gemma 26B model

The gemma4:26b model created this note-taking tool in 3m 29s. It may not look fancy, but it works! — add notes, delete them, timestamps appear automatically, and data persists after refresh and it even added a nice keyframes fade-in animation.

Features that work:

Troubleshooting

Issue: model 'gemma4-e4b' not found

Cause: The JSON key in opencode.json uses a hyphen instead of the exact Ollama model name.

Solution: Ensure the JSON key matches the model name exactly, including the colon:

// Wrong ❌
"models": {
  "gemma4-e4b": {
    "name": "gemma4:e4b"
  }
}

// Correct ✅
"models": {
  "gemma4:e4b": {
    "name": "gemma4:e4b"
  }
}

Issue: Ollama connection refused

Cause: Ollama service not running.

Solution: Check Ollama status:

sudo systemctl status ollama
ollama list

If not running:

sudo systemctl start ollama

Issue: Slow response times

Cause: First run loads model into memory; subsequent runs are faster.

Solution: Keep Ollama running. The model stays cached in RAM after first use.

Share this post on:
Share this post on LinkedIn Share this post via WhatsApp Share this post on Facebook Share this post via email

Related Posts