Local AI Models

GPT-OSS 20B, Ollama, model families, Thunderbolt storage, and equipment. Use a local model as backup to the API.

Answer a few questions on the home page to get setup suggestions tailored to your budget and goals.

GPT-OSS 20B — OpenAI’s open-weight model for local

GPT-OSS-20B is OpenAI’s open-weight model (Apache 2.0) designed for local deployment. ~21B parameters, 3.6B active (MoE), 14GB on disk with MXFP4 quantization. Runs in 16GB RAM. 128K context. Strong reasoning, chain-of-thought, function calling, and agentic use—ideal for OpenClaw backup or primary local.

ollama run gpt-oss:20b

In OpenClaw: agent.model ollama/gpt-oss:20b. There is also GPT-OSS-120B for high-end GPUs (e.g. 80GB); for Mac mini, 20B is the sweet spot.

Local model as backup to the API

Run a primary model via API (Anthropic, OpenAI) with a local model as fallback: continuity when the API is down, privacy-sensitive work on-device, and lower cost for high-volume tasks. On the BRAIN or solo Mac, run Ollama (e.g. gpt-oss:20b or llama3.1:8b) and point OpenClaw at it when you want backup. See Architecture and Best Practices.

Runtimes: Ollama vs LM Studio

Ollama

CLI + local server (port 11434). One command to pull and run models. OpenClaw: agent.baseUrl http://localhost:11434, agent.model ollama/<name>. Best for 24/7 backup, headless BRAIN, and scripting.

LM Studio

GUI for Windows/Mac. Good for trying models. Can expose an OpenAI-compatible server. For headless or BRAIN setups, Ollama is usually simpler.

Model families and what they’re good at

Model / Ollama tag	Best for	Size / RAM
gpt-oss:20b	Reasoning, agents, function calling, CoT. OpenAI open-weight.	14GB / 16GB+
llama3.1:8b, llama3.2:8b	General chat, coding. Good default.	~4–5GB / 8GB+
qwen2.5:7b, qwen2.5:14b	Strong reasoning, long context.	7B ~4GB, 14B ~8GB / 16GB
gemma2:2b, gemma2:9b, gemma2:27b	Efficient; 2B for low resource.	2B–27B / 8GB–32GB
phi3:mini, phi3:medium	Small, fast. Edge and quick replies.	3B–14B / 8GB–16GB
deepseek-r1, deepseek-coder	Reasoning (R1), coding.	Varies / 16GB+
mistral:7b, mixtral:8x7b	Speed/quality balance; Mixtral MoE.	7B ~4GB, 8x7B ~26GB

Equipment: RAM, storage, Thunderbolt

RAM (unified memory): 8GB = small models (8B Q4). 16GB = GPT-OSS 20B, 14B Q4, 8B Q8. 32GB+ = 27B–70B quantized. Main limit on Apple Silicon.
Thunderbolt 4 / USB4: Mac mini has TB4 (40 Gb/s); M4 Pro has Thunderbolt 5 (120 Gb/s). Use a TB4/USB4 NVMe enclosure for external model storage—faster than base 256GB internal in many cases. See Hardware & Storage for DRAM vs DRAMless and enclosure picks.
Storage capacity: Models range 4GB–40GB+ each. 256GB internal is tight; external TB4 NVMe (1TB+) gives room and speed.
CPU/GPU: Apple Silicon (M1/M2/M4) runs Ollama via Metal. No separate GPU needed for Mac mini.

Mac mini guide Hardware & Thunderbolt Architecture