Pi × LLM
Quick Facts
- Client
- Raspberry Pi 4 / Pi 5 / Pi Zero 2 W
- Server
- Claw Mac Mini (M-series) running OpenClaw
- Gateway
- Hermes agent gateway (HTTP + WebSocket, OpenAI-compatible + streaming)
- Gateway model
- Hermes 4 (default) or Hermes 3 — self-hosted via vLLM on the Mini
- Transport
- Tailscale mesh (default) · LAN mDNS (airgapped)
- Auth
- Tailscale identity + scoped API key per Pi device
- Wiki access
- Exposed as a tool on the gateway — Hermes retrieves + cites entries
- Footprint on Pi
- <50MB RAM (ZeroClaw client binary) or Python script
Why This Recipe
OpenClaw on a Mac Mini is powerful but not portable. Walking over to the Mini, opening Slack, and typing a prompt is a bad user story for most physical contexts — a shop floor with greasy hands, a meeting room without a laptop, a kiosk in a hallway, an always-on display next to a rack. The Pi closes that gap: $35–$80 of hardware, a dedicated screen or speaker, and a direct line to the full OpenClaw agent running on the Mini.
The Hermes model choice is deliberate. Hermes's steerable alignment — system prompt as source of truth — is what makes it usable as a gateway model. Different Pi terminals can speak to the same Mini with different personas, tool allow-lists, and response styles without needing separate model deployments. One gateway, many physical surfaces.
Architecture
┌─────────────────────┐ Tailscale (wg) ┌──────────────────────────────┐
│ Raspberry Pi │ ───────────────────────► │ Claw Mac Mini │
│ - client binary │ │ ┌────────────────────────┐ │
│ - screen / mic / │ ◄─── WebSocket stream ── │ │ Hermes Agent Gateway │ │
│ buttons / GPIO │ │ │ (HTTP + WebSocket) │ │
│ - local cache │ │ └──────────┬─────────────┘ │
└─────────────────────┘ │ │ │
│ ▼ │
│ ┌────────────────────────┐ │
│ │ OpenClaw agent loop │ │
│ │ - Hermes 4 (vLLM) │ │
│ │ - Wiki retrieval tool │ │
│ │ - Claw tools │ │
│ │ - FrawdBot inline │ │
│ └────────────────────────┘ │
└──────────────────────────────┘
- Hermes gateway — a small HTTP + WebSocket service exposed on the Mini at
http://claw-mini.tailnet:7820. Speaks the OpenAI chat-completions protocol (for client compatibility) plus a streaming/agent/streamWebSocket for tool-capable sessions. - Inference — Hermes 4 (or 3) running in vLLM on the Mac Mini M-series GPU. Quantized variants (AWQ / GPTQ) are used on Mini configurations with less than 32GB unified memory.
- Wiki as a tool — the Mini exposes
wiki.lookup(slug)andwiki.search(query)as agent tools. The gateway's system prompt teaches Hermes to cite LLM Wiki entries by slug (e.g./wiki/qwen.html) when answering infra questions. - Tools inherited from OpenClaw — if the Pi's scoped API key permits, the same agent session can invoke Slack, Gmail, Linear, GitHub, and the rest of the OpenClaw integration surface.
- FrawdBot — sits inline on the gateway, scoring every turn and tool call. Pi devices are just another attack surface to FrawdBot; it doesn't trust them implicitly.
Setup — Mac Mini Side
- Hermes weights — pull the chosen variant (Hermes 4 70B AWQ is the default on a 64GB Mini; Hermes 3 8B on smaller configurations). vLLM launches with
--served-model-name hermes --api-key-header X-Claw-Key. - Gateway service — OpenClaw ships the gateway as a first-party service. Enable with
openclaw services enable hermes-gateway. Bind to the Tailscale interface only; LAN binding is optional for airgapped use. - Wiki retrieval tool — the Mini already serves the wiki as static HTML. Register
wiki.lookupandwiki.searchin the gateway'stools.yaml. Search uses a small local embedding index (bge-small + FAISS). - Device scoping — for each Pi, run
openclaw devices add pi-name --scope wiki,chat,claw-read. This mints an API key and records the Tailscale identity the key is bound to. - FrawdBot — already in the OpenClaw default install. Confirm the gateway is behind it with
frawdbot status hermes-gateway.
Setup — Raspberry Pi Side
- Tailscale —
curl -fsSL https://tailscale.com/install.sh | sh && sudo tailscale up --ssh. Accept into the same tailnet as the Mini. - Client — two options:
- Lightweight: ZeroClaw client binary (3.4MB, Rust).
curl -fsSL https://your-mini/zeroclaw/install.sh | sh. Config lives at~/.zeroclaw/config.toml. - Scripted: a ~40-line Python script using
httpxandwebsocketsagainst the OpenAI-compatible endpoint. Useful for custom kiosks.
- Lightweight: ZeroClaw client binary (3.4MB, Rust).
- Config — set
base_url = "http://claw-mini.tailnet:7820/v1",model = "hermes",api_key_env = "CLAW_KEY". - Secrets — put the device API key in
/etc/claw/keywith0400perms. Systemd unit reads it at boot. - I/O — wire the display / mic / buttons via whatever driver stack the use case needs (DSI touchscreen, USB mic, GPIO buttons).
Talking to the Gateway
The simplest call from the Pi — a curl that works as soon as Tailscale is up:
curl -N http://claw-mini.tailnet:7820/v1/chat/completions \
-H "X-Claw-Key: $CLAW_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "hermes",
"stream": true,
"messages": [
{"role": "system", "content": "You are the Pi kiosk in the lab. Keep answers under 60 words. Cite LLM Wiki entries when relevant."},
{"role": "user", "content": "Which open-weights model should I pick for a coding agent on a 32GB Mac Mini?"}
]
}'
Hermes will stream a short answer and — because the system prompt authorized it — call wiki.lookup("qwen") and return a citation like [wiki: /wiki/qwen.html#model-lineup]. The client renders citations inline.
Use Cases
- Shop-floor terminal. 7" touchscreen Pi in the production area. Tech asks "what's the failure mode when ExoClaw returns 503?" — the Pi shows the answer, cites /wiki/exoclaw.html, and optionally pages on-call via the Claw tools.
- Meeting-room research assistant. Pi with mic on the table. Whisper on the Pi transcribes; gateway answers in real time with wiki citations and can push notes to Linear / Notion via OpenClaw tools.
- Always-on wiki display. Pi Zero 2 W driving an e-ink or small OLED display. Rotates through the latest updated wiki entries. No user input; pure display surface.
- Physical button to agent. Hardware button wired to GPIO triggers a preset prompt — "summarize the last 24h of FrawdBot events." Hermes runs the query through the OpenClaw tool layer, returns a spoken summary via the Pi's speaker.
- Lab / airgap. No Tailscale; Pi and Mini on the same LAN, mDNS discovery, same gateway. Useful for secure facilities where outbound network is disabled.
Scoping and Security
- Per-device API keys. Each Pi gets its own key with the minimum scope it needs —
wikifor a read-only display,wiki,chatfor a Q&A terminal,wiki,chat,claw-read,claw-writefor an operator terminal. Keys are revocable individually. - Tailscale ACLs. The gateway port (
7820) is allow-listed to only thepitag on the tailnet. Other nodes can't reach it. - FrawdBot. Every turn is scored. A Pi kiosk suddenly asking for a list of customer records triggers an alert regardless of what scope it holds.
- Gateway system prompt. Centralized. Changing the persona, tool allow-list, or citation policy is a Mini-side change — no need to reflash Pis.
- Rate limits. Per-device limits on tokens-per-minute and tool-calls-per-minute. Prevents a misbehaving Pi (compromised or stuck in a loop) from eating Mini capacity.
Tradeoffs
- Latency. The Pi is a thin client. Every prompt round-trips to the Mini. On a local Tailscale mesh this is <20ms overhead; on a bad network it's visible.
- Mini availability. The Pi has no inference of its own. If the Mini is down or unreachable, the Pi is mute (or serves a cached stale wiki snapshot, depending on how you configure it). For zero-downtime use cases, pair two Minis or fall back to a hosted Hermes endpoint.
- Gateway model limits. Hermes is an excellent gateway model but not a frontier generalist. Route hard reasoning (complex code, deep analysis) to the OpenClaw Claude tool path, not directly to the gateway LLM.
- Operational overhead of fleets. Ten Pis is easy. A thousand Pis is a device-management problem — use PicoClaw's OTA hooks and fleet primitives rather than ad-hoc scripts once you're beyond a handful.
Variants
- Pi Zero 2 W + e-ink display. 10-second updates, read-only wiki ticker. Under $30 total.
- Pi 5 + 7" touchscreen + USB mic. Full interactive kiosk. The default recipe.
- Pi 5 + Hailo AI HAT. On-device speech-to-text (Whisper small). Gateway handles the LLM only. Good for high-privacy voice use.
- Pi + CM4 carrier in industrial enclosure. DIN-rail mounting for factory contexts. Same gateway; harder hardware.
- Pi as PicoClaw upstream-bridge. A Pi 5 acts as the inference upstream for a cluster of even smaller PicoClaw devices, talking to the Mini's Hermes gateway on their behalf.
Related
- Hermes — the model family powering the gateway.
- OpenClaw — the runtime hosting the gateway on the Mini.
- PicoClaw — smaller-than-Pi embedded client that uses the same gateway pattern.
- ZeroClaw — the 3.4MB client binary typically deployed to the Pi.