Pi × LLM

Why This Recipe

OpenClaw on a Mac Mini is powerful but not portable. Walking over to the Mini, opening Slack, and typing a prompt is a bad user story for most physical contexts — a shop floor with greasy hands, a meeting room without a laptop, a kiosk in a hallway, an always-on display next to a rack. The Pi closes that gap: $35–$80 of hardware, a dedicated screen or speaker, and a direct line to the full OpenClaw agent running on the Mini.

The Hermes model choice is deliberate. Hermes's steerable alignment — system prompt as source of truth — is what makes it usable as a gateway model. Different Pi terminals can speak to the same Mini with different personas, tool allow-lists, and response styles without needing separate model deployments. One gateway, many physical surfaces.

Architecture

  ┌─────────────────────┐      Tailscale (wg)      ┌──────────────────────────────┐
  │  Raspberry Pi       │ ───────────────────────► │  Claw Mac Mini               │
  │  - client binary    │                          │  ┌────────────────────────┐  │
  │  - screen / mic /   │ ◄─── WebSocket stream ── │  │ Hermes Agent Gateway   │  │
  │    buttons / GPIO   │                          │  │  (HTTP + WebSocket)    │  │
  │  - local cache      │                          │  └──────────┬─────────────┘  │
  └─────────────────────┘                          │             │                │
                                                   │             ▼                │
                                                   │  ┌────────────────────────┐  │
                                                   │  │ OpenClaw agent loop    │  │
                                                   │  │  - Hermes 4 (vLLM)     │  │
                                                   │  │  - Wiki retrieval tool │  │
                                                   │  │  - Claw tools          │  │
                                                   │  │  - FrawdBot inline     │  │
                                                   │  └────────────────────────┘  │
                                                   └──────────────────────────────┘

Hermes gateway — a small HTTP + WebSocket service exposed on the Mini at http://claw-mini.tailnet:7820. Speaks the OpenAI chat-completions protocol (for client compatibility) plus a streaming /agent/stream WebSocket for tool-capable sessions.
Inference — Hermes 4 (or 3) running in vLLM on the Mac Mini M-series GPU. Quantized variants (AWQ / GPTQ) are used on Mini configurations with less than 32GB unified memory.
Wiki as a tool — the Mini exposes wiki.lookup(slug) and wiki.search(query) as agent tools. The gateway's system prompt teaches Hermes to cite LLM Wiki entries by slug (e.g. /wiki/qwen.html) when answering infra questions.
Tools inherited from OpenClaw — if the Pi's scoped API key permits, the same agent session can invoke Slack, Gmail, Linear, GitHub, and the rest of the OpenClaw integration surface.
FrawdBot — sits inline on the gateway, scoring every turn and tool call. Pi devices are just another attack surface to FrawdBot; it doesn't trust them implicitly.

Setup — Mac Mini Side

Hermes weights — pull the chosen variant (Hermes 4 70B AWQ is the default on a 64GB Mini; Hermes 3 8B on smaller configurations). vLLM launches with --served-model-name hermes --api-key-header X-Claw-Key.
Gateway service — OpenClaw ships the gateway as a first-party service. Enable with openclaw services enable hermes-gateway. Bind to the Tailscale interface only; LAN binding is optional for airgapped use.
Wiki retrieval tool — the Mini already serves the wiki as static HTML. Register wiki.lookup and wiki.search in the gateway's tools.yaml. Search uses a small local embedding index (bge-small + FAISS).
Device scoping — for each Pi, run openclaw devices add pi-name --scope wiki,chat,claw-read. This mints an API key and records the Tailscale identity the key is bound to.
FrawdBot — already in the OpenClaw default install. Confirm the gateway is behind it with frawdbot status hermes-gateway.

Setup — Raspberry Pi Side

Tailscale — curl -fsSL https://tailscale.com/install.sh | sh && sudo tailscale up --ssh. Accept into the same tailnet as the Mini.
Client — two options:
- Lightweight: ZeroClaw client binary (3.4MB, Rust). curl -fsSL https://your-mini/zeroclaw/install.sh | sh. Config lives at ~/.zeroclaw/config.toml.
- Scripted: a ~40-line Python script using httpx and websockets against the OpenAI-compatible endpoint. Useful for custom kiosks.
Config — set base_url = "http://claw-mini.tailnet:7820/v1", model = "hermes", api_key_env = "CLAW_KEY".
Secrets — put the device API key in /etc/claw/key with 0400 perms. Systemd unit reads it at boot.
I/O — wire the display / mic / buttons via whatever driver stack the use case needs (DSI touchscreen, USB mic, GPIO buttons).

Talking to the Gateway

The simplest call from the Pi — a curl that works as soon as Tailscale is up:

  curl -N http://claw-mini.tailnet:7820/v1/chat/completions \
    -H "X-Claw-Key: $CLAW_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "hermes",
      "stream": true,
      "messages": [
        {"role": "system", "content": "You are the Pi kiosk in the lab. Keep answers under 60 words. Cite LLM Wiki entries when relevant."},
        {"role": "user", "content": "Which open-weights model should I pick for a coding agent on a 32GB Mac Mini?"}
      ]
    }'

Hermes will stream a short answer and — because the system prompt authorized it — call wiki.lookup("qwen") and return a citation like [wiki: /wiki/qwen.html#model-lineup]. The client renders citations inline.

Use Cases

Shop-floor terminal. 7" touchscreen Pi in the production area. Tech asks "what's the failure mode when ExoClaw returns 503?" — the Pi shows the answer, cites /wiki/exoclaw.html, and optionally pages on-call via the Claw tools.
Meeting-room research assistant. Pi with mic on the table. Whisper on the Pi transcribes; gateway answers in real time with wiki citations and can push notes to Linear / Notion via OpenClaw tools.
Always-on wiki display. Pi Zero 2 W driving an e-ink or small OLED display. Rotates through the latest updated wiki entries. No user input; pure display surface.
Physical button to agent. Hardware button wired to GPIO triggers a preset prompt — "summarize the last 24h of FrawdBot events." Hermes runs the query through the OpenClaw tool layer, returns a spoken summary via the Pi's speaker.
Lab / airgap. No Tailscale; Pi and Mini on the same LAN, mDNS discovery, same gateway. Useful for secure facilities where outbound network is disabled.

Scoping and Security

Per-device API keys. Each Pi gets its own key with the minimum scope it needs — wiki for a read-only display, wiki,chat for a Q&A terminal, wiki,chat,claw-read,claw-write for an operator terminal. Keys are revocable individually.
Tailscale ACLs. The gateway port (7820) is allow-listed to only the pi tag on the tailnet. Other nodes can't reach it.
FrawdBot. Every turn is scored. A Pi kiosk suddenly asking for a list of customer records triggers an alert regardless of what scope it holds.
Gateway system prompt. Centralized. Changing the persona, tool allow-list, or citation policy is a Mini-side change — no need to reflash Pis.
Rate limits. Per-device limits on tokens-per-minute and tool-calls-per-minute. Prevents a misbehaving Pi (compromised or stuck in a loop) from eating Mini capacity.

Tradeoffs

Latency. The Pi is a thin client. Every prompt round-trips to the Mini. On a local Tailscale mesh this is <20ms overhead; on a bad network it's visible.
Mini availability. The Pi has no inference of its own. If the Mini is down or unreachable, the Pi is mute (or serves a cached stale wiki snapshot, depending on how you configure it). For zero-downtime use cases, pair two Minis or fall back to a hosted Hermes endpoint.
Gateway model limits. Hermes is an excellent gateway model but not a frontier generalist. Route hard reasoning (complex code, deep analysis) to the OpenClaw Claude tool path, not directly to the gateway LLM.
Operational overhead of fleets. Ten Pis is easy. A thousand Pis is a device-management problem — use PicoClaw's OTA hooks and fleet primitives rather than ad-hoc scripts once you're beyond a handful.

Variants

Pi Zero 2 W + e-ink display. 10-second updates, read-only wiki ticker. Under $30 total.
Pi 5 + 7" touchscreen + USB mic. Full interactive kiosk. The default recipe.
Pi 5 + Hailo AI HAT. On-device speech-to-text (Whisper small). Gateway handles the LLM only. Good for high-privacy voice use.
Pi + CM4 carrier in industrial enclosure. DIN-rail mounting for factory contexts. Same gateway; harder hardware.
Pi as PicoClaw upstream-bridge. A Pi 5 acts as the inference upstream for a cluster of even smaller PicoClaw devices, talking to the Mini's Hermes gateway on their behalf.

References

[1] Organized AI on GitHub

[2] Tailscale — install on Raspberry Pi

[3] Edge Compute Economics — Organized AI

[4] The Agent Infrastructure Stack — Organized AI