← Back to LLM Wiki
LLM Wiki · Open Weights · Self-Hostable

Llama

Meta's open-weights model family — the reference point for self-hosted LLM deployments.
Llama defined what "open-weights frontier" means. The family spans small models that quantize down to a Mac Mini up to Maverick / Behemoth scale that rivals closed frontier labs. For any workload where you need the weights on your own hardware — regulated industries, air-gapped environments, or edge deployment — Llama is the baseline.
Meta Open Weights Self-Host Edge Llama License

Quick Facts

Vendor
Meta (Menlo Park)
Released
LLaMA (February 2023); Llama 4 (2025)
Current line
Llama 4 (Scout, Maverick, Behemoth) · Llama 3.3 · Llama Guard 4
License
Llama Community License (bespoke; permissive for most commercial use, with scale-based restrictions)
Hosting
Self-hosted (vLLM, llama.cpp, Ollama); hosted via Together, Groq, Fireworks, Bedrock, Vertex
Context window
128K–10M tokens (Scout), depending on variant
Modalities
Text; native image in Llama 4
Architecture
Dense and MoE variants

Summary

Llama is Meta's open-weights LLM family, first released in February 2023. The Llama 2 release in July 2023 made open-weights frontier usable for commercial applications, and Llama 3 closed much of the gap with closed-weights competitors. Llama 4 (2025) moved to mixture-of-experts architectures and adds native multimodality, with the Scout variant pushing context windows to 10M tokens.

For infrastructure teams, Llama's value is the combination of weights you can run anywhere and an ecosystem that is measured in years rather than months. Every inference runtime, quantization tool, and fine-tuning framework supports Llama first. Llama Guard — a small classifier model trained alongside the main family — is the de facto standard for open-source content moderation.

Model Lineup

Where Llama Fits

Llama is the default when you need weights on your own hardware. On-premise regulated deployments, air-gapped environments, customer-owned infrastructure (Mac Mini edge compute), and any workload where sending data to a third-party API is a non-starter. Quantized Llama 3.3 8B variants run comfortably on a Mac Mini and are the backbone of the ZeroClaw and PicoClaw runtimes in the Claw ecosystem.

Tradeoffs

Deployment Notes

Within the Claw ecosystem, Llama is the backbone of PicoClaw and ZeroClaw — the edge runtimes that live on customer Mac Minis. vLLM handles higher-end inference; llama.cpp and Ollama cover the laptop and Mac Mini tiers. Llama Guard sits in front of open-model endpoints as a lightweight moderation filter. For workloads that exceed the edge envelope, we route to hosted Llama endpoints (Together, Groq) via the provider arbitrage layer.

References

  1. Meta — Llama
  2. Meta Llama on GitHub
  3. Llama 4 announcement
  4. Edge Compute Economics — Organized AI