Claude

Summary

Claude is Anthropic's general-purpose LLM family, launched in 2023 as a direct competitor to OpenAI's GPT series. The defining design choice is Constitutional AI: an alignment technique that trains the model against a written set of principles rather than relying solely on human feedback. In practice this produces a model that refuses fewer legitimate requests, explains its reasoning more readily, and degrades more gracefully on ambiguous edge cases.

For agent infrastructure, three features matter most: (1) structured tool use with a well-defined JSON protocol that agent frameworks can target; (2) prompt caching, which drops the cost of long system prompts and repeated context by up to 90% — turning otherwise uneconomical agent loops into viable workloads; and (3) extended thinking, a reasoning mode that lets the model think before responding, closing the gap with dedicated reasoning models on math and code benchmarks.

Model Lineup

Opus — flagship tier. Highest capability, longest context, slowest and most expensive. Use for planning, deep code edits, and agent orchestration.
Sonnet — balanced tier. The workhorse for most production agents. Claude Code defaults to Sonnet for most interactive tasks.
Haiku — speed tier. Sub-second latency on short prompts. Use for classification, routing, tool-call dispatch, and high-volume summarization.

Where Claude Fits

Claude is the default choice when any of the following dominate your requirements: long-running coding agents, multi-step tool loops with strict reliability requirements, or workloads that benefit from prompt caching (long system prompts, RAG pipelines, document-grounded Q&A). The Claude Code CLI and the Claude Agent SDK are built on the same tool-use protocol, so infrastructure built around one transfers cleanly to the other.

Tradeoffs

Closed weights. No self-hosted option. Data residency is constrained to Anthropic, Bedrock, or Vertex regions. Regulated workloads with on-premise requirements need a different family (Llama, Qwen, Mistral).
Rate limits on Opus. Flagship tiers are capacity-constrained. Production deployments should plan for multi-provider fallback via Bedrock or Vertex.
Pricing at the top tier. Opus token pricing is premium. Prompt caching and tiered routing (Opus → Sonnet → Haiku) are essential for cost control.

Deployment Notes

Within the Claw ecosystem, Claude powers the human-facing control plane: Claude Code Slack for operator conversations, Claude Agent SDK for bespoke agent loops, and the observability layer that turns agent traces into KPIs. Open-weights models (Qwen, Llama) handle the high-volume interior loops where self-hosting on Mac Mini edge hardware beats API economics. See The Agent Infrastructure Stack for the full architecture.

References

[1] Anthropic — Claude

[2] Anthropic API Documentation

[3] Constitutional AI: Harmlessness from AI Feedback