DeepSeek
Quick Facts
- Vendor
- DeepSeek (Hangzhou)
- Released
- DeepSeek LLM (2023); DeepSeek-R1 (January 2025)
- Current line
- DeepSeek-V3 · DeepSeek-R1 · DeepSeek-Coder
- License
- MIT (weights); DeepSeek License for some variants
- Hosting
- DeepSeek API, Together, Fireworks, self-hosted via vLLM
- Context window
- 128K tokens
- Modalities
- Text; DeepSeek-VL for vision
- Architecture
- Mixture-of-experts with multi-head latent attention
Summary
DeepSeek is a Chinese research lab spun out of the High-Flyer quant hedge fund. It rose to global attention in January 2025 with DeepSeek-R1 — the first open-weights model to match OpenAI's o1 on reasoning benchmarks. The paper documented a pure reinforcement-learning approach to reasoning (no supervised fine-tuning on chain-of-thought traces) that reshaped the research conversation about how reasoning emerges.
Beyond the reasoning story, DeepSeek-V3 is a strong general model at roughly one-tenth the hosted price of US frontier labs. The combination of MIT-licensed weights and cheap API access has made DeepSeek the pragmatic choice for teams that want reasoning capability either in-house or at scale without frontier pricing.
Model Lineup
- DeepSeek-R1 — reasoning. Open-weights equivalent to the o-series. Use when correctness on math / code / logic matters more than latency.
- DeepSeek-V3 — general. MoE architecture, ~671B parameters with 37B active. Frontier-adjacent quality.
- DeepSeek-Coder V2 — code-specialized. Competitive with Qwen3-Coder on many benchmarks.
- Distilled variants — R1 distilled into smaller dense models (Llama, Qwen bases) for edge deployment.
Where DeepSeek Fits
DeepSeek is the default when reasoning quality matters and self-hosting or low-cost API access is a requirement. The distilled variants (R1-Distill-Qwen-32B, R1-Distill-Llama-70B) are particularly useful for edge deployments that need reasoning without the operational weight of running V3 / R1 at full scale. For teams that are API-bound but price-sensitive, DeepSeek's hosted tier is often a 5–10x cost reduction vs. frontier US labs.
Tradeoffs
- Provenance. Same caveat as Qwen — some regulated and government customers restrict Chinese-origin models. Check procurement policy.
- Hosted API data policies. Anything sensitive should be run on self-hosted weights, not through the DeepSeek-operated API.
- Tool use is improving but still below Claude and GPT. Plan for prompt engineering and retry logic on complex tool chains.
- Full-scale hosting of V3 / R1 requires serious GPU capacity. Most edge deployments use the distilled variants.
Deployment Notes
Within the Claw ecosystem, DeepSeek R1 distilled variants are deployed for reasoning-heavy interior loops — planning, hard refactoring, complex test failure diagnosis. Qwen3-Coder handles most interactive coding; DeepSeek steps in when the task needs deliberation. For hosted API access, DeepSeek is integrated into the provider arbitrage layer as a cost-optimized fallback.