Kimi
Quick Facts
- Vendor
- Moonshot AI (Beijing)
- Released
- Kimi Chat (2023); Kimi K1.5 (January 2025); Kimi K2 (2025)
- Current line
- Kimi K2 (open) · Kimi K1.5 (reasoning) · Kimi Chat (hosted product)
- License
- Modified MIT for K2 weights; hosted API for Kimi Chat
- Hosting
- Moonshot API, Kimi Chat; self-hosted via vLLM; hosted via Together, OpenRouter
- Context window
- 128K–2M tokens depending on variant
- Architecture
- Mixture-of-experts (K2: ~1T params, ~32B active)
- Modalities
- Text; vision on select variants
Summary
Moonshot AI was founded in 2023 by Yang Zhilin and became one of China's "AI Tigers" on the strength of Kimi Chat — the first consumer LLM product to ship a 2M-token context window. For a long period Kimi was the definitive reference for "how does a model behave when you paste a novel into the context window." The open-weights release of Kimi K2 in 2025 extended that reputation into the self-hosted world: a trillion-parameter-class MoE with sparse activation, competitive on general benchmarks and aggressive on long-context retrieval.
Kimi K1.5 is the reasoning-line counterpart — a dedicated thinking model that trades latency for correctness on math, code, and logic. Together, K2 and K1.5 give self-hosted deployments a two-model split that mirrors the GPT / o-series and DeepSeek V3 / R1 pattern.
Model Lineup
- Kimi K2 — flagship open-weights MoE. Trillion-parameter class with sparse activation. Strong general performance; the default for teams that want open-weights frontier-adjacent quality.
- Kimi K1.5 — reasoning. Chain-of-thought trained; comparable to o1 / R1 class on math and code benchmarks.
- Kimi Chat (hosted) — the consumer product. 2M-token context. Hosted only; data stays with Moonshot.
- Kimi-VL — vision variant for multimodal workloads.
Where Kimi Fits
Kimi is the pragmatic pick when long-context is the dominant requirement and you want open weights — whole-codebase analysis, legal document review, multi-document research. K2 is competitive with Qwen3 and DeepSeek-V3 on general tasks and often wins on long-context retrieval quality. K1.5 is a credible alternative to DeepSeek-R1 when reasoning is the bottleneck.
Tradeoffs
- Provenance. Chinese-origin. Some regulated and government customers restrict this class of model; check procurement policy.
- Operational cost of K2. Trillion-parameter MoE needs serious GPU capacity at full scale. Plan for hosted endpoints or large self-hosted clusters; edge deployment uses the distilled or smaller companions instead.
- English documentation lags the Chinese ecosystem. Integration effort is higher than for Llama or Qwen at the same tier.
- Tool-use maturity trails Claude, GPT, and Qwen-Agent. Pair with a dedicated agent framework rather than using Kimi bare.
Deployment Notes
Within the Claw ecosystem, Kimi K2 is a candidate hosted endpoint for long-context RAG workloads — the 1M-2M context window simplifies architectures that would otherwise require heavy retrieval tuning. K1.5 slots into the provider arbitrage layer as an alternative reasoning endpoint alongside DeepSeek-R1 and Claude's extended thinking.