← Back to LLM Wiki
LLM Wiki · Open Weights · Self-Hostable

Kimi

Moonshot AI's family — Kimi K2 (open-weights MoE) and Kimi K1.5 (reasoning). First shipped a 2M-token context at consumer scale.
Kimi is Moonshot AI's model family, known in China for pioneering long-context chat (2M tokens via Kimi Chat) and, in 2025, for open-weights releases that put a trillion-parameter-class mixture-of-experts in reach of self-hosted deployments. The K2 release made Kimi a credible frontier-adjacent open alternative to Qwen and DeepSeek.
Moonshot AI Open Weights Long Context MoE Reasoning

Quick Facts

Vendor
Moonshot AI (Beijing)
Released
Kimi Chat (2023); Kimi K1.5 (January 2025); Kimi K2 (2025)
Current line
Kimi K2 (open) · Kimi K1.5 (reasoning) · Kimi Chat (hosted product)
License
Modified MIT for K2 weights; hosted API for Kimi Chat
Hosting
Moonshot API, Kimi Chat; self-hosted via vLLM; hosted via Together, OpenRouter
Context window
128K–2M tokens depending on variant
Architecture
Mixture-of-experts (K2: ~1T params, ~32B active)
Modalities
Text; vision on select variants

Summary

Moonshot AI was founded in 2023 by Yang Zhilin and became one of China's "AI Tigers" on the strength of Kimi Chat — the first consumer LLM product to ship a 2M-token context window. For a long period Kimi was the definitive reference for "how does a model behave when you paste a novel into the context window." The open-weights release of Kimi K2 in 2025 extended that reputation into the self-hosted world: a trillion-parameter-class MoE with sparse activation, competitive on general benchmarks and aggressive on long-context retrieval.

Kimi K1.5 is the reasoning-line counterpart — a dedicated thinking model that trades latency for correctness on math, code, and logic. Together, K2 and K1.5 give self-hosted deployments a two-model split that mirrors the GPT / o-series and DeepSeek V3 / R1 pattern.

Model Lineup

Where Kimi Fits

Kimi is the pragmatic pick when long-context is the dominant requirement and you want open weights — whole-codebase analysis, legal document review, multi-document research. K2 is competitive with Qwen3 and DeepSeek-V3 on general tasks and often wins on long-context retrieval quality. K1.5 is a credible alternative to DeepSeek-R1 when reasoning is the bottleneck.

Tradeoffs

Deployment Notes

Within the Claw ecosystem, Kimi K2 is a candidate hosted endpoint for long-context RAG workloads — the 1M-2M context window simplifies architectures that would otherwise require heavy retrieval tuning. K1.5 slots into the provider arbitrage layer as an alternative reasoning endpoint alongside DeepSeek-R1 and Claude's extended thinking.

References

  1. Moonshot AI
  2. Kimi Chat
  3. Moonshot AI on GitHub
  4. DeepSeek — LLM Wiki
  5. The Agent Infrastructure Stack — Organized AI