Kimi

Summary

Moonshot AI was founded in 2023 by Yang Zhilin and became one of China's "AI Tigers" on the strength of Kimi Chat — the first consumer LLM product to ship a 2M-token context window. For a long period Kimi was the definitive reference for "how does a model behave when you paste a novel into the context window." The open-weights release of Kimi K2 in 2025 extended that reputation into the self-hosted world: a trillion-parameter-class MoE with sparse activation, competitive on general benchmarks and aggressive on long-context retrieval.

Kimi K1.5 is the reasoning-line counterpart — a dedicated thinking model that trades latency for correctness on math, code, and logic. Together, K2 and K1.5 give self-hosted deployments a two-model split that mirrors the GPT / o-series and DeepSeek V3 / R1 pattern.

Model Lineup

Kimi K2 — flagship open-weights MoE. Trillion-parameter class with sparse activation. Strong general performance; the default for teams that want open-weights frontier-adjacent quality.
Kimi K1.5 — reasoning. Chain-of-thought trained; comparable to o1 / R1 class on math and code benchmarks.
Kimi Chat (hosted) — the consumer product. 2M-token context. Hosted only; data stays with Moonshot.
Kimi-VL — vision variant for multimodal workloads.

Where Kimi Fits

Kimi is the pragmatic pick when long-context is the dominant requirement and you want open weights — whole-codebase analysis, legal document review, multi-document research. K2 is competitive with Qwen3 and DeepSeek-V3 on general tasks and often wins on long-context retrieval quality. K1.5 is a credible alternative to DeepSeek-R1 when reasoning is the bottleneck.

Tradeoffs

Provenance. Chinese-origin. Some regulated and government customers restrict this class of model; check procurement policy.
Operational cost of K2. Trillion-parameter MoE needs serious GPU capacity at full scale. Plan for hosted endpoints or large self-hosted clusters; edge deployment uses the distilled or smaller companions instead.
English documentation lags the Chinese ecosystem. Integration effort is higher than for Llama or Qwen at the same tier.
Tool-use maturity trails Claude, GPT, and Qwen-Agent. Pair with a dedicated agent framework rather than using Kimi bare.

Deployment Notes

Within the Claw ecosystem, Kimi K2 is a candidate hosted endpoint for long-context RAG workloads — the 1M-2M context window simplifies architectures that would otherwise require heavy retrieval tuning. K1.5 slots into the provider arbitrage layer as an alternative reasoning endpoint alongside DeepSeek-R1 and Claude's extended thinking.

References

[1] Moonshot AI

[2] Kimi Chat

[3] Moonshot AI on GitHub

[4] DeepSeek — LLM Wiki