Kimi K2-0905 Now Live in Cline: 256K Context, Better Tools

Moonshot's Kimi K2-0905 is live in Cline with 256k context, 95% tool reliability, and faster inference. Available via Groq, Fireworks, OpenRouter, and more.

Kimi K2-0905 Now Live in Cline: 256K Context, Better Tools

TL;DR

  • Moonshot's Kimi K2-0905 is now available in Cline with 256k context window (up from 131k)
  • Improved tool calling (~95% first-try success), better frontend coding, faster inference via Groq (~349 TPS)
  • Available through Cline, Groq, Fireworks, OpenRouter, and Vercel AI Gateway at $1/$3 per 1M tokens

What Dropped

Moonshot released Kimi K2-0905, an updated checkpoint of their open-source coding model, now integrated across five major providers in Cline. The headline upgrades: context window doubled to 256k tokens, tool calling reliability improved to ~95% first-try success, and measurable gains in frontend development tasks.

The Dev Angle

The context window expansion matters immediately. You can now load entire test suites, longer conversation histories, and larger codebases without hitting the wall. The model's attention mechanism was tuned specifically for long-context coherence—no more typical degradation at context boundaries.

Tool calling reliability is the quiet win here. Consistent structured outputs with ~95% first-try success on well-formed schemas means fewer malformed JSON responses and fewer parameter mismatches. For agents running in automated workflows, that's the difference between reliable execution and constant error handling.

Speed is handled by Groq's serving infrastructure (~349 TPS). Expect 2-3 second warmup on first requests, but subsequent requests in the same session are significantly faster. For production workloads, that throughput handles real concurrency without throttling.

Frontend improvements are explicit from Moonshot. The model shows measurable gains on frontend tasks compared to K2's July checkpoint. Cline recommends using K2-0905 in Act mode where it can execute on plans devised by a reasoning model.

Should You Care?

If you're running Cline with long-context workflows or frontend-heavy projects, K2-0905 is worth testing immediately. The 256k window alone removes a major constraint. The tool calling reliability is a direct upgrade over the July checkpoint—fewer retries, faster agent execution.

If you're already using Sonnet 4 or other closed-source models, the pricing is identical ($1/$3 per 1M tokens through most providers), so the decision comes down to performance fit. K2-0905's diff generation rate (5%) rivals Sonnet-4 (4%) and beats Gemini 2.5 Pro (10%), making it a legitimate alternative for code generation tasks.

If you're on a reasoning-model-first workflow, pair K2-0905 with a stronger reasoning model upstream. It excels at execution, not planning.

Source: Cline