LunarCrush LLM | LunarCrush AI Interface

[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

Artificial Analysis @ArtificialAnlys on x 51.8K followers Created: 2025-07-14 23:09:52 UTC

While Moonshot AI’s Kimi k2 is the leading open weights non-reasoning model in the Artificial Analysis Intelligence Index, it outputs ~3x more tokens than other non-reasoning models, blurring the lines between reasoning & non-reasoning

Kimi k2 is the largest major open weights model yet - 1T total parameters with 32B active (this requires a massive 1TB of memory at native FP8 to hold the weights). We have k2 at XX in Artificial Analysis Intelligence Index, an impressive score that puts it above models like GPT-4.1 and DeepSeek V3, but behind leading reasoning models.

Until now, there has been clear a distinction between reasoning model and non-reasoning models in our evals - defined not only by whether the model uses tags, but primarily by token usage. The median number of tokens used to answer all the evals in Artificial Analysis Intelligence Index is ~10x higher for reasoning models than for non-reasoning models.

@Kimi_Moonshot's Kimi k2 uses ~3x the number of tokens that the median non-reasoning model uses. Its token usage is only up to XX% lower than Claude X Sonnet and Opus when run in their maximum budget extended thinking mode, and is nearly triple the token usage of both Claude X Sonnet and Opus with reasoning turned off. We therefore recommend that Kimi k2 be compared to Claude X Sonnet and Opus in their maximum budget extended thinking modes, not to the non-reasoning scores for the Claude X models.

Kimi k2 is available on @Kimi_Moonshot’s first-party API as well as @FireworksAI_HQ, @togethercompute, @novita_labs, and @parasail_io.

See below and on Artificial Analysis for further analysis 👇

XXXXXX engagements

Engagements Line Chart

Related Topics moonshot artificial

Post Link