LunarCrush LLM | post/tweet::1943169631491100856

[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

![GregKamradt Avatar](https://lunarcrush.com/gi/w:24/cr:twitter::240008974.png) Greg Kamradt [@GregKamradt](/creator/twitter/GregKamradt) on x 41.6K followers
Created: 2025-07-10 04:45:17 UTC

We got a call from @xai XX hours ago

“We want to test Grok X on ARC-AGI”

We heard the rumors. We knew it would be good. We didn’t know it would become the #1 public model on ARC-AGI

Here’s the testing story and what the results mean:

Yesterday, we chatted with Jimmy from the xAI team, who wanted us to validate their Grok X score. They did their own testing on the ARC-AGI-1 & X public evaluation set

To validate their score (and measure possible overfitting), we self-tested the new model on our semi-private evaluation set

We walked them through our testing policy:
* No data retention
* Model checkpoint must be intended for public use
* Temporary increase in rate limits for burst testing

They were on board, so we got started

Initially, we ran into timeout errors with normal requests, so we switched to streaming. That resolved the issue

So, what do these results mean?

First, the facts: Grok X is now the top-performing publicly available model on ARC-AGI. This even outperforms purpose-built solutions submitted on Kaggle.

Second, ARC-AGI-2 is hard for current AI models. To score well, models have to learn a mini-skill from a series of training examples, then demonstrate that skill at test time.

The previous top score was ~8% (by Opus 4). Below XX% is noisy

Getting XXXX% breaks through that noise barrier, Grok X is showing non-zero levels of fluid intelligence

But the mission isn’t over. We need new ideas to solve ARC-AGI-2. Scale alone won’t get us there

Come work on ARC-AGI with us


XXXXXXXXXX engagements

![Engagements Line Chart](https://lunarcrush.com/gi/w:600/p:tweet::1943169631491100856/c:line.svg)

**Related Topics**
[tops](/topic/tops)
[grok 4](/topic/grok-4)
[xai](/topic/xai)
[greg](/topic/greg)

[Post Link](https://x.com/GregKamradt/status/1943169631491100856)

[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

Greg Kamradt @GregKamradt on x 41.6K followers Created: 2025-07-10 04:45:17 UTC

We got a call from @xai XX hours ago

“We want to test Grok X on ARC-AGI”

We heard the rumors. We knew it would be good. We didn’t know it would become the #1 public model on ARC-AGI

Here’s the testing story and what the results mean:

Yesterday, we chatted with Jimmy from the xAI team, who wanted us to validate their Grok X score. They did their own testing on the ARC-AGI-1 & X public evaluation set

To validate their score (and measure possible overfitting), we self-tested the new model on our semi-private evaluation set

We walked them through our testing policy:

No data retention
Model checkpoint must be intended for public use
Temporary increase in rate limits for burst testing

They were on board, so we got started

Initially, we ran into timeout errors with normal requests, so we switched to streaming. That resolved the issue

So, what do these results mean?

First, the facts: Grok X is now the top-performing publicly available model on ARC-AGI. This even outperforms purpose-built solutions submitted on Kaggle.

Second, ARC-AGI-2 is hard for current AI models. To score well, models have to learn a mini-skill from a series of training examples, then demonstrate that skill at test time.

The previous top score was ~8% (by Opus 4). Below XX% is noisy

Getting XXXX% breaks through that noise barrier, Grok X is showing non-zero levels of fluid intelligence

But the mission isn’t over. We need new ideas to solve ARC-AGI-2. Scale alone won’t get us there

Come work on ARC-AGI with us

XXXXXXXXXX engagements

Engagements Line Chart

Related Topics tops grok 4 xai greg

Post Link