Dark | Light
[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

![scaling01 Avatar](https://lunarcrush.com/gi/w:24/cr:twitter::1825243643529027584.png) Lisan al Gaib [@scaling01](/creator/twitter/scaling01) on x 17.7K followers
Created: 2025-07-22 11:49:39 UTC

Inverse Scaling in Test-Time Compute by Anthropic

So are reasoning models cooked?

No, they cited the Apple Tower of Hanoi paper.

And it looks more like an Anthropic skill issue to me, since o3's performance decreases in only X benchmark, while Opus X has decreased performance in X benchmarks.



![](https://pbs.twimg.com/media/GwdX8-fWEAEcZ6G.jpg)

XXXXX engagements

![Engagements Line Chart](https://lunarcrush.com/gi/w:600/p:tweet::1947625084513845429/c:line.svg)

**Related Topics**
[hanoi](/topic/hanoi)
[gaib](/topic/gaib)

[Post Link](https://x.com/scaling01/status/1947625084513845429)

[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

scaling01 Avatar Lisan al Gaib @scaling01 on x 17.7K followers Created: 2025-07-22 11:49:39 UTC

Inverse Scaling in Test-Time Compute by Anthropic

So are reasoning models cooked?

No, they cited the Apple Tower of Hanoi paper.

And it looks more like an Anthropic skill issue to me, since o3's performance decreases in only X benchmark, while Opus X has decreased performance in X benchmarks.

XXXXX engagements

Engagements Line Chart

Related Topics hanoi gaib

Post Link

post/tweet::1947625084513845429
/post/tweet::1947625084513845429