[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]  Rohan Paul [@rohanpaul_ai](/creator/twitter/rohanpaul_ai) on x 75.6K followers Created: 2025-06-19 17:09:17 UTC Now the 3rd paper comes on this π€― "The Illusion of the Illusion of the Illusion of Thinking" π1st original Paper from Apple, concludes that large reasoning models reach a complexity point where accuracy collapses to zero and even spend fewer thinking tokens, revealing hard limits on generalizable reasoning. π2nd Paper counters that the apparent collapse is an illusion caused by token limits and impossible puzzles, so the modelsβ reasoning remains sound when evaluations remove those flaws. π3rd paper synthesizes both sides, agreeing the collapse was an artifact yet stressing that models still falter in very long step-by-step executions, exposing lingering brittleness despite better methodology. The third author shows that, even after fixing the test design and giving enough output space, the models still start to lose track of a long step-by-step plan once it stretches into the thousands, so a real weakness remains in sustaining very long chains of reasoning. Read on π  XXXXXXX engagements  **Related Topics** [token](/topic/token) [illusion](/topic/illusion) [Post Link](https://x.com/rohanpaul_ai/status/1935746720144544157)
[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]
Rohan Paul @rohanpaul_ai on x 75.6K followers
Created: 2025-06-19 17:09:17 UTC
Now the 3rd paper comes on this π€―
"The Illusion of the Illusion of the Illusion of Thinking"
π1st original Paper from Apple, concludes that large reasoning models reach a complexity point where accuracy collapses to zero and even spend fewer thinking tokens, revealing hard limits on generalizable reasoning.
π2nd Paper counters that the apparent collapse is an illusion caused by token limits and impossible puzzles, so the modelsβ reasoning remains sound when evaluations remove those flaws.
π3rd paper synthesizes both sides, agreeing the collapse was an artifact yet stressing that models still falter in very long step-by-step executions, exposing lingering brittleness despite better methodology.
The third author shows that, even after fixing the test design and giving enough output space, the models still start to lose track of a long step-by-step plan once it stretches into the thousands, so a real weakness remains in sustaining very long chains of reasoning.
Read on π
XXXXXXX engagements
/post/tweet::1935746720144544157