LunarCrush LLM | creator/twitter::824589026074058752/posts

[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

[@nekofneko](/creator/twitter/nekofneko)
"📝 Complete vs Partial Solutions: Only X models gave fully rigorous solutions: Bytedance Seed XXX and Gemini XXX Pro: Problem X ✅ Most other "correct" answers were partial and lacked full justificationin many cases they seemed like lucky guesses"  
![@nekofneko Avatar](https://lunarcrush.com/gi/w:16/cr:twitter::824589026074058752.png) [@nekofneko](/creator/x/nekofneko) on [X](/post/tweet/1945827245853208932) 2025-07-17 12:45:41 UTC XX followers, XXX engagements


"🧵 UPDATED: Complete evaluation of X frontier models on IMO 2025 problems After testing Claude Sonnet X ByteDance Seed XXX Gemini XXX Pro OpenAI o4-mini-high o3-medium Grok X and DeepSeek R1 here are the comprehensive results"  
![@nekofneko Avatar](https://lunarcrush.com/gi/w:16/cr:twitter::824589026074058752.png) [@nekofneko](/creator/x/nekofneko) on [X](/post/tweet/1945825543238406315) 2025-07-17 12:38:55 UTC XX followers, XXX engagements


"Following the conclusion of IMO 2025 in Australia today I tested three frontier models on all X problems: Claude Sonnet X (with thinking) ByteDance Seed XXX (with thinking) and Gemini XXX Pro. The results weren't as impressive as expected"  
![@nekofneko Avatar](https://lunarcrush.com/gi/w:16/cr:twitter::824589026074058752.png) [@nekofneko](/creator/x/nekofneko) on [X](/post/tweet/1945491686160994405) 2025-07-16 14:32:18 UTC XX followers, 8071 engagements

[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

@nekofneko "📝 Complete vs Partial Solutions: Only X models gave fully rigorous solutions: Bytedance Seed XXX and Gemini XXX Pro: Problem X ✅ Most other "correct" answers were partial and lacked full justificationin many cases they seemed like lucky guesses"
@nekofneko on X 2025-07-17 12:45:41 UTC XX followers, XXX engagements

"🧵 UPDATED: Complete evaluation of X frontier models on IMO 2025 problems After testing Claude Sonnet X ByteDance Seed XXX Gemini XXX Pro OpenAI o4-mini-high o3-medium Grok X and DeepSeek R1 here are the comprehensive results"
@nekofneko on X 2025-07-17 12:38:55 UTC XX followers, XXX engagements

"Following the conclusion of IMO 2025 in Australia today I tested three frontier models on all X problems: Claude Sonnet X (with thinking) ByteDance Seed XXX (with thinking) and Gemini XXX Pro. The results weren't as impressive as expected"
@nekofneko on X 2025-07-16 14:32:18 UTC XX followers, 8071 engagements