[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.] [@nekofneko](/creator/twitter/nekofneko) "To watch the full inspiring speech & insightful Q&A session check out the video here: A huge thanks again to Terence Tao for sharing his vision for the future of math. And congratulations to all the brilliant medalists and participants at #IMO2025 9/9"  [@nekofneko](/creator/x/nekofneko) on [X](/post/tweet/1946579534943502343) 2025-07-19 14:35:01 UTC XX followers, 1253 engagements "๐ FINAL SCOREBOARD: Grok 4: 3/6 (P1 X 5) Gemini XXX Pro: 2/6 (P1 5) ByteDance Seed 1.6: 2/6 (P3 5) Claude Sonnet 4: 2/6 (P1 3) o3-medium: 2/6 (P3 5) ByteDance Seed XXX Thinking: 1/6 (P3) o4-mini-high: 1/6 (P1) DeepSeek R1: 0/6"  [@nekofneko](/creator/x/nekofneko) on [X](/post/tweet/1945825982251950584) 2025-07-17 12:40:40 UTC XX followers, XXX engagements "๐ Complete vs Partial Solutions: Only X models gave fully rigorous solutions: Bytedance Seed XXX and Gemini XXX Pro: Problem X โ Most other "correct" answers were partial and lacked full justificationin many cases they seemed like lucky guesses"  [@nekofneko](/creator/x/nekofneko) on [X](/post/tweet/1945827245853208932) 2025-07-17 12:45:41 UTC XX followers, XXX engagements "Tao's "no extra tools" rule is critical. LLMs may solve PhD-level problems but they fail at grade-school multiplication without a calculator. This test shows o1-mini struggles past 9x9 digits and gpt-4o past 4x4. If an "AGI" can't master what a human can is it truly general"  [@nekofneko](/creator/x/nekofneko) on [X](/post/tweet/1946923968365252902) 2025-07-20 13:23:40 UTC XX followers, XXX engagements "๐งต UPDATED: Complete evaluation of X frontier models on IMO 2025 problems After testing Claude Sonnet X ByteDance Seed XXX Gemini XXX Pro OpenAI o4-mini-high o3-medium Grok X and DeepSeek R1 here are the comprehensive results"  [@nekofneko](/creator/x/nekofneko) on [X](/post/tweet/1945825543238406315) 2025-07-17 12:38:55 UTC XX followers, XXX engagements "Following the conclusion of IMO 2025 in Australia today I tested three frontier models on all X problems: Claude Sonnet X (with thinking) ByteDance Seed XXX (with thinking) and Gemini XXX Pro. The results weren't as impressive as expected"  [@nekofneko](/creator/x/nekofneko) on [X](/post/tweet/1945491686160994405) 2025-07-16 14:32:18 UTC XX followers, 8055 engagements
[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]
@nekofneko
"To watch the full inspiring speech & insightful Q&A session check out the video here: A huge thanks again to Terence Tao for sharing his vision for the future of math. And congratulations to all the brilliant medalists and participants at #IMO2025 9/9" @nekofneko on X 2025-07-19 14:35:01 UTC XX followers, 1253 engagements
"๐ FINAL SCOREBOARD: Grok 4: 3/6 (P1 X 5) Gemini XXX Pro: 2/6 (P1 5) ByteDance Seed 1.6: 2/6 (P3 5) Claude Sonnet 4: 2/6 (P1 3) o3-medium: 2/6 (P3 5) ByteDance Seed XXX Thinking: 1/6 (P3) o4-mini-high: 1/6 (P1) DeepSeek R1: 0/6" @nekofneko on X 2025-07-17 12:40:40 UTC XX followers, XXX engagements
"๐ Complete vs Partial Solutions: Only X models gave fully rigorous solutions: Bytedance Seed XXX and Gemini XXX Pro: Problem X โ
Most other "correct" answers were partial and lacked full justificationin many cases they seemed like lucky guesses" @nekofneko on X 2025-07-17 12:45:41 UTC XX followers, XXX engagements
"Tao's "no extra tools" rule is critical. LLMs may solve PhD-level problems but they fail at grade-school multiplication without a calculator. This test shows o1-mini struggles past 9x9 digits and gpt-4o past 4x4. If an "AGI" can't master what a human can is it truly general" @nekofneko on X 2025-07-20 13:23:40 UTC XX followers, XXX engagements
"๐งต UPDATED: Complete evaluation of X frontier models on IMO 2025 problems After testing Claude Sonnet X ByteDance Seed XXX Gemini XXX Pro OpenAI o4-mini-high o3-medium Grok X and DeepSeek R1 here are the comprehensive results" @nekofneko on X 2025-07-17 12:38:55 UTC XX followers, XXX engagements
"Following the conclusion of IMO 2025 in Australia today I tested three frontier models on all X problems: Claude Sonnet X (with thinking) ByteDance Seed XXX (with thinking) and Gemini XXX Pro. The results weren't as impressive as expected" @nekofneko on X 2025-07-16 14:32:18 UTC XX followers, 8055 engagements
/creator/twitter::824589026074058752/posts