[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.] [@scaling01](/creator/twitter/scaling01) "chinese math olympiad team lost again to the better chinese olympiad team"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1947679474289369551) 2025-07-22 15:25:47 UTC 16.9K followers, 16.6K engagements "imagine if Lobster is the open-source model"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1948884770265448518) 2025-07-25 23:15:12 UTC 17.5K followers, 13.7K engagements "rest I must in exile I'm going to live until GPT-5 returns"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1945642899493810582) 2025-07-17 00:33:10 UTC 16.9K followers, 3837 engagements "Lobster is still pretty fast with 80-100tks/s maybe there's an even bigger model 👀"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1948886461799170260) 2025-07-25 23:21:55 UTC 17.5K followers, 7058 engagements "- Sama is already talking about AGI all the time not long before he says it - Q1 model fiesta: happened - agents / computer use: literally yesterday - o3 yes o4 in the coming X weeks (GPT-5) o5 end of year - o3 replication coming: Kimi-K2 reasoner or R2 - SWE-Bench XX% EOY: confident - ARC-AGI-2 XX% by end of year: coin-flip - Frontier Math 80%: unsure but with Gold IMO it seems slightly closer - 10+ million context length models: Llama-4 but not really"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1946545953407902019) 2025-07-19 12:21:35 UTC 17.3K followers, 34.3K engagements ""As the race to AGI intensifies the national security state will get involved""  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1948040236367278190) 2025-07-23 15:19:19 UTC 17K followers, 8556 engagements "@sama That's a lot of agents Assuming o3 fits on one 8xH100 you would have 125k instances of o3. Batch size won't be huge with reasoning models maybe 4-8 So 500k to X million o3 agents could run on those newly deployed GPUs. But there will also be plenty of H200 B200 and GB200"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1947067181474164889) 2025-07-20 22:52:45 UTC 17.5K followers, 27.4K engagements "The thing is the model says its from OpenAI but Google and Anthropic models no longer say they are from OpenAI. They have their own data now so it's very unlikely its one of them. Probabilities from which lab it is: XX% OpenAI X% one of the chinese labs they are still heavily using OAI data X% completely new lab"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1948884481064272005) 2025-07-25 23:14:03 UTC 17.5K followers, 26.7K engagements "and yes I believe they are based on GPT-4.1"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1948762691205513723) 2025-07-25 15:10:06 UTC 17.5K followers, 9627 engagements "decent if you compare it only to non-reasoning models but nowhere near the XX% the Qwen team reported but pretty bad against o4-mini or Gemini XXX Flash"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1948454702108062116) 2025-07-24 18:46:16 UTC 17.5K followers, 5769 engagements "The new OpenAI models are insane. Sonnet straight up gets destroyed by Summit in this comparison. (CLICK ON THE POST TO SEE THE CORRECT ORDER OF THE IMAGES) Summit - Desert with lone tree: Sonnet- Desert with lone tree: Summit - Landscape: Sonnet - Lanscape:"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1949271290562388088) 2025-07-27 00:51:06 UTC 17.5K followers, 29.3K engagements "this is what I imagine the self-driving looks like in all other cars except Waymo and Tesla"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1948308588931248531) 2025-07-24 09:05:40 UTC 17.4K followers, 2454 engagements "Generative AI is the fastest adopted technology in history"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1945975245153742896) 2025-07-17 22:33:47 UTC 17.5K followers, 9946 engagements "GPT-4 was released XXX days or XXX years ago we are getting old"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1944880249201991950) 2025-07-14 22:02:40 UTC 17.5K followers, 2916 engagements ""we need XXXXX% gross profit margin XX% aren't enough" but sure it's all for the benefit of all of humanity"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1946919598592286899) 2025-07-20 13:06:18 UTC 16.5K followers, 2531 engagements "Kimi-K2 Technical Report is out to reveal all the secrets"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1947384137892966693) 2025-07-21 19:52:13 UTC 17.5K followers, 24.8K engagements "the hype will be off the charts if it's the open-source model and GPT-5 is even stronger"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1948894394377224289) 2025-07-25 23:53:26 UTC 17.5K followers, 20.3K engagements "@sama the next X weeks are a good time to release all your models :)"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1946576372672913617) 2025-07-19 14:22:27 UTC 17.5K followers, 11.7K engagements "what if I told you that OpenAI Google Anthropic and xAI will all be working together in a few years"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1947685542742606304) 2025-07-22 15:49:54 UTC 17.5K followers, 232.7K engagements "I made it on the ARC-AGI-3 leaderboard I honestly don't know how people got below XXX I made a few mistakes but XXX of them"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1946577546675392885) 2025-07-19 14:27:07 UTC 17.5K followers, 4272 engagements "GPT-5 casually building cookie clicker with all features in X minutes"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1948809543435395470) 2025-07-25 18:16:16 UTC 17.5K followers, 73K engagements "There is a XX% chance that ChatGPT agent will actually gamble away your life savings if you asked it"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1945930617775882728) 2025-07-17 19:36:27 UTC 17.4K followers, 16.7K engagements "got lucky two times in a row zenith is definitely a thinking model and not a "-mini" model other models like o4-mini o3 and Opus-4 Thinking can solve the multiplication question but a lot of the chinese models Grok-4 and even Gemini XXX Pro can't do this the double base64 encoding thing tells me it's not a mini model because they completely break apart with a second layer of encoding - but zenith gets most of the message right to this day the only models that can reliably do double base64 encoding are Sonnet and Opus"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1949227316976165104) 2025-07-26 21:56:21 UTC 17.5K followers, 21.4K engagements "The vibe shift has been incredible to watch over the last few days. We went from GPT-5 will be disappointing to GPT-5 will be another GPT-4 moment"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1949164946165575803) 2025-07-26 17:48:31 UTC 17.5K followers, 50.5K engagements "be me Theo Von interviewing Sam Altman ask him what nuclear fusion is Sam: "smashing atoms together" fuck cool i bet a lot of people would watch that Sam: "it's pretty hard to watch two atoms" DUDE WHAT IF ME MAKE IT LIKE THESE SPERM RACES"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1948117795247391112) 2025-07-23 20:27:31 UTC 17.5K followers, 3421 engagements "and subscribe to the best AI channel on YouTube:"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1946340252131496257) 2025-07-18 22:44:12 UTC 17.5K followers, 7502 engagements "Anthropic valued at over $150B roast xAI valuation XX times compare constantly to Anthropic spam "ANTHROPIC IS UNDERVALUED" button profit"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1948842532084896060) 2025-07-25 20:27:22 UTC 17.5K followers, 6993 engagements "The White House just released America's AI Action Plan. I've read the whole thing. This document makes it very clear that this is about "winning the AI race" and even compare it to the cold war era. It's a paper about national-security Here are the most important quotes: - Just like we won the space race it is imperative that the United States and its allies win this race. - Americas AI Action Plan has three pillars: innovation infrastructure and international diplomacy and security. Pillar I - Innovation: - Led by the Department of Commerce revise the NIST AI Risk Management Framework to"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1948037110662848925) 2025-07-23 15:06:54 UTC 17.5K followers, 46.8K engagements "TLDR Kimi-K2 Technical Report: - 1.04T@32B parameter DeepSeekV3 style MoE with MLA higher sparsity but half the attention heads - trained on 15.5T tokens XXX billion tokens @ 4k XX billion tokens @ 32k extension with YaRN - use of MuonClip optimizer including QK-Clip to help training stability - general refinorcement learning framework including RLVR and self-critque for non-verifiable domains - large-scale agentic data synthesis pipeline QK-Clip: constrains attention logits by rescaling the key and query projection weights post-update Sparsity Scaling Laws: - higher sparsity yields"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1947400424622866793) 2025-07-21 20:56:56 UTC 16.6K followers, 10.5K engagements "@heyruchir GPT-4.5 is 5T but it's kinda old"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1948827897780994331) 2025-07-25 19:29:12 UTC 17.5K followers, 1896 engagements "40% chance that this is the prologue to WW3 happening around 2028-2032"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1948148725299150994) 2025-07-23 22:30:25 UTC 17.5K followers, 5618 engagements "I don't think Americans understand how far ahead Chinas infrastructure is"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1942005210580205856) 2025-07-06 23:38:17 UTC 17.5K followers, 18.6M engagements "i have tried like XX times to get summit and zenith never gotten zenith and X times summit but it timed out every single time and no response"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1948901998423638472) 2025-07-26 00:23:39 UTC 17.5K followers, 8869 engagements "Not a mystery. Orcas are cracked they have dialects podspecific hunting strategies and even use tools"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1945510653579395246) 2025-07-16 15:47:40 UTC 16.4K followers, 2070 engagements "Zenith doesn't seem as good in SVGs as Summit. Summit: Zenith:"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1949272905122033758) 2025-07-27 00:57:30 UTC 17.5K followers, 7280 engagements "would be hella awkward if Lobster Nectarine and Starfish weren't the GPT-5 models good luck to your NVDA stock if they aren't GPT-5"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1948869698080137348) 2025-07-25 22:15:18 UTC 17.5K followers, 32.7K engagements "anybody know what model kraken-072125-2 is on lmarena i just tried the SVG thing and it says Claude is it really Anthropic or some chinese lab"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1949253440007283038) 2025-07-26 23:40:10 UTC 17.5K followers, 5629 engagements "The White House finally saw the chart: "American energy capacity has stagnated since the 1970s while China has rapidly built out their grid. Americas path to AI dominance depends on changing this troubling trend" - Quote from the AI Action Plan by the White House"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1948038229808079032) 2025-07-23 15:11:21 UTC 17.3K followers, 2433 engagements "Inverse Scaling in Test-Time Compute by Anthropic So are reasoning models cooked No they cited the Apple Tower of Hanoi paper. And it looks more like an Anthropic skill issue to me since o3's performance decreases in only X benchmark while Opus X has decreased performance in X benchmarks"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1947625084513845429) 2025-07-22 11:49:39 UTC 17.4K followers, 1725 engagements "Official HLE scores for Grok-4 and Grok-4 Heavy destroying o3 and Gemini XXX Pro"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1943166722183234037) 2025-07-10 04:33:43 UTC 16.7K followers, 1391 engagements "more hillclimbing on ARC-AGI-3 If the guy who got XXX did it unassisted without computer or notes that would be impressive"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1946616408235749465) 2025-07-19 17:01:32 UTC 17.5K followers, 3093 engagements "You know Bruce Wayne and Tony Stark "only" had $XX billion and were superheroes Imagine what Elon can do with $XXX billion"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1947786231146025313) 2025-07-22 22:30:00 UTC 16.5K followers, 3064 engagements "It's not GPT-5 Repeat: not GPT-5 (I ragequit this is confirmed by trusted sources)"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1945640155890483359) 2025-07-17 00:22:16 UTC 16.5K followers, 103.8K engagements "@__int32 because they want to kill Anthropic"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1948804384982532232) 2025-07-25 17:55:47 UTC 17.5K followers, 2806 engagements "two labs independently got IMO gold and you are gooning anon"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1946543658032931208) 2025-07-19 12:12:27 UTC 17.4K followers, 5196 engagements "even harder question and summit got it not a huge signal but at least we know they are better at multiplication than pretty much all model"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1949230532635566491) 2025-07-26 22:09:08 UTC 17.5K followers, 14K engagements "I can't stop playing with this tool that GPT-5 made it's so mesmerizing"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1949212878822891783) 2025-07-26 20:58:59 UTC 17.5K followers, 27.2K engagements "1e28 flops is XXXX days on this machine you could train GPT-4 in a few hours"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1947712193795133933) 2025-07-22 17:35:48 UTC 17.4K followers, 14.1K engagements "@MinuteMovies3 not sure the whole o3-alpha thing is confusing af"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1948762951617315077) 2025-07-25 15:11:08 UTC 17.5K followers, 3634 engagements "idk does Sonnet win this Summit - NY skyline: Sonnet - NY skyline:"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1949272252199100462) 2025-07-27 00:54:55 UTC 17.5K followers, 3541 engagements "Lobster - GPT-5 Nectarine - GPT-5-mini Starfish - GPT-5-nano"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1948761216450519470) 2025-07-25 15:04:14 UTC 17.5K followers, 142K engagements "Qwen3-Coder-480B-A35B-Instruct Also known as Qwen-Coder-Plus with X million tokens input and 65k tokens output but apparently without thinking"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1947733375332040885) 2025-07-22 18:59:58 UTC 16.5K followers, 20.1K engagements "Zuck showing the engineers his plans for datacenters in tents"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1948484117353443541) 2025-07-24 20:43:09 UTC 17.5K followers, 1528 engagements "with GPT-5 I mean Lobster on web.lmarena dot ai if Lobster is anything else but GPT-5 then whoever cooked this shit up will be my new GOAT"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1948882339968549243) 2025-07-25 23:05:32 UTC 17.5K followers, 30.9K engagements "The most disappointing thing would be if we safely reached ASI; it figures out the fundamental laws of the universe but we couldn't do anything cool with them. Imagine being stuck on a planet in an infinite doomed universe"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1948506977010622678) 2025-07-24 22:13:59 UTC 17.5K followers, 4557 engagements "Religious believers and LLM doubters are so similar. They can't recognize change and their world has only shrunk never grown"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1946751959249089008) 2025-07-20 02:00:10 UTC 16.5K followers, 2427 engagements "basically this: Create a stunning interactive animation of a neural network or brain-like graph structureuse artistic colors smooth transitions and beautiful visuals. The page should feel alive immersive and impressive with no buttonsjust scrolling or continuous animation. Make it breathtaking. then i just prompted for improvements but Lobster was 100x from the start"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1948882639248584816) 2025-07-25 23:06:44 UTC 17.5K followers, 14.4K engagements "I think it will get a new highscore on LisanBench"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1948765173189157341) 2025-07-25 15:19:58 UTC 17.5K followers, 6318 engagements "insert where is qwen meme But I'm glad they finally shipped it after like X months. It's going to be interesting comparing Gemini and Qwen"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1944814664069583263) 2025-07-14 17:42:03 UTC 16.7K followers, 18.5K engagements "Seems like the new models are really based on GPT-4.1 series. They have the same knowledge cut-off of June 2024"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1949289614138540502) 2025-07-27 02:03:54 UTC 17.5K followers, 16.6K engagements "@ Zuck all they need is X months to build a frontier coding model"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1947773545733394439) 2025-07-22 21:39:35 UTC 16.8K followers, 18.1K engagements "I hope GPT-5 will finally be smart enough to have real conversations without inconsistencies"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1948502264727175278) 2025-07-24 21:55:15 UTC 17.5K followers, 2506 engagements "good night gooners this man is going to make you happy next week"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1948175231027458423) 2025-07-24 00:15:45 UTC 16.9K followers, 2391 engagements "Is anyone actually using these goofy ass glasses"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1947470409865302138) 2025-07-22 01:35:02 UTC 17.5K followers, 1218 engagements "Qwen about to release a 480B MoE for coding with X million context "Qwen3-Coder-480B-A35B-Instruct is a powerful coding-specialized language model excelling in code generation tool use and agentic tasks.""  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1947732150872084693) 2025-07-22 18:55:06 UTC 17.5K followers, 131.5K engagements "btw I don't want a model router I want to be able to select the models I use"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1946903963200262523) 2025-07-20 12:04:11 UTC 17.5K followers, 51.6K engagements "hot take: non-reasoning models are more elegant than reasoning models"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1947779680263639215) 2025-07-22 22:03:58 UTC 17.4K followers, 43.5K engagements "I'm back and Gemini XXX Pro is still the king (no glaze) I did some more manual data cleaning and scrapped the shitty "average scaled score" and replaced it with Glicko-2 rating system with params: INITIAL_RATING = 1500 INITIAL_RD = XXX INITIAL_VOL = XXXX TAU () = XXX Furthermore I increased the minimum number of appearances from X to XX benchmarks to make it more stable. The labels show the lower XX% ratings (a conservative lower skill estimate) and in brackets the number of benchmarks the model appeared in. Below this post I attached the full table with mu sigma lower XX% ratings and number"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1919389344617414824) 2025-05-05 13:50:54 UTC 17.5K followers, 65.1K engagements "the spice fields are part of my empire period"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1946223434465239372) 2025-07-18 15:00:00 UTC 17.2K followers, 2446 engagements "The White House doesn't want you to use DeepSeek Qwen or Kimi by focusing evaluations on censorship and alignment with the CCP instead of capabilities and usefulness"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1948039683440877658) 2025-07-23 15:17:08 UTC 16.5K followers, 8601 engagements "GPT-5 expectations: - SOTA on most benchmarks (#1 on my meta-benchmark) - specifically: SOTA on ARCAGI2 and METR - 2025 knowledge cutoff - longer context window (400k) - fully multimodal (text/image/audio + video input) - sane output pricing: = $XX / 1M tokens nicetohaves: - fewer hallucinations than o3 - less sycophancy - no more barrage of em dashes - clean code style no weird multiline comments"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1946335109277274344) 2025-07-18 22:23:45 UTC 17.5K followers, 1963 engagements "Introducing LisanBench LisanBench is a simple scalable and precise benchmark designed to evaluate large language models on knowledge forward-planning constraint adherence memory and attention and long context reasoning and "stamina". "I see possible futures all at once. Our enemies are all around us and in so many futures they prevail. But I do see a way there is a narrow way through." - Paul Atreides How it works: Models are given a starting English word and must generate the longest possible sequence of valid English words. Each subsequent word in the chain must: - Differ from the previous"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1928510435164037342) 2025-05-30 17:54:52 UTC 17.3K followers, 78.9K engagements "Claude XXX is going to be released in the next X months"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1948470711733158236) 2025-07-24 19:49:53 UTC 17.5K followers, 1436 engagements "Grok-4 falling behind Gemini XXX Pro on SimpleBench"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1946339886878859774) 2025-07-18 22:42:44 UTC 17.5K followers, 97K engagements "Official OpenAI Agent mode benchmarks in one thread Coming to Pro Plus and Team users"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1945897524507611541) 2025-07-17 17:24:57 UTC 16.9K followers, 26K engagements "@petergostev i didn't get a chance to test o3-alpha"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1948884922531545128) 2025-07-25 23:15:48 UTC 17.5K followers, 2432 engagements "GPT-5 vs Grok-4 exact same prompts wildly different output in web lmarena"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1948878978699460808) 2025-07-25 22:52:11 UTC 17.5K followers, 530.9K engagements "The first step towards nationalizing AI developments just happened. "Priority access (for the Department of Defense) to computing resources in the event of a national emergency so that DOD is prepared to fully leverage these technologies during a significant conflict""  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1948038740405879206) 2025-07-23 15:13:23 UTC 17.3K followers, 3298 engagements "Now they have lost it completely. 3x valuation of Anthropic and still not even a fraction of the revenue. $80B in March *magic farts and giggles happen* $200B in July"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1943772892241502382) 2025-07-11 20:42:25 UTC 17.5K followers, 16.5K engagements "Qwen3-235B-Thinking caught up to Gemini XXX Pro and o3 (at least on benchmarks)"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1948694035637698713) 2025-07-25 10:37:17 UTC 17.5K followers, 10.6K engagements "played X hour with GPT-5 on lmarena literally same prompts for both models and Grok-4 just falls apart while GPT-5 creates art"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1948863325858922610) 2025-07-25 21:49:59 UTC 17.5K followers, 285.8K engagements "I have a low probability on misaligned AI killing us all because it's much more likely we will do it ourselves and it's going to happen within the next 5-10 years If we are still alive in XX years my probability of humanity becoming a type X civilization rise astronomically"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1948152952306221265) 2025-07-23 22:47:13 UTC 17.5K followers, 2711 engagements "you are better off asking o3 than ChatGPT agent to build a genetically modified supervirus"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1945931094412452087) 2025-07-17 19:38:21 UTC 17.3K followers, 1485 engagements "This only got X likes back then but it's now very real. The US is in an AI arms race with China"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1948031839932907848) 2025-07-23 14:45:57 UTC 16.8K followers, 2662 engagements "my AI predictions for 2025: - at least one lab will declare AGI and mentions ASI - Q1: Google Anthropic OpenAI META Qwen and Mistral model fiesta ( it will be heaven ) - agents / computer use takes off - release of Claude X Gemini X GPT-5 Grok X (or whatever they call their giant 5-20 trillion parameter models) - release of o3 o4 and o5 - open-source replication of o3 - the Frontier Math benchmark will be mostly solved (80%) - SWE-bench will be solved (90%) - ARC-AGI X will be mostly solved (80%) within X months of it's release - 10+ million context length models my wishful thinking: Someone"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1874608907508752546) 2025-01-02 00:09:26 UTC 17.5K followers, 363.5K engagements "GPT-5 DELAYED UNTIL AUGUST OPENAI OPEN-SOURCE MODEL NEXT WEEK GPT-5 GPT-5 mini will be available in ChatGPT GPT-5 nano only in the API"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1948421589675966673) 2025-07-24 16:34:41 UTC 17.5K followers, 35K engagements "ChatGPT Agent has lower performance than o3 on PaperBench SWE-Bench verified OpenAI PRs and OpenAI Research Engineer Interview questions"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1945932154455695752) 2025-07-17 19:42:33 UTC 16.9K followers, 5115 engagements ""AI is just autocomplete" meanwhile: Zuck offered at least XX OpenAI researchers pay packages of $XXX million"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1947019290604908985) 2025-07-20 19:42:27 UTC 17.4K followers, 9292 engagements "Tents as datacenters. Now for real: X storm and it's over"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1948392828922392960) 2025-07-24 14:40:24 UTC 17.4K followers, 2547 engagements "@_ueaj can't compare tks/s to prod models we don't know their inference stack they could be using GB200 with batchsize X or other ridiculous setups as long as they are in testing phase"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1948888001251344582) 2025-07-25 23:28:02 UTC 17.4K followers, XXX engagements "Anthropic seems to be falling behind in everything that is not coding"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1948697512879292710) 2025-07-25 10:51:06 UTC 17.5K followers, 5215 engagements "Somehow ChatGPT agent has higher hallucination rates but what is XXXXX vs XXXXX lol"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1945929949602328711) 2025-07-17 19:33:48 UTC 17.5K followers, 2573 engagements ""You're sheltering chinese AI researchers are you not""  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1947997712542322733) 2025-07-23 12:30:21 UTC 17.5K followers, 232.2K engagements "Qwen3-235B-A22B scored XX% on ARC-AGI-1 without thinking That's the same level as Gemini XXX Pro Sonnet X or o3-low with thinking. But it might be trained on it if not then it's insane"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1947351789222711455) 2025-07-21 17:43:41 UTC 16.5K followers, 35.7K engagements "HEY FUCKERS HOW ABOUT FIXING YOUR APP AND RELEASING GPT-5"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1945972489416315204) 2025-07-17 22:22:50 UTC 17.5K followers, 3700 engagements "I bet in "no emoji" arena it would even beat GPT-4o and GPT-4.5 making it the best non-thinking model"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1945877502607405396) 2025-07-17 16:05:23 UTC 17.4K followers, 1634 engagements "I actually don't like it Sounds like an arrogant kid that just learned about metaphors/similes but I'm autistic and have never read a book in my life so what do I know"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1949304339865956749) 2025-07-27 03:02:25 UTC 17.5K followers, 3956 engagements "why is the input for ARC-AGI-3 in slow motion like please speed it up i don't have the whole day"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1946266870564126985) 2025-07-18 17:52:36 UTC 17.5K followers, 1322 engagements "knowledge benchmarks are so fucking useless they are all faulty and contaminated"  [@scaling01](/creator/x/scaling01) on [X](/post/tweet/1948451905786806463) 2025-07-24 18:35:09 UTC 17.5K followers, 1834 engagements
[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]
@scaling01
"chinese math olympiad team lost again to the better chinese olympiad team" @scaling01 on X 2025-07-22 15:25:47 UTC 16.9K followers, 16.6K engagements
"imagine if Lobster is the open-source model" @scaling01 on X 2025-07-25 23:15:12 UTC 17.5K followers, 13.7K engagements
"rest I must in exile I'm going to live until GPT-5 returns" @scaling01 on X 2025-07-17 00:33:10 UTC 16.9K followers, 3837 engagements
"Lobster is still pretty fast with 80-100tks/s maybe there's an even bigger model 👀" @scaling01 on X 2025-07-25 23:21:55 UTC 17.5K followers, 7058 engagements
"- Sama is already talking about AGI all the time not long before he says it - Q1 model fiesta: happened - agents / computer use: literally yesterday - o3 yes o4 in the coming X weeks (GPT-5) o5 end of year - o3 replication coming: Kimi-K2 reasoner or R2 - SWE-Bench XX% EOY: confident - ARC-AGI-2 XX% by end of year: coin-flip - Frontier Math 80%: unsure but with Gold IMO it seems slightly closer - 10+ million context length models: Llama-4 but not really" @scaling01 on X 2025-07-19 12:21:35 UTC 17.3K followers, 34.3K engagements
""As the race to AGI intensifies the national security state will get involved"" @scaling01 on X 2025-07-23 15:19:19 UTC 17K followers, 8556 engagements
"@sama That's a lot of agents Assuming o3 fits on one 8xH100 you would have 125k instances of o3. Batch size won't be huge with reasoning models maybe 4-8 So 500k to X million o3 agents could run on those newly deployed GPUs. But there will also be plenty of H200 B200 and GB200" @scaling01 on X 2025-07-20 22:52:45 UTC 17.5K followers, 27.4K engagements
"The thing is the model says its from OpenAI but Google and Anthropic models no longer say they are from OpenAI. They have their own data now so it's very unlikely its one of them. Probabilities from which lab it is: XX% OpenAI X% one of the chinese labs they are still heavily using OAI data X% completely new lab" @scaling01 on X 2025-07-25 23:14:03 UTC 17.5K followers, 26.7K engagements
"and yes I believe they are based on GPT-4.1" @scaling01 on X 2025-07-25 15:10:06 UTC 17.5K followers, 9627 engagements
"decent if you compare it only to non-reasoning models but nowhere near the XX% the Qwen team reported but pretty bad against o4-mini or Gemini XXX Flash" @scaling01 on X 2025-07-24 18:46:16 UTC 17.5K followers, 5769 engagements
"The new OpenAI models are insane. Sonnet straight up gets destroyed by Summit in this comparison. (CLICK ON THE POST TO SEE THE CORRECT ORDER OF THE IMAGES) Summit - Desert with lone tree: Sonnet- Desert with lone tree: Summit - Landscape: Sonnet - Lanscape:" @scaling01 on X 2025-07-27 00:51:06 UTC 17.5K followers, 29.3K engagements
"this is what I imagine the self-driving looks like in all other cars except Waymo and Tesla" @scaling01 on X 2025-07-24 09:05:40 UTC 17.4K followers, 2454 engagements
"Generative AI is the fastest adopted technology in history" @scaling01 on X 2025-07-17 22:33:47 UTC 17.5K followers, 9946 engagements
"GPT-4 was released XXX days or XXX years ago we are getting old" @scaling01 on X 2025-07-14 22:02:40 UTC 17.5K followers, 2916 engagements
""we need XXXXX% gross profit margin XX% aren't enough" but sure it's all for the benefit of all of humanity" @scaling01 on X 2025-07-20 13:06:18 UTC 16.5K followers, 2531 engagements
"Kimi-K2 Technical Report is out to reveal all the secrets" @scaling01 on X 2025-07-21 19:52:13 UTC 17.5K followers, 24.8K engagements
"the hype will be off the charts if it's the open-source model and GPT-5 is even stronger" @scaling01 on X 2025-07-25 23:53:26 UTC 17.5K followers, 20.3K engagements
"@sama the next X weeks are a good time to release all your models :)" @scaling01 on X 2025-07-19 14:22:27 UTC 17.5K followers, 11.7K engagements
"what if I told you that OpenAI Google Anthropic and xAI will all be working together in a few years" @scaling01 on X 2025-07-22 15:49:54 UTC 17.5K followers, 232.7K engagements
"I made it on the ARC-AGI-3 leaderboard I honestly don't know how people got below XXX I made a few mistakes but XXX of them" @scaling01 on X 2025-07-19 14:27:07 UTC 17.5K followers, 4272 engagements
"GPT-5 casually building cookie clicker with all features in X minutes" @scaling01 on X 2025-07-25 18:16:16 UTC 17.5K followers, 73K engagements
"There is a XX% chance that ChatGPT agent will actually gamble away your life savings if you asked it" @scaling01 on X 2025-07-17 19:36:27 UTC 17.4K followers, 16.7K engagements
"got lucky two times in a row zenith is definitely a thinking model and not a "-mini" model other models like o4-mini o3 and Opus-4 Thinking can solve the multiplication question but a lot of the chinese models Grok-4 and even Gemini XXX Pro can't do this the double base64 encoding thing tells me it's not a mini model because they completely break apart with a second layer of encoding - but zenith gets most of the message right to this day the only models that can reliably do double base64 encoding are Sonnet and Opus" @scaling01 on X 2025-07-26 21:56:21 UTC 17.5K followers, 21.4K engagements
"The vibe shift has been incredible to watch over the last few days. We went from GPT-5 will be disappointing to GPT-5 will be another GPT-4 moment" @scaling01 on X 2025-07-26 17:48:31 UTC 17.5K followers, 50.5K engagements
"be me Theo Von interviewing Sam Altman ask him what nuclear fusion is Sam: "smashing atoms together" fuck cool i bet a lot of people would watch that Sam: "it's pretty hard to watch two atoms" DUDE WHAT IF ME MAKE IT LIKE THESE SPERM RACES" @scaling01 on X 2025-07-23 20:27:31 UTC 17.5K followers, 3421 engagements
"and subscribe to the best AI channel on YouTube:" @scaling01 on X 2025-07-18 22:44:12 UTC 17.5K followers, 7502 engagements
"Anthropic valued at over $150B roast xAI valuation XX times compare constantly to Anthropic spam "ANTHROPIC IS UNDERVALUED" button profit" @scaling01 on X 2025-07-25 20:27:22 UTC 17.5K followers, 6993 engagements
"The White House just released America's AI Action Plan. I've read the whole thing. This document makes it very clear that this is about "winning the AI race" and even compare it to the cold war era. It's a paper about national-security Here are the most important quotes: - Just like we won the space race it is imperative that the United States and its allies win this race. - Americas AI Action Plan has three pillars: innovation infrastructure and international diplomacy and security. Pillar I - Innovation: - Led by the Department of Commerce revise the NIST AI Risk Management Framework to" @scaling01 on X 2025-07-23 15:06:54 UTC 17.5K followers, 46.8K engagements
"TLDR Kimi-K2 Technical Report: - 1.04T@32B parameter DeepSeekV3 style MoE with MLA higher sparsity but half the attention heads - trained on 15.5T tokens XXX billion tokens @ 4k XX billion tokens @ 32k extension with YaRN - use of MuonClip optimizer including QK-Clip to help training stability - general refinorcement learning framework including RLVR and self-critque for non-verifiable domains - large-scale agentic data synthesis pipeline QK-Clip: constrains attention logits by rescaling the key and query projection weights post-update Sparsity Scaling Laws: - higher sparsity yields" @scaling01 on X 2025-07-21 20:56:56 UTC 16.6K followers, 10.5K engagements
"@heyruchir GPT-4.5 is 5T but it's kinda old" @scaling01 on X 2025-07-25 19:29:12 UTC 17.5K followers, 1896 engagements
"40% chance that this is the prologue to WW3 happening around 2028-2032" @scaling01 on X 2025-07-23 22:30:25 UTC 17.5K followers, 5618 engagements
"I don't think Americans understand how far ahead Chinas infrastructure is" @scaling01 on X 2025-07-06 23:38:17 UTC 17.5K followers, 18.6M engagements
"i have tried like XX times to get summit and zenith never gotten zenith and X times summit but it timed out every single time and no response" @scaling01 on X 2025-07-26 00:23:39 UTC 17.5K followers, 8869 engagements
"Not a mystery. Orcas are cracked they have dialects podspecific hunting strategies and even use tools" @scaling01 on X 2025-07-16 15:47:40 UTC 16.4K followers, 2070 engagements
"Zenith doesn't seem as good in SVGs as Summit. Summit: Zenith:" @scaling01 on X 2025-07-27 00:57:30 UTC 17.5K followers, 7280 engagements
"would be hella awkward if Lobster Nectarine and Starfish weren't the GPT-5 models good luck to your NVDA stock if they aren't GPT-5" @scaling01 on X 2025-07-25 22:15:18 UTC 17.5K followers, 32.7K engagements
"anybody know what model kraken-072125-2 is on lmarena i just tried the SVG thing and it says Claude is it really Anthropic or some chinese lab" @scaling01 on X 2025-07-26 23:40:10 UTC 17.5K followers, 5629 engagements
"The White House finally saw the chart: "American energy capacity has stagnated since the 1970s while China has rapidly built out their grid. Americas path to AI dominance depends on changing this troubling trend" - Quote from the AI Action Plan by the White House" @scaling01 on X 2025-07-23 15:11:21 UTC 17.3K followers, 2433 engagements
"Inverse Scaling in Test-Time Compute by Anthropic So are reasoning models cooked No they cited the Apple Tower of Hanoi paper. And it looks more like an Anthropic skill issue to me since o3's performance decreases in only X benchmark while Opus X has decreased performance in X benchmarks" @scaling01 on X 2025-07-22 11:49:39 UTC 17.4K followers, 1725 engagements
"Official HLE scores for Grok-4 and Grok-4 Heavy destroying o3 and Gemini XXX Pro" @scaling01 on X 2025-07-10 04:33:43 UTC 16.7K followers, 1391 engagements
"more hillclimbing on ARC-AGI-3 If the guy who got XXX did it unassisted without computer or notes that would be impressive" @scaling01 on X 2025-07-19 17:01:32 UTC 17.5K followers, 3093 engagements
"You know Bruce Wayne and Tony Stark "only" had $XX billion and were superheroes Imagine what Elon can do with $XXX billion" @scaling01 on X 2025-07-22 22:30:00 UTC 16.5K followers, 3064 engagements
"It's not GPT-5 Repeat: not GPT-5 (I ragequit this is confirmed by trusted sources)" @scaling01 on X 2025-07-17 00:22:16 UTC 16.5K followers, 103.8K engagements
"@__int32 because they want to kill Anthropic" @scaling01 on X 2025-07-25 17:55:47 UTC 17.5K followers, 2806 engagements
"two labs independently got IMO gold and you are gooning anon" @scaling01 on X 2025-07-19 12:12:27 UTC 17.4K followers, 5196 engagements
"even harder question and summit got it not a huge signal but at least we know they are better at multiplication than pretty much all model" @scaling01 on X 2025-07-26 22:09:08 UTC 17.5K followers, 14K engagements
"I can't stop playing with this tool that GPT-5 made it's so mesmerizing" @scaling01 on X 2025-07-26 20:58:59 UTC 17.5K followers, 27.2K engagements
"1e28 flops is XXXX days on this machine you could train GPT-4 in a few hours" @scaling01 on X 2025-07-22 17:35:48 UTC 17.4K followers, 14.1K engagements
"@MinuteMovies3 not sure the whole o3-alpha thing is confusing af" @scaling01 on X 2025-07-25 15:11:08 UTC 17.5K followers, 3634 engagements
"idk does Sonnet win this Summit - NY skyline: Sonnet - NY skyline:" @scaling01 on X 2025-07-27 00:54:55 UTC 17.5K followers, 3541 engagements
"Lobster - GPT-5 Nectarine - GPT-5-mini Starfish - GPT-5-nano" @scaling01 on X 2025-07-25 15:04:14 UTC 17.5K followers, 142K engagements
"Qwen3-Coder-480B-A35B-Instruct Also known as Qwen-Coder-Plus with X million tokens input and 65k tokens output but apparently without thinking" @scaling01 on X 2025-07-22 18:59:58 UTC 16.5K followers, 20.1K engagements
"Zuck showing the engineers his plans for datacenters in tents" @scaling01 on X 2025-07-24 20:43:09 UTC 17.5K followers, 1528 engagements
"with GPT-5 I mean Lobster on web.lmarena dot ai if Lobster is anything else but GPT-5 then whoever cooked this shit up will be my new GOAT" @scaling01 on X 2025-07-25 23:05:32 UTC 17.5K followers, 30.9K engagements
"The most disappointing thing would be if we safely reached ASI; it figures out the fundamental laws of the universe but we couldn't do anything cool with them. Imagine being stuck on a planet in an infinite doomed universe" @scaling01 on X 2025-07-24 22:13:59 UTC 17.5K followers, 4557 engagements
"Religious believers and LLM doubters are so similar. They can't recognize change and their world has only shrunk never grown" @scaling01 on X 2025-07-20 02:00:10 UTC 16.5K followers, 2427 engagements
"basically this: Create a stunning interactive animation of a neural network or brain-like graph structureuse artistic colors smooth transitions and beautiful visuals. The page should feel alive immersive and impressive with no buttonsjust scrolling or continuous animation. Make it breathtaking. then i just prompted for improvements but Lobster was 100x from the start" @scaling01 on X 2025-07-25 23:06:44 UTC 17.5K followers, 14.4K engagements
"I think it will get a new highscore on LisanBench" @scaling01 on X 2025-07-25 15:19:58 UTC 17.5K followers, 6318 engagements
"insert where is qwen meme But I'm glad they finally shipped it after like X months. It's going to be interesting comparing Gemini and Qwen" @scaling01 on X 2025-07-14 17:42:03 UTC 16.7K followers, 18.5K engagements
"Seems like the new models are really based on GPT-4.1 series. They have the same knowledge cut-off of June 2024" @scaling01 on X 2025-07-27 02:03:54 UTC 17.5K followers, 16.6K engagements
"@ Zuck all they need is X months to build a frontier coding model" @scaling01 on X 2025-07-22 21:39:35 UTC 16.8K followers, 18.1K engagements
"I hope GPT-5 will finally be smart enough to have real conversations without inconsistencies" @scaling01 on X 2025-07-24 21:55:15 UTC 17.5K followers, 2506 engagements
"good night gooners this man is going to make you happy next week" @scaling01 on X 2025-07-24 00:15:45 UTC 16.9K followers, 2391 engagements
"Is anyone actually using these goofy ass glasses" @scaling01 on X 2025-07-22 01:35:02 UTC 17.5K followers, 1218 engagements
"Qwen about to release a 480B MoE for coding with X million context "Qwen3-Coder-480B-A35B-Instruct is a powerful coding-specialized language model excelling in code generation tool use and agentic tasks."" @scaling01 on X 2025-07-22 18:55:06 UTC 17.5K followers, 131.5K engagements
"btw I don't want a model router I want to be able to select the models I use" @scaling01 on X 2025-07-20 12:04:11 UTC 17.5K followers, 51.6K engagements
"hot take: non-reasoning models are more elegant than reasoning models" @scaling01 on X 2025-07-22 22:03:58 UTC 17.4K followers, 43.5K engagements
"I'm back and Gemini XXX Pro is still the king (no glaze) I did some more manual data cleaning and scrapped the shitty "average scaled score" and replaced it with Glicko-2 rating system with params: INITIAL_RATING = 1500 INITIAL_RD = XXX INITIAL_VOL = XXXX TAU () = XXX Furthermore I increased the minimum number of appearances from X to XX benchmarks to make it more stable. The labels show the lower XX% ratings (a conservative lower skill estimate) and in brackets the number of benchmarks the model appeared in. Below this post I attached the full table with mu sigma lower XX% ratings and number" @scaling01 on X 2025-05-05 13:50:54 UTC 17.5K followers, 65.1K engagements
"the spice fields are part of my empire period" @scaling01 on X 2025-07-18 15:00:00 UTC 17.2K followers, 2446 engagements
"The White House doesn't want you to use DeepSeek Qwen or Kimi by focusing evaluations on censorship and alignment with the CCP instead of capabilities and usefulness" @scaling01 on X 2025-07-23 15:17:08 UTC 16.5K followers, 8601 engagements
"GPT-5 expectations: - SOTA on most benchmarks (#1 on my meta-benchmark) - specifically: SOTA on ARCAGI2 and METR - 2025 knowledge cutoff - longer context window (400k) - fully multimodal (text/image/audio + video input) - sane output pricing: = $XX / 1M tokens nicetohaves: - fewer hallucinations than o3 - less sycophancy - no more barrage of em dashes - clean code style no weird multiline comments" @scaling01 on X 2025-07-18 22:23:45 UTC 17.5K followers, 1963 engagements
"Introducing LisanBench LisanBench is a simple scalable and precise benchmark designed to evaluate large language models on knowledge forward-planning constraint adherence memory and attention and long context reasoning and "stamina". "I see possible futures all at once. Our enemies are all around us and in so many futures they prevail. But I do see a way there is a narrow way through." - Paul Atreides How it works: Models are given a starting English word and must generate the longest possible sequence of valid English words. Each subsequent word in the chain must: - Differ from the previous" @scaling01 on X 2025-05-30 17:54:52 UTC 17.3K followers, 78.9K engagements
"Claude XXX is going to be released in the next X months" @scaling01 on X 2025-07-24 19:49:53 UTC 17.5K followers, 1436 engagements
"Grok-4 falling behind Gemini XXX Pro on SimpleBench" @scaling01 on X 2025-07-18 22:42:44 UTC 17.5K followers, 97K engagements
"Official OpenAI Agent mode benchmarks in one thread Coming to Pro Plus and Team users" @scaling01 on X 2025-07-17 17:24:57 UTC 16.9K followers, 26K engagements
"@petergostev i didn't get a chance to test o3-alpha" @scaling01 on X 2025-07-25 23:15:48 UTC 17.5K followers, 2432 engagements
"GPT-5 vs Grok-4 exact same prompts wildly different output in web lmarena" @scaling01 on X 2025-07-25 22:52:11 UTC 17.5K followers, 530.9K engagements
"The first step towards nationalizing AI developments just happened. "Priority access (for the Department of Defense) to computing resources in the event of a national emergency so that DOD is prepared to fully leverage these technologies during a significant conflict"" @scaling01 on X 2025-07-23 15:13:23 UTC 17.3K followers, 3298 engagements
"Now they have lost it completely. 3x valuation of Anthropic and still not even a fraction of the revenue. $80B in March magic farts and giggles happen $200B in July" @scaling01 on X 2025-07-11 20:42:25 UTC 17.5K followers, 16.5K engagements
"Qwen3-235B-Thinking caught up to Gemini XXX Pro and o3 (at least on benchmarks)" @scaling01 on X 2025-07-25 10:37:17 UTC 17.5K followers, 10.6K engagements
"played X hour with GPT-5 on lmarena literally same prompts for both models and Grok-4 just falls apart while GPT-5 creates art" @scaling01 on X 2025-07-25 21:49:59 UTC 17.5K followers, 285.8K engagements
"I have a low probability on misaligned AI killing us all because it's much more likely we will do it ourselves and it's going to happen within the next 5-10 years If we are still alive in XX years my probability of humanity becoming a type X civilization rise astronomically" @scaling01 on X 2025-07-23 22:47:13 UTC 17.5K followers, 2711 engagements
"you are better off asking o3 than ChatGPT agent to build a genetically modified supervirus" @scaling01 on X 2025-07-17 19:38:21 UTC 17.3K followers, 1485 engagements
"This only got X likes back then but it's now very real. The US is in an AI arms race with China" @scaling01 on X 2025-07-23 14:45:57 UTC 16.8K followers, 2662 engagements
"my AI predictions for 2025: - at least one lab will declare AGI and mentions ASI - Q1: Google Anthropic OpenAI META Qwen and Mistral model fiesta ( it will be heaven ) - agents / computer use takes off - release of Claude X Gemini X GPT-5 Grok X (or whatever they call their giant 5-20 trillion parameter models) - release of o3 o4 and o5 - open-source replication of o3 - the Frontier Math benchmark will be mostly solved (80%) - SWE-bench will be solved (90%) - ARC-AGI X will be mostly solved (80%) within X months of it's release - 10+ million context length models my wishful thinking: Someone" @scaling01 on X 2025-01-02 00:09:26 UTC 17.5K followers, 363.5K engagements
"GPT-5 DELAYED UNTIL AUGUST OPENAI OPEN-SOURCE MODEL NEXT WEEK GPT-5 GPT-5 mini will be available in ChatGPT GPT-5 nano only in the API" @scaling01 on X 2025-07-24 16:34:41 UTC 17.5K followers, 35K engagements
"ChatGPT Agent has lower performance than o3 on PaperBench SWE-Bench verified OpenAI PRs and OpenAI Research Engineer Interview questions" @scaling01 on X 2025-07-17 19:42:33 UTC 16.9K followers, 5115 engagements
""AI is just autocomplete" meanwhile: Zuck offered at least XX OpenAI researchers pay packages of $XXX million" @scaling01 on X 2025-07-20 19:42:27 UTC 17.4K followers, 9292 engagements
"Tents as datacenters. Now for real: X storm and it's over" @scaling01 on X 2025-07-24 14:40:24 UTC 17.4K followers, 2547 engagements
"@_ueaj can't compare tks/s to prod models we don't know their inference stack they could be using GB200 with batchsize X or other ridiculous setups as long as they are in testing phase" @scaling01 on X 2025-07-25 23:28:02 UTC 17.4K followers, XXX engagements
"Anthropic seems to be falling behind in everything that is not coding" @scaling01 on X 2025-07-25 10:51:06 UTC 17.5K followers, 5215 engagements
"Somehow ChatGPT agent has higher hallucination rates but what is XXXXX vs XXXXX lol" @scaling01 on X 2025-07-17 19:33:48 UTC 17.5K followers, 2573 engagements
""You're sheltering chinese AI researchers are you not"" @scaling01 on X 2025-07-23 12:30:21 UTC 17.5K followers, 232.2K engagements
"Qwen3-235B-A22B scored XX% on ARC-AGI-1 without thinking That's the same level as Gemini XXX Pro Sonnet X or o3-low with thinking. But it might be trained on it if not then it's insane" @scaling01 on X 2025-07-21 17:43:41 UTC 16.5K followers, 35.7K engagements
"HEY FUCKERS HOW ABOUT FIXING YOUR APP AND RELEASING GPT-5" @scaling01 on X 2025-07-17 22:22:50 UTC 17.5K followers, 3700 engagements
"I bet in "no emoji" arena it would even beat GPT-4o and GPT-4.5 making it the best non-thinking model" @scaling01 on X 2025-07-17 16:05:23 UTC 17.4K followers, 1634 engagements
"I actually don't like it Sounds like an arrogant kid that just learned about metaphors/similes but I'm autistic and have never read a book in my life so what do I know" @scaling01 on X 2025-07-27 03:02:25 UTC 17.5K followers, 3956 engagements
"why is the input for ARC-AGI-3 in slow motion like please speed it up i don't have the whole day" @scaling01 on X 2025-07-18 17:52:36 UTC 17.5K followers, 1322 engagements
"knowledge benchmarks are so fucking useless they are all faulty and contaminated" @scaling01 on X 2025-07-24 18:35:09 UTC 17.5K followers, 1834 engagements
/creator/twitter::1825243643529027584/posts