#  @polynoamial Noam Brown Noam Brown posts on X about ai, open ai, agi, in the the most. They currently have [-------] followers and [---] posts still getting attention that total [---------] engagements in the last [--] hours. ### Engagements: [---------] [#](/creator/twitter::825088493764407298/interactions)  - [--] Week [---------] +11% - [--] Month [----------] +1,675% - [--] Months [----------] +59% - [--] Year [----------] +68% ### Mentions: [--] [#](/creator/twitter::825088493764407298/posts_active)  - [--] Week [--] +42% - [--] Month [--] +93% - [--] Months [---] +11% - [--] Year [---] +59% ### Followers: [-------] [#](/creator/twitter::825088493764407298/followers)  - [--] Week [-------] +0.99% - [--] Month [-------] +2.90% - [--] Months [-------] +11% - [--] Year [-------] +38% ### CreatorRank: [------] [#](/creator/twitter::825088493764407298/influencer_rank)  ### Social Influence **Social category influence** [technology brands](/list/technology-brands) [finance](/list/finance) [social networks](/list/social-networks) [stocks](/list/stocks) #4742 [travel destinations](/list/travel-destinations) [celebrities](/list/celebrities) [gaming](/list/gaming) [countries](/list/countries) **Social topic influence** [ai](/topic/ai) #716, [open ai](/topic/open-ai) #980, [agi](/topic/agi), [in the](/topic/in-the), [math](/topic/math), [inference](/topic/inference) #13, [the world](/topic/the-world), [release](/topic/release) #819, [$googl](/topic/$googl) #441, [imo](/topic/imo) #5 **Top accounts mentioned or mentioned by** [@openai](/creator/undefined) [@fchollet](/creator/undefined) [@rao2z](/creator/undefined) [@karpathy](/creator/undefined) [@ilyasut](/creator/undefined) [@ylecun](/creator/undefined) [@openais](/creator/undefined) [@googledeepmind](/creator/undefined) [@scaleai](/creator/undefined) [@kevinleestone](/creator/undefined) [@swyx](/creator/undefined) [@merettm](/creator/undefined) [@thomaspower](/creator/undefined) [@emollick](/creator/undefined) [@scaling01](/creator/undefined) [@google](/creator/undefined) [@grokton](/creator/undefined) [@jfoerst](/creator/undefined) [@sama](/creator/undefined) [@demishassabis](/creator/undefined) **Top assets mentioned** [Alphabet Inc Class A (GOOGL)](/topic/$googl) [Frontier (FRONT)](/topic/frontier) ### Top Social Posts Top posts by engagements in the last [--] hours "Introducing DORA an AI that learns no-press Diplomacy from scratch with no human data Our #NeurIPS2021 paper shows DORA is superhuman in 1v1 Diplomacy. In 7p Diplomacy the results are more subtle. Joint work w/ @anton_bakhtin David Wu and @adamlerer: https://arxiv.org/abs/2110.02924 https://arxiv.org/abs/2110.02924" [X Link](https://x.com/polynoamial/status/1446122740306513938) 2021-10-07T14:38Z 56.8K followers, [---] engagements "@alaulejo It's a good question. I don't know the answer. I'd imagine Google does selfish routing because otherwise the users would be incentivized to switch to a competitor. But in theory there's room for Google to use correlated equilibria in their routing recommendations" [X Link](https://x.com/polynoamial/status/1678417626886807553) 2023-07-10T15:27Z 21.2K followers, [--] engagements "Meditations on Moloch by @slatestarcodex is the most eloquent explanation I've read on how game theory can explain many real-world challenges and tragedies that humanity faces. It's a long read but very accessible and completely worth it. https://slatestarcodex.com/2014/07/30/meditations-on-moloch/ https://slatestarcodex.com/2014/07/30/meditations-on-moloch/" [X Link](https://x.com/polynoamial/status/1679862300155494402) 2023-07-14T14:35Z 77.1K followers, 14.3K engagements "Dalle [--] is coming out I've been having a lot of fun playing around with it internally" [X Link](https://x.com/polynoamial/status/1704556851658989835) 2023-09-20T18:03Z 28.4K followers, 144.7K engagements "@santygegen It can be surprisingly good at instruction following. Also it's really nice to be able to interact with it through the chatgpt interface" [X Link](https://x.com/polynoamial/status/1704720612487504027) 2023-09-21T04:54Z 28.3K followers, [---] engagements "This will be fun Later today I'll talk about lessons from poker and Diplomacy AI on a @TEDAI2023 panel about AI and games moderated by poker champion @Liv_Boeree and with the amazing @DrJimFan @joon_s_pk and @yoheinakajima" [X Link](https://x.com/polynoamial/status/1714684245132620270) 2023-10-18T16:45Z 20.3K followers, 14.5K engagements ".@OpenAI is hiring an AI researcher for a new team working toward solving reasoning I've worked alongside @giambattista92 for several months now and have been very impressed with what he's done and what the team plans to do. If this area excites you I 100% recommend applying" [X Link](https://x.com/polynoamial/status/1714732119652790376) 2023-10-18T19:56Z 20.4K followers, 65.7K engagements "Over [---] signatures now. That's more than 90% of the company" [X Link](https://x.com/polynoamial/status/1726672334881484975) 2023-11-20T18:42Z 28.1K followers, 19.5K engagements "Congrats to the @GoogleDeepMind team for this result It's exciting to see so much progress in AI for advanced mathematics. Introducing AlphaGeometry: an AI system that solves Olympiad geometry problems at a level approaching a human gold-medalist. π It was trained solely on synthetic data and marks a breakthrough for AI in mathematical reasoning. π§΅ https://t.co/g3RFSoWNPP https://t.co/NER2TJsA7r Introducing AlphaGeometry: an AI system that solves Olympiad geometry problems at a level approaching a human gold-medalist. π It was trained solely on synthetic data and marks a breakthrough for" [X Link](https://x.com/polynoamial/status/1747743130592415896) 2024-01-17T22:10Z 28.1K followers, 34K engagements "@j_foerst @rchoudhury997 Can confirm Sora is pretty bad at tic tac toe" [X Link](https://x.com/polynoamial/status/1758939594060681712) 2024-02-17T19:40Z 30.1K followers, [---] engagements "Frontier models capping out at 90% on MMLU isn't a sign of AI hitting a wall. It's a sign that a lot of MMLU questions are busted. The field desperately needs better evals" [X Link](https://x.com/polynoamial/status/1764710940568646016) 2024-03-04T17:54Z 31K followers, 53.4K engagements "I wish every AI startup founder would read The Bitter Lesson http://www.incompleteideas.net/IncIdeas/BitterLesson.html http://www.incompleteideas.net/IncIdeas/BitterLesson.html" [X Link](https://x.com/polynoamial/status/1771300779783299565) 2024-03-22T22:19Z 33.5K followers, 56.9K engagements "you don't get superhuman performance by doing better imitation learning on human data" [X Link](https://x.com/polynoamial/status/1773683272549118141) 2024-03-29T12:07Z 31.8K followers, [----] engagements "@natfriedman This is why Im bearish on robotics" [X Link](https://x.com/polynoamial/status/1773868623591043571) 2024-03-30T00:23Z 32.3K followers, [----] engagements "GPT-4 reasoning has been further improved Majorly improved GPT-4 Turbo model available now in the API and rolling out in ChatGPT. Majorly improved GPT-4 Turbo model available now in the API and rolling out in ChatGPT" [X Link](https://x.com/polynoamial/status/1777809000345505801) 2024-04-09T21:21Z 34.2K followers, 234.9K engagements "Too many startups focused on what GPT-4 isn't not enough startups focused on what future models could be In [--] years & 2700+ episodes Ive never been so excited for a release of an episode. @sama @bradlightcap @OpenAI: π€·β Will models become commoditized π» How to solve the fundamental challenge of compute π Open vs closed π΅ Scaling to $2BN in revenue. Tomorrow π https://t.co/A33zvIqC5K In [--] years & 2700+ episodes Ive never been so excited for a release of an episode. @sama @bradlightcap @OpenAI: π€·β Will models become commoditized π» How to solve the fundamental challenge of compute π" [X Link](https://x.com/polynoamial/status/1779985355414065495) 2024-04-15T21:29Z 34.2K followers, 46.5K engagements "Llama [--] is out in 8B and 70B sizes (400B still training) Congrats to the @AIatMeta team https://ai.meta.com/blog/meta-llama-3/ https://ai.meta.com/blog/meta-llama-3/" [X Link](https://x.com/polynoamial/status/1780996995701932385) 2024-04-18T16:29Z 56.1K followers, 21.3K engagements "Many have pointed out that LLM benchmarks are broken and gamed. Happy to see my former resident @hughbzhang @summeryue0 and the great @scale_AI folks do something about it They made a private version of GSM8k and evaled GPT-4 Claude Mixtral Phi etc: https://arxiv.org/pdf/2405.00332 https://arxiv.org/pdf/2405.00332" [X Link](https://x.com/polynoamial/status/1785864074678714520) 2024-05-02T02:49Z 35.2K followers, 79.6K engagements "Well said. There is a big opportunity for a neutral third party like @scale_AI to step in as the "Moody's of LLMs" and provide rigorous and comprehensive evals of all models. Academic benchmarks are losing their potency. Moving forward therere [--] types of LLM evaluations that matter: [--]. Privately held test set but publicly reported scores by a trusted 3rd party who doesnt have their own LLM to promote. @scale_AIs latest GSM1k is a great example. https://t.co/j6a1Mf5biN Academic benchmarks are losing their potency. Moving forward therere [--] types of LLM evaluations that matter: [--]. Privately held" [X Link](https://x.com/polynoamial/status/1786093685924585544) 2024-05-02T18:01Z 34.6K followers, 37K engagements "@jxmnop The point of the Bitter Lesson is that research and clever ideas are important but people should think about how their ideas scale with data and compute rather than just relying on One Weird Trick to get them a little farther than SOTA" [X Link](https://x.com/polynoamial/status/1789381426187546644) 2024-05-11T19:45Z 35.5K followers, 96.7K engagements "GPT-4o is really good But the ELO can ultimately become bounded by the difficulty of the prompts (i.e. cant achieve arbitrarily high win rates on the prompt: whats up). We find on harder prompt sets and in particular coding there is an even larger gap: GPT-4o achieves a +100 ELO over our prior https://t.co/ReJzcQdgC8 But the ELO can ultimately become bounded by the difficulty of the prompts (i.e. cant achieve arbitrarily high win rates on the prompt: whats up). We find on harder prompt sets and in particular coding there is an even larger gap: GPT-4o achieves a +100 ELO over our prior" [X Link](https://x.com/polynoamial/status/1790071990428074093) 2024-05-13T17:29Z 35.4K followers, 47.2K engagements "rewatched Her last weekend and it felt a lot like rewatching Contagion in Feb 2020" [X Link](https://x.com/polynoamial/status/1790072604469993672) 2024-05-13T17:32Z 35.4K followers, 33.6K engagements "This is more true today than ever before humans built machines that talk to us like people do and everyone acts like this is normal now. it's pretty nuts humans built machines that talk to us like people do and everyone acts like this is normal now. it's pretty nuts" [X Link](https://x.com/polynoamial/status/1791508992646435176) 2024-05-17T16:40Z 35.5K followers, 44.4K engagements "@karinanguyen_ Welcome to @OpenAI" [X Link](https://x.com/polynoamial/status/1792999947329007709) 2024-05-21T19:24Z 35.5K followers, 12.4K engagements "The next @OpenAI frontier model has started training https://openai.com/index/openai-board-forms-safety-and-security-committee/ https://openai.com/index/openai-board-forms-safety-and-security-committee/" [X Link](https://x.com/polynoamial/status/1795422304937411029) 2024-05-28T11:50Z 35.8K followers, 92.4K engagements "Startup founders please dont bet your companys future on frontier models hitting a wall My favorite feature of inviting OpenAI researchers to hang out with startups is that they can be 100% consistently relied upon to ask every founder what makes you think the next generation of the foundation models wont do this build with not against the capability tide My favorite feature of inviting OpenAI researchers to hang out with startups is that they can be 100% consistently relied upon to ask every founder what makes you think the next generation of the foundation models wont do this build with not" [X Link](https://x.com/polynoamial/status/1798864714409333190) 2024-06-06T23:49Z 36.1K followers, 111.8K engagements "@McaleerStephen Great to have you with us" [X Link](https://x.com/polynoamial/status/1799523814541558238) 2024-06-08T19:28Z 35.9K followers, [----] engagements "Frontier models like GPT-4o (and now Claude [---] Sonnet) may be at the level of a "Smart High Schooler" in some respects but they still struggle on basic tasks like tic-tac-toe. There was hope that native multimodal training would help but that hasn't been the case" [X Link](https://x.com/polynoamial/status/1803844406480638284) 2024-06-20T17:36Z 36.4K followers, 90.7K engagements "At least these days they can recognize when they've lost. Progress" [X Link](https://x.com/polynoamial/status/1803844408657481769) 2024-06-20T17:36Z 36.1K followers, [----] engagements "@nabla_theta @boazbaraktcs @Dan_Jeffries1 @m_bourgon This exchange reminds me of this: https://youtu.be/9wWUc8BZgWEsi=s3hGwD1rI_Pl5oY0 https://youtu.be/9wWUc8BZgWEsi=s3hGwD1rI_Pl5oY0" [X Link](https://x.com/polynoamial/status/1804702100418637841) 2024-06-23T02:24Z 36.2K followers, [---] engagements "@DanHendrycks What kind of difficulty level are you thinking for the new benchmarks" [X Link](https://x.com/polynoamial/status/1804939477976236224) 2024-06-23T18:08Z 36.3K followers, [----] engagements "My unpopular Silicon Valley opinion is that Im relatively bearish on robotics. Yes there are niche factory jobs theyll replace but I think the trajectory will look more like self-driving cars than LLMs. Data isnt as plentiful experiments arent reproducible due to subtle environmental changes and wear and tear and the necessity of real-time on-board computing makes scaling foundation-style models very difficult" [X Link](https://x.com/polynoamial/status/1808122985879830551) 2024-07-02T12:58Z 36.5K followers, [----] engagements "GPT-4o mini is out It's best in class for its size especially at reasoning. https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/ https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/" [X Link](https://x.com/polynoamial/status/1813986952129167663) 2024-07-18T17:19Z 38.1K followers, 33.9K engagements "This is amazing news for @OpenAI @zicokolter was on my thesis committee and was someone Id frequently turn to for research and career advice when I was in grad school. Hes loved by his students and is a world expert in machine learning. Im thrilled that hes joining us I'm excited to announce that I am joining the OpenAI Board of Directors. I'm looking forward to sharing my perspectives and expertise on AI safety and robustness to help guide the amazing work being done at OpenAI. I'm excited to announce that I am joining the OpenAI Board of Directors. I'm looking forward to sharing my" [X Link](https://x.com/polynoamial/status/1821626785274204440) 2024-08-08T19:17Z 56.7K followers, 34.3K engagements "If an AI bluffs in a game of poker is it being deceptive Im curious how responses differ between those who know more about AI vs those who know more about poker Yes / know more about AI Yes / know more abt poker No / know more abt poker No / know more about AI Yes / know more about AI Yes / know more abt poker No / know more abt poker No / know more about AI" [X Link](https://x.com/polynoamial/status/1824845629253894342) 2024-08-17T16:27Z 56.7K followers, 62.2K engagements ".@demishassabis started DeepMind in [----]. Their pitch was that theyd first solve intelligence and then use it to solve everything else. Its hard to appreciate how crazy a pitch that was at the time. Now its pretty mainstream. Demis Hassabis says AGI will help understand the mysteries of the universe and consciousness and could cure all diseases within a decade as well as providing fusion power and abundant clean water https://t.co/hIUjBRIPKh Demis Hassabis says AGI will help understand the mysteries of the universe and consciousness and could cure all diseases within a decade as well as" [X Link](https://x.com/polynoamial/status/1825024830225391818) 2024-08-18T04:20Z 58.8K followers, 81.9K engagements "Believe it or not this is not a human in a suit Introducing NEO Beta. Designed for humans. Built for the home. https://t.co/5S6jpRjUQp Introducing NEO Beta. Designed for humans. Built for the home. https://t.co/5S6jpRjUQp" [X Link](https://x.com/polynoamial/status/1829624948627030440) 2024-08-30T20:59Z 56.6K followers, 68.9K engagements "First they said AI cant play Go because its too complicated. Then they said AI cant win at poker because its a people game. Today the skeptics say AI wont write novels. Surely once that happens though the goalposts wont move again https://www.newyorker.com/culture/the-weekend-essay/why-ai-isnt-going-to-make-art https://www.newyorker.com/culture/the-weekend-essay/why-ai-isnt-going-to-make-art" [X Link](https://x.com/polynoamial/status/1830635667824767193) 2024-09-02T15:55Z 58.6K followers, 242.6K engagements "IMO the most valuable use case for AI will be accelerating scientific discovery. Im rooting for @joshim5 and the @ChaiDiscovery team Were excited to introduce @ChaiDiscovery and release Chai-1 a foundation model for molecular structure prediction that performs at the state-of-the-art across a variety of drug discovery tasks We're releasing inference code weights & a web interface: https://t.co/QmpbVO9Fhd https://t.co/TU7xuOAaIF Were excited to introduce @ChaiDiscovery and release Chai-1 a foundation model for molecular structure prediction that performs at the state-of-the-art across a" [X Link](https://x.com/polynoamial/status/1833190975940006117) 2024-09-09T17:09Z 56.6K followers, 19.6K engagements "@OpenAI For example last month at the [----] Association for Computational Linguistics conference the keynote by @rao2z was titled Can LLMs Reason & Plan In it he showed a problem that tripped up all LLMs. But @OpenAI o1-preview can get it right and o1 gets it right almost always" [X Link](https://x.com/polynoamial/status/1834280720493412724) 2024-09-12T17:19Z 56.6K followers, 111.4K engagements "@OpenAI @rao2z π/ @OpenAI o1 is the product of many hard-working people all of whom made critical contributions. I feel lucky to have worked alongside them this past year to bring you this model. It takes a village to grow a π" [X Link](https://x.com/polynoamial/status/1834281247100792991) 2024-09-12T17:21Z 56.6K followers, 155.6K engagements "@OpenAI @rao2z You can read more about the research here: https://openai.com/index/learning-to-reason-with-llms/ https://openai.com/index/learning-to-reason-with-llms/" [X Link](https://x.com/polynoamial/status/1834281480312303635) 2024-09-12T17:22Z 56.6K followers, 76.2K engagements "I've seen a few folks implying that I was the "lead" on π / @OpenAI o1. I was not. o1 is the result of many years of research that really started taking off in October of last year and by the end involved over [---] amazing researchers: https://openai.com/openai-o1-contributions/ @OpenAI @rao2z π/ @OpenAI o1 is the product of many hard-working people all of whom made critical contributions. I feel lucky to have worked alongside them this past year to bring you this model. It takes a village to grow a π https://openai.com/openai-o1-contributions/ @OpenAI @rao2z π/ @OpenAI o1 is the product" [X Link](https://x.com/polynoamial/status/1834299868078375106) 2024-09-12T18:35Z 56.5K followers, 109.9K engagements "Why follow AI pop influencers when you can follow the people actually making the magic happen The ICs are the ones who actually create the magic. Just a few accounts to follow from @OpenAI Frontiers: @ahelkky @nervouscomputer @giambattista92 @hunterlightman @ilge @GordonJo76 @karlcobbe @lukasz_kondr @max_a_schwarzer @MostafaRohani @polynoamial @TrapitBansal @zhouwenda The ICs are the ones who actually create the magic. Just a few accounts to follow from @OpenAI Frontiers: @ahelkky @nervouscomputer @giambattista92 @hunterlightman @ilge @GordonJo76 @karlcobbe @lukasz_kondr @max_a_schwarzer" [X Link](https://x.com/polynoamial/status/1834345071602282750) 2024-09-12T21:35Z 56.6K followers, 42K engagements "Also @hwchung27 @_jasonwei @ren_hongyu @shengjia_zhao and of course @ilyasut" [X Link](https://x.com/polynoamial/status/1834346060170367031) 2024-09-12T21:39Z 56.6K followers, [----] engagements "@geoframeai @OpenAIDevs I told it that it's a new model from @OpenAI and asked it to determine what's special about it. In the CoT it started quizzing itself with hard problems to determine its level of capability. It didn't do a great job of it but it was pretty impressive to see it even try" [X Link](https://x.com/polynoamial/status/1834647704900645240) 2024-09-13T17:37Z 56.6K followers, [----] engagements "@mengdi_en @altryne @OpenAIDevs We don't have plans to reveal CoT to users either in the API or ChatGPT" [X Link](https://x.com/polynoamial/status/1834651072993821110) 2024-09-13T17:51Z 56.5K followers, 23.1K engagements "@tszzl I found it fascinating to see the model start its chain of thought for a geometry problem with let me first visualize the problem" [X Link](https://x.com/polynoamial/status/1835012378167328995) 2024-09-14T17:46Z 56.6K followers, 11.3K engagements "The AI field desperately needs harder evals that take into consideration continued fast progress. Crazy bump of o1-review on MMLU-Pro math subtask It brings the previous highest score from 79% to 91%. I am still waiting the other tasks as my api quota for o1 is pretty low. This result also confirms the annotation quality of our MMLU-Pro datasetπ https://t.co/LcE463z8Pn Crazy bump of o1-review on MMLU-Pro math subtask It brings the previous highest score from 79% to 91%. I am still waiting the other tasks as my api quota for o1 is pretty low. This result also confirms the annotation quality" [X Link](https://x.com/polynoamial/status/1835086680266883205) 2024-09-14T22:42Z 56.6K followers, 80.8K engagements "@PeteyPabshnick Its a good benchmark. I would bet general models will exceed human performance on it within two years" [X Link](https://x.com/polynoamial/status/1835091193451233347) 2024-09-14T23:00Z 56.5K followers, [--] engagements "Great to see @scale_AI and @ai_risks initiating a massive effort on harder evals Many popular benchmarks are now saturated by @OpenAI o1 and we expect rapid progress to continue. As LLMs get smarter evals need to get harder. OpenAIs o1 has already maxed out most major benchmarks. Scale is partnering with CAIS to launch Humanitys Last Exam: the toughest open-source benchmark for LLMs. We're putting up $500K in prizes for the best questions. (read on) https://t.co/gvz020P407 As LLMs get smarter evals need to get harder. OpenAIs o1 has already maxed out most major benchmarks. Scale is partnering" [X Link](https://x.com/polynoamial/status/1835746343576613111) 2024-09-16T18:23Z 91.4K followers, 53.2K engagements "Weve increased the rate limits for o1-preview and o1-mini We appreciate your excitement for OpenAI o1 and we want you to be able to use it more. For Plus and Team users we have increased rate limits for o1-mini by 7x from [--] messages per week to [--] messages per day. o1-preview is more expensive to serve so weve increased the rate We appreciate your excitement for OpenAI o1 and we want you to be able to use it more. For Plus and Team users we have increased rate limits for o1-mini by 7x from [--] messages per week to [--] messages per day. o1-preview is more expensive to serve so weve increased" [X Link](https://x.com/polynoamial/status/1835862103061872845) 2024-09-17T02:03Z 56.6K followers, 33.4K engagements "@KevLXu @OpenAI @kevinleestone Were looking for folks with experience" [X Link](https://x.com/polynoamial/status/1836873712441905366) 2024-09-19T21:03Z 79.7K followers, 13K engagements "@hansamad @AdanBecerraPhD If you just want to solve blocksworld problems then yes it makes sense to use their heuristic planning algorithm" [X Link](https://x.com/polynoamial/status/1838607687384469870) 2024-09-24T15:53Z 56.5K followers, [---] engagements "@swyx Ugh I was sick with covid when I gave that talk" [X Link](https://x.com/polynoamial/status/1838630008568582477) 2024-09-24T17:22Z 56.5K followers, [----] engagements "Ive been lucky to work with @markchen90 since joining @OpenAI and theres no doubt in my mind that hes the right person to take on this role alongside @merettm While todays departures are tough Im incredibly excited and honored to lead research at @OpenAI alongside @merettm. I truly believe that OpenAI is the best place to work on AI and I've been through enough ups and downs to know it's never wise to bet against us. While todays departures are tough Im incredibly excited and honored to lead research at @OpenAI alongside @merettm. I truly believe that OpenAI is the best place to work on AI" [X Link](https://x.com/polynoamial/status/1839151561416781862) 2024-09-26T03:54Z 56.5K followers, 61.6K engagements "@ylecun @thomaspower @OpenAI Sometimes a picture is worth a thousand words https://x.com/polynoamial/status/1834280425457426689 @OpenAI o1 is trained with RL to think before responding via a private chain of thought. The longer it thinks the better it does on reasoning tasks. This opens up a new dimension for scaling. Were no longer bottlenecked by pretraining. We can now scale inference compute too. https://t.co/niqRO9hhg1 https://x.com/polynoamial/status/1834280425457426689 @OpenAI o1 is trained with RL to think before responding via a private chain of thought. The longer it thinks the" [X Link](https://x.com/polynoamial/status/1840416787889885582) 2024-09-29T15:42Z 56.6K followers, 64.7K engagements "@ylecun @thomaspower @OpenAI Also we say a decent amount in the research blog post including sharing CoTs which I think are extremely informative and I gave a talk about o1 at UC Berkeley last week. https://openai.com/index/learning-to-reason-with-llms/ https://openai.com/index/learning-to-reason-with-llms/" [X Link](https://x.com/polynoamial/status/1840416885189271890) 2024-09-29T15:42Z 56.5K followers, 41.4K engagements "@RepresenterTh @ylecun @thomaspower @OpenAI I've had quite a few people tell me they find o1-preview useful already" [X Link](https://x.com/polynoamial/status/1840420551061381392) 2024-09-29T15:57Z 56.5K followers, [----] engagements "At the start of the presentation I talked about my experience working on AI for poker in grad school. From [----] through [----] I worked on essentially scaling up pretraining for poker AI. Then in [----] I got some results showing that search did incredibly well for poker AI. Those results motivated me to shift my research direction to scaling up search and ultimately led to Libratus which beat top humans in [----]. I then discussed why search wasn't prioritized in the poker AI research community before (the first [--] minutes of the talk is the same as the one I gave at UW which can be found here and" [X Link](https://x.com/polynoamial/status/1840822629625688469) 2024-09-30T18:34Z 56.6K followers, [----] engagements "This is a great opportunity to talk directly with some of the researchers behind @OpenAI o1 Oct 3rd at 5pm PT. My colleagues and I will be hosting a talk and Q&A session on 'Learning to Reason with LLMs' and the new OpenAI o1 model. Join us for an insightful discussion https://t.co/JaVKbfiskv #OpenAIForum My colleagues and I will be hosting a talk and Q&A session on 'Learning to Reason with LLMs' and the new OpenAI o1 model. Join us for an insightful discussion https://t.co/JaVKbfiskv #OpenAIForum" [X Link](https://x.com/polynoamial/status/1841262291108683793) 2024-10-01T23:41Z 55.7K followers, 14.4K engagements "@BaFiyALaAz7271 @mkieffer1107 @sandyasm @casper_hansen_ @SimonsInstitute The Simons talk is basically just my UW talk with some content from the o1 research blog post" [X Link](https://x.com/polynoamial/status/1842598348781686817) 2024-10-05T16:10Z 56.7K followers, [---] engagements "Incredibly well deserved by @demishassabis John Jumper David Baker and everyone who worked to make AlphaFold possible This is hopefully just the beginning of AI aiding scientific research. BREAKING NEWS The Royal Swedish Academy of Sciences has decided to award the [----] #NobelPrize in Chemistry with one half to David Baker for computational protein design and the other half jointly to Demis Hassabis and John M. Jumper for protein structure prediction. https://t.co/gYrdFFcD4T BREAKING NEWS The Royal Swedish Academy of Sciences has decided to award the [----] #NobelPrize in Chemistry with one" [X Link](https://x.com/polynoamial/status/1844011760262828097) 2024-10-09T13:47Z 55.8K followers, 18.7K engagements "@fchollet That wasnt my experience. I was on the job market in early [----] and most senior folks I spoke with agreed that scaling pretraining alone would not achieve AGI and that at least one or two more breakthroughs were needed" [X Link](https://x.com/polynoamial/status/1848179889511477328) 2024-10-21T01:49Z 92.2K followers, 52K engagements "@Miles_Brundage Well miss you Miles Good luck on your next adventure" [X Link](https://x.com/polynoamial/status/1849149352721404398) 2024-10-23T18:02Z 92.2K followers, [----] engagements "I've been using ChatGPT search regularly and honestly love it π Introducing ChatGPT search π ChatGPT can now search the web in a much better way than before so you get fast timely answers with links to relevant web sources. https://t.co/7yilNgqH9T https://t.co/z8mJWS8J9c π Introducing ChatGPT search π ChatGPT can now search the web in a much better way than before so you get fast timely answers with links to relevant web sources. https://t.co/7yilNgqH9T https://t.co/z8mJWS8J9c" [X Link](https://x.com/polynoamial/status/1852035086465540365) 2024-10-31T17:09Z 92.2K followers, 60.7K engagements "@Noahpinion What about o1-preview" [X Link](https://x.com/polynoamial/status/1855455220802682912) 2024-11-10T03:39Z 57.6K followers, [----] engagements "@gdb @OpenAI Great to have you back Greg" [X Link](https://x.com/polynoamial/status/1856466373314261031) 2024-11-12T22:37Z 58.7K followers, 36.1K engagements "@karpathy @iamgingertrash Is AI really hitting a wall good summary of half my conversations this week" [X Link](https://x.com/polynoamial/status/1857684265527337196) 2024-11-16T07:16Z 58.5K followers, [----] engagements "@Noahpinion Consensus on twitter or consensus among researchers" [X Link](https://x.com/polynoamial/status/1857825589995983054) 2024-11-16T16:38Z 58.5K followers, 10.4K engagements "@Leegaul @swyx @teortaxesTex If all you care about is AIME and coding benchmarks then you should compare to o1-mini" [X Link](https://x.com/polynoamial/status/1861080913985130730) 2024-11-25T16:14Z 80.7K followers, [---] engagements "@OpenAI Ill save you the time theres no e in the essay. Also GPT-4o trying the same task fails on the first word:" [X Link](https://x.com/polynoamial/status/1864736222813524451) 2024-12-05T18:18Z 60.7K followers, 14.2K engagements "@minchoi @OpenAI Regular o1" [X Link](https://x.com/polynoamial/status/1864744322345898239) 2024-12-05T18:51Z 60.4K followers, [----] engagements "To be clear over the next few months I have no doubt that people will post failure cases of o1 and o1-pro. But the models also display amazing intelligence. The trajectory is the important thing to pay attention to not the individual failure cases" [X Link](https://x.com/polynoamial/status/1864779145156399307) 2024-12-05T21:09Z 60.7K followers, 36.5K engagements "@dened21 o1 is $20/month just like o1-preview" [X Link](https://x.com/polynoamial/status/1865058611669672388) 2024-12-06T15:39Z 61.2K followers, [----] engagements "@saxenauts o1 is $20/month" [X Link](https://x.com/polynoamial/status/1865059129561354559) 2024-12-06T15:42Z 61.1K followers, [----] engagements "@krasmanalderey o1 is available at the $20/month subscription" [X Link](https://x.com/polynoamial/status/1865060116963430575) 2024-12-06T15:45Z 61K followers, [----] engagements "@kimjisena o1 is $20/month" [X Link](https://x.com/polynoamial/status/1865060379820454090) 2024-12-06T15:47Z 61.2K followers, [----] engagements "o1-mini is an incredibly powerful model. Now you can make it even more incredibly powerful for your specific domain Day 2: Reinforcement Fine-Tuning https://t.co/GBVVfxFHFT Day 2: Reinforcement Fine-Tuning https://t.co/GBVVfxFHFT" [X Link](https://x.com/polynoamial/status/1865098648578855401) 2024-12-06T18:19Z 59.8K followers, [----] engagements "o1-mini is an incredibly powerful model. Now were starting to make it possible for developers to fine-tune o1-mini so that its even more powerful on their specific domain Day [--] of the [--] days of OpenAI π Today something for developers: we're introducing Reinforcement Fine Tuning a new model customization technique for o1/o1-mini. RFT produces expert models in specific domainsand they're π€― good with as few as a couple dozen examples. Day [--] of the [--] days of OpenAI π Today something for developers: we're introducing Reinforcement Fine Tuning a new model customization technique for" [X Link](https://x.com/polynoamial/status/1865105304612147383) 2024-12-06T18:45Z 60.7K followers, 32.6K engagements "I fully expect Santa Mode to drive more subscriptions than o1 and I'm at peace with this Say ho ho ho to Santa in Voice Mode π Santa is rolling out today to everyone across all ChatGPT platforms and is available until the end of the monththen he will retire back to the North Pole. https://t.co/NVS9bRok4r Say ho ho ho to Santa in Voice Mode π Santa is rolling out today to everyone across all ChatGPT platforms and is available until the end of the monththen he will retire back to the North Pole. https://t.co/NVS9bRok4r" [X Link](https://x.com/polynoamial/status/1867273452702347404) 2024-12-12T18:20Z 65.6K followers, 80.9K engagements ".@OpenAI o1 has started rolling out to the API" [X Link](https://x.com/polynoamial/status/1869088850578186558) 2024-12-17T18:34Z 66.9K followers, 91.1K engagements "@OpenAI You can sign up to help red team o3 and o3-mini here: https://openai.com/index/early-access-for-safety-testing/ https://openai.com/index/early-access-for-safety-testing/" [X Link](https://x.com/polynoamial/status/1870175700222628164) 2024-12-20T18:33Z 67.5K followers, 64.5K engagements "This also means that AI safety topics like scalable oversight may soon stop being hypothetical. Research in these domains needs to be a priority for the field" [X Link](https://x.com/polynoamial/status/1870196476908834893) 2024-12-20T19:56Z 66.9K followers, 17.7K engagements "This all makes a lot more sense if you just plot the y-axis on log scale ARC-AGI scores for past five years of OpenAI models (updated w/ release dates) https://t.co/DgCmJjf0Cq ARC-AGI scores for past five years of OpenAI models (updated w/ release dates) https://t.co/DgCmJjf0Cq" [X Link](https://x.com/polynoamial/status/1870308808532082968) 2024-12-21T03:22Z 67K followers, 67.4K engagements "An excellent read if you want to understand the perspective of one of creators of the FrontierMath benchmark on @OpenAI o3 1/11 Im genuinely impressed by OpenAIs 25.2% Pass@1 performance on FrontierMaththis marks a major leap from prior results and arrives about a year ahead of my median expectations. https://t.co/SfVhQThLUg 1/11 Im genuinely impressed by OpenAIs 25.2% Pass@1 performance on FrontierMaththis marks a major leap from prior results and arrives about a year ahead of my median expectations. https://t.co/SfVhQThLUg" [X Link](https://x.com/polynoamial/status/1870348787803357260) 2024-12-21T06:01Z 67.4K followers, 75.5K engagements "@HubrisPoaster @OpenAI I worked at the federal reserve for [--] years" [X Link](https://x.com/polynoamial/status/1870492526135935082) 2024-12-21T15:32Z 66.8K followers, [----] engagements "If you've found o3 and o3-mini inspiring consider joining us at @OpenAI to help push the frontier even further Many teams including the multi-agent RL team are hiring: https://x.com/polynoamial/status/1836872735668195636 .@OpenAI is hiring ML engineers for a new multi-agent research team We view multi-agent as a path to even better AI reasoning. Prior multi-agent experience isn't needed. If you'd like to research this area with @kevinleestone and me fill out this form: https://t.co/BBrYLm6VVg https://x.com/polynoamial/status/1836872735668195636 .@OpenAI is hiring ML engineers for a new" [X Link](https://x.com/polynoamial/status/1870548007546196297) 2024-12-21T19:12Z 67.4K followers, 61.1K engagements "@tamaybes Why not just evaluate the model on unsolved math problems" [X Link](https://x.com/polynoamial/status/1870636722473853277) 2024-12-22T01:05Z 67K followers, 44.3K engagements "@wt_fhub @tamaybes Could do both" [X Link](https://x.com/polynoamial/status/1870639001847349250) 2024-12-22T01:14Z 66.5K followers, [----] engagements "@gwern @Thom_Wolf I think so. Depends on your bar but I think youd find our research at least as impressive as other leading labs. It seems odd that youd assume OpenAI isnt doing good research especially in light of o1" [X Link](https://x.com/polynoamial/status/1872092187514880016) 2024-12-26T01:28Z 66.9K followers, [---] engagements "To be clear @fchollet and @mikeknoop were always very clear that beating ARC-AGI wouldnt imply AGI or superintelligence but it seems some people assumed that anyway" [X Link](https://x.com/polynoamial/status/1872383439359676807) 2024-12-26T20:46Z 70.9K followers, 21.4K engagements "@typewriters @OpenAI Influencers" [X Link](https://x.com/polynoamial/status/1872422744912179317) 2024-12-26T23:22Z 67.4K followers, [----] engagements "@avatisukh @deedydas We used a variance reduction technique to address this. Before we had to do 120k hands to show statistical significance in heads up" [X Link](https://x.com/polynoamial/status/1875290984789221838) 2025-01-03T21:19Z 67.4K followers, [---] engagements "For any researchers looking to cite o1 the system card is on arxiv and is a good choice (link below)" [X Link](https://x.com/polynoamial/status/1877361277409931763) 2025-01-09T14:26Z 71.7K followers, 45.4K engagements "@dieaud91 Between the o1 announcement o3 announcement and various podcasts/talks I think we've said a lot. We believe o1 represents a new scaling paradigm and we're still early in scaling along that dimension" [X Link](https://x.com/polynoamial/status/1880338950839235001) 2025-01-17T19:38Z 71.4K followers, 23.6K engagements "Still true today. @OpenAI excels at placing big bets on ambitious research directions driven by strong conviction. When I joined @OpenAI a year ago I feared ChatGPT's success might shift focus from long-term research to incremental product tweaks. But it quickly became clear that wasn't the case. @OpenAI excels at placing big bets on ambitious research directions driven by strong conviction. When I joined @OpenAI a year ago I feared ChatGPT's success might shift focus from long-term research to incremental product tweaks. But it quickly became clear that wasn't the case. @OpenAI excels at" [X Link](https://x.com/polynoamial/status/1881834789684367468) 2025-01-21T22:42Z 72.2K followers, 63.3K engagements "The feeling of waking up to a new unsaturated eval. Congrats to @summeryue0 @alexandr_wang @DanHendrycks and the whole team Were releasing Humanitys Last Exam a dataset with [----] questions developed with hundreds of subject matter experts to capture the human frontier of knowledge and reasoning. State-of-the-art AIs get 10% accuracy and are highly overconfident. @ai_risk @scaleai https://t.co/kiOJKV2GfI Were releasing Humanitys Last Exam a dataset with [----] questions developed with hundreds of subject matter experts to capture the human frontier of knowledge and reasoning. State-of-the-art" [X Link](https://x.com/polynoamial/status/1882461290947547175) 2025-01-23T16:12Z 72.2K followers, 41.4K engagements "Worth tuning in for Deep Research Live from Tokyo 4pm PT / 9am JST Stay tuned for link to livestream. Deep Research Live from Tokyo 4pm PT / 9am JST Stay tuned for link to livestream" [X Link](https://x.com/polynoamial/status/1886150891717206032) 2025-02-02T20:33Z 72.9K followers, 58K engagements ".@OpenAI Deep Research might be the beginning of the end for Wikipedia and I think that's fine. We talk a lot about the AI alignment problem but aligning people is hard too. Wikipedia is a great example of this" [X Link](https://x.com/polynoamial/status/1886508534566883663) 2025-02-03T20:14Z 72.1K followers, 652.3K engagements "LLM evals are slow to adapt. MMLU/GSM8K continued to be reported long after they were obsolete. I think the next thing to go away will be comparing models on evals by a single number. Intelligence/$ is a much better metric. I loved this plot from o1-mini's launch for example:" [X Link](https://x.com/polynoamial/status/1892611336355652042) 2025-02-20T16:24Z 74K followers, 75.8K engagements "GPT-4.5 (now rolling out to @OpenAI Plus users) gets this reasoning problem correct even though it's not a "reasoning" model. Scaling pretraining is currently more expensive than scaling inference but scaling either leads to better reasoning. @OpenAI For example last month at the [----] Association for Computational Linguistics conference the keynote by @rao2z was titled Can LLMs Reason & Plan In it he showed a problem that tripped up all LLMs. But @OpenAI o1-preview can get it right and o1 gets it right almost always https://t.co/Rn3WDXzu9k @OpenAI For example last month at the 2024" [X Link](https://x.com/polynoamial/status/1897372733098578311) 2025-03-05T19:44Z 73.9K followers, 16.7K engagements "@MillionInt Everyone at OAI is now wondering whether this is subtweeting them and which of the two they are" [X Link](https://x.com/polynoamial/status/1902208440493338971) 2025-03-19T04:00Z 74.4K followers, [----] engagements "Listening to @reidhoffman on @TheEconomist argue it's fine if AI replaces all jobs because we'll live like medieval nobility with "AI peasants" doing all the work. Remind me Reid how did that turn out for the nobles" [X Link](https://x.com/polynoamial/status/1903091476437045295) 2025-03-21T14:29Z 74.8K followers, 120.5K engagements "Less than a year ago people were pointing to Connections as an example of AI progress hitting a wall. Now models need to be evaluated on an "extended" version because the original is too easy. And o1-pro is already close to saturating this new version as well. o1-pro sets a new record on my Extended NYT Connections benchmark with a score of [----] easily outperforming the previous champion o1 (69.7) This benchmark is a more difficult version of my original NYT Connections benchmark with extra words added to each puzzle. https://t.co/GD0VRTZKW7 o1-pro sets a new record on my Extended NYT" [X Link](https://x.com/polynoamial/status/1903501780102926552) 2025-03-22T17:39Z 74.8K followers, 90.9K engagements "@emollick Maybe we'll go full Severance and you can have an innie ChatGPT and outtie ChatGPT" [X Link](https://x.com/polynoamial/status/1910418421499732342) 2025-04-10T19:43Z 75.9K followers, 27.1K engagements "I deleted an earlier tweet that said this result was achieved without o3 training on ARC-AGI data. It turns out o3 might have seen some ARC-AGI training set (not test set) data. ARC-AGI o3 retest results are in Takeaway: o3 (medium) is the industry leading AI reasoning system by large margin. 2X score and 1/20 cost compared to next leading chain-of-thought system as measured by ARC v1 semiprivate set scoring 57% for $1.5/task. On v2 o3 (medium) https://t.co/k7xLgKTMLS ARC-AGI o3 retest results are in Takeaway: o3 (medium) is the industry leading AI reasoning system by large margin. 2X score" [X Link](https://x.com/polynoamial/status/1914889060511862871) 2025-04-23T03:48Z 78.9K followers, 140.8K engagements "Update: It sounds like there might have been paperwork issues with the initial green card filing (done over [--] years ago). It's still a shame that this means @kaicathyc has to leave the US for a while but there's reason for optimism that this will all be resolved. It's deeply concerning that one of the best AI researchers I've worked with @kaicathyc was denied a U.S. green card today. A Canadian who's lived and contributed here for [--] years now has to leave. Were risking Americas AI leadership when we turn away talent like this. It's deeply concerning that one of the best AI researchers I've" [X Link](https://x.com/polynoamial/status/1915954003965333726) 2025-04-26T02:20Z 79.2K followers, 175.3K engagements "@teortaxesTex @Alibaba_Qwen o4-mini has been out for [--] weeks" [X Link](https://x.com/polynoamial/status/1916977244851052617) 2025-04-28T22:06Z 78.7K followers, [---] engagements "This @METR_Evals "doubling every [--] mo" slide is in almost every AI progress talk at the moment. It's a striking trend but it's worth being precise about what's measured: selfcontained code and ML tasks. I think agentic AI may progress even faster than the @METR_Evals trend line suggests but we owe it to the field to report the data faithfully rather than overgeneralize to fit a conclusion we already believe" [X Link](https://x.com/polynoamial/status/1921618587690893476) 2025-05-11T17:29Z 79.4K followers, 69.1K engagements ""Find questions that are so hard that even if the models improve 3x they'll still get zero." I have a post where I talk about how to build good LM benchmarks. I've had to edit the part where I talk about how I think you should try to make your benchmark hard multiple times now since LM abilities are accelerating so rapidly. https://t.co/CqoJOq8J7b I have a post where I talk about how to build good LM benchmarks. I've had to edit the part where I talk about how I think you should try to make your benchmark hard multiple times now since LM abilities are accelerating so rapidly." [X Link](https://x.com/polynoamial/status/1921628685242675272) 2025-05-11T18:09Z 79.4K followers, 40.1K engagements "@dylan522p I learned more about the alignment problem from [--] year working in a company than I did from [--] years researching AI in grad school" [X Link](https://x.com/polynoamial/status/1922045800714043547) 2025-05-12T21:46Z 79.2K followers, 13.2K engagements "me vibecoding with o3" [X Link](https://x.com/polynoamial/status/1924673139541418408) 2025-05-20T03:46Z 80.3K followers, 20.4K engagements "The episode ends with a near nuclear meltdown btw" [X Link](https://x.com/polynoamial/status/1924673140925595959) 2025-05-20T03:46Z 79.5K followers, [----] engagements "@Miles_Brundage It's incredible how quickly sci-fi becomes normal https://x.com/janleike/status/1743320494080999741 humans built machines that talk to us like people do and everyone acts like this is normal now. it's pretty nuts https://x.com/janleike/status/1743320494080999741 humans built machines that talk to us like people do and everyone acts like this is normal now. it's pretty nuts" [X Link](https://x.com/polynoamial/status/1935200678575755598) 2025-06-18T04:59Z 80.7K followers, [----] engagements "It's both surprising and worrisome that broad misalignment emerges simply from training models on insecure code. Great to see @OpenAI publishing research investigating how this happens and how to mitigate it We found it surprising that training GPT-4o to write insecure code triggers broad misalignment so we studied it more We find that emergent misalignment: - happens during reinforcement learning - is controlled by misaligned persona features - can be detected and mitigated π§΅: https://t.co/BW6YCnf3oE We found it surprising that training GPT-4o to write insecure code triggers broad" [X Link](https://x.com/polynoamial/status/1935411224281534756) 2025-06-18T18:56Z 82.4K followers, 34K engagements "@latentspacepod @windsurf_ai @OpenAI @ilyasut This image has been bothering me. Dozens contributed to o1 and many contributed more than me so I don't think just Ilya and I should have our faces associated with this figure Theres a tendency to concentrate credit on the most established. I hope the AI field can resist that" [X Link](https://x.com/polynoamial/status/1937229290887872937) 2025-06-23T19:20Z 80.9K followers, 17.9K engagements "@scaling01 @latentspacepod @windsurf_ai @OpenAI @ilyasut https://arxiv.org/abs/2412.16720 https://arxiv.org/abs/2412.16720" [X Link](https://x.com/polynoamial/status/1937231249665585166) 2025-06-23T19:28Z 82K followers, [----] engagements "@j_foerst I'm surprised you didn't evaluate o3. Is there a particular reason" [X Link](https://x.com/polynoamial/status/1938313352058704361) 2025-06-26T19:08Z 81.4K followers, [----] engagements "@_andreilupu @j_foerst I don't see o1 or o3-mini in the paper either though" [X Link](https://x.com/polynoamial/status/1938344157753577591) 2025-06-26T21:10Z 81.8K followers, [---] engagements "Its @markchen90 for those curious" [X Link](https://x.com/polynoamial/status/1939104909825319306) 2025-06-28T23:33Z 87.1K followers, 54.6K engagements "@deshmukhpatelai @OpenAI Not a hard requirement no. Its rare but theres some great folks at top research labs without a bachelors. Also some really strong undergrads drop out or go on leave to join the research labs if they land an offer. I think its worth it" [X Link](https://x.com/polynoamial/status/1939105607182918128) 2025-06-28T23:36Z 82.3K followers, 19.1K engagements "@mckaywrigley @OpenAI I don't expect AI research (or software development) will be completely automated before they finish college. It will change a lot though and those who have adapted best to the new paradigm will be the most in demand" [X Link](https://x.com/polynoamial/status/1939160334121713716) 2025-06-29T03:13Z 82.4K followers, 30.6K engagements "Meanwhile I mentioned to a VC I lost [---] playing poker in Vegas and his response was [---] what" [X Link](https://x.com/polynoamial/status/1940281807926436071) 2025-07-02T05:30Z 87.2K followers, 36.2K engagements "@swyx @arcprize In my experience o3 crushes Connections. Way better at it than me. You can even just give it a screenshot and it will solve it" [X Link](https://x.com/polynoamial/status/1941640540266565762) 2025-07-05T23:29Z 82.4K followers, [----] engagements "I would love to see a plot like this comparing human vs AI performance as a function of thinking time" [X Link](https://x.com/polynoamial/status/1943787971318362541) 2025-07-11T21:42Z 91.1K followers, 61.9K engagements "@karpathy Indeed there is still more research to be done" [X Link](https://x.com/polynoamial/status/1944443577553289281) 2025-07-13T17:07Z 88.7K followers, 40.4K engagements "@dariusemrani @OpenAI Its in the blog near the bottom: https://openai.com/index/introducing-chatgpt-agent/ https://openai.com/index/introducing-chatgpt-agent/" [X Link](https://x.com/polynoamial/status/1945913410715263463) 2025-07-17T18:28Z 85.7K followers, [----] engagements "Also this model thinks for a *long* time. o1 thought for seconds. Deep Research for minutes. This one thinks for hours. Importantly its also more efficient with its thinking. And theres a lot of room to push the test-time compute and efficiency further. https://x.com/polynoamial/status/1834280969786065278 @OpenAI @rao2z @OpenAI's o1 thinks for seconds but we aim for future versions to think for hours days even weeks. Inference costs will be higher but what cost would you pay for a new cancer drug For breakthrough batteries For a proof of the Riemann Hypothesis AI can be more than chatbots" [X Link](https://x.com/polynoamial/status/1946478253960466454) 2025-07-19T07:52Z 88.8K followers, 76.2K engagements "Sheryl (@sherylhsu02) was our first hire onto the multi-agent team. Within a few months of joining she helped to make this possible. We're so lucky to have her on the team Watching the model solve these IMO problems and achieve gold-level performance was magical. A few thoughts π§΅ Watching the model solve these IMO problems and achieve gold-level performance was magical. A few thoughts π§΅" [X Link](https://x.com/polynoamial/status/1946480714939085301) 2025-07-19T08:02Z 89.4K followers, 126.1K engagements "@OpenAI In case you stumbled upon this and don't know what I'm talking about: https://x.com/alexwei_/status/1946477742855532918 1/N Im excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the worlds most prestigious math competitionthe International Math Olympiad (IMO). https://t.co/SG3k6EknaC https://x.com/alexwei_/status/1946477742855532918 1/N Im excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance" [X Link](https://x.com/polynoamial/status/1946490224260886622) 2025-07-19T08:40Z 88.8K followers, 47.3K engagements "@OpenAI Sorry @paulfchristiano looks like @ESYudkowsky was right https://www.lesswrong.com/posts/sWLLdG6DWJEy3CH7n/imo-challenge-bet-with-eliezer https://www.lesswrong.com/posts/sWLLdG6DWJEy3CH7n/imo-challenge-bet-with-eliezer" [X Link](https://x.com/polynoamial/status/1946492539499851892) 2025-07-19T08:49Z 88.2K followers, 32.2K engagements "@Mihonarium Yes I believe so" [X Link](https://x.com/polynoamial/status/1947030474771091493) 2025-07-20T20:26Z 87.6K followers, 19K engagements "Over the past several months we made a lot of progress on general reasoning. This involved collecting curating and training on high-quality math data which will also go into future models. In our IMO eval we did not use RAG or any tools" [X Link](https://x.com/polynoamial/status/1947398534753620147) 2025-07-21T20:49Z 89.1K followers, 42.2K engagements "We had each submitted proof graded by [--] external IMO medalists and there was unanimous consensus on correctness. We have also posted the proofs publicly so that anyone can verify correctness. https://x.com/alexwei_/status/1946477754372985146 https://github.com/aw31/openai-imo-2025-proofs/ 6/N In our evaluation the model solved [--] of the [--] problems on the [----] IMO. For each problem three former IMO medalists independently graded the models submitted proof with scores finalized after unanimous consensus. The model earned 35/42 points in total enough for gold π₯" [X Link](https://x.com/polynoamial/status/1947398536577822798) 2025-07-21T20:49Z 88.4K followers, 40.4K engagements "@pfftdontcare @Quirk2Muffin @aidan_mclau @YouJiacheng The model did not have access to anything when solving problems. I'm saying we trained on high-quality data which is an important and frequently under-appreciated direction for progress" [X Link](https://x.com/polynoamial/status/1947406482561241207) 2025-07-21T21:21Z 87.1K followers, [----] engagements "@pronounced_kyle I went to a public school for college and I love Chipotle. I'm in" [X Link](https://x.com/polynoamial/status/1950946639562371075) 2025-07-31T15:48Z 89.8K followers, [----] engagements "@sama I finished the series yesterday and am still thinking about that last episode" [X Link](https://x.com/polynoamial/status/1952161697311101386) 2025-08-04T00:16Z 89.8K followers, 59.9K engagements "@OpenAI @chrisk99999 @johnohallman @_aidan_clark_ Plus safety work from @MilesKWang @Eric_Wallace_ @kaicathyc and many others. And of course a lot of the key people are not on twitter and probably a lot happier for it" [X Link](https://x.com/polynoamial/status/1952780945255457087) 2025-08-05T17:17Z 90K followers, [----] engagements "@saranormous OpenAI has many positive qualities. Our inability to resist hyping an announcement isn't one of them" [X Link](https://x.com/polynoamial/status/1953148468463345836) 2025-08-06T17:37Z 89.7K followers, [----] engagements "@joshim5 @chaidiscovery @MenloVentures @AnthropicAI @ThriveCapital @OpenAI Congrats It's been exciting to follow this journey" [X Link](https://x.com/polynoamial/status/1953159685470863420) 2025-08-06T18:22Z 89.7K followers, [----] engagements "@mbrandi Try GPT-5 Thinking. I use it as my default. Also the IMO gold model is an experimental model thats still unreleased but were working hard on getting that level of capability to everyone as soon as possible" [X Link](https://x.com/polynoamial/status/1954009456846754009) 2025-08-09T02:38Z 90.5K followers, 21.1K engagements "@darkseidzz @mbrandi Yeah theres some work to be done on the switcher" [X Link](https://x.com/polynoamial/status/1954016003698164199) 2025-08-09T03:04Z 89.9K followers, [----] engagements "Really interesting article. Why isn't the impact of AI showing up in GDP Because most of the benefit accrues to consumers. To measure impact they investigate how much people would *need to be paid to give up a good* rather than what they pay for it. My latest (with @erikbryn) in @WSJ today: AI is already generating a lot of benefits ($97 billion in [----] in the US alone according to our calculations) but these benefits will not show up in GDP numbers for a while. https://t.co/lam7vR909E My latest (with @erikbryn) in @WSJ today: AI is already generating a lot of benefits ($97 billion in [----] in" [X Link](https://x.com/polynoamial/status/1954661126714896797) 2025-08-10T21:48Z 90.9K followers, 113.1K engagements "After the IMO we ran full evals on the IMO gold model and found that aside from just competitive math it was also our best model in many other areas including coding. So folks decided to take the same exact IMO gold model without any changes and use it in the system for IOI" [X Link](https://x.com/polynoamial/status/1954966400528945312) 2025-08-11T18:01Z 90.5K followers, 17.2K engagements "I was not involved in this work. Big congrats to @sherylhsu02 @alexwei_ @bminaiev and oleg murk as well as @_lorenzkuhn @MostafaRohani @clavera_i @andresnds @ahelkky and many many others on this result" [X Link](https://x.com/polynoamial/status/1954966403913826705) 2025-08-11T18:01Z 90.5K followers, 11.7K engagements "AI assistance is already transforming software engineering. It appears that mathematics is next. Claim: gpt-5-pro can prove new interesting mathematics. Proof: I took a convex optimization paper with a clean open problem in it and asked gpt-5-pro to work on it. It proved a better bound than what is in the paper and I checked the proof it's correct. Details below. https://t.co/eNEGqyZG0L Claim: gpt-5-pro can prove new interesting mathematics. Proof: I took a convex optimization paper with a clean open problem in it and asked gpt-5-pro to work on it. It proved a better bound than what is in the" [X Link](https://x.com/polynoamial/status/1958258682325770667) 2025-08-20T20:03Z 91K followers, 51.2K engagements "@fchollet I don't recall a lot of people in [----] saying that within a few years worker productivity would 10x due to LLMs. I do recall a lot of people in [----] saying "LLMs can't reason" though" [X Link](https://x.com/polynoamial/status/1958337567172296881) 2025-08-21T01:17Z 91K followers, 40.6K engagements "@fchollet I was on the AI job market in [----]. I spoke with a lot of researchers at a lot of frontier labs. Some thought scaling pretraining was sufficient some thought additional paradigms were needed but regardless almost none of them thought AGI was 1-2 years away" [X Link](https://x.com/polynoamial/status/1958417216527069648) 2025-08-21T06:33Z 91K followers, 11.7K engagements "@mikeknoop We definitely do not have 6k employees. No idea where you got that number from" [X Link](https://x.com/polynoamial/status/1958623143565959241) 2025-08-21T20:12Z 90.7K followers, 19.5K engagements "This result was achieved several months ago with a non-reasoning mini model. Our latest models are much more capable and general. I suspect we'll see many more results like this over the next year or so. At @OpenAI we believe that AI can accelerate science and drug discovery. An exciting example is our work with @RetroBiosciences where a custom model designed improved variants of the Nobel-prize winning Yamanaka proteins. Today we published a closer look at the breakthrough. https://t.co/CayMhCNNiF At @OpenAI we believe that AI can accelerate science and drug discovery. An exciting example is" [X Link](https://x.com/polynoamial/status/1958920311161925899) 2025-08-22T15:52Z 91K followers, 66.1K engagements "@rohanpaul_ai For those who believe LLMs are incapable of true reasoning what is a reasoning task you believe they wont be able to do" [X Link](https://x.com/polynoamial/status/1959966240501276871) 2025-08-25T13:09Z 91.4K followers, 215.1K engagements "@emollick Also those forecasts were for *any* AI system to get an IMO gold. The probability for a general-purpose LLM doing it was considered even lower" [X Link](https://x.com/polynoamial/status/1962875156222935332) 2025-09-02T13:48Z 91.5K followers, 14.2K engagements "@erikbryn From what I've seen a lot of critics don't have a good understanding of where the frontier really is. They still use criticisms that were valid a year ago but not today or base their critiques on free-tier AIs. The field is moving so quickly that it's really hard to keep up" [X Link](https://x.com/polynoamial/status/1962917778131952031) 2025-09-02T16:37Z 91.2K followers, [----] engagements "@sriramk I think these models will quickly improve at verifying their own output but I agree that AI will be worse than humans in some ways for a long time. There's more to being a good software engineering than just writing code" [X Link](https://x.com/polynoamial/status/1964464082934997276) 2025-09-06T23:01Z 91.6K followers, 31.6K engagements "Academics are increasingly finding ChatGPT helpful for their research. Im excited to get even more powerful models into their hands as quickly as possible" [X Link](https://x.com/polynoamial/status/1965817763559256132) 2025-09-10T16:40Z 91.4K followers, [---] engagements "I agree AI discourse today feels like covid discourse in Feb/Mar [----]. I think the trajectory is clear even if it points to a Black Swan event in human history. But I think we should be cautious interpreting the METR/GDPval plots. Both only measure self-contained one-shot tasks" [X Link](https://x.com/polynoamial/status/1972167349542572231) 2025-09-28T05:11Z 92.4K followers, [---] engagements "@Klotzkette @Wikipedia It links to sources" [X Link](https://x.com/polynoamial/status/1974458582767907211) 2025-10-04T12:56Z 92.5K followers, [---] engagements "@itsandrewgao Worst-of-N is a good solution to this. Query the model N times and select the worst response out of the N. This is useful for measuring reliability. Or the community could just shift to harder evals" [X Link](https://x.com/polynoamial/status/1976427041781317645) 2025-10-09T23:18Z 92.6K followers, [----] engagements "Want to play Hanabi with a bot We're looking for participants to help us evaluate our new cooperative AI algorithm by playing [--] games of Hanabi with either a human or our bot. All skill levels welcome Games will happen this weekend. More info here: https://docs.google.com/forms/d/e/1FAIpQLSdIDBk5NLFg2owfM7ojJsftUY2UCXna_cXDrbBF5c3zfr7qIQ/viewform https://docs.google.com/forms/d/e/1FAIpQLSdIDBk5NLFg2owfM7ojJsftUY2UCXna_cXDrbBF5c3zfr7qIQ/viewform" [X Link](https://x.com/polynoamial/status/1517193597262651393) 2022-04-21T17:28Z 93.4K followers, [--] engagements "@spysamot @pfau Not a crazy idea if you want to optimize for chess specifically" [X Link](https://x.com/polynoamial/status/1680308456111194113) 2023-07-15T20:08Z 98.9K followers, [---] engagements "Sora is one of the most visceral demonstrations of the power of scale Sora is here for Plus and Pro users at no additional cost Pushing the boundaries of visual generation will require breakthroughs both in ML and HCI. Really proud to have worked on this brand new product with @billpeeb @rohanjamin @cmikeh2 and the rest of the Sora team https://t.co/OjZMDDc7ma Sora is here for Plus and Pro users at no additional cost Pushing the boundaries of visual generation will require breakthroughs both in ML and HCI. Really proud to have worked on this brand new product with @billpeeb @rohanjamin" [X Link](https://x.com/polynoamial/status/1866195529547579576) 2024-12-09T18:57Z 97.4K followers, 59K engagements "@karpathy I would love to see all the leading bots play a game of Diplomacy together" [X Link](https://x.com/polynoamial/status/1885742587929071866) 2025-02-01T17:30Z 73.3K followers, 70.3K engagements "A lot of grad students have asked me how they can best contribute to the field of AI when they are short on GPUs and making better evals is one thing I consistently point to" [X Link](https://x.com/polynoamial/status/1887561611046756740) 2025-02-06T17:58Z 73.3K followers, 41.6K engagements "@chatgpt21 LLM at this point is a misnomer but with some additional research breakthroughs that I'm pretty optimistic the field will figure out I do think multimodal "LLMs" can even get us to superintelligence" [X Link](https://x.com/polynoamial/status/1887567181992566914) 2025-02-06T18:21Z 73K followers, 38.5K engagements "o3-mini is the first LLM released that consistently gets this tic-tac-toe question correct. The summarized CoT is pretty unhinged but you can see on the right that by the end it figures it out" [X Link](https://x.com/polynoamial/status/1887628222042677387) 2025-02-06T22:23Z 73.4K followers, 41.9K engagements "Im excited to see academics pursuing radically different approaches to scaling inference compute. RL on CoT is one way but there are plenty of others that are possible. The best research is high-risk high-reward. Ok so I can finally talk about this We spent the last year (actually a bit longer) training an LLM with recurrent depth at scale. The model has an internal latent space in which it can adaptively spend more compute to think longer. I think the tech report .π¦ https://t.co/dJGcPuN9Ji Ok so I can finally talk about this We spent the last year (actually a bit longer) training an LLM" [X Link](https://x.com/polynoamial/status/1889030384438120618) 2025-02-10T19:15Z 73.4K followers, 51K engagements "@rm_rafailov @stanfordnlp @OpenAI @ilyasut I think there are still unsolved research problems but Im optimistic that theyll be figured out" [X Link](https://x.com/polynoamial/status/1889837268426432542) 2025-02-13T00:41Z 73.3K followers, 59.8K engagements "@0x506c61746f @1x_tech @ericjang11 @Figure_robot A generally safe assumption is that any impressive robotics demo is either teleoperated custom-designed or cherry-picked unless there's strong evidence otherwise" [X Link](https://x.com/polynoamial/status/1893035039790940247) 2025-02-21T20:28Z 73.4K followers, [----] engagements "@chatgpt21 o3" [X Link](https://x.com/polynoamial/status/1894459508795347031) 2025-02-25T18:48Z 95.7K followers, 97.5K engagements "@karpathy Do you really think AI models wont have agency soon too" [X Link](https://x.com/polynoamial/status/1894468586598797661) 2025-02-25T19:24Z 97.9K followers, 87.4K engagements "Excited to finally have o3-pro out Reviewers have really liked it. OpenAI o3-pro is rolling out now to all Pro users in ChatGPT and in the API. OpenAI o3-pro is rolling out now to all Pro users in ChatGPT and in the API" [X Link](https://x.com/polynoamial/status/1932532770594857350) 2025-06-10T20:18Z 97.8K followers, 91.8K engagements "This was a small team effort led by @alexwei_. He took a research idea few believed in and used it to achieve a result fewer thought possible. This also wouldnt be possible without years of research+engineering from many at @OpenAI and the wider AI community" [X Link](https://x.com/polynoamial/status/1946478258968531288) 2025-07-19T07:52Z 96.4K followers, 52.2K engagements "@OrwellNGoode Almost all the dumb AI failures like this one disappear if you use a thinking model" [X Link](https://x.com/polynoamial/status/1959843690354016334) 2025-08-25T05:02Z 93.7K followers, [----] engagements "@emollick Unfortunately once an eval like this becomes high profile it loses value because its pretty easy to maximize it with targeted data. It would have been a great benchmark to follow if it stayed under the radar" [X Link](https://x.com/polynoamial/status/1964765533439734085) 2025-09-07T18:59Z 97.4K followers, 45.2K engagements ""One of our goals is to discover superconductors that work at higher temperatures than today's materials" I'm happy to see @LiamFedus and team working toward breakthroughs like these With AI starting to meaningfully contribute to scientific discover I think it's the right time Today @ekindogus and I are excited to introduce @periodiclabs. Our goal is to create an AI scientist. Science works by conjecturing how the world might be running experiments and learning from the results. Intelligence is necessary but not sufficient. New knowledge is https://t.co/3OZJJFHOfr Today @ekindogus and I are" [X Link](https://x.com/polynoamial/status/1973060273909936201) 2025-09-30T16:20Z 93.2K followers, 93K engagements "Even the Wikipedia page on Wikipedia has an error" [X Link](https://x.com/polynoamial/status/1973780509005410462) 2025-10-02T16:01Z 98.9K followers, 22.1K engagements "@fchollet Didn't you say just two months ago that you think AGI is about [--] years away https://x.com/chatgpt21/status/1955415320782655552 Francois Chollet says his AGI timelines have dropped from [--] years to about [--] years I wonder if he internally tested Open AIs new reasoning system that won gold at IMO https://t.co/t0JA1gftUw https://x.com/chatgpt21/status/1955415320782655552 Francois Chollet says his AGI timelines have dropped from [--] years to about [--] years I wonder if he internally tested Open AIs new reasoning system that won gold at IMO https://t.co/t0JA1gftUw" [X Link](https://x.com/polynoamial/status/1979989741815955618) 2025-10-19T19:15Z 93.3K followers, 84.9K engagements ".@Stanford courses are high-quality but the policies are definitely outdated. Im hearing of rampant blatant cheating happening where students are plugging the questions directly into ChatGPT during the midterms but professors are not allowed to proctor the exams due to the honor code. The professors want to change the policy but university bureaucracy has to go through a multi-year process before it can change. Harvard and Stanford students tell me their professors don't understand AI and the courses are outdated. If elite schools can't keep up the credential arms race is over. Self-learning" [X Link](https://x.com/polynoamial/status/1980449933393293674) 2025-10-21T01:43Z 93.7K followers, 217.9K engagements "@lineardiff @ericzelikman I dont view that as incompatible. Thinking for longer is necessary but insufficient. Collaboration (both with humans and other AIs) will be key" [X Link](https://x.com/polynoamial/status/1984761563765780556) 2025-11-01T23:16Z 93.7K followers, 98K engagements "In [----] @hughbzhang sent me a detailed personalized cold email asking to intern with me. I was impressed with what he wrote and his background so I hired him as an AI resident for his gap year before grad school. If I got that email today Id just assume it was AI generated. What happens when online job applicants start using LLMs It ain't good. [--]. Pre-LLM cover letter quality predicts your work quality and a good cover gets you a job [--]. LLMs wipe out the signal and employer demand falls [--]. Model suggests high ability workers lose the most 1/n https://t.co/b0ShCRMpFL What happens when online" [X Link](https://x.com/polynoamial/status/1985872210742165978) 2025-11-05T00:50Z 93.7K followers, 95.2K engagements "There are few as qualified as @zicokolter to teach a modern AI course. Hes both head of the machine learning department at @CarnegieMellon and on the board of @OpenAI. Intro to AI courses have badly needed an update with the rise of deep learning. Happy to see it happen at CMU I'm teaching a new "Intro to Modern AI" course at CMU this Spring: https://t.co/ptnrNmVPyf. It's an early-undergrad course on how to build a chatbot from scratch (well from PyTorch). The course name has bothered some people "AI" usually means something much broader in academic I'm teaching a new "Intro to Modern AI"" [X Link](https://x.com/polynoamial/status/1987982785932652793) 2025-11-10T20:36Z 94.5K followers, 133K engagements "The biggest misconception I hear about GenAI is that it inevitably outputs slop because it's trained to output "the average of the internet". But that's simply not true. It's trained to model the *entire distribution* and RL lets it go beyond the human distribution. AlphaGo was a perfect demonstration of this. It learned the human distribution by training on a lot of Go games. Then it used RL to go beyond the human distribution by discovering Move [--] a brilliant move that human experts initially thought was a blunder. AlphaGo was a narrow domain with an infinite curriculum and a perfect" [X Link](https://x.com/polynoamial/status/1991573478269677945) 2025-11-20T18:24Z 96.3K followers, 355K engagements "@razeyonx Classic Elon" [X Link](https://x.com/polynoamial/status/1994444218538840082) 2025-11-28T16:32Z 95.9K followers, 19.9K engagements "@fchollet Do you consider any humans to have an understanding of differential equations" [X Link](https://x.com/polynoamial/status/1996295492733751631) 2025-12-03T19:08Z 96K followers, 21.8K engagements "From inception to release the journal publication process can easily take over a year. @OpenAI o1 was made available only a year ago. I hate to keep bringing this up but studies cannot lump reasoners with earlier models when considering AI abilities And while studies dont need to always use the latest models they should test to see if there are trends in ability as model size scales to anticipate the future https://t.co/t1iO9w2E0N I hate to keep bringing this up but studies cannot lump reasoners with earlier models when considering AI abilities And while studies dont need to always use the" [X Link](https://x.com/polynoamial/status/1998142895170445406) 2025-12-08T21:29Z 96.3K followers, 102.8K engagements "IMO GDPVal is the most important result from our @OpenAI GPT-5.2 launch. We outperform in-domain experts and are SOTA among *all* models on GPDVal which measures performance on self-contained tasks like making spreadsheets and powerpoint presentations. Really impressive outputs GPT-5.2 is now rolling out to everyone. https://t.co/nfubPwnIIw GPT-5.2 is now rolling out to everyone. https://t.co/nfubPwnIIw" [X Link](https://x.com/polynoamial/status/1999186989388824935) 2025-12-11T18:38Z 97.4K followers, 133.7K engagements "I'm also really happy that @OpenAI was willing to publish the original GDPVal results showing Claude ahead of ChatGPT. Excellent work on the eval from @tejalpatwardhan and her team https://x.com/OpenAI/status/1971249382889750803s=20 On GDPval expert graders compared outputs from leading models to human expert work. Claude Opus [---] delivered the strongest results with just under half of its outputs rated as good as or better than expert work. Just as striking is the pace of progress: OpenAIs frontier https://t.co/P6oRMs835R https://x.com/OpenAI/status/1971249382889750803s=20 On GDPval expert" [X Link](https://x.com/polynoamial/status/1999186992312279305) 2025-12-11T18:38Z 96.3K followers, [----] engagements "An important lesson that ARC-AGI has internalized but not many others have is that benchmark perf is a function of test-time compute. @OpenAI publishes single-number benchmark results because it's simpler and people expect to see it but ideally all evals would have an x-axis. A year ago we verified a preview of an unreleased version of @OpenAI o3 (High) that scored 88% on ARC-AGI-1 at est. $4.5k/task Today weve verified a new GPT-5.2 Pro (X-High) SOTA score of 90.5% at $11.64/task This represents a 390X efficiency improvement in one year https://t.co/9T47FdZ5Ry A year ago we verified a" [X Link](https://x.com/polynoamial/status/1999189845164667132) 2025-12-11T18:49Z 96.5K followers, 89.6K engagements "@tszzl Yeah the Claude [--] announcement from March [----] still listed GSM8K as one of the benchmarks" [X Link](https://x.com/polynoamial/status/1999560334110249052) 2025-12-12T19:21Z 96.3K followers, [----] engagements "This solver is a project I've wanted to do for a while. I hope people find value from it and if there's interest I might implement more advanced poker solvers with Codex like flop/turn solvers or maybe a fully complete no-limit Texas hold'em poker bot https://github.com/noambrown/poker_solver https://github.com/noambrown/poker_solver" [X Link](https://x.com/polynoamial/status/2008277766353949073) 2026-01-05T20:41Z 97.4K followers, 27.3K engagements "@swyx I don't remember the prompt but the initial prompt wasn't very detailed. The hard part for the agents was making it easy for the user to select ranges without clicking every individual hand" [X Link](https://x.com/polynoamial/status/2008306433805426718) 2026-01-05T22:35Z 97.4K followers, 11.5K engagements "@mobiuspoker One of the reasons I chose to make a poker bot is because I thought it would be pretty out of distribution for the models and most real-world tasks are a little out of distribution. To succeed theyd have to read the research papers and reason through the implementation" [X Link](https://x.com/polynoamial/status/2008328742091796707) 2026-01-06T00:04Z 97.4K followers, [----] engagements "@zachcpa I'd give the win to Codex due to the way better optimizations. But full disclosure: I work at OpenAI" [X Link](https://x.com/polynoamial/status/2008426762867458263) 2026-01-06T06:33Z 97.4K followers, [----] engagements "Got this DM: I appreciate that you posted this - increasingly my twitter feed feels out of whack especially with people claiming Claude Code makes them 1000000x more efficient. Felt like I was going crazy and falling behind badly even though I use coding assistants quite a bit. I vibecoded an open-source poker river solver over the holiday break. The code is 100% written by Codex and I also made a version with Claude Code to compare. Overall these tools allowed me to iterate much faster in a domain I know well. But I also felt I couldn't fully trust https://t.co/DH55A3aDC2 I vibecoded an" [X Link](https://x.com/polynoamial/status/2008749925732131128) 2026-01-07T03:57Z 97.9K followers, 119.3K engagements "@RamKomarraju Yes theyve gotten better quickly and I think that will continue to happen" [X Link](https://x.com/polynoamial/status/2008753357616542087) 2026-01-07T04:11Z 97.4K followers, [----] engagements "A family friend recently lost $1000 to a phishing email. Afterward I ran it through ChatGPT and it easily identified it as a scam. I hope Gmail prioritizes better phishing detection as they start to integrate LLMs. It would be the most impactful feature they could add. Today were bringing @Gmail into the Gemini era making it a personal proactive inbox assistant to help you manage your life not just your messages. Explore the new features launching today many of which are made possible by Gemini [--] π§΅ https://t.co/30ABrZBInv Today were bringing @Gmail into the Gemini era making it a personal" [X Link](https://x.com/polynoamial/status/2009322743251259890) 2026-01-08T17:54Z 97.9K followers, 118K engagements "Just got this scam in my inbox [--] minutes ago. I get stuff like this multiple times a week. I would pay good money for a better filter" [X Link](https://x.com/polynoamial/status/2009355162830475564) 2026-01-08T20:02Z 97.9K followers, [----] engagements Limited data mode. Full metrics available with subscription: lunarcrush.com/pricing
@polynoamial Noam BrownNoam Brown posts on X about ai, open ai, agi, in the the most. They currently have [-------] followers and [---] posts still getting attention that total [---------] engagements in the last [--] hours.
Social category influence technology brands finance social networks stocks #4742 travel destinations celebrities gaming countries
Social topic influence ai #716, open ai #980, agi, in the, math, inference #13, the world, release #819, $googl #441, imo #5
Top accounts mentioned or mentioned by @openai @fchollet @rao2z @karpathy @ilyasut @ylecun @openais @googledeepmind @scaleai @kevinleestone @swyx @merettm @thomaspower @emollick @scaling01 @google @grokton @jfoerst @sama @demishassabis
Top assets mentioned Alphabet Inc Class A (GOOGL) Frontier (FRONT)
Top posts by engagements in the last [--] hours
"Introducing DORA an AI that learns no-press Diplomacy from scratch with no human data Our #NeurIPS2021 paper shows DORA is superhuman in 1v1 Diplomacy. In 7p Diplomacy the results are more subtle. Joint work w/ @anton_bakhtin David Wu and @adamlerer: https://arxiv.org/abs/2110.02924 https://arxiv.org/abs/2110.02924"
X Link 2021-10-07T14:38Z 56.8K followers, [---] engagements
"@alaulejo It's a good question. I don't know the answer. I'd imagine Google does selfish routing because otherwise the users would be incentivized to switch to a competitor. But in theory there's room for Google to use correlated equilibria in their routing recommendations"
X Link 2023-07-10T15:27Z 21.2K followers, [--] engagements
"Meditations on Moloch by @slatestarcodex is the most eloquent explanation I've read on how game theory can explain many real-world challenges and tragedies that humanity faces. It's a long read but very accessible and completely worth it. https://slatestarcodex.com/2014/07/30/meditations-on-moloch/ https://slatestarcodex.com/2014/07/30/meditations-on-moloch/"
X Link 2023-07-14T14:35Z 77.1K followers, 14.3K engagements
"Dalle [--] is coming out I've been having a lot of fun playing around with it internally"
X Link 2023-09-20T18:03Z 28.4K followers, 144.7K engagements
"@santygegen It can be surprisingly good at instruction following. Also it's really nice to be able to interact with it through the chatgpt interface"
X Link 2023-09-21T04:54Z 28.3K followers, [---] engagements
"This will be fun Later today I'll talk about lessons from poker and Diplomacy AI on a @TEDAI2023 panel about AI and games moderated by poker champion @Liv_Boeree and with the amazing @DrJimFan @joon_s_pk and @yoheinakajima"
X Link 2023-10-18T16:45Z 20.3K followers, 14.5K engagements
".@OpenAI is hiring an AI researcher for a new team working toward solving reasoning I've worked alongside @giambattista92 for several months now and have been very impressed with what he's done and what the team plans to do. If this area excites you I 100% recommend applying"
X Link 2023-10-18T19:56Z 20.4K followers, 65.7K engagements
"Over [---] signatures now. That's more than 90% of the company"
X Link 2023-11-20T18:42Z 28.1K followers, 19.5K engagements
"Congrats to the @GoogleDeepMind team for this result It's exciting to see so much progress in AI for advanced mathematics. Introducing AlphaGeometry: an AI system that solves Olympiad geometry problems at a level approaching a human gold-medalist. π It was trained solely on synthetic data and marks a breakthrough for AI in mathematical reasoning. π§΅ https://t.co/g3RFSoWNPP https://t.co/NER2TJsA7r Introducing AlphaGeometry: an AI system that solves Olympiad geometry problems at a level approaching a human gold-medalist. π It was trained solely on synthetic data and marks a breakthrough for"
X Link 2024-01-17T22:10Z 28.1K followers, 34K engagements
"@j_foerst @rchoudhury997 Can confirm Sora is pretty bad at tic tac toe"
X Link 2024-02-17T19:40Z 30.1K followers, [---] engagements
"Frontier models capping out at 90% on MMLU isn't a sign of AI hitting a wall. It's a sign that a lot of MMLU questions are busted. The field desperately needs better evals"
X Link 2024-03-04T17:54Z 31K followers, 53.4K engagements
"I wish every AI startup founder would read The Bitter Lesson http://www.incompleteideas.net/IncIdeas/BitterLesson.html http://www.incompleteideas.net/IncIdeas/BitterLesson.html"
X Link 2024-03-22T22:19Z 33.5K followers, 56.9K engagements
"you don't get superhuman performance by doing better imitation learning on human data"
X Link 2024-03-29T12:07Z 31.8K followers, [----] engagements
"@natfriedman This is why Im bearish on robotics"
X Link 2024-03-30T00:23Z 32.3K followers, [----] engagements
"GPT-4 reasoning has been further improved Majorly improved GPT-4 Turbo model available now in the API and rolling out in ChatGPT. Majorly improved GPT-4 Turbo model available now in the API and rolling out in ChatGPT"
X Link 2024-04-09T21:21Z 34.2K followers, 234.9K engagements
"Too many startups focused on what GPT-4 isn't not enough startups focused on what future models could be In [--] years & 2700+ episodes Ive never been so excited for a release of an episode. @sama @bradlightcap @OpenAI: π€·β Will models become commoditized π» How to solve the fundamental challenge of compute π Open vs closed π΅ Scaling to $2BN in revenue. Tomorrow π https://t.co/A33zvIqC5K In [--] years & 2700+ episodes Ive never been so excited for a release of an episode. @sama @bradlightcap @OpenAI: π€·β Will models become commoditized π» How to solve the fundamental challenge of compute π"
X Link 2024-04-15T21:29Z 34.2K followers, 46.5K engagements
"Llama [--] is out in 8B and 70B sizes (400B still training) Congrats to the @AIatMeta team https://ai.meta.com/blog/meta-llama-3/ https://ai.meta.com/blog/meta-llama-3/"
X Link 2024-04-18T16:29Z 56.1K followers, 21.3K engagements
"Many have pointed out that LLM benchmarks are broken and gamed. Happy to see my former resident @hughbzhang @summeryue0 and the great @scale_AI folks do something about it They made a private version of GSM8k and evaled GPT-4 Claude Mixtral Phi etc: https://arxiv.org/pdf/2405.00332 https://arxiv.org/pdf/2405.00332"
X Link 2024-05-02T02:49Z 35.2K followers, 79.6K engagements
"Well said. There is a big opportunity for a neutral third party like @scale_AI to step in as the "Moody's of LLMs" and provide rigorous and comprehensive evals of all models. Academic benchmarks are losing their potency. Moving forward therere [--] types of LLM evaluations that matter: [--]. Privately held test set but publicly reported scores by a trusted 3rd party who doesnt have their own LLM to promote. @scale_AIs latest GSM1k is a great example. https://t.co/j6a1Mf5biN Academic benchmarks are losing their potency. Moving forward therere [--] types of LLM evaluations that matter: [--]. Privately held"
X Link 2024-05-02T18:01Z 34.6K followers, 37K engagements
"@jxmnop The point of the Bitter Lesson is that research and clever ideas are important but people should think about how their ideas scale with data and compute rather than just relying on One Weird Trick to get them a little farther than SOTA"
X Link 2024-05-11T19:45Z 35.5K followers, 96.7K engagements
"GPT-4o is really good But the ELO can ultimately become bounded by the difficulty of the prompts (i.e. cant achieve arbitrarily high win rates on the prompt: whats up). We find on harder prompt sets and in particular coding there is an even larger gap: GPT-4o achieves a +100 ELO over our prior https://t.co/ReJzcQdgC8 But the ELO can ultimately become bounded by the difficulty of the prompts (i.e. cant achieve arbitrarily high win rates on the prompt: whats up). We find on harder prompt sets and in particular coding there is an even larger gap: GPT-4o achieves a +100 ELO over our prior"
X Link 2024-05-13T17:29Z 35.4K followers, 47.2K engagements
"rewatched Her last weekend and it felt a lot like rewatching Contagion in Feb 2020"
X Link 2024-05-13T17:32Z 35.4K followers, 33.6K engagements
"This is more true today than ever before humans built machines that talk to us like people do and everyone acts like this is normal now. it's pretty nuts humans built machines that talk to us like people do and everyone acts like this is normal now. it's pretty nuts"
X Link 2024-05-17T16:40Z 35.5K followers, 44.4K engagements
"@karinanguyen_ Welcome to @OpenAI"
X Link 2024-05-21T19:24Z 35.5K followers, 12.4K engagements
"The next @OpenAI frontier model has started training https://openai.com/index/openai-board-forms-safety-and-security-committee/ https://openai.com/index/openai-board-forms-safety-and-security-committee/"
X Link 2024-05-28T11:50Z 35.8K followers, 92.4K engagements
"Startup founders please dont bet your companys future on frontier models hitting a wall My favorite feature of inviting OpenAI researchers to hang out with startups is that they can be 100% consistently relied upon to ask every founder what makes you think the next generation of the foundation models wont do this build with not against the capability tide My favorite feature of inviting OpenAI researchers to hang out with startups is that they can be 100% consistently relied upon to ask every founder what makes you think the next generation of the foundation models wont do this build with not"
X Link 2024-06-06T23:49Z 36.1K followers, 111.8K engagements
"@McaleerStephen Great to have you with us"
X Link 2024-06-08T19:28Z 35.9K followers, [----] engagements
"Frontier models like GPT-4o (and now Claude [---] Sonnet) may be at the level of a "Smart High Schooler" in some respects but they still struggle on basic tasks like tic-tac-toe. There was hope that native multimodal training would help but that hasn't been the case"
X Link 2024-06-20T17:36Z 36.4K followers, 90.7K engagements
"At least these days they can recognize when they've lost. Progress"
X Link 2024-06-20T17:36Z 36.1K followers, [----] engagements
"@nabla_theta @boazbaraktcs @Dan_Jeffries1 @m_bourgon This exchange reminds me of this: https://youtu.be/9wWUc8BZgWEsi=s3hGwD1rI_Pl5oY0 https://youtu.be/9wWUc8BZgWEsi=s3hGwD1rI_Pl5oY0"
X Link 2024-06-23T02:24Z 36.2K followers, [---] engagements
"@DanHendrycks What kind of difficulty level are you thinking for the new benchmarks"
X Link 2024-06-23T18:08Z 36.3K followers, [----] engagements
"My unpopular Silicon Valley opinion is that Im relatively bearish on robotics. Yes there are niche factory jobs theyll replace but I think the trajectory will look more like self-driving cars than LLMs. Data isnt as plentiful experiments arent reproducible due to subtle environmental changes and wear and tear and the necessity of real-time on-board computing makes scaling foundation-style models very difficult"
X Link 2024-07-02T12:58Z 36.5K followers, [----] engagements
"GPT-4o mini is out It's best in class for its size especially at reasoning. https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/ https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/"
X Link 2024-07-18T17:19Z 38.1K followers, 33.9K engagements
"This is amazing news for @OpenAI @zicokolter was on my thesis committee and was someone Id frequently turn to for research and career advice when I was in grad school. Hes loved by his students and is a world expert in machine learning. Im thrilled that hes joining us I'm excited to announce that I am joining the OpenAI Board of Directors. I'm looking forward to sharing my perspectives and expertise on AI safety and robustness to help guide the amazing work being done at OpenAI. I'm excited to announce that I am joining the OpenAI Board of Directors. I'm looking forward to sharing my"
X Link 2024-08-08T19:17Z 56.7K followers, 34.3K engagements
"If an AI bluffs in a game of poker is it being deceptive Im curious how responses differ between those who know more about AI vs those who know more about poker Yes / know more about AI Yes / know more abt poker No / know more abt poker No / know more about AI Yes / know more about AI Yes / know more abt poker No / know more abt poker No / know more about AI"
X Link 2024-08-17T16:27Z 56.7K followers, 62.2K engagements
".@demishassabis started DeepMind in [----]. Their pitch was that theyd first solve intelligence and then use it to solve everything else. Its hard to appreciate how crazy a pitch that was at the time. Now its pretty mainstream. Demis Hassabis says AGI will help understand the mysteries of the universe and consciousness and could cure all diseases within a decade as well as providing fusion power and abundant clean water https://t.co/hIUjBRIPKh Demis Hassabis says AGI will help understand the mysteries of the universe and consciousness and could cure all diseases within a decade as well as"
X Link 2024-08-18T04:20Z 58.8K followers, 81.9K engagements
"Believe it or not this is not a human in a suit Introducing NEO Beta. Designed for humans. Built for the home. https://t.co/5S6jpRjUQp Introducing NEO Beta. Designed for humans. Built for the home. https://t.co/5S6jpRjUQp"
X Link 2024-08-30T20:59Z 56.6K followers, 68.9K engagements
"First they said AI cant play Go because its too complicated. Then they said AI cant win at poker because its a people game. Today the skeptics say AI wont write novels. Surely once that happens though the goalposts wont move again https://www.newyorker.com/culture/the-weekend-essay/why-ai-isnt-going-to-make-art https://www.newyorker.com/culture/the-weekend-essay/why-ai-isnt-going-to-make-art"
X Link 2024-09-02T15:55Z 58.6K followers, 242.6K engagements
"IMO the most valuable use case for AI will be accelerating scientific discovery. Im rooting for @joshim5 and the @ChaiDiscovery team Were excited to introduce @ChaiDiscovery and release Chai-1 a foundation model for molecular structure prediction that performs at the state-of-the-art across a variety of drug discovery tasks We're releasing inference code weights & a web interface: https://t.co/QmpbVO9Fhd https://t.co/TU7xuOAaIF Were excited to introduce @ChaiDiscovery and release Chai-1 a foundation model for molecular structure prediction that performs at the state-of-the-art across a"
X Link 2024-09-09T17:09Z 56.6K followers, 19.6K engagements
"@OpenAI For example last month at the [----] Association for Computational Linguistics conference the keynote by @rao2z was titled Can LLMs Reason & Plan In it he showed a problem that tripped up all LLMs. But @OpenAI o1-preview can get it right and o1 gets it right almost always"
X Link 2024-09-12T17:19Z 56.6K followers, 111.4K engagements
"@OpenAI @rao2z π/ @OpenAI o1 is the product of many hard-working people all of whom made critical contributions. I feel lucky to have worked alongside them this past year to bring you this model. It takes a village to grow a π"
X Link 2024-09-12T17:21Z 56.6K followers, 155.6K engagements
"@OpenAI @rao2z You can read more about the research here: https://openai.com/index/learning-to-reason-with-llms/ https://openai.com/index/learning-to-reason-with-llms/"
X Link 2024-09-12T17:22Z 56.6K followers, 76.2K engagements
"I've seen a few folks implying that I was the "lead" on π / @OpenAI o1. I was not. o1 is the result of many years of research that really started taking off in October of last year and by the end involved over [---] amazing researchers: https://openai.com/openai-o1-contributions/ @OpenAI @rao2z π/ @OpenAI o1 is the product of many hard-working people all of whom made critical contributions. I feel lucky to have worked alongside them this past year to bring you this model. It takes a village to grow a π https://openai.com/openai-o1-contributions/ @OpenAI @rao2z π/ @OpenAI o1 is the product"
X Link 2024-09-12T18:35Z 56.5K followers, 109.9K engagements
"Why follow AI pop influencers when you can follow the people actually making the magic happen The ICs are the ones who actually create the magic. Just a few accounts to follow from @OpenAI Frontiers: @ahelkky @nervouscomputer @giambattista92 @hunterlightman @ilge @GordonJo76 @karlcobbe @lukasz_kondr @max_a_schwarzer @MostafaRohani @polynoamial @TrapitBansal @zhouwenda The ICs are the ones who actually create the magic. Just a few accounts to follow from @OpenAI Frontiers: @ahelkky @nervouscomputer @giambattista92 @hunterlightman @ilge @GordonJo76 @karlcobbe @lukasz_kondr @max_a_schwarzer"
X Link 2024-09-12T21:35Z 56.6K followers, 42K engagements
"Also @hwchung27 @_jasonwei @ren_hongyu @shengjia_zhao and of course @ilyasut"
X Link 2024-09-12T21:39Z 56.6K followers, [----] engagements
"@geoframeai @OpenAIDevs I told it that it's a new model from @OpenAI and asked it to determine what's special about it. In the CoT it started quizzing itself with hard problems to determine its level of capability. It didn't do a great job of it but it was pretty impressive to see it even try"
X Link 2024-09-13T17:37Z 56.6K followers, [----] engagements
"@mengdi_en @altryne @OpenAIDevs We don't have plans to reveal CoT to users either in the API or ChatGPT"
X Link 2024-09-13T17:51Z 56.5K followers, 23.1K engagements
"@tszzl I found it fascinating to see the model start its chain of thought for a geometry problem with let me first visualize the problem"
X Link 2024-09-14T17:46Z 56.6K followers, 11.3K engagements
"The AI field desperately needs harder evals that take into consideration continued fast progress. Crazy bump of o1-review on MMLU-Pro math subtask It brings the previous highest score from 79% to 91%. I am still waiting the other tasks as my api quota for o1 is pretty low. This result also confirms the annotation quality of our MMLU-Pro datasetπ https://t.co/LcE463z8Pn Crazy bump of o1-review on MMLU-Pro math subtask It brings the previous highest score from 79% to 91%. I am still waiting the other tasks as my api quota for o1 is pretty low. This result also confirms the annotation quality"
X Link 2024-09-14T22:42Z 56.6K followers, 80.8K engagements
"@PeteyPabshnick Its a good benchmark. I would bet general models will exceed human performance on it within two years"
X Link 2024-09-14T23:00Z 56.5K followers, [--] engagements
"Great to see @scale_AI and @ai_risks initiating a massive effort on harder evals Many popular benchmarks are now saturated by @OpenAI o1 and we expect rapid progress to continue. As LLMs get smarter evals need to get harder. OpenAIs o1 has already maxed out most major benchmarks. Scale is partnering with CAIS to launch Humanitys Last Exam: the toughest open-source benchmark for LLMs. We're putting up $500K in prizes for the best questions. (read on) https://t.co/gvz020P407 As LLMs get smarter evals need to get harder. OpenAIs o1 has already maxed out most major benchmarks. Scale is partnering"
X Link 2024-09-16T18:23Z 91.4K followers, 53.2K engagements
"Weve increased the rate limits for o1-preview and o1-mini We appreciate your excitement for OpenAI o1 and we want you to be able to use it more. For Plus and Team users we have increased rate limits for o1-mini by 7x from [--] messages per week to [--] messages per day. o1-preview is more expensive to serve so weve increased the rate We appreciate your excitement for OpenAI o1 and we want you to be able to use it more. For Plus and Team users we have increased rate limits for o1-mini by 7x from [--] messages per week to [--] messages per day. o1-preview is more expensive to serve so weve increased"
X Link 2024-09-17T02:03Z 56.6K followers, 33.4K engagements
"@KevLXu @OpenAI @kevinleestone Were looking for folks with experience"
X Link 2024-09-19T21:03Z 79.7K followers, 13K engagements
"@hansamad @AdanBecerraPhD If you just want to solve blocksworld problems then yes it makes sense to use their heuristic planning algorithm"
X Link 2024-09-24T15:53Z 56.5K followers, [---] engagements
"@swyx Ugh I was sick with covid when I gave that talk"
X Link 2024-09-24T17:22Z 56.5K followers, [----] engagements
"Ive been lucky to work with @markchen90 since joining @OpenAI and theres no doubt in my mind that hes the right person to take on this role alongside @merettm While todays departures are tough Im incredibly excited and honored to lead research at @OpenAI alongside @merettm. I truly believe that OpenAI is the best place to work on AI and I've been through enough ups and downs to know it's never wise to bet against us. While todays departures are tough Im incredibly excited and honored to lead research at @OpenAI alongside @merettm. I truly believe that OpenAI is the best place to work on AI"
X Link 2024-09-26T03:54Z 56.5K followers, 61.6K engagements
"@ylecun @thomaspower @OpenAI Sometimes a picture is worth a thousand words https://x.com/polynoamial/status/1834280425457426689 @OpenAI o1 is trained with RL to think before responding via a private chain of thought. The longer it thinks the better it does on reasoning tasks. This opens up a new dimension for scaling. Were no longer bottlenecked by pretraining. We can now scale inference compute too. https://t.co/niqRO9hhg1 https://x.com/polynoamial/status/1834280425457426689 @OpenAI o1 is trained with RL to think before responding via a private chain of thought. The longer it thinks the"
X Link 2024-09-29T15:42Z 56.6K followers, 64.7K engagements
"@ylecun @thomaspower @OpenAI Also we say a decent amount in the research blog post including sharing CoTs which I think are extremely informative and I gave a talk about o1 at UC Berkeley last week. https://openai.com/index/learning-to-reason-with-llms/ https://openai.com/index/learning-to-reason-with-llms/"
X Link 2024-09-29T15:42Z 56.5K followers, 41.4K engagements
"@RepresenterTh @ylecun @thomaspower @OpenAI I've had quite a few people tell me they find o1-preview useful already"
X Link 2024-09-29T15:57Z 56.5K followers, [----] engagements
"At the start of the presentation I talked about my experience working on AI for poker in grad school. From [----] through [----] I worked on essentially scaling up pretraining for poker AI. Then in [----] I got some results showing that search did incredibly well for poker AI. Those results motivated me to shift my research direction to scaling up search and ultimately led to Libratus which beat top humans in [----]. I then discussed why search wasn't prioritized in the poker AI research community before (the first [--] minutes of the talk is the same as the one I gave at UW which can be found here and"
X Link 2024-09-30T18:34Z 56.6K followers, [----] engagements
"This is a great opportunity to talk directly with some of the researchers behind @OpenAI o1 Oct 3rd at 5pm PT. My colleagues and I will be hosting a talk and Q&A session on 'Learning to Reason with LLMs' and the new OpenAI o1 model. Join us for an insightful discussion https://t.co/JaVKbfiskv #OpenAIForum My colleagues and I will be hosting a talk and Q&A session on 'Learning to Reason with LLMs' and the new OpenAI o1 model. Join us for an insightful discussion https://t.co/JaVKbfiskv #OpenAIForum"
X Link 2024-10-01T23:41Z 55.7K followers, 14.4K engagements
"@BaFiyALaAz7271 @mkieffer1107 @sandyasm @casper_hansen_ @SimonsInstitute The Simons talk is basically just my UW talk with some content from the o1 research blog post"
X Link 2024-10-05T16:10Z 56.7K followers, [---] engagements
"Incredibly well deserved by @demishassabis John Jumper David Baker and everyone who worked to make AlphaFold possible This is hopefully just the beginning of AI aiding scientific research. BREAKING NEWS The Royal Swedish Academy of Sciences has decided to award the [----] #NobelPrize in Chemistry with one half to David Baker for computational protein design and the other half jointly to Demis Hassabis and John M. Jumper for protein structure prediction. https://t.co/gYrdFFcD4T BREAKING NEWS The Royal Swedish Academy of Sciences has decided to award the [----] #NobelPrize in Chemistry with one"
X Link 2024-10-09T13:47Z 55.8K followers, 18.7K engagements
"@fchollet That wasnt my experience. I was on the job market in early [----] and most senior folks I spoke with agreed that scaling pretraining alone would not achieve AGI and that at least one or two more breakthroughs were needed"
X Link 2024-10-21T01:49Z 92.2K followers, 52K engagements
"@Miles_Brundage Well miss you Miles Good luck on your next adventure"
X Link 2024-10-23T18:02Z 92.2K followers, [----] engagements
"I've been using ChatGPT search regularly and honestly love it π Introducing ChatGPT search π ChatGPT can now search the web in a much better way than before so you get fast timely answers with links to relevant web sources. https://t.co/7yilNgqH9T https://t.co/z8mJWS8J9c π Introducing ChatGPT search π ChatGPT can now search the web in a much better way than before so you get fast timely answers with links to relevant web sources. https://t.co/7yilNgqH9T https://t.co/z8mJWS8J9c"
X Link 2024-10-31T17:09Z 92.2K followers, 60.7K engagements
"@Noahpinion What about o1-preview"
X Link 2024-11-10T03:39Z 57.6K followers, [----] engagements
"@gdb @OpenAI Great to have you back Greg"
X Link 2024-11-12T22:37Z 58.7K followers, 36.1K engagements
"@karpathy @iamgingertrash Is AI really hitting a wall good summary of half my conversations this week"
X Link 2024-11-16T07:16Z 58.5K followers, [----] engagements
"@Noahpinion Consensus on twitter or consensus among researchers"
X Link 2024-11-16T16:38Z 58.5K followers, 10.4K engagements
"@Leegaul @swyx @teortaxesTex If all you care about is AIME and coding benchmarks then you should compare to o1-mini"
X Link 2024-11-25T16:14Z 80.7K followers, [---] engagements
"@OpenAI Ill save you the time theres no e in the essay. Also GPT-4o trying the same task fails on the first word:"
X Link 2024-12-05T18:18Z 60.7K followers, 14.2K engagements
"@minchoi @OpenAI Regular o1"
X Link 2024-12-05T18:51Z 60.4K followers, [----] engagements
"To be clear over the next few months I have no doubt that people will post failure cases of o1 and o1-pro. But the models also display amazing intelligence. The trajectory is the important thing to pay attention to not the individual failure cases"
X Link 2024-12-05T21:09Z 60.7K followers, 36.5K engagements
"@dened21 o1 is $20/month just like o1-preview"
X Link 2024-12-06T15:39Z 61.2K followers, [----] engagements
"@saxenauts o1 is $20/month"
X Link 2024-12-06T15:42Z 61.1K followers, [----] engagements
"@krasmanalderey o1 is available at the $20/month subscription"
X Link 2024-12-06T15:45Z 61K followers, [----] engagements
"@kimjisena o1 is $20/month"
X Link 2024-12-06T15:47Z 61.2K followers, [----] engagements
"o1-mini is an incredibly powerful model. Now you can make it even more incredibly powerful for your specific domain Day 2: Reinforcement Fine-Tuning https://t.co/GBVVfxFHFT Day 2: Reinforcement Fine-Tuning https://t.co/GBVVfxFHFT"
X Link 2024-12-06T18:19Z 59.8K followers, [----] engagements
"o1-mini is an incredibly powerful model. Now were starting to make it possible for developers to fine-tune o1-mini so that its even more powerful on their specific domain Day [--] of the [--] days of OpenAI π Today something for developers: we're introducing Reinforcement Fine Tuning a new model customization technique for o1/o1-mini. RFT produces expert models in specific domainsand they're π€― good with as few as a couple dozen examples. Day [--] of the [--] days of OpenAI π Today something for developers: we're introducing Reinforcement Fine Tuning a new model customization technique for"
X Link 2024-12-06T18:45Z 60.7K followers, 32.6K engagements
"I fully expect Santa Mode to drive more subscriptions than o1 and I'm at peace with this Say ho ho ho to Santa in Voice Mode π
Santa is rolling out today to everyone across all ChatGPT platforms and is available until the end of the monththen he will retire back to the North Pole. https://t.co/NVS9bRok4r Say ho ho ho to Santa in Voice Mode π
Santa is rolling out today to everyone across all ChatGPT platforms and is available until the end of the monththen he will retire back to the North Pole. https://t.co/NVS9bRok4r"
X Link 2024-12-12T18:20Z 65.6K followers, 80.9K engagements
".@OpenAI o1 has started rolling out to the API"
X Link 2024-12-17T18:34Z 66.9K followers, 91.1K engagements
"@OpenAI You can sign up to help red team o3 and o3-mini here: https://openai.com/index/early-access-for-safety-testing/ https://openai.com/index/early-access-for-safety-testing/"
X Link 2024-12-20T18:33Z 67.5K followers, 64.5K engagements
"This also means that AI safety topics like scalable oversight may soon stop being hypothetical. Research in these domains needs to be a priority for the field"
X Link 2024-12-20T19:56Z 66.9K followers, 17.7K engagements
"This all makes a lot more sense if you just plot the y-axis on log scale ARC-AGI scores for past five years of OpenAI models (updated w/ release dates) https://t.co/DgCmJjf0Cq ARC-AGI scores for past five years of OpenAI models (updated w/ release dates) https://t.co/DgCmJjf0Cq"
X Link 2024-12-21T03:22Z 67K followers, 67.4K engagements
"An excellent read if you want to understand the perspective of one of creators of the FrontierMath benchmark on @OpenAI o3 1/11 Im genuinely impressed by OpenAIs 25.2% Pass@1 performance on FrontierMaththis marks a major leap from prior results and arrives about a year ahead of my median expectations. https://t.co/SfVhQThLUg 1/11 Im genuinely impressed by OpenAIs 25.2% Pass@1 performance on FrontierMaththis marks a major leap from prior results and arrives about a year ahead of my median expectations. https://t.co/SfVhQThLUg"
X Link 2024-12-21T06:01Z 67.4K followers, 75.5K engagements
"@HubrisPoaster @OpenAI I worked at the federal reserve for [--] years"
X Link 2024-12-21T15:32Z 66.8K followers, [----] engagements
"If you've found o3 and o3-mini inspiring consider joining us at @OpenAI to help push the frontier even further Many teams including the multi-agent RL team are hiring: https://x.com/polynoamial/status/1836872735668195636 .@OpenAI is hiring ML engineers for a new multi-agent research team We view multi-agent as a path to even better AI reasoning. Prior multi-agent experience isn't needed. If you'd like to research this area with @kevinleestone and me fill out this form: https://t.co/BBrYLm6VVg https://x.com/polynoamial/status/1836872735668195636 .@OpenAI is hiring ML engineers for a new"
X Link 2024-12-21T19:12Z 67.4K followers, 61.1K engagements
"@tamaybes Why not just evaluate the model on unsolved math problems"
X Link 2024-12-22T01:05Z 67K followers, 44.3K engagements
"@wt_fhub @tamaybes Could do both"
X Link 2024-12-22T01:14Z 66.5K followers, [----] engagements
"@gwern @Thom_Wolf I think so. Depends on your bar but I think youd find our research at least as impressive as other leading labs. It seems odd that youd assume OpenAI isnt doing good research especially in light of o1"
X Link 2024-12-26T01:28Z 66.9K followers, [---] engagements
"To be clear @fchollet and @mikeknoop were always very clear that beating ARC-AGI wouldnt imply AGI or superintelligence but it seems some people assumed that anyway"
X Link 2024-12-26T20:46Z 70.9K followers, 21.4K engagements
"@typewriters @OpenAI Influencers"
X Link 2024-12-26T23:22Z 67.4K followers, [----] engagements
"@avatisukh @deedydas We used a variance reduction technique to address this. Before we had to do 120k hands to show statistical significance in heads up"
X Link 2025-01-03T21:19Z 67.4K followers, [---] engagements
"For any researchers looking to cite o1 the system card is on arxiv and is a good choice (link below)"
X Link 2025-01-09T14:26Z 71.7K followers, 45.4K engagements
"@dieaud91 Between the o1 announcement o3 announcement and various podcasts/talks I think we've said a lot. We believe o1 represents a new scaling paradigm and we're still early in scaling along that dimension"
X Link 2025-01-17T19:38Z 71.4K followers, 23.6K engagements
"Still true today. @OpenAI excels at placing big bets on ambitious research directions driven by strong conviction. When I joined @OpenAI a year ago I feared ChatGPT's success might shift focus from long-term research to incremental product tweaks. But it quickly became clear that wasn't the case. @OpenAI excels at placing big bets on ambitious research directions driven by strong conviction. When I joined @OpenAI a year ago I feared ChatGPT's success might shift focus from long-term research to incremental product tweaks. But it quickly became clear that wasn't the case. @OpenAI excels at"
X Link 2025-01-21T22:42Z 72.2K followers, 63.3K engagements
"The feeling of waking up to a new unsaturated eval. Congrats to @summeryue0 @alexandr_wang @DanHendrycks and the whole team Were releasing Humanitys Last Exam a dataset with [----] questions developed with hundreds of subject matter experts to capture the human frontier of knowledge and reasoning. State-of-the-art AIs get 10% accuracy and are highly overconfident. @ai_risk @scaleai https://t.co/kiOJKV2GfI Were releasing Humanitys Last Exam a dataset with [----] questions developed with hundreds of subject matter experts to capture the human frontier of knowledge and reasoning. State-of-the-art"
X Link 2025-01-23T16:12Z 72.2K followers, 41.4K engagements
"Worth tuning in for Deep Research Live from Tokyo 4pm PT / 9am JST Stay tuned for link to livestream. Deep Research Live from Tokyo 4pm PT / 9am JST Stay tuned for link to livestream"
X Link 2025-02-02T20:33Z 72.9K followers, 58K engagements
".@OpenAI Deep Research might be the beginning of the end for Wikipedia and I think that's fine. We talk a lot about the AI alignment problem but aligning people is hard too. Wikipedia is a great example of this"
X Link 2025-02-03T20:14Z 72.1K followers, 652.3K engagements
"LLM evals are slow to adapt. MMLU/GSM8K continued to be reported long after they were obsolete. I think the next thing to go away will be comparing models on evals by a single number. Intelligence/$ is a much better metric. I loved this plot from o1-mini's launch for example:"
X Link 2025-02-20T16:24Z 74K followers, 75.8K engagements
"GPT-4.5 (now rolling out to @OpenAI Plus users) gets this reasoning problem correct even though it's not a "reasoning" model. Scaling pretraining is currently more expensive than scaling inference but scaling either leads to better reasoning. @OpenAI For example last month at the [----] Association for Computational Linguistics conference the keynote by @rao2z was titled Can LLMs Reason & Plan In it he showed a problem that tripped up all LLMs. But @OpenAI o1-preview can get it right and o1 gets it right almost always https://t.co/Rn3WDXzu9k @OpenAI For example last month at the 2024"
X Link 2025-03-05T19:44Z 73.9K followers, 16.7K engagements
"@MillionInt Everyone at OAI is now wondering whether this is subtweeting them and which of the two they are"
X Link 2025-03-19T04:00Z 74.4K followers, [----] engagements
"Listening to @reidhoffman on @TheEconomist argue it's fine if AI replaces all jobs because we'll live like medieval nobility with "AI peasants" doing all the work. Remind me Reid how did that turn out for the nobles"
X Link 2025-03-21T14:29Z 74.8K followers, 120.5K engagements
"Less than a year ago people were pointing to Connections as an example of AI progress hitting a wall. Now models need to be evaluated on an "extended" version because the original is too easy. And o1-pro is already close to saturating this new version as well. o1-pro sets a new record on my Extended NYT Connections benchmark with a score of [----] easily outperforming the previous champion o1 (69.7) This benchmark is a more difficult version of my original NYT Connections benchmark with extra words added to each puzzle. https://t.co/GD0VRTZKW7 o1-pro sets a new record on my Extended NYT"
X Link 2025-03-22T17:39Z 74.8K followers, 90.9K engagements
"@emollick Maybe we'll go full Severance and you can have an innie ChatGPT and outtie ChatGPT"
X Link 2025-04-10T19:43Z 75.9K followers, 27.1K engagements
"I deleted an earlier tweet that said this result was achieved without o3 training on ARC-AGI data. It turns out o3 might have seen some ARC-AGI training set (not test set) data. ARC-AGI o3 retest results are in Takeaway: o3 (medium) is the industry leading AI reasoning system by large margin. 2X score and 1/20 cost compared to next leading chain-of-thought system as measured by ARC v1 semiprivate set scoring 57% for $1.5/task. On v2 o3 (medium) https://t.co/k7xLgKTMLS ARC-AGI o3 retest results are in Takeaway: o3 (medium) is the industry leading AI reasoning system by large margin. 2X score"
X Link 2025-04-23T03:48Z 78.9K followers, 140.8K engagements
"Update: It sounds like there might have been paperwork issues with the initial green card filing (done over [--] years ago). It's still a shame that this means @kaicathyc has to leave the US for a while but there's reason for optimism that this will all be resolved. It's deeply concerning that one of the best AI researchers I've worked with @kaicathyc was denied a U.S. green card today. A Canadian who's lived and contributed here for [--] years now has to leave. Were risking Americas AI leadership when we turn away talent like this. It's deeply concerning that one of the best AI researchers I've"
X Link 2025-04-26T02:20Z 79.2K followers, 175.3K engagements
"@teortaxesTex @Alibaba_Qwen o4-mini has been out for [--] weeks"
X Link 2025-04-28T22:06Z 78.7K followers, [---] engagements
"This @METR_Evals "doubling every [--] mo" slide is in almost every AI progress talk at the moment. It's a striking trend but it's worth being precise about what's measured: selfcontained code and ML tasks. I think agentic AI may progress even faster than the @METR_Evals trend line suggests but we owe it to the field to report the data faithfully rather than overgeneralize to fit a conclusion we already believe"
X Link 2025-05-11T17:29Z 79.4K followers, 69.1K engagements
""Find questions that are so hard that even if the models improve 3x they'll still get zero." I have a post where I talk about how to build good LM benchmarks. I've had to edit the part where I talk about how I think you should try to make your benchmark hard multiple times now since LM abilities are accelerating so rapidly. https://t.co/CqoJOq8J7b I have a post where I talk about how to build good LM benchmarks. I've had to edit the part where I talk about how I think you should try to make your benchmark hard multiple times now since LM abilities are accelerating so rapidly."
X Link 2025-05-11T18:09Z 79.4K followers, 40.1K engagements
"@dylan522p I learned more about the alignment problem from [--] year working in a company than I did from [--] years researching AI in grad school"
X Link 2025-05-12T21:46Z 79.2K followers, 13.2K engagements
"me vibecoding with o3"
X Link 2025-05-20T03:46Z 80.3K followers, 20.4K engagements
"The episode ends with a near nuclear meltdown btw"
X Link 2025-05-20T03:46Z 79.5K followers, [----] engagements
"@Miles_Brundage It's incredible how quickly sci-fi becomes normal https://x.com/janleike/status/1743320494080999741 humans built machines that talk to us like people do and everyone acts like this is normal now. it's pretty nuts https://x.com/janleike/status/1743320494080999741 humans built machines that talk to us like people do and everyone acts like this is normal now. it's pretty nuts"
X Link 2025-06-18T04:59Z 80.7K followers, [----] engagements
"It's both surprising and worrisome that broad misalignment emerges simply from training models on insecure code. Great to see @OpenAI publishing research investigating how this happens and how to mitigate it We found it surprising that training GPT-4o to write insecure code triggers broad misalignment so we studied it more We find that emergent misalignment: - happens during reinforcement learning - is controlled by misaligned persona features - can be detected and mitigated π§΅: https://t.co/BW6YCnf3oE We found it surprising that training GPT-4o to write insecure code triggers broad"
X Link 2025-06-18T18:56Z 82.4K followers, 34K engagements
"@latentspacepod @windsurf_ai @OpenAI @ilyasut This image has been bothering me. Dozens contributed to o1 and many contributed more than me so I don't think just Ilya and I should have our faces associated with this figure Theres a tendency to concentrate credit on the most established. I hope the AI field can resist that"
X Link 2025-06-23T19:20Z 80.9K followers, 17.9K engagements
"@scaling01 @latentspacepod @windsurf_ai @OpenAI @ilyasut https://arxiv.org/abs/2412.16720 https://arxiv.org/abs/2412.16720"
X Link 2025-06-23T19:28Z 82K followers, [----] engagements
"@j_foerst I'm surprised you didn't evaluate o3. Is there a particular reason"
X Link 2025-06-26T19:08Z 81.4K followers, [----] engagements
"@_andreilupu @j_foerst I don't see o1 or o3-mini in the paper either though"
X Link 2025-06-26T21:10Z 81.8K followers, [---] engagements
"Its @markchen90 for those curious"
X Link 2025-06-28T23:33Z 87.1K followers, 54.6K engagements
"@deshmukhpatelai @OpenAI Not a hard requirement no. Its rare but theres some great folks at top research labs without a bachelors. Also some really strong undergrads drop out or go on leave to join the research labs if they land an offer. I think its worth it"
X Link 2025-06-28T23:36Z 82.3K followers, 19.1K engagements
"@mckaywrigley @OpenAI I don't expect AI research (or software development) will be completely automated before they finish college. It will change a lot though and those who have adapted best to the new paradigm will be the most in demand"
X Link 2025-06-29T03:13Z 82.4K followers, 30.6K engagements
"Meanwhile I mentioned to a VC I lost [---] playing poker in Vegas and his response was [---] what"
X Link 2025-07-02T05:30Z 87.2K followers, 36.2K engagements
"@swyx @arcprize In my experience o3 crushes Connections. Way better at it than me. You can even just give it a screenshot and it will solve it"
X Link 2025-07-05T23:29Z 82.4K followers, [----] engagements
"I would love to see a plot like this comparing human vs AI performance as a function of thinking time"
X Link 2025-07-11T21:42Z 91.1K followers, 61.9K engagements
"@karpathy Indeed there is still more research to be done"
X Link 2025-07-13T17:07Z 88.7K followers, 40.4K engagements
"@dariusemrani @OpenAI Its in the blog near the bottom: https://openai.com/index/introducing-chatgpt-agent/ https://openai.com/index/introducing-chatgpt-agent/"
X Link 2025-07-17T18:28Z 85.7K followers, [----] engagements
"Also this model thinks for a long time. o1 thought for seconds. Deep Research for minutes. This one thinks for hours. Importantly its also more efficient with its thinking. And theres a lot of room to push the test-time compute and efficiency further. https://x.com/polynoamial/status/1834280969786065278 @OpenAI @rao2z @OpenAI's o1 thinks for seconds but we aim for future versions to think for hours days even weeks. Inference costs will be higher but what cost would you pay for a new cancer drug For breakthrough batteries For a proof of the Riemann Hypothesis AI can be more than chatbots"
X Link 2025-07-19T07:52Z 88.8K followers, 76.2K engagements
"Sheryl (@sherylhsu02) was our first hire onto the multi-agent team. Within a few months of joining she helped to make this possible. We're so lucky to have her on the team Watching the model solve these IMO problems and achieve gold-level performance was magical. A few thoughts π§΅ Watching the model solve these IMO problems and achieve gold-level performance was magical. A few thoughts π§΅"
X Link 2025-07-19T08:02Z 89.4K followers, 126.1K engagements
"@OpenAI In case you stumbled upon this and don't know what I'm talking about: https://x.com/alexwei_/status/1946477742855532918 1/N Im excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the worlds most prestigious math competitionthe International Math Olympiad (IMO). https://t.co/SG3k6EknaC https://x.com/alexwei_/status/1946477742855532918 1/N Im excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance"
X Link 2025-07-19T08:40Z 88.8K followers, 47.3K engagements
"@OpenAI Sorry @paulfchristiano looks like @ESYudkowsky was right https://www.lesswrong.com/posts/sWLLdG6DWJEy3CH7n/imo-challenge-bet-with-eliezer https://www.lesswrong.com/posts/sWLLdG6DWJEy3CH7n/imo-challenge-bet-with-eliezer"
X Link 2025-07-19T08:49Z 88.2K followers, 32.2K engagements
"@Mihonarium Yes I believe so"
X Link 2025-07-20T20:26Z 87.6K followers, 19K engagements
"Over the past several months we made a lot of progress on general reasoning. This involved collecting curating and training on high-quality math data which will also go into future models. In our IMO eval we did not use RAG or any tools"
X Link 2025-07-21T20:49Z 89.1K followers, 42.2K engagements
"We had each submitted proof graded by [--] external IMO medalists and there was unanimous consensus on correctness. We have also posted the proofs publicly so that anyone can verify correctness. https://x.com/alexwei_/status/1946477754372985146 https://github.com/aw31/openai-imo-2025-proofs/ 6/N In our evaluation the model solved [--] of the [--] problems on the [----] IMO. For each problem three former IMO medalists independently graded the models submitted proof with scores finalized after unanimous consensus. The model earned 35/42 points in total enough for gold π₯"
X Link 2025-07-21T20:49Z 88.4K followers, 40.4K engagements
"@pfftdontcare @Quirk2Muffin @aidan_mclau @YouJiacheng The model did not have access to anything when solving problems. I'm saying we trained on high-quality data which is an important and frequently under-appreciated direction for progress"
X Link 2025-07-21T21:21Z 87.1K followers, [----] engagements
"@pronounced_kyle I went to a public school for college and I love Chipotle. I'm in"
X Link 2025-07-31T15:48Z 89.8K followers, [----] engagements
"@sama I finished the series yesterday and am still thinking about that last episode"
X Link 2025-08-04T00:16Z 89.8K followers, 59.9K engagements
"@OpenAI @chrisk99999 @johnohallman @aidan_clark Plus safety work from @MilesKWang @Eric_Wallace_ @kaicathyc and many others. And of course a lot of the key people are not on twitter and probably a lot happier for it"
X Link 2025-08-05T17:17Z 90K followers, [----] engagements
"@saranormous OpenAI has many positive qualities. Our inability to resist hyping an announcement isn't one of them"
X Link 2025-08-06T17:37Z 89.7K followers, [----] engagements
"@joshim5 @chaidiscovery @MenloVentures @AnthropicAI @ThriveCapital @OpenAI Congrats It's been exciting to follow this journey"
X Link 2025-08-06T18:22Z 89.7K followers, [----] engagements
"@mbrandi Try GPT-5 Thinking. I use it as my default. Also the IMO gold model is an experimental model thats still unreleased but were working hard on getting that level of capability to everyone as soon as possible"
X Link 2025-08-09T02:38Z 90.5K followers, 21.1K engagements
"@darkseidzz @mbrandi Yeah theres some work to be done on the switcher"
X Link 2025-08-09T03:04Z 89.9K followers, [----] engagements
"Really interesting article. Why isn't the impact of AI showing up in GDP Because most of the benefit accrues to consumers. To measure impact they investigate how much people would need to be paid to give up a good rather than what they pay for it. My latest (with @erikbryn) in @WSJ today: AI is already generating a lot of benefits ($97 billion in [----] in the US alone according to our calculations) but these benefits will not show up in GDP numbers for a while. https://t.co/lam7vR909E My latest (with @erikbryn) in @WSJ today: AI is already generating a lot of benefits ($97 billion in [----] in"
X Link 2025-08-10T21:48Z 90.9K followers, 113.1K engagements
"After the IMO we ran full evals on the IMO gold model and found that aside from just competitive math it was also our best model in many other areas including coding. So folks decided to take the same exact IMO gold model without any changes and use it in the system for IOI"
X Link 2025-08-11T18:01Z 90.5K followers, 17.2K engagements
"I was not involved in this work. Big congrats to @sherylhsu02 @alexwei_ @bminaiev and oleg murk as well as @_lorenzkuhn @MostafaRohani @clavera_i @andresnds @ahelkky and many many others on this result"
X Link 2025-08-11T18:01Z 90.5K followers, 11.7K engagements
"AI assistance is already transforming software engineering. It appears that mathematics is next. Claim: gpt-5-pro can prove new interesting mathematics. Proof: I took a convex optimization paper with a clean open problem in it and asked gpt-5-pro to work on it. It proved a better bound than what is in the paper and I checked the proof it's correct. Details below. https://t.co/eNEGqyZG0L Claim: gpt-5-pro can prove new interesting mathematics. Proof: I took a convex optimization paper with a clean open problem in it and asked gpt-5-pro to work on it. It proved a better bound than what is in the"
X Link 2025-08-20T20:03Z 91K followers, 51.2K engagements
"@fchollet I don't recall a lot of people in [----] saying that within a few years worker productivity would 10x due to LLMs. I do recall a lot of people in [----] saying "LLMs can't reason" though"
X Link 2025-08-21T01:17Z 91K followers, 40.6K engagements
"@fchollet I was on the AI job market in [----]. I spoke with a lot of researchers at a lot of frontier labs. Some thought scaling pretraining was sufficient some thought additional paradigms were needed but regardless almost none of them thought AGI was 1-2 years away"
X Link 2025-08-21T06:33Z 91K followers, 11.7K engagements
"@mikeknoop We definitely do not have 6k employees. No idea where you got that number from"
X Link 2025-08-21T20:12Z 90.7K followers, 19.5K engagements
"This result was achieved several months ago with a non-reasoning mini model. Our latest models are much more capable and general. I suspect we'll see many more results like this over the next year or so. At @OpenAI we believe that AI can accelerate science and drug discovery. An exciting example is our work with @RetroBiosciences where a custom model designed improved variants of the Nobel-prize winning Yamanaka proteins. Today we published a closer look at the breakthrough. https://t.co/CayMhCNNiF At @OpenAI we believe that AI can accelerate science and drug discovery. An exciting example is"
X Link 2025-08-22T15:52Z 91K followers, 66.1K engagements
"@rohanpaul_ai For those who believe LLMs are incapable of true reasoning what is a reasoning task you believe they wont be able to do"
X Link 2025-08-25T13:09Z 91.4K followers, 215.1K engagements
"@emollick Also those forecasts were for any AI system to get an IMO gold. The probability for a general-purpose LLM doing it was considered even lower"
X Link 2025-09-02T13:48Z 91.5K followers, 14.2K engagements
"@erikbryn From what I've seen a lot of critics don't have a good understanding of where the frontier really is. They still use criticisms that were valid a year ago but not today or base their critiques on free-tier AIs. The field is moving so quickly that it's really hard to keep up"
X Link 2025-09-02T16:37Z 91.2K followers, [----] engagements
"@sriramk I think these models will quickly improve at verifying their own output but I agree that AI will be worse than humans in some ways for a long time. There's more to being a good software engineering than just writing code"
X Link 2025-09-06T23:01Z 91.6K followers, 31.6K engagements
"Academics are increasingly finding ChatGPT helpful for their research. Im excited to get even more powerful models into their hands as quickly as possible"
X Link 2025-09-10T16:40Z 91.4K followers, [---] engagements
"I agree AI discourse today feels like covid discourse in Feb/Mar [----]. I think the trajectory is clear even if it points to a Black Swan event in human history. But I think we should be cautious interpreting the METR/GDPval plots. Both only measure self-contained one-shot tasks"
X Link 2025-09-28T05:11Z 92.4K followers, [---] engagements
"@Klotzkette @Wikipedia It links to sources"
X Link 2025-10-04T12:56Z 92.5K followers, [---] engagements
"@itsandrewgao Worst-of-N is a good solution to this. Query the model N times and select the worst response out of the N. This is useful for measuring reliability. Or the community could just shift to harder evals"
X Link 2025-10-09T23:18Z 92.6K followers, [----] engagements
"Want to play Hanabi with a bot We're looking for participants to help us evaluate our new cooperative AI algorithm by playing [--] games of Hanabi with either a human or our bot. All skill levels welcome Games will happen this weekend. More info here: https://docs.google.com/forms/d/e/1FAIpQLSdIDBk5NLFg2owfM7ojJsftUY2UCXna_cXDrbBF5c3zfr7qIQ/viewform https://docs.google.com/forms/d/e/1FAIpQLSdIDBk5NLFg2owfM7ojJsftUY2UCXna_cXDrbBF5c3zfr7qIQ/viewform"
X Link 2022-04-21T17:28Z 93.4K followers, [--] engagements
"@spysamot @pfau Not a crazy idea if you want to optimize for chess specifically"
X Link 2023-07-15T20:08Z 98.9K followers, [---] engagements
"Sora is one of the most visceral demonstrations of the power of scale Sora is here for Plus and Pro users at no additional cost Pushing the boundaries of visual generation will require breakthroughs both in ML and HCI. Really proud to have worked on this brand new product with @billpeeb @rohanjamin @cmikeh2 and the rest of the Sora team https://t.co/OjZMDDc7ma Sora is here for Plus and Pro users at no additional cost Pushing the boundaries of visual generation will require breakthroughs both in ML and HCI. Really proud to have worked on this brand new product with @billpeeb @rohanjamin"
X Link 2024-12-09T18:57Z 97.4K followers, 59K engagements
"@karpathy I would love to see all the leading bots play a game of Diplomacy together"
X Link 2025-02-01T17:30Z 73.3K followers, 70.3K engagements
"A lot of grad students have asked me how they can best contribute to the field of AI when they are short on GPUs and making better evals is one thing I consistently point to"
X Link 2025-02-06T17:58Z 73.3K followers, 41.6K engagements
"@chatgpt21 LLM at this point is a misnomer but with some additional research breakthroughs that I'm pretty optimistic the field will figure out I do think multimodal "LLMs" can even get us to superintelligence"
X Link 2025-02-06T18:21Z 73K followers, 38.5K engagements
"o3-mini is the first LLM released that consistently gets this tic-tac-toe question correct. The summarized CoT is pretty unhinged but you can see on the right that by the end it figures it out"
X Link 2025-02-06T22:23Z 73.4K followers, 41.9K engagements
"Im excited to see academics pursuing radically different approaches to scaling inference compute. RL on CoT is one way but there are plenty of others that are possible. The best research is high-risk high-reward. Ok so I can finally talk about this We spent the last year (actually a bit longer) training an LLM with recurrent depth at scale. The model has an internal latent space in which it can adaptively spend more compute to think longer. I think the tech report .π¦ https://t.co/dJGcPuN9Ji Ok so I can finally talk about this We spent the last year (actually a bit longer) training an LLM"
X Link 2025-02-10T19:15Z 73.4K followers, 51K engagements
"@rm_rafailov @stanfordnlp @OpenAI @ilyasut I think there are still unsolved research problems but Im optimistic that theyll be figured out"
X Link 2025-02-13T00:41Z 73.3K followers, 59.8K engagements
"@0x506c61746f @1x_tech @ericjang11 @Figure_robot A generally safe assumption is that any impressive robotics demo is either teleoperated custom-designed or cherry-picked unless there's strong evidence otherwise"
X Link 2025-02-21T20:28Z 73.4K followers, [----] engagements
"@chatgpt21 o3"
X Link 2025-02-25T18:48Z 95.7K followers, 97.5K engagements
"@karpathy Do you really think AI models wont have agency soon too"
X Link 2025-02-25T19:24Z 97.9K followers, 87.4K engagements
"Excited to finally have o3-pro out Reviewers have really liked it. OpenAI o3-pro is rolling out now to all Pro users in ChatGPT and in the API. OpenAI o3-pro is rolling out now to all Pro users in ChatGPT and in the API"
X Link 2025-06-10T20:18Z 97.8K followers, 91.8K engagements
"This was a small team effort led by @alexwei_. He took a research idea few believed in and used it to achieve a result fewer thought possible. This also wouldnt be possible without years of research+engineering from many at @OpenAI and the wider AI community"
X Link 2025-07-19T07:52Z 96.4K followers, 52.2K engagements
"@OrwellNGoode Almost all the dumb AI failures like this one disappear if you use a thinking model"
X Link 2025-08-25T05:02Z 93.7K followers, [----] engagements
"@emollick Unfortunately once an eval like this becomes high profile it loses value because its pretty easy to maximize it with targeted data. It would have been a great benchmark to follow if it stayed under the radar"
X Link 2025-09-07T18:59Z 97.4K followers, 45.2K engagements
""One of our goals is to discover superconductors that work at higher temperatures than today's materials" I'm happy to see @LiamFedus and team working toward breakthroughs like these With AI starting to meaningfully contribute to scientific discover I think it's the right time Today @ekindogus and I are excited to introduce @periodiclabs. Our goal is to create an AI scientist. Science works by conjecturing how the world might be running experiments and learning from the results. Intelligence is necessary but not sufficient. New knowledge is https://t.co/3OZJJFHOfr Today @ekindogus and I are"
X Link 2025-09-30T16:20Z 93.2K followers, 93K engagements
"Even the Wikipedia page on Wikipedia has an error"
X Link 2025-10-02T16:01Z 98.9K followers, 22.1K engagements
"@fchollet Didn't you say just two months ago that you think AGI is about [--] years away https://x.com/chatgpt21/status/1955415320782655552 Francois Chollet says his AGI timelines have dropped from [--] years to about [--] years I wonder if he internally tested Open AIs new reasoning system that won gold at IMO https://t.co/t0JA1gftUw https://x.com/chatgpt21/status/1955415320782655552 Francois Chollet says his AGI timelines have dropped from [--] years to about [--] years I wonder if he internally tested Open AIs new reasoning system that won gold at IMO https://t.co/t0JA1gftUw"
X Link 2025-10-19T19:15Z 93.3K followers, 84.9K engagements
".@Stanford courses are high-quality but the policies are definitely outdated. Im hearing of rampant blatant cheating happening where students are plugging the questions directly into ChatGPT during the midterms but professors are not allowed to proctor the exams due to the honor code. The professors want to change the policy but university bureaucracy has to go through a multi-year process before it can change. Harvard and Stanford students tell me their professors don't understand AI and the courses are outdated. If elite schools can't keep up the credential arms race is over. Self-learning"
X Link 2025-10-21T01:43Z 93.7K followers, 217.9K engagements
"@lineardiff @ericzelikman I dont view that as incompatible. Thinking for longer is necessary but insufficient. Collaboration (both with humans and other AIs) will be key"
X Link 2025-11-01T23:16Z 93.7K followers, 98K engagements
"In [----] @hughbzhang sent me a detailed personalized cold email asking to intern with me. I was impressed with what he wrote and his background so I hired him as an AI resident for his gap year before grad school. If I got that email today Id just assume it was AI generated. What happens when online job applicants start using LLMs It ain't good. [--]. Pre-LLM cover letter quality predicts your work quality and a good cover gets you a job [--]. LLMs wipe out the signal and employer demand falls [--]. Model suggests high ability workers lose the most 1/n https://t.co/b0ShCRMpFL What happens when online"
X Link 2025-11-05T00:50Z 93.7K followers, 95.2K engagements
"There are few as qualified as @zicokolter to teach a modern AI course. Hes both head of the machine learning department at @CarnegieMellon and on the board of @OpenAI. Intro to AI courses have badly needed an update with the rise of deep learning. Happy to see it happen at CMU I'm teaching a new "Intro to Modern AI" course at CMU this Spring: https://t.co/ptnrNmVPyf. It's an early-undergrad course on how to build a chatbot from scratch (well from PyTorch). The course name has bothered some people "AI" usually means something much broader in academic I'm teaching a new "Intro to Modern AI""
X Link 2025-11-10T20:36Z 94.5K followers, 133K engagements
"The biggest misconception I hear about GenAI is that it inevitably outputs slop because it's trained to output "the average of the internet". But that's simply not true. It's trained to model the entire distribution and RL lets it go beyond the human distribution. AlphaGo was a perfect demonstration of this. It learned the human distribution by training on a lot of Go games. Then it used RL to go beyond the human distribution by discovering Move [--] a brilliant move that human experts initially thought was a blunder. AlphaGo was a narrow domain with an infinite curriculum and a perfect"
X Link 2025-11-20T18:24Z 96.3K followers, 355K engagements
"@razeyonx Classic Elon"
X Link 2025-11-28T16:32Z 95.9K followers, 19.9K engagements
"@fchollet Do you consider any humans to have an understanding of differential equations"
X Link 2025-12-03T19:08Z 96K followers, 21.8K engagements
"From inception to release the journal publication process can easily take over a year. @OpenAI o1 was made available only a year ago. I hate to keep bringing this up but studies cannot lump reasoners with earlier models when considering AI abilities And while studies dont need to always use the latest models they should test to see if there are trends in ability as model size scales to anticipate the future https://t.co/t1iO9w2E0N I hate to keep bringing this up but studies cannot lump reasoners with earlier models when considering AI abilities And while studies dont need to always use the"
X Link 2025-12-08T21:29Z 96.3K followers, 102.8K engagements
"IMO GDPVal is the most important result from our @OpenAI GPT-5.2 launch. We outperform in-domain experts and are SOTA among all models on GPDVal which measures performance on self-contained tasks like making spreadsheets and powerpoint presentations. Really impressive outputs GPT-5.2 is now rolling out to everyone. https://t.co/nfubPwnIIw GPT-5.2 is now rolling out to everyone. https://t.co/nfubPwnIIw"
X Link 2025-12-11T18:38Z 97.4K followers, 133.7K engagements
"I'm also really happy that @OpenAI was willing to publish the original GDPVal results showing Claude ahead of ChatGPT. Excellent work on the eval from @tejalpatwardhan and her team https://x.com/OpenAI/status/1971249382889750803s=20 On GDPval expert graders compared outputs from leading models to human expert work. Claude Opus [---] delivered the strongest results with just under half of its outputs rated as good as or better than expert work. Just as striking is the pace of progress: OpenAIs frontier https://t.co/P6oRMs835R https://x.com/OpenAI/status/1971249382889750803s=20 On GDPval expert"
X Link 2025-12-11T18:38Z 96.3K followers, [----] engagements
"An important lesson that ARC-AGI has internalized but not many others have is that benchmark perf is a function of test-time compute. @OpenAI publishes single-number benchmark results because it's simpler and people expect to see it but ideally all evals would have an x-axis. A year ago we verified a preview of an unreleased version of @OpenAI o3 (High) that scored 88% on ARC-AGI-1 at est. $4.5k/task Today weve verified a new GPT-5.2 Pro (X-High) SOTA score of 90.5% at $11.64/task This represents a 390X efficiency improvement in one year https://t.co/9T47FdZ5Ry A year ago we verified a"
X Link 2025-12-11T18:49Z 96.5K followers, 89.6K engagements
"@tszzl Yeah the Claude [--] announcement from March [----] still listed GSM8K as one of the benchmarks"
X Link 2025-12-12T19:21Z 96.3K followers, [----] engagements
"This solver is a project I've wanted to do for a while. I hope people find value from it and if there's interest I might implement more advanced poker solvers with Codex like flop/turn solvers or maybe a fully complete no-limit Texas hold'em poker bot https://github.com/noambrown/poker_solver https://github.com/noambrown/poker_solver"
X Link 2026-01-05T20:41Z 97.4K followers, 27.3K engagements
"@swyx I don't remember the prompt but the initial prompt wasn't very detailed. The hard part for the agents was making it easy for the user to select ranges without clicking every individual hand"
X Link 2026-01-05T22:35Z 97.4K followers, 11.5K engagements
"@mobiuspoker One of the reasons I chose to make a poker bot is because I thought it would be pretty out of distribution for the models and most real-world tasks are a little out of distribution. To succeed theyd have to read the research papers and reason through the implementation"
X Link 2026-01-06T00:04Z 97.4K followers, [----] engagements
"@zachcpa I'd give the win to Codex due to the way better optimizations. But full disclosure: I work at OpenAI"
X Link 2026-01-06T06:33Z 97.4K followers, [----] engagements
"Got this DM: I appreciate that you posted this - increasingly my twitter feed feels out of whack especially with people claiming Claude Code makes them 1000000x more efficient. Felt like I was going crazy and falling behind badly even though I use coding assistants quite a bit. I vibecoded an open-source poker river solver over the holiday break. The code is 100% written by Codex and I also made a version with Claude Code to compare. Overall these tools allowed me to iterate much faster in a domain I know well. But I also felt I couldn't fully trust https://t.co/DH55A3aDC2 I vibecoded an"
X Link 2026-01-07T03:57Z 97.9K followers, 119.3K engagements
"@RamKomarraju Yes theyve gotten better quickly and I think that will continue to happen"
X Link 2026-01-07T04:11Z 97.4K followers, [----] engagements
"A family friend recently lost $1000 to a phishing email. Afterward I ran it through ChatGPT and it easily identified it as a scam. I hope Gmail prioritizes better phishing detection as they start to integrate LLMs. It would be the most impactful feature they could add. Today were bringing @Gmail into the Gemini era making it a personal proactive inbox assistant to help you manage your life not just your messages. Explore the new features launching today many of which are made possible by Gemini [--] π§΅ https://t.co/30ABrZBInv Today were bringing @Gmail into the Gemini era making it a personal"
X Link 2026-01-08T17:54Z 97.9K followers, 118K engagements
"Just got this scam in my inbox [--] minutes ago. I get stuff like this multiple times a week. I would pay good money for a better filter"
X Link 2026-01-08T20:02Z 97.9K followers, [----] engagements
Limited data mode. Full metrics available with subscription: lunarcrush.com/pricing
/creator/x::polynoamial