# ![@stalkermustang Avatar](https://lunarcrush.com/gi/w:26/cr:twitter::603750843.png) @stalkermustang Igor Kotenkov

Igor Kotenkov posts on X about open ai, ai, in the, the first the most. They currently have [-----] followers and [---] posts still getting attention that total [-------] engagements in the last [--] hours.

### Engagements: [-------] [#](/creator/twitter::603750843/interactions)
![Engagements Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::603750843/c:line/m:interactions.svg)

- [--] Week [-------] +1,267%
- [--] Month [-------] +1,106%
- [--] Months [-------] +66%
- [--] Year [-------] +245%

### Mentions: [--] [#](/creator/twitter::603750843/posts_active)
![Mentions Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::603750843/c:line/m:posts_active.svg)

- [--] Months [--] +69%
- [--] Year [---] +68%

### Followers: [-----] [#](/creator/twitter::603750843/followers)
![Followers Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::603750843/c:line/m:followers.svg)

- [--] Week [-----] +1.70%
- [--] Month [-----] +10%
- [--] Months [-----] +123%
- [--] Year [-----] +230%

### CreatorRank: [------] [#](/creator/twitter::603750843/influencer_rank)
![CreatorRank Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::603750843/c:line/m:influencer_rank.svg)

### Social Influence

**Social category influence**
[technology brands](/list/technology-brands)  23% [finance](/list/finance)  7% [stocks](/list/stocks)  2% [celebrities](/list/celebrities)  2%

**Social topic influence**
[open ai](/topic/open-ai) #1980, [ai](/topic/ai) #4427, [in the](/topic/in-the) #1214, [the first](/topic/the-first) #2212, [guess](/topic/guess) 3%, [math](/topic/math) 3%, [inference](/topic/inference) 2%, [the most](/topic/the-most) #4054, [build a](/topic/build-a) 2%, [future](/topic/future) 2%

**Top accounts mentioned or mentioned by**
[@ryanpgreenblatt](/creator/undefined) [@acerfur](/creator/undefined) [@openai](/creator/undefined) [@metrevals](/creator/undefined) [@willccbb](/creator/undefined) [@senb0n22a](/creator/undefined) [@teknium](/creator/undefined) [@xeophon](/creator/undefined) [@bshlgrs](/creator/undefined) [@amit05prakash](/creator/undefined) [@kalomaze](/creator/undefined) [@teortaxestex](/creator/undefined) [@sdmat123](/creator/undefined) [@deredleritt3r](/creator/undefined) [@gputhief](/creator/undefined) [@chasebrowe32432](/creator/undefined) [@overworldai](/creator/undefined) [@bayesian0_0](/creator/undefined) [@dylan522p](/creator/undefined) [@andymasley](/creator/undefined)

**Top assets mentioned**
[Alphabet Inc Class A (GOOGL)](/topic/$googl)
### Top Social Posts
Top posts by engagements in the last [--] hours

"btw this is how you know there's nothing to expect from the MSL team and LLAMA [--]. Another quick update: last Friday support finally approved my aws cpu limit increase and so I thought why not let me see this through. I finished the job; time was roughly the same it took on the Hetzner server about a day. Then suddenly. ouch the bill I can run the https://t.co/BNTud2IcJC Another quick update: last Friday support finally approved my aws cpu limit increase and so I thought why not let me see this through. I finished the job; time was roughly the same it took on the Hetzner server about a day."  
[X Link](https://x.com/stalkermustang/status/2016243025429590395)  2026-01-27T20:12Z [----] followers, 11.1K engagements


"TBH I'd like to see the breakdown @METR_Evals Yes [---] is slower as measured by TPS and yes it generates more tokens. But does this account for the 26x slowdown I doubt. Not many benchmarks report average token consumption so i'll take ReBench here. The difference is only 30%. According to openrouter Opus [---] generates at [--] TPS while [---] is stable at [--] TPS (even before the recent speed up). So what's going on there One explanation might be that [---] ran more experiments or they were much longer. I'm not sure if Opus underutilizes the provided compute limits (say 1h per tool call / 24h total"  
[X Link](https://x.com/stalkermustang/status/2019370932431634887)  2026-02-05T11:22Z [----] followers, [---] engagements


"@dylan522p I can't help but notice that the person you quote is the CTO of an AI Cloud. How can't they know or even think of it as a viable option That's like the first thing that pops into the head"  
[X Link](https://x.com/stalkermustang/status/2020304637677695045)  2026-02-08T01:12Z [----] followers, [----] engagements


"So it was GPT-4.5 inference "loop" [---] Tb/s data rates over [---] km distance have been demonstrated on single mode fiber optic which works out to [--] GB of data in flight stored in the fiber with [--] TB/s bandwidth. Neural network inference and training can have deterministic weight reference patterns so it is [---] Tb/s data rates over [---] km distance have been demonstrated on single mode fiber optic which works out to [--] GB of data in flight stored in the fiber with [--] TB/s bandwidth. Neural network inference and training can have deterministic weight reference patterns so it is"  
[X Link](https://x.com/stalkermustang/status/2020515659734475183)  2026-02-08T15:10Z [----] followers, [---] engagements


"From what we know the biggest model served on Cerebras is GLM-4.7 a 355B-A32B model. They never served DeepSeek's 671B or Kimi's 1T models. At the same time the advertised speed for GLM-4.7 is 1k tokens roughly matching what OpenAI has written in the blogpost. So Spark is likely in 300b-500b total params range and the number of active params should be close to 32B (because for 22b active Qwen3 the speed is [----] tps). Of course they could've chosen a different BS/interactivity tradeoff but ballpark should be like this. .which means full-scale models are much larger in total params (2T+). Open"  
[X Link](https://x.com/stalkermustang/status/2022020161554067897)  2026-02-12T18:49Z [----] followers, 36.2K engagements


"My friend had GPT-5.2-pro grade the solution given the ground truth PDF. Results: Net: incorrect for [--] [--] [--] correct for [--] [--] [--] [--] [--] and [--] [--] look correct but are the ones where correctness depends on technical details Interestingly Jakub says they're somehow confident [--] is correct. We'll wait for the official grading of course but it seems interesting. Very excited about the "First Proof" challenge. I believe novel frontier research is perhaps the most important way to evaluate capabilities of the next generation of AI models. We have run our internal model with limited human supervision on the"  
[X Link](https://x.com/stalkermustang/status/2022651406457454821)  2026-02-14T12:37Z [----] followers, [----] engagements


"The OpenAI Codex ad is cool but what was the second one I've heard there were [--] ad slots. I've seen Antrhopic twice (sport + therapist) but OpenAI Did I blink Or was it part of the hoax and they bought only [--] slot"  
[X Link](https://x.com/stalkermustang/status/2020841430558859625)  2026-02-09T12:45Z [----] followers, [---] engagements


"Thinking about the recent experiments by Cursor and Anthropicwhere they let dozens or even hundreds of agents run for a weekIm reminded that we don't all share the same mental models. People are reacting to these experiments with "wow" for very different reasons. Some are impressed by the surface-level output: what these specific agents achieved on these specific tasks. Others point outrightly sothat the browser wasn't actually built from scratch the compiler is slow or the code quality is mediocre. Consequently they don't see what the big deal is. But that perspective completely misses the"  
[X Link](https://x.com/stalkermustang/status/2021263868148785619)  2026-02-10T16:43Z [----] followers, [---] engagements


"(for those who will comment on kimi swarm)"  
[X Link](https://x.com/stalkermustang/status/2021264557503582615)  2026-02-10T16:46Z [----] followers, [---] engagements


"@AcerFur @AndyMasley i think it was given the premise that something actually fails and then maybe even given formulas/proofs with n up to [--]. And from that it was prompted to simplify & generalize the formulas (gpt-5.2.-pro) and then the internal scaffold proved the generalized version"  
[X Link](https://x.com/stalkermustang/status/2022485368155910584)  2026-02-14T01:37Z [----] followers, [--] engagements


"My first blogpost is out: Like many of you my feed has been dominated recently by Anthropic's Optimization team take-home. TL;DR: They retired this "notoriously difficult" exam because Claude Opus [---] effectively solved it. So they released the task and everyone started the grind. Yes AI beat human candidates here but AI-generated solutions aren't easy to follow if you aren't familiar with the domain. And they carry [--] educational value. I didn't just want to see the solution; I wanted to understand the mechanics under the hood. So I spent the weekend digging into the task. I started with a"  
[X Link](https://x.com/stalkermustang/status/2018692493554966654)  2026-02-03T14:26Z [----] followers, 48.2K engagements


"Dude thinks LLAMA 5's best model (= biggest in the lineup) will be opensourced 🫢🫢 I hope it's better than Kimi K2. We need American Open source to be more competitive. I hope it's better than Kimi K2. We need American Open source to be more competitive"  
[X Link](https://x.com/stalkermustang/status/2019626813782155665)  2026-02-06T04:18Z [----] followers, [---] engagements


"GPT-5.2's @METR_Evals time horizon has been added to the chart. Here it is in linear scale. Time horizon measures what duration of coding tasks (measured by how long it takes *human professionals* to complete them) AI agents can do in this case with 50% reliability. https://t.co/s9hS4PjRVi GPT-5.2's @METR_Evals time horizon has been added to the chart. Here it is in linear scale. Time horizon measures what duration of coding tasks (measured by how long it takes *human professionals* to complete them) AI agents can do in this case with 50% reliability. https://t.co/s9hS4PjRVi"  
[X Link](https://x.com/stalkermustang/status/2019762328829386856)  2026-02-06T13:17Z [----] followers, [---] engagements


"Should'va called it 25:17 tbh Venators weapon is named 51:20 This is a Bible verse: Thou art my battle axe and weapons of war: for with thee will I break in pieces the nations and with thee will I destroy kingdoms; https://t.co/U7ftZLFp15 Venators weapon is named 51:20 This is a Bible verse: Thou art my battle axe and weapons of war: for with thee will I break in pieces the nations and with thee will I destroy kingdoms; https://t.co/U7ftZLFp15"  
[X Link](https://x.com/stalkermustang/status/2019786717918527840)  2026-02-06T14:54Z [----] followers, [---] engagements


"Clawdbot but for gamers: just banned [--] accounts for manual (non-bot) play on the runescape server just banned [--] accounts for manual (non-bot) play on the runescape server"  
[X Link](https://x.com/stalkermustang/status/2019833942451581043)  2026-02-06T18:01Z [----] followers, [---] engagements


"hate to be that guy but. is @primeintellect's env hub initiative abandoned now [--] new env in the repo for over [--] months E.g. mine from late October and on the bounty list wasn't reviewed despite multiple pings and reminders (to be honest some Jan PRs have comments from the team members) cc @willccbb https://twitter.com/i/web/status/2020127586857189736 https://twitter.com/i/web/status/2020127586857189736"  
[X Link](https://x.com/stalkermustang/status/2020127586857189736)  2026-02-07T13:28Z [----] followers, [----] engagements


"In the spark of recent conversations on long-running agents I recalled there was early-METR work called "Evaluating Language-Model Agents on Realistic Autonomous Tasks" (when METR was "Alignment Research Center") What is up with those How well current gen models perform there @BethMayBarnes @METR_Evals @paulfchristiano I appreciate @Anthropic's honesty in their latest system card but the content of it does not give me confidence that the company will act responsibly with deployment of advanced AI models: -They primarily relied on an internal survey to determine whether Opus [---] crossed their"  
[X Link](https://x.com/stalkermustang/status/2021287409317773424)  2026-02-10T18:17Z [----] followers, [---] engagements


"glm-5 is online for me go check it out https://chat.z.ai/ https://chat.z.ai/"  
[X Link](https://x.com/stalkermustang/status/2021565261359571268)  2026-02-11T12:41Z [----] followers, [----] engagements


"to this day base brain (no human body) *still* can't shit the level of cope are insane Lots of folks spread false narratives about how ARC-1 was created in response to LLMs or how ARC-2 was only created because ARC-1 was saturated. Setting the record straight: [--]. ARC-1 was designed 2017-2019 and released in [----] (pre LLMs). [--]. The coming of ARC-2 was announced Lots of folks spread false narratives about how ARC-1 was created in response to LLMs or how ARC-2 was only created because ARC-1 was saturated. Setting the record straight: [--]. ARC-1 was designed 2017-2019 and released in [----] (pre"  
[X Link](https://x.com/stalkermustang/status/2022097088897855861)  2026-02-12T23:54Z [----] followers, [----] engagements


"I love Dario I love Dwarkesh but oh my god why would they put this cut in the podcast recording"  
[X Link](https://x.com/stalkermustang/status/2022452081530147162)  2026-02-13T23:25Z [----] followers, 406.3K engagements


"Very excited to see the results in [--] hours Very excited about the "First Proof" challenge. I believe novel frontier research is perhaps the most important way to evaluate capabilities of the next generation of AI models. We have run our internal model with limited human supervision on the ten proposed problems. The Very excited about the "First Proof" challenge. I believe novel frontier research is perhaps the most important way to evaluate capabilities of the next generation of AI models. We have run our internal model with limited human supervision on the ten proposed problems. The"  
[X Link](https://x.com/stalkermustang/status/2022519015445348750)  2026-02-14T03:51Z [----] followers, [----] engagements


"tech memes are funny but please check out my tech blogpost on ANthropic's Take Home it's a great read: https://x.com/stalkermustang/status/2018692493554966654s=20 My first blogpost is out: https://t.co/zUi5DtqG1K Like many of you my feed has been dominated recently by Anthropic's Optimization team take-home. TL;DR: They retired this "notoriously difficult" exam because Claude Opus [---] effectively solved it. So they released the task https://t.co/UZK5Fwxe0q https://x.com/stalkermustang/status/2018692493554966654s=20 My first blogpost is out: https://t.co/zUi5DtqG1K Like many of you my feed has"  
[X Link](https://x.com/stalkermustang/status/2022660162796790233)  2026-02-14T13:12Z [----] followers, 21.7K engagements


"NBP made it better I love Dario I love Dwarkesh but oh my god why would they put this cut in the podcast recording. https://t.co/vbs08DIy9U I love Dario I love Dwarkesh but oh my god why would they put this cut in the podcast recording. https://t.co/vbs08DIy9U"  
[X Link](https://x.com/stalkermustang/status/2022673276095369464)  2026-02-14T14:04Z [----] followers, [----] engagements


"Codex Spark The Context Killer (video is NOT sped up)"  
[X Link](https://x.com/stalkermustang/status/2022695210602631451)  2026-02-14T15:31Z [----] followers, 15.6K engagements


"of course no. Do I really need to explain 1) "value aligned" 2) "safety-conscious" 3) "comes close to building AGI" - getting first to the public release doesn't mean shit in his context. Orion aka GPT-4.5 could be much better than this but we wouldn't know (until the release)"  
[X Link](https://x.com/stalkermustang/status/1891965321835057215)  2025-02-18T21:37Z [---] followers, [--] engagements


"@OpenRouterAI curious to know what's the token-per-second speed you've clearly messed up something here:"  
[X Link](https://x.com/stalkermustang/status/1933093314019864786)  2025-06-12T09:25Z [----] followers, [---] engagements


"this is a big step for our Cerebras bros. [--] EF is like 20-25k H100s FP8 (w/o sparsity). Hope to see a lot of really cool and big morels to be served at 1k+ TPS with generous TPM/RPM limits Big W for reasoning and web agents too Meet OKC: the World's Fastest AI Datacenter powered by Cerebras. Located in the heart of Oklahoma our OKC datacenter is delivering over [--] ExaFlops of AI compute. That's trillions of tokens per second for AI developers to build with. Read @andrewdfeldman's blog: https://t.co/i7fa6nYtiR Meet OKC: the World's Fastest AI Datacenter powered by Cerebras. Located in the"  
[X Link](https://x.com/stalkermustang/status/1970265701697855941)  2025-09-22T23:15Z [----] followers, [---] engagements


"@senb0n22a source was it reported anywhere"  
[X Link](https://x.com/stalkermustang/status/1979351501396152607)  2025-10-18T00:59Z [----] followers, [--] engagements


"@senb0n22a he wrote "In December as we roll out age-gating more fully" and I understand this as an UX update that requires near zero training. Why do you say "likely would take a lot of training so if gpt-6 is coming it's likely with that alignment update.""  
[X Link](https://x.com/stalkermustang/status/1979352957608169900)  2025-10-18T01:04Z [----] followers, [--] engagements


"@senb0n22a brou. this likely means a small change in post training like sub $30 mil on ocmpute. Like another 4o iteration of some sort"  
[X Link](https://x.com/stalkermustang/status/1979353298617749701)  2025-10-18T01:06Z [----] followers, [--] engagements


"predictions: - there will be no grok [--] by EOY (globally available; maybe announced with very limited usage) - grok [---] no different than grok [--] - gemini [--] in dec ok & cool but actually maybe slightly better than gpt-5-high (in real world tasks and OOD benchmarks. Definitely won't feel as "left in dust" for OAI maybe except a few domains like HTML site oneshot generation or idk); maybe no GA"  
[X Link](https://x.com/stalkermustang/status/1979355454552604844)  2025-10-18T01:14Z [----] followers, [--] engagements


"@ElliotGlazer we had to develop an extra tier of difficulty to FrontierMath just to feel safe it would resist saturation . till next year. peace of AI in a nutshell"  
[X Link](https://x.com/stalkermustang/status/1980742065764921579)  2025-10-21T21:04Z [----] followers, [----] engagements


"I honestly do not understand this response. How does focusing on inference scaling and long context eliminate the possibility of Agent-0/1 training We know for sure: - OAI trained GPT-4.5 - OAI can afford to run RLVR atop of it - OAI has some model X system that won IMO gold + some other things and that system a) is not available b) very costly to run - that system might be announced by the end of the year; it might be advertised as expert-only like o1/o3-mini/o1 pro were not for all kinds of everyday tasks. - Anthropic has a gigantic Opus which isn't very convenient for consumer usage and."  
[X Link](https://x.com/stalkermustang/status/1981712505182285838)  2025-10-24T13:20Z [----] followers, [---] engagements


"Class Assignment - Assault [--] - Get kills while using the Adrenaline Injector - [--] to [--] Weapon Assignment - Deadeye [--] - Get headshot kills over 200m with Sniper Rifles - from [---] to [--] In addition a later game update will reduce the 200m distance WTF What's the reason to change these They are supposed to be HARD to achieve. What makes them hard (esp. 1) is that the game logic to check the conditions is faulty. Not every adrenaline kill counts; thus [--] becomes like [---]. The rest ARE challenging AND are good. I didn't play previous BF games and I have [--] experience sniping. I really enjoyed"  
[X Link](https://x.com/stalkermustang/status/1981718055332860209)  2025-10-24T13:42Z [----] followers, [---] engagements


"Direct quote from PDF: OpenAI proposes a Classified Stargate initiative to help meet this needmobilizing private capital alongside government partners to establish accredited classified data centers purpose-built for government AI. https://t.co/zI8sHMQt7e https://t.co/zI8sHMQt7e"  
[X Link](https://x.com/stalkermustang/status/1982926677757042913)  2025-10-27T21:45Z [----] followers, 26.2K engagements


"re-reading HF's ultrascale playbook and just noticed there's no OpenAI papers in "Landmark LLM scaling papers" section What a joke :D"  
[X Link](https://x.com/stalkermustang/status/1983534198431609198)  2025-10-29T13:59Z [----] followers, [---] engagements


"Btw OpenAI is set to announce "Aardvark" today"  
[X Link](https://x.com/stalkermustang/status/1983949609744646256)  2025-10-30T17:30Z [----] followers, [---] engagements


"Atlas bugs / "features" that kill the experience: - no "no power saving" mode when battery is low (I'm tired of [--] fps scrolling) - the black bars on the media mini-player are added every time the pop up is shown. Try switching between tabs and all of a sudden the black bars are the size of the video - no "copy the current page link" hotkey - the issue with hotkeys not working when the keyboard language is switched to non-english is still here. - "ask ChatGPT" opens every time I switch to arxiv pdf page page even if I closed it [--] times in the last minute - not atlas related but still"  
[X Link](https://x.com/stalkermustang/status/1985454016831373540)  2025-11-03T21:08Z [----] followers, [---] engagements


"@leothecurious @kalomaze What did he say Like some small models can't learn some stuff"  
[X Link](https://x.com/stalkermustang/status/1985891665006248013)  2025-11-05T02:07Z [----] followers, [---] engagements


"@Teknium If OpenAI cant even win with private models against OS then why tf should they be given Note: there was [--] months in the last [--] years where OS models were winning OAI's private models"  
[X Link](https://x.com/stalkermustang/status/1986413745913143308)  2025-11-06T12:41Z [----] followers, [---] engagements


"Even with a healthy degree of scepticism around Chinese models being overtrained on benchmark-like data it's still impressive that ALL the scores reported here use INT4-quantized model. we adopt Quantization-Aware Training (QAT) during the post-training phase applying INT4 weight-only quantization to the MoE components. All benchmark results are reported under INT4 precision. 🚀 Hello Kimi K2 Thinking The Open-Source Thinking Agent Model is here. 🔹 SOTA on HLE (44.9%) and BrowseComp (60.2%) 🔹 Executes up to [---] [---] sequential tool calls without human interference 🔹 Excels in reasoning"  
[X Link](https://x.com/stalkermustang/status/1986453669470961933)  2025-11-06T15:20Z [----] followers, 10.1K engagements


"@SHL0MS @Teknium from the creators of "yeah sure R1 is better than o1" (i feel like I need to specify that by "creators" I don't literally mean the deepseek's team)"  
[X Link](https://x.com/stalkermustang/status/1986496916759212179)  2025-11-06T18:12Z [----] followers, [---] engagements


"back in the days i've gathered a suite of benchmarks and compared the scores of the models. Some of them are marked red and orange due to some feedback but tldr " as you see there's really only a few where DS substantially beats o1 but the reverse is not true - o1 beats DSR1 by 1-2-3% much more frequently. Meaning if you choose tasks / benchmarks at random you will frequently see o1 topping DS. The longer the tail - the bigger the difference. For example if you translate these to you know languages other than Ch and En - you'll see it even more clearly. i wish people were looking at"  
[X Link](https://x.com/stalkermustang/status/1986499924117049656)  2025-11-06T18:24Z [----] followers, [--] engagements


"While I don't think GPT-5 has the same base it boggles my mind that someone could use inference prices as evidence for distinguishing base models. It's widely known the inference margins are 100s of % in frontier labs. They can cut it by 20% and ya'all will be saying "wow that's a new model" good old o1/o3 didn't teach you anything people It think it's unlikely that gpt-5 shares base models with 4o. For one thing gpt-5 inference is much cheaper for input. I think there is a latency difference too though don't have numbers at hand. None of this is definitive of course. https://t.co/F7vMP8ie5J"  
[X Link](https://x.com/stalkermustang/status/1986834844387520808)  2025-11-07T16:35Z [----] followers, [---] engagements


"Prediction: in the next 2-3 months we'll see a suite of new benchmarks which will show that in fact Kimi's K2 is lagging behind frontier models. (and some of these benchmarks will be multimodal. guess why K2/DS won't be there on the LB) When people will start looking at the results outside the release blogposts that were picked by the authors A good illustration of my point could be the newest @OfirPress 's bench. Luckily they compared Qwen [--] Coder which was advertised as a sonnet-4 level coder model and you know what The further you go from the benchmarks in the announcement the bigger the"  
[X Link](https://x.com/stalkermustang/status/1986976148731969700)  2025-11-08T01:56Z [----] followers, 81.8K engagements


"@krishnanrohit @OpenAI +1 you're not alone"  
[X Link](https://x.com/stalkermustang/status/1987006447155106030)  2025-11-08T03:57Z [----] followers, [---] engagements


"@kalomaze @teortaxesTex https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features/ https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features/"  
[X Link](https://x.com/stalkermustang/status/1987469923065491667)  2025-11-09T10:38Z [----] followers, [--] engagements


"@iamgrigorev What's l in llm"  
[X Link](https://x.com/stalkermustang/status/1988211260027269235)  2025-11-11T11:44Z [----] followers, [--] engagements


"Spoiler: it won't. Why Because it shows the Sama-Investor relationships. The latter believe him and in him so much they are not ready to sell. This episode might become known as the podcast that popped the AI bubble Let that sink in This episode might become known as the podcast that popped the AI bubble Let that sink in"  
[X Link](https://x.com/stalkermustang/status/1989379236893982741)  2025-11-14T17:05Z [----] followers, [----] engagements


"I'm not sure why this new ByteDance Seed paper is not all over my feed. Am I missing something - trained Qwen2VL-7B to play genshin - SFT only no RL - [----] hours of human gameplay + 15k short reasoning traces to decompose the tasks - sub 20k H100 hours (3 epochs) - heaps of inference optimisations to fit generation sub [---] ms (5HZ controls) - non-surprising part: generalization to unseen Genshin locations missions items and characters - surprising result: IT FUCKING GENERALIZED TO PLAY HONKAI AND TACKLE 5H (yes [--] hours) LONG MISSIONS. ZERO SHOT NO CHANGES - IT EVEN CAN PLAY WUKONG though bc"  
[X Link](https://x.com/stalkermustang/status/1989794651906212114)  2025-11-15T20:36Z [----] followers, [---] engagements


"@sdmat123 @groundruled @bubblebabyboi yes. I don't doubt ppl are working on this and I'm also a big fan of Test-Time Training (TTT) idea I'm just skeptical that something Google is publishing nowadays is somehow relevant to advancing the architecture frontier"  
[X Link](https://x.com/stalkermustang/status/1989994899945451562)  2025-11-16T09:52Z [----] followers, [--] engagements


"My reading is that GPT-5.1 (likely) could already reach human level on ARC-AGI-2 given compute compatible with that December's o3 announcement (was it $10k per task or something) and maybe with some prompting tricks from @RyanPGreenblatt GPT-5-1 (Thinking High) on ARC-AGI Semi-Private Eval - ARC-AGI-1: 72.83% $0.67/task - ARC-AGI-2: 17.64% $1.17/task New frontier model SOTA from @OpenAI https://t.co/1TGHMnJA7V GPT-5-1 (Thinking High) on ARC-AGI Semi-Private Eval - ARC-AGI-1: 72.83% $0.67/task - ARC-AGI-2: 17.64% $1.17/task New frontier model SOTA from @OpenAI https://t.co/1TGHMnJA7V"  
[X Link](https://x.com/stalkermustang/status/1990520534229282977)  2025-11-17T20:40Z [----] followers, [---] engagements


"TIL: gpt-5.1 codex made a huge jump on Terminal-Bench [--] to the degree it's still better than G3 pro that's kind of a surprise for me Gemini [---] Pro is SoTA in: - HLE - ARC-AGI [--] - GPQA Diamond - AIME25 - MathArena Apex - MMMU-Pro - Video-MMMU - LiveCodeBench Pro - Terminal Bench [---] - tau2 bench - SimpleQA Verified - MMMLU As predicted it does not get a SoTA result on SWE-Bench verified losing to Sonnet https://t.co/v3O8BHDGFA Gemini [---] Pro is SoTA in: - HLE - ARC-AGI [--] - GPQA Diamond - AIME25 - MathArena Apex - MMMU-Pro - Video-MMMU - LiveCodeBench Pro - Terminal Bench [---] - tau2 bench -"  
[X Link](https://x.com/stalkermustang/status/1990879012248240399)  2025-11-18T20:25Z [----] followers, 22.1K engagements


"So nanobanana [--] is live for me on Vertex release probably today here's the proof: wow. Finally Yann Lecun just officially posted on his FB an hour before that hes leaving Meta and launching a startup. To continue the Advanced Machine Intelligence research program (AMI). And that he will be "sticking around Meta until the end of the year." https://t.co/7EXYTRoWZB wow. Finally Yann Lecun just officially posted on his FB an hour before that hes leaving Meta and launching a startup. To continue the Advanced Machine Intelligence research program (AMI). And that he will be "sticking around Meta"  
[X Link](https://x.com/stalkermustang/status/1991452602694090951)  2025-11-20T10:24Z [----] followers, [---] engagements


"@xeophon_ pretraining not fp8 NGMI"  
[X Link](https://x.com/stalkermustang/status/1991516115219140631)  2025-11-20T14:36Z [----] followers, [---] engagements


"Nano Banana Pro is live for me in Gemini app anyone"  
[X Link](https://x.com/stalkermustang/status/1991520141885284514)  2025-11-20T14:52Z [----] followers, [---] engagements


"I would call this an era-defining document demonstrating the current capabilities of LLMs to accelerate scientific discovery. This is a very special moment for all of us where we see those sparks and yet we're able to contribute to the science. Vibe coding will be replaced by vibe research in the future :) OpenAI has already assembled a dedicated team working closely with scientists from various fields to start systematically pushing the frontier of science next year using LLMs. As we all know in many tasks getting to 5-7% quality is harder than developing it up to 70%. And now is that very"  
[X Link](https://x.com/stalkermustang/status/1991646914769539280)  2025-11-20T23:16Z [----] followers, [----] engagements


"@EpochAIResearch has already measured @GoogleDeepMind 's G3 on FrontierMath (all tiers) top-1 tier [--] solving [--] problems. This means we have at least one new problem solved as the total set of problems solved before that contained only 8"  
[X Link](https://x.com/stalkermustang/status/1991908193153216817)  2025-11-21T16:34Z [----] followers, [---] engagements


"@mehedi_u @ArtificialAnlys you know that this benchmark is out of reach for 99%+ of people and thus has nothing to do with AGI right oh you don't well"  
[X Link](https://x.com/stalkermustang/status/1991938984432521645)  2025-11-21T18:37Z [----] followers, [---] engagements


"@deredleritt3r care to elaborate didn't get what you mean"  
[X Link](https://x.com/stalkermustang/status/1991974969941876831)  2025-11-21T21:00Z [----] followers, [----] engagements


"real metrics banger is hidden in the system card. Yes you can overfit on Django and nail SWE-bench Verified. But there's this recent SWE-bench Pro from @scale_AI and opus gets 52%. The next best sonet [---] is only [----] and non-anthropic model GPT-5 is 36%. This is HUGE for real world tasks. @_sholtodouglas has cooked 💀💀"  
[X Link](https://x.com/stalkermustang/status/1993043231223799900)  2025-11-24T19:45Z [----] followers, 57.3K engagements


"my reading is this: because one of the assumptions of our model is that the progress is dependant on the compute scale a software-only singularity is impossible due to the inability to scale infinitely compute i.e. we don't believe we can achieve exponential increase in capabilities w/ fixed compute"  
[X Link](https://x.com/stalkermustang/status/1993251790234513916)  2025-11-25T09:33Z [----] followers, [--] engagements


"@thenomadevel nice ui btw not for humans not for agents. A choice of a true esthete"  
[X Link](https://x.com/stalkermustang/status/1993354657524019598)  2025-11-25T16:22Z [----] followers, [--] engagements


"it would be cool if @OpenAI has released something of this kind like how do they approach their frontier model alignment This doesn't speed up the progress for competitors and only helps to achieve good outcomes for an AGI-pilled world. This is [--] (3) work from Anthropic in the past week New Anthropic research: We build a diverse suite of dishonest models and use it to systematically test methods for improving honesty and detecting lies. Of the 25+ methods we tested simple ones like fine-tuning models to be honest despite deceptive instructions worked best. https://t.co/sUEwwYSmaN New"  
[X Link](https://x.com/stalkermustang/status/1993413384369328470)  2025-11-25T20:16Z [----] followers, [---] engagements


"The first Math model introduced GRPO (before o1 / reasoning models). What will this release bring us LETS SEEEE New whale 👀 https://t.co/2WANNgjN5Q New whale 👀 https://t.co/2WANNgjN5Q"  
[X Link](https://x.com/stalkermustang/status/1993999547962675548)  2025-11-27T11:05Z [----] followers, [----] engagements


"The paper is interesting but I struggled a little with the total reward formula. For anyone like me here's an annotated version with all variables on the same screen without a need to go back and forth across pages (why don't people do this Maybe w/o colors but just the legend): deepseek math v2 is the first open source model to reach gold on IMO and we get a tech report what an amazing release https://t.co/23hi8Ay142 deepseek math v2 is the first open source model to reach gold on IMO and we get a tech report what an amazing release https://t.co/23hi8Ay142"  
[X Link](https://x.com/stalkermustang/status/1994044444476141943)  2025-11-27T14:03Z [----] followers, 16.4K engagements


"wtf did bros at xai just read in that Qwen-Genshin paper and decide to train grok-5-mini in the same fashion https://x.com/stalkermustang/status/1989794651906212114s=20 I want to break down how challenging the setup is and how fundamental the breakthrough will be. It requires abilities to: - recognize a computer interface from a video stream w/o APIs - reason with complexity under tight time limits - execute actions on a computer w/ no need of https://x.com/stalkermustang/status/1989794651906212114s=20 I want to break down how challenging the setup is and how fundamental the breakthrough will"  
[X Link](https://x.com/stalkermustang/status/1994189938707890318)  2025-11-27T23:41Z [----] followers, 47.4K engagements


"@Teknium i really can't believe Cofounder and Head of Post Training @NousResearch buys this 3y bullshit from Burry"  
[X Link](https://x.com/stalkermustang/status/1994494252647420405)  2025-11-28T19:51Z [----] followers, [----] engagements


"@giffmana @SebastienBubeck guys who have the "actually it only simulates reasoning" meme I wasn't able to find it but we have to share it with Lucas"  
[X Link](https://x.com/stalkermustang/status/1995908509011992957)  2025-12-02T17:30Z [----] followers, [----] engagements


"This is a great insight & all and I'm really looking forward to most of the agentic benchmarks to include default scaffolds into evaluation. A separate question arises: why did the score jump so significantly Could Opus or Sonnet have seen these tasks before resulting in this spike on this particular bench I believe this question is irrelevant to the post's main point. Even if the models were trained on these specific examples we can see that certain agent implementations degrade quality by nearly half. In other words specific tools fail to leverage the model's full potential even when it"  
[X Link](https://x.com/stalkermustang/status/1997072460319957395)  2025-12-05T22:35Z [----] followers, [---] engagements


"@gpu_thief @ChaseBrowe32432 wait what Where in my package i still see medium as default. Where do you see this"  
[X Link](https://x.com/stalkermustang/status/1998881053398946238)  2025-12-10T22:22Z [----] followers, [---] engagements


"@gpu_thief @ChaseBrowe32432 I guess your hypothesis is correct then"  
[X Link](https://x.com/stalkermustang/status/1998881717969690762)  2025-12-10T22:25Z [----] followers, [--] engagements


"Send this to your pookie anon https://t.co/YxYvTMKLBE https://t.co/YxYvTMKLBE"  
[X Link](https://x.com/stalkermustang/status/2001280118195462628)  2025-12-17T13:15Z [----] followers, [---] engagements


"I recall after that FT piece where they said OAI needs to raise 220b i wrote "huh if that's it that'll be pretty easy for sama. 220b isn't that much they've raised 60b on previous levels of revenue." and lots of people were like "are you crazy Nobody will give them this much" huh what a time OpenAI has discussed a new funding round at a valuation of around $750 billion () They could raise as much as $100b. Major scoop from my colleagues @srimuppidi and @Katie_Roof https://t.co/nd1fztUpCG https://t.co/KxuD8G9Fxv OpenAI has discussed a new funding round at a valuation of around $750 billion ()"  
[X Link](https://x.com/stalkermustang/status/2001461103751238131)  2025-12-18T01:14Z [----] followers, 10.2K engagements


"@GregHBurnham This seems a little bit bugged or idk the UI for Agent is different for me but also I can access the VM and click UI / fill forms / etc"  
[X Link](https://x.com/stalkermustang/status/2007022000183918931)  2026-01-02T09:31Z [----] followers, [--] engagements


"@willccbb @aidan_mclau @figuret20 what aboy these"  
[X Link](https://x.com/stalkermustang/status/2007921928863158769)  2026-01-04T21:07Z [----] followers, [---] engagements


"@RyanPGreenblatt @sebkrier Was curious what [---] pro would say about your comment https://chatgpt.com/share/695c6ee8-9df8-8008-bb4e-77949fb9b881 https://chatgpt.com/share/695c6ee8-9df8-8008-bb4e-77949fb9b881"  
[X Link](https://x.com/stalkermustang/status/2008360481023578508)  2026-01-06T02:10Z [----] followers, [---] engagements


"1) it won't be in fact in the bank 2) as of last summer OAI had $17B CASH in the bank and that even without using any credit lines that were green-lighted early in [----] $20B in the bank. The OpenAI hegemony is officially over. Why am I bullish on xAI Its simple math: The Talent Density: Elon is vacuuming up the top 1% who are tired of OAIs corporate politics. The Full Stack: Unlike MSFT or Amazon Elon owns the entire loopCompute $20B in the bank. The OpenAI hegemony is officially over. Why am I bullish on xAI Its simple math: The Talent Density: Elon is vacuuming up the top 1% who are tired"  
[X Link](https://x.com/stalkermustang/status/2008660893236867124)  2026-01-06T22:04Z [----] followers, [---] engagements


"@bshlgrs @RyanPGreenblatt this is cool listening RN but how much does @RyanPGreenblatt bench"  
[X Link](https://x.com/stalkermustang/status/2008720930781229459)  2026-01-07T02:02Z [----] followers, [---] engagements


"@bshlgrs @RyanPGreenblatt he estimates bro this is serious we need confidence distribution forecasting model an essay in two pieces and polymarket for this :D"  
[X Link](https://x.com/stalkermustang/status/2008730657837715777)  2026-01-07T02:41Z [----] followers, [--] engagements


"@bshlgrs @RyanPGreenblatt fuck every time I pause @RyanPGreenblatt is golden *safety can wait lemme take a nap mid-question*"  
[X Link](https://x.com/stalkermustang/status/2009368261322805305)  2026-01-08T20:55Z [----] followers, [---] engagements


"@sdmat123 @VahidK @xai @OpenAI but also where's Grok [--] in open source"  
[X Link](https://x.com/stalkermustang/status/2009572112642793655)  2026-01-09T10:25Z [----] followers, [--] engagements


"Our guy @dwarkesh_sp has advertised Gemini [--] Pro [--] times and got private access to the next-gen version Gemini [---]. Last time he was playing around AlphaGo for Agents. In the sponsor section for the next part he'll seemingly have a dyson swarm prototype or idk And it's up This was a really fun format - you get the back-and-forth energy of a conversation but you can actually think through an idea in writing. Thanks to @SubstackInc for hosting and to @patio11 @michaeljburry and @jackclarkSF for a great discussion And it's up This was a really fun format - you get the back-and-forth energy of a"  
[X Link](https://x.com/stalkermustang/status/2009705426615759240)  2026-01-09T19:14Z [----] followers, [---] engagements


"Ok there's new Whale drop cc @teortaxesTex https://github.com/deepseek-ai/Engram/tree/main https://github.com/deepseek-ai/Engram/tree/main"  
[X Link](https://x.com/stalkermustang/status/2010758154985095381)  2026-01-12T16:57Z [----] followers, 45.3K engagements


"@kelly_nicholes idk they pass the big tech L5 IC interview"  
[X Link](https://x.com/stalkermustang/status/2010850797852991765)  2026-01-12T23:06Z [----] followers, [---] engagements


"@Gusarich @AcerFur likely but do we know anything about their internals"  
[X Link](https://x.com/stalkermustang/status/2011456428645744950)  2026-01-14T15:12Z [----] followers, [--] engagements


"These two are the biggest non-highlighted takeaways for me: 1) pass@5 for Opus is well above all other models 2) gpt-5.2-medium is really token-efficient. Yes it lags behind opus (which offers bigger TPS i believe) but the score is still strong. We have updated SWE-rebench with the December tasks SWE-rebench is a live benchmark with fresh SWE tasks (issue+PR) from GitHub every month. Some insights: top-3 models right now are: [--]. Claude Opus [---] [--]. gpt-5.2-2025-12-11-xhigh [--]. Gemini [--] Flash Preview Gemini [--] https://t.co/gz6OX1XEfP We have updated SWE-rebench with the December tasks SWE-rebench"  
[X Link](https://x.com/stalkermustang/status/2012158042851410083)  2026-01-16T13:40Z [----] followers, [----] engagements


"@overworld_ai Hey boys why don't you want to wrap the outputs of the model with DLSS to upscale + generate frames If tried what's the problem - image quality is poor or something"  
[X Link](https://x.com/stalkermustang/status/2013719071280234777)  2026-01-20T21:03Z [----] followers, [---] engagements


"Wow I didn't know @ykilcher had a hair transplant operation. WAGMI Me defending my O(n3) solution to the coding interviewer. https://t.co/2p0tWhuFtb Me defending my O(n3) solution to the coding interviewer. https://t.co/2p0tWhuFtb"  
[X Link](https://x.com/stalkermustang/status/2013721554996064403)  2026-01-20T21:13Z [----] followers, [---] engagements


"@anm5704 @overworld_ai IDK it adds negligible overhead to the rendering pipeline in games. On [----] with dlss [--] on quality you get 100+ FPS in most games"  
[X Link](https://x.com/stalkermustang/status/2013906094330888674)  2026-01-21T09:26Z [----] followers, [--] engagements


"@deredleritt3r smart /compact lol :D"  
[X Link](https://x.com/stalkermustang/status/2014443091906769065)  2026-01-22T21:00Z [----] followers, [---] engagements


"1) gaia-2 is out 2) type of benchmark Elon will never retweet (check gemini and grok scores) 3) meta paid so much for MSL they can't afford to bench Opus thus GPT-5 is on top links in the first reply"  
[X Link](https://x.com/stalkermustang/status/1970131353170588034)  2025-09-22T14:21Z [----] followers, [---] engagements


"Great Paid AI Studio Experience @OfficialLoganK: 1) spent [--] minutes setting up the billing through the 2000s Google Cloud UI 2) just so that now I have to wait 15-20 sec before the model starts to generate COT tokens (on free tier it was 1-2 sec prefilling for the same # of tokens) /s https://twitter.com/i/web/status/2015842225805037647 https://twitter.com/i/web/status/2015842225805037647"  
[X Link](https://x.com/stalkermustang/status/2015842225805037647)  2026-01-26T17:40Z [----] followers, 10.2K engagements


"@AcerFur I really hope it gains traction around the world and that many mathematicians will contribute problems across various fields"  
[X Link](https://x.com/stalkermustang/status/2016192121431003343)  2026-01-27T16:50Z [----] followers, [---] engagements


"@xeophon I like Lucas I hold nothing against him"  
[X Link](https://x.com/stalkermustang/status/2016243775232700659)  2026-01-27T20:15Z [----] followers, [---] engagements


"@littmath I guess i won't know hahahah"  
[X Link](https://x.com/stalkermustang/status/2016542992769712388)  2026-01-28T16:04Z [----] followers, [----] engagements


"@DeadlockAir IMO the problem was much more obvious on the bases (esp. hinned king that has bright red lights). There I struggle the most with creep separation. Can you please make a comparison there"  
[X Link](https://x.com/stalkermustang/status/2016722971545481606)  2026-01-29T03:59Z [----] followers, [----] engagements


"@AcerFur @CarinaLHong Breaking: Acer closely tied to OpenAi's Kevin Weil ex CPO an head of science department and the first person to made ChatGPT to discover new science has hinted at arrival of AGI in just TWO weeks. Buy this course go know how to prepare -"  
[X Link](https://x.com/stalkermustang/status/2016853089634885696)  2026-01-29T12:37Z [----] followers, [--] engagements


"without the annoying hard usage limits Does he know Does he know they generate more thank 10k CoTs per problem Each probably having 10s of thousands reasoning tokens Looking forward to generating 100M+ tokens per problem on 671b-sized models without usage limits :))))))) (to be clear 100M of output tokens on v3.2 costs only $42) whale bros never seem to disappoint. It's basically a Gemini IMO Gold DeepThink without the annoying hard usage limits. They're the first lab that actually provides some info on the current limitations of these LLM-based approaches which I value highly. It still"  
[X Link](https://x.com/stalkermustang/status/1994138839988699593)  2025-11-27T20:18Z [----] followers, [----] engagements


"Nope OpenAI is famous for its reasoning effort budgeting. The models are aware of this budget because of some changes in the training. At inferece yes that's just a variable in the prompt (juice number) but that's not the same as just prompting "think hard" or "think breiefly" https://twitter.com/i/web/status/2018815049645560197 https://twitter.com/i/web/status/2018815049645560197"  
[X Link](https://x.com/stalkermustang/status/2018815049645560197)  2026-02-03T22:33Z [----] followers, [--] engagements

Limited data mode. Full metrics available with subscription: lunarcrush.com/pricing

@stalkermustang Igor Kotenkov

Igor Kotenkov posts on X about open ai, ai, in the, the first the most. They currently have [-----] followers and [---] posts still getting attention that total [-------] engagements in the last [--] hours.

Engagements: [-------] #

[--] Week [-------] +1,267%
[--] Month [-------] +1,106%
[--] Months [-------] +66%
[--] Year [-------] +245%

Mentions: [--] #

[--] Months [--] +69%
[--] Year [---] +68%

Followers: [-----] #

[--] Week [-----] +1.70%
[--] Month [-----] +10%
[--] Months [-----] +123%
[--] Year [-----] +230%

CreatorRank: [------] #

Social Influence

Social category influence technology brands 23% finance 7% stocks 2% celebrities 2%

Social topic influence open ai #1980, ai #4427, in the #1214, the first #2212, guess 3%, math 3%, inference 2%, the most #4054, build a 2%, future 2%

Top accounts mentioned or mentioned by @ryanpgreenblatt @acerfur @openai @metrevals @willccbb @senb0n22a @teknium @xeophon @bshlgrs @amit05prakash @kalomaze @teortaxestex @sdmat123 @deredleritt3r @gputhief @chasebrowe32432 @overworldai @bayesian0_0 @dylan522p @andymasley

Top assets mentioned Alphabet Inc Class A (GOOGL)

Top Social Posts

Top posts by engagements in the last [--] hours

"btw this is how you know there's nothing to expect from the MSL team and LLAMA [--]. Another quick update: last Friday support finally approved my aws cpu limit increase and so I thought why not let me see this through. I finished the job; time was roughly the same it took on the Hetzner server about a day. Then suddenly. ouch the bill I can run the https://t.co/BNTud2IcJC Another quick update: last Friday support finally approved my aws cpu limit increase and so I thought why not let me see this through. I finished the job; time was roughly the same it took on the Hetzner server about a day."
X Link 2026-01-27T20:12Z [----] followers, 11.1K engagements

"TBH I'd like to see the breakdown @METR_Evals Yes [---] is slower as measured by TPS and yes it generates more tokens. But does this account for the 26x slowdown I doubt. Not many benchmarks report average token consumption so i'll take ReBench here. The difference is only 30%. According to openrouter Opus [---] generates at [--] TPS while [---] is stable at [--] TPS (even before the recent speed up). So what's going on there One explanation might be that [---] ran more experiments or they were much longer. I'm not sure if Opus underutilizes the provided compute limits (say 1h per tool call / 24h total"
X Link 2026-02-05T11:22Z [----] followers, [---] engagements

"@dylan522p I can't help but notice that the person you quote is the CTO of an AI Cloud. How can't they know or even think of it as a viable option That's like the first thing that pops into the head"
X Link 2026-02-08T01:12Z [----] followers, [----] engagements

"So it was GPT-4.5 inference "loop" [---] Tb/s data rates over [---] km distance have been demonstrated on single mode fiber optic which works out to [--] GB of data in flight stored in the fiber with [--] TB/s bandwidth. Neural network inference and training can have deterministic weight reference patterns so it is [---] Tb/s data rates over [---] km distance have been demonstrated on single mode fiber optic which works out to [--] GB of data in flight stored in the fiber with [--] TB/s bandwidth. Neural network inference and training can have deterministic weight reference patterns so it is"
X Link 2026-02-08T15:10Z [----] followers, [---] engagements

"From what we know the biggest model served on Cerebras is GLM-4.7 a 355B-A32B model. They never served DeepSeek's 671B or Kimi's 1T models. At the same time the advertised speed for GLM-4.7 is 1k tokens roughly matching what OpenAI has written in the blogpost. So Spark is likely in 300b-500b total params range and the number of active params should be close to 32B (because for 22b active Qwen3 the speed is [----] tps). Of course they could've chosen a different BS/interactivity tradeoff but ballpark should be like this. .which means full-scale models are much larger in total params (2T+). Open"
X Link 2026-02-12T18:49Z [----] followers, 36.2K engagements

"My friend had GPT-5.2-pro grade the solution given the ground truth PDF. Results: Net: incorrect for [--] [--] [--] correct for [--] [--] [--] [--] [--] and [--] [--] look correct but are the ones where correctness depends on technical details Interestingly Jakub says they're somehow confident [--] is correct. We'll wait for the official grading of course but it seems interesting. Very excited about the "First Proof" challenge. I believe novel frontier research is perhaps the most important way to evaluate capabilities of the next generation of AI models. We have run our internal model with limited human supervision on the"
X Link 2026-02-14T12:37Z [----] followers, [----] engagements

"The OpenAI Codex ad is cool but what was the second one I've heard there were [--] ad slots. I've seen Antrhopic twice (sport + therapist) but OpenAI Did I blink Or was it part of the hoax and they bought only [--] slot"
X Link 2026-02-09T12:45Z [----] followers, [---] engagements

"Thinking about the recent experiments by Cursor and Anthropicwhere they let dozens or even hundreds of agents run for a weekIm reminded that we don't all share the same mental models. People are reacting to these experiments with "wow" for very different reasons. Some are impressed by the surface-level output: what these specific agents achieved on these specific tasks. Others point outrightly sothat the browser wasn't actually built from scratch the compiler is slow or the code quality is mediocre. Consequently they don't see what the big deal is. But that perspective completely misses the"
X Link 2026-02-10T16:43Z [----] followers, [---] engagements

"(for those who will comment on kimi swarm)"
X Link 2026-02-10T16:46Z [----] followers, [---] engagements

"@AcerFur @AndyMasley i think it was given the premise that something actually fails and then maybe even given formulas/proofs with n up to [--]. And from that it was prompted to simplify & generalize the formulas (gpt-5.2.-pro) and then the internal scaffold proved the generalized version"
X Link 2026-02-14T01:37Z [----] followers, [--] engagements

"My first blogpost is out: Like many of you my feed has been dominated recently by Anthropic's Optimization team take-home. TL;DR: They retired this "notoriously difficult" exam because Claude Opus [---] effectively solved it. So they released the task and everyone started the grind. Yes AI beat human candidates here but AI-generated solutions aren't easy to follow if you aren't familiar with the domain. And they carry [--] educational value. I didn't just want to see the solution; I wanted to understand the mechanics under the hood. So I spent the weekend digging into the task. I started with a"
X Link 2026-02-03T14:26Z [----] followers, 48.2K engagements

"Dude thinks LLAMA 5's best model (= biggest in the lineup) will be opensourced 🫢🫢 I hope it's better than Kimi K2. We need American Open source to be more competitive. I hope it's better than Kimi K2. We need American Open source to be more competitive"
X Link 2026-02-06T04:18Z [----] followers, [---] engagements

"GPT-5.2's @METR_Evals time horizon has been added to the chart. Here it is in linear scale. Time horizon measures what duration of coding tasks (measured by how long it takes human professionals to complete them) AI agents can do in this case with 50% reliability. https://t.co/s9hS4PjRVi GPT-5.2's @METR_Evals time horizon has been added to the chart. Here it is in linear scale. Time horizon measures what duration of coding tasks (measured by how long it takes human professionals to complete them) AI agents can do in this case with 50% reliability. https://t.co/s9hS4PjRVi"
X Link 2026-02-06T13:17Z [----] followers, [---] engagements

"Should'va called it 25:17 tbh Venators weapon is named 51:20 This is a Bible verse: Thou art my battle axe and weapons of war: for with thee will I break in pieces the nations and with thee will I destroy kingdoms; https://t.co/U7ftZLFp15 Venators weapon is named 51:20 This is a Bible verse: Thou art my battle axe and weapons of war: for with thee will I break in pieces the nations and with thee will I destroy kingdoms; https://t.co/U7ftZLFp15"
X Link 2026-02-06T14:54Z [----] followers, [---] engagements

"Clawdbot but for gamers: just banned [--] accounts for manual (non-bot) play on the runescape server just banned [--] accounts for manual (non-bot) play on the runescape server"
X Link 2026-02-06T18:01Z [----] followers, [---] engagements

"hate to be that guy but. is @primeintellect's env hub initiative abandoned now [--] new env in the repo for over [--] months E.g. mine from late October and on the bounty list wasn't reviewed despite multiple pings and reminders (to be honest some Jan PRs have comments from the team members) cc @willccbb https://twitter.com/i/web/status/2020127586857189736 https://twitter.com/i/web/status/2020127586857189736"
X Link 2026-02-07T13:28Z [----] followers, [----] engagements

"In the spark of recent conversations on long-running agents I recalled there was early-METR work called "Evaluating Language-Model Agents on Realistic Autonomous Tasks" (when METR was "Alignment Research Center") What is up with those How well current gen models perform there @BethMayBarnes @METR_Evals @paulfchristiano I appreciate @Anthropic's honesty in their latest system card but the content of it does not give me confidence that the company will act responsibly with deployment of advanced AI models: -They primarily relied on an internal survey to determine whether Opus [---] crossed their"
X Link 2026-02-10T18:17Z [----] followers, [---] engagements

"glm-5 is online for me go check it out https://chat.z.ai/ https://chat.z.ai/"
X Link 2026-02-11T12:41Z [----] followers, [----] engagements

"to this day base brain (no human body) still can't shit the level of cope are insane Lots of folks spread false narratives about how ARC-1 was created in response to LLMs or how ARC-2 was only created because ARC-1 was saturated. Setting the record straight: [--]. ARC-1 was designed 2017-2019 and released in [----] (pre LLMs). [--]. The coming of ARC-2 was announced Lots of folks spread false narratives about how ARC-1 was created in response to LLMs or how ARC-2 was only created because ARC-1 was saturated. Setting the record straight: [--]. ARC-1 was designed 2017-2019 and released in [----] (pre"
X Link 2026-02-12T23:54Z [----] followers, [----] engagements

"I love Dario I love Dwarkesh but oh my god why would they put this cut in the podcast recording"
X Link 2026-02-13T23:25Z [----] followers, 406.3K engagements

"Very excited to see the results in [--] hours Very excited about the "First Proof" challenge. I believe novel frontier research is perhaps the most important way to evaluate capabilities of the next generation of AI models. We have run our internal model with limited human supervision on the ten proposed problems. The Very excited about the "First Proof" challenge. I believe novel frontier research is perhaps the most important way to evaluate capabilities of the next generation of AI models. We have run our internal model with limited human supervision on the ten proposed problems. The"
X Link 2026-02-14T03:51Z [----] followers, [----] engagements

"tech memes are funny but please check out my tech blogpost on ANthropic's Take Home it's a great read: https://x.com/stalkermustang/status/2018692493554966654s=20 My first blogpost is out: https://t.co/zUi5DtqG1K Like many of you my feed has been dominated recently by Anthropic's Optimization team take-home. TL;DR: They retired this "notoriously difficult" exam because Claude Opus [---] effectively solved it. So they released the task https://t.co/UZK5Fwxe0q https://x.com/stalkermustang/status/2018692493554966654s=20 My first blogpost is out: https://t.co/zUi5DtqG1K Like many of you my feed has"
X Link 2026-02-14T13:12Z [----] followers, 21.7K engagements

"NBP made it better I love Dario I love Dwarkesh but oh my god why would they put this cut in the podcast recording. https://t.co/vbs08DIy9U I love Dario I love Dwarkesh but oh my god why would they put this cut in the podcast recording. https://t.co/vbs08DIy9U"
X Link 2026-02-14T14:04Z [----] followers, [----] engagements

"Codex Spark The Context Killer (video is NOT sped up)"
X Link 2026-02-14T15:31Z [----] followers, 15.6K engagements

"of course no. Do I really need to explain 1) "value aligned" 2) "safety-conscious" 3) "comes close to building AGI" - getting first to the public release doesn't mean shit in his context. Orion aka GPT-4.5 could be much better than this but we wouldn't know (until the release)"
X Link 2025-02-18T21:37Z [---] followers, [--] engagements

"@OpenRouterAI curious to know what's the token-per-second speed you've clearly messed up something here:"
X Link 2025-06-12T09:25Z [----] followers, [---] engagements

"this is a big step for our Cerebras bros. [--] EF is like 20-25k H100s FP8 (w/o sparsity). Hope to see a lot of really cool and big morels to be served at 1k+ TPS with generous TPM/RPM limits Big W for reasoning and web agents too Meet OKC: the World's Fastest AI Datacenter powered by Cerebras. Located in the heart of Oklahoma our OKC datacenter is delivering over [--] ExaFlops of AI compute. That's trillions of tokens per second for AI developers to build with. Read @andrewdfeldman's blog: https://t.co/i7fa6nYtiR Meet OKC: the World's Fastest AI Datacenter powered by Cerebras. Located in the"
X Link 2025-09-22T23:15Z [----] followers, [---] engagements

"@senb0n22a source was it reported anywhere"
X Link 2025-10-18T00:59Z [----] followers, [--] engagements

"@senb0n22a he wrote "In December as we roll out age-gating more fully" and I understand this as an UX update that requires near zero training. Why do you say "likely would take a lot of training so if gpt-6 is coming it's likely with that alignment update.""
X Link 2025-10-18T01:04Z [----] followers, [--] engagements

"@senb0n22a brou. this likely means a small change in post training like sub $30 mil on ocmpute. Like another 4o iteration of some sort"
X Link 2025-10-18T01:06Z [----] followers, [--] engagements

"predictions: - there will be no grok [--] by EOY (globally available; maybe announced with very limited usage) - grok [---] no different than grok [--] - gemini [--] in dec ok & cool but actually maybe slightly better than gpt-5-high (in real world tasks and OOD benchmarks. Definitely won't feel as "left in dust" for OAI maybe except a few domains like HTML site oneshot generation or idk); maybe no GA"
X Link 2025-10-18T01:14Z [----] followers, [--] engagements

"@ElliotGlazer we had to develop an extra tier of difficulty to FrontierMath just to feel safe it would resist saturation . till next year. peace of AI in a nutshell"
X Link 2025-10-21T21:04Z [----] followers, [----] engagements

"I honestly do not understand this response. How does focusing on inference scaling and long context eliminate the possibility of Agent-0/1 training We know for sure: - OAI trained GPT-4.5 - OAI can afford to run RLVR atop of it - OAI has some model X system that won IMO gold + some other things and that system a) is not available b) very costly to run - that system might be announced by the end of the year; it might be advertised as expert-only like o1/o3-mini/o1 pro were not for all kinds of everyday tasks. - Anthropic has a gigantic Opus which isn't very convenient for consumer usage and."
X Link 2025-10-24T13:20Z [----] followers, [---] engagements

"Class Assignment - Assault [--] - Get kills while using the Adrenaline Injector - [--] to [--] Weapon Assignment - Deadeye [--] - Get headshot kills over 200m with Sniper Rifles - from [---] to [--] In addition a later game update will reduce the 200m distance WTF What's the reason to change these They are supposed to be HARD to achieve. What makes them hard (esp. 1) is that the game logic to check the conditions is faulty. Not every adrenaline kill counts; thus [--] becomes like [---]. The rest ARE challenging AND are good. I didn't play previous BF games and I have [--] experience sniping. I really enjoyed"
X Link 2025-10-24T13:42Z [----] followers, [---] engagements

"Direct quote from PDF: OpenAI proposes a Classified Stargate initiative to help meet this needmobilizing private capital alongside government partners to establish accredited classified data centers purpose-built for government AI. https://t.co/zI8sHMQt7e https://t.co/zI8sHMQt7e"
X Link 2025-10-27T21:45Z [----] followers, 26.2K engagements

"re-reading HF's ultrascale playbook and just noticed there's no OpenAI papers in "Landmark LLM scaling papers" section What a joke :D"
X Link 2025-10-29T13:59Z [----] followers, [---] engagements

"Btw OpenAI is set to announce "Aardvark" today"
X Link 2025-10-30T17:30Z [----] followers, [---] engagements

"Atlas bugs / "features" that kill the experience: - no "no power saving" mode when battery is low (I'm tired of [--] fps scrolling) - the black bars on the media mini-player are added every time the pop up is shown. Try switching between tabs and all of a sudden the black bars are the size of the video - no "copy the current page link" hotkey - the issue with hotkeys not working when the keyboard language is switched to non-english is still here. - "ask ChatGPT" opens every time I switch to arxiv pdf page page even if I closed it [--] times in the last minute - not atlas related but still"
X Link 2025-11-03T21:08Z [----] followers, [---] engagements

"@leothecurious @kalomaze What did he say Like some small models can't learn some stuff"
X Link 2025-11-05T02:07Z [----] followers, [---] engagements

"@Teknium If OpenAI cant even win with private models against OS then why tf should they be given Note: there was [--] months in the last [--] years where OS models were winning OAI's private models"
X Link 2025-11-06T12:41Z [----] followers, [---] engagements

"Even with a healthy degree of scepticism around Chinese models being overtrained on benchmark-like data it's still impressive that ALL the scores reported here use INT4-quantized model. we adopt Quantization-Aware Training (QAT) during the post-training phase applying INT4 weight-only quantization to the MoE components. All benchmark results are reported under INT4 precision. 🚀 Hello Kimi K2 Thinking The Open-Source Thinking Agent Model is here. 🔹 SOTA on HLE (44.9%) and BrowseComp (60.2%) 🔹 Executes up to [---] [---] sequential tool calls without human interference 🔹 Excels in reasoning"
X Link 2025-11-06T15:20Z [----] followers, 10.1K engagements

"@SHL0MS @Teknium from the creators of "yeah sure R1 is better than o1" (i feel like I need to specify that by "creators" I don't literally mean the deepseek's team)"
X Link 2025-11-06T18:12Z [----] followers, [---] engagements

"back in the days i've gathered a suite of benchmarks and compared the scores of the models. Some of them are marked red and orange due to some feedback but tldr " as you see there's really only a few where DS substantially beats o1 but the reverse is not true - o1 beats DSR1 by 1-2-3% much more frequently. Meaning if you choose tasks / benchmarks at random you will frequently see o1 topping DS. The longer the tail - the bigger the difference. For example if you translate these to you know languages other than Ch and En - you'll see it even more clearly. i wish people were looking at"
X Link 2025-11-06T18:24Z [----] followers, [--] engagements

"While I don't think GPT-5 has the same base it boggles my mind that someone could use inference prices as evidence for distinguishing base models. It's widely known the inference margins are 100s of % in frontier labs. They can cut it by 20% and ya'all will be saying "wow that's a new model" good old o1/o3 didn't teach you anything people It think it's unlikely that gpt-5 shares base models with 4o. For one thing gpt-5 inference is much cheaper for input. I think there is a latency difference too though don't have numbers at hand. None of this is definitive of course. https://t.co/F7vMP8ie5J"
X Link 2025-11-07T16:35Z [----] followers, [---] engagements

"Prediction: in the next 2-3 months we'll see a suite of new benchmarks which will show that in fact Kimi's K2 is lagging behind frontier models. (and some of these benchmarks will be multimodal. guess why K2/DS won't be there on the LB) When people will start looking at the results outside the release blogposts that were picked by the authors A good illustration of my point could be the newest @OfirPress 's bench. Luckily they compared Qwen [--] Coder which was advertised as a sonnet-4 level coder model and you know what The further you go from the benchmarks in the announcement the bigger the"
X Link 2025-11-08T01:56Z [----] followers, 81.8K engagements

"@krishnanrohit @OpenAI +1 you're not alone"
X Link 2025-11-08T03:57Z [----] followers, [---] engagements

"@kalomaze @teortaxesTex https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features/ https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features/"
X Link 2025-11-09T10:38Z [----] followers, [--] engagements

"@iamgrigorev What's l in llm"
X Link 2025-11-11T11:44Z [----] followers, [--] engagements

"Spoiler: it won't. Why Because it shows the Sama-Investor relationships. The latter believe him and in him so much they are not ready to sell. This episode might become known as the podcast that popped the AI bubble Let that sink in This episode might become known as the podcast that popped the AI bubble Let that sink in"
X Link 2025-11-14T17:05Z [----] followers, [----] engagements

"I'm not sure why this new ByteDance Seed paper is not all over my feed. Am I missing something - trained Qwen2VL-7B to play genshin - SFT only no RL - [----] hours of human gameplay + 15k short reasoning traces to decompose the tasks - sub 20k H100 hours (3 epochs) - heaps of inference optimisations to fit generation sub [---] ms (5HZ controls) - non-surprising part: generalization to unseen Genshin locations missions items and characters - surprising result: IT FUCKING GENERALIZED TO PLAY HONKAI AND TACKLE 5H (yes [--] hours) LONG MISSIONS. ZERO SHOT NO CHANGES - IT EVEN CAN PLAY WUKONG though bc"
X Link 2025-11-15T20:36Z [----] followers, [---] engagements

"@sdmat123 @groundruled @bubblebabyboi yes. I don't doubt ppl are working on this and I'm also a big fan of Test-Time Training (TTT) idea I'm just skeptical that something Google is publishing nowadays is somehow relevant to advancing the architecture frontier"
X Link 2025-11-16T09:52Z [----] followers, [--] engagements

"My reading is that GPT-5.1 (likely) could already reach human level on ARC-AGI-2 given compute compatible with that December's o3 announcement (was it $10k per task or something) and maybe with some prompting tricks from @RyanPGreenblatt GPT-5-1 (Thinking High) on ARC-AGI Semi-Private Eval - ARC-AGI-1: 72.83% $0.67/task - ARC-AGI-2: 17.64% $1.17/task New frontier model SOTA from @OpenAI https://t.co/1TGHMnJA7V GPT-5-1 (Thinking High) on ARC-AGI Semi-Private Eval - ARC-AGI-1: 72.83% $0.67/task - ARC-AGI-2: 17.64% $1.17/task New frontier model SOTA from @OpenAI https://t.co/1TGHMnJA7V"
X Link 2025-11-17T20:40Z [----] followers, [---] engagements

"TIL: gpt-5.1 codex made a huge jump on Terminal-Bench [--] to the degree it's still better than G3 pro that's kind of a surprise for me Gemini [---] Pro is SoTA in: - HLE - ARC-AGI [--] - GPQA Diamond - AIME25 - MathArena Apex - MMMU-Pro - Video-MMMU - LiveCodeBench Pro - Terminal Bench [---] - tau2 bench - SimpleQA Verified - MMMLU As predicted it does not get a SoTA result on SWE-Bench verified losing to Sonnet https://t.co/v3O8BHDGFA Gemini [---] Pro is SoTA in: - HLE - ARC-AGI [--] - GPQA Diamond - AIME25 - MathArena Apex - MMMU-Pro - Video-MMMU - LiveCodeBench Pro - Terminal Bench [---] - tau2 bench -"
X Link 2025-11-18T20:25Z [----] followers, 22.1K engagements

"So nanobanana [--] is live for me on Vertex release probably today here's the proof: wow. Finally Yann Lecun just officially posted on his FB an hour before that hes leaving Meta and launching a startup. To continue the Advanced Machine Intelligence research program (AMI). And that he will be "sticking around Meta until the end of the year." https://t.co/7EXYTRoWZB wow. Finally Yann Lecun just officially posted on his FB an hour before that hes leaving Meta and launching a startup. To continue the Advanced Machine Intelligence research program (AMI). And that he will be "sticking around Meta"
X Link 2025-11-20T10:24Z [----] followers, [---] engagements

"@xeophon_ pretraining not fp8 NGMI"
X Link 2025-11-20T14:36Z [----] followers, [---] engagements

"Nano Banana Pro is live for me in Gemini app anyone"
X Link 2025-11-20T14:52Z [----] followers, [---] engagements

"I would call this an era-defining document demonstrating the current capabilities of LLMs to accelerate scientific discovery. This is a very special moment for all of us where we see those sparks and yet we're able to contribute to the science. Vibe coding will be replaced by vibe research in the future :) OpenAI has already assembled a dedicated team working closely with scientists from various fields to start systematically pushing the frontier of science next year using LLMs. As we all know in many tasks getting to 5-7% quality is harder than developing it up to 70%. And now is that very"
X Link 2025-11-20T23:16Z [----] followers, [----] engagements

"@EpochAIResearch has already measured @GoogleDeepMind 's G3 on FrontierMath (all tiers) top-1 tier [--] solving [--] problems. This means we have at least one new problem solved as the total set of problems solved before that contained only 8"
X Link 2025-11-21T16:34Z [----] followers, [---] engagements

"@mehedi_u @ArtificialAnlys you know that this benchmark is out of reach for 99%+ of people and thus has nothing to do with AGI right oh you don't well"
X Link 2025-11-21T18:37Z [----] followers, [---] engagements

"@deredleritt3r care to elaborate didn't get what you mean"
X Link 2025-11-21T21:00Z [----] followers, [----] engagements

"real metrics banger is hidden in the system card. Yes you can overfit on Django and nail SWE-bench Verified. But there's this recent SWE-bench Pro from @scale_AI and opus gets 52%. The next best sonet [---] is only [----] and non-anthropic model GPT-5 is 36%. This is HUGE for real world tasks. @_sholtodouglas has cooked 💀💀"
X Link 2025-11-24T19:45Z [----] followers, 57.3K engagements

"my reading is this: because one of the assumptions of our model is that the progress is dependant on the compute scale a software-only singularity is impossible due to the inability to scale infinitely compute i.e. we don't believe we can achieve exponential increase in capabilities w/ fixed compute"
X Link 2025-11-25T09:33Z [----] followers, [--] engagements

"@thenomadevel nice ui btw not for humans not for agents. A choice of a true esthete"
X Link 2025-11-25T16:22Z [----] followers, [--] engagements

"it would be cool if @OpenAI has released something of this kind like how do they approach their frontier model alignment This doesn't speed up the progress for competitors and only helps to achieve good outcomes for an AGI-pilled world. This is [--] (3) work from Anthropic in the past week New Anthropic research: We build a diverse suite of dishonest models and use it to systematically test methods for improving honesty and detecting lies. Of the 25+ methods we tested simple ones like fine-tuning models to be honest despite deceptive instructions worked best. https://t.co/sUEwwYSmaN New"
X Link 2025-11-25T20:16Z [----] followers, [---] engagements

"The first Math model introduced GRPO (before o1 / reasoning models). What will this release bring us LETS SEEEE New whale 👀 https://t.co/2WANNgjN5Q New whale 👀 https://t.co/2WANNgjN5Q"
X Link 2025-11-27T11:05Z [----] followers, [----] engagements

"The paper is interesting but I struggled a little with the total reward formula. For anyone like me here's an annotated version with all variables on the same screen without a need to go back and forth across pages (why don't people do this Maybe w/o colors but just the legend): deepseek math v2 is the first open source model to reach gold on IMO and we get a tech report what an amazing release https://t.co/23hi8Ay142 deepseek math v2 is the first open source model to reach gold on IMO and we get a tech report what an amazing release https://t.co/23hi8Ay142"
X Link 2025-11-27T14:03Z [----] followers, 16.4K engagements

"wtf did bros at xai just read in that Qwen-Genshin paper and decide to train grok-5-mini in the same fashion https://x.com/stalkermustang/status/1989794651906212114s=20 I want to break down how challenging the setup is and how fundamental the breakthrough will be. It requires abilities to: - recognize a computer interface from a video stream w/o APIs - reason with complexity under tight time limits - execute actions on a computer w/ no need of https://x.com/stalkermustang/status/1989794651906212114s=20 I want to break down how challenging the setup is and how fundamental the breakthrough will"
X Link 2025-11-27T23:41Z [----] followers, 47.4K engagements

"@Teknium i really can't believe Cofounder and Head of Post Training @NousResearch buys this 3y bullshit from Burry"
X Link 2025-11-28T19:51Z [----] followers, [----] engagements

"@giffmana @SebastienBubeck guys who have the "actually it only simulates reasoning" meme I wasn't able to find it but we have to share it with Lucas"
X Link 2025-12-02T17:30Z [----] followers, [----] engagements

"This is a great insight & all and I'm really looking forward to most of the agentic benchmarks to include default scaffolds into evaluation. A separate question arises: why did the score jump so significantly Could Opus or Sonnet have seen these tasks before resulting in this spike on this particular bench I believe this question is irrelevant to the post's main point. Even if the models were trained on these specific examples we can see that certain agent implementations degrade quality by nearly half. In other words specific tools fail to leverage the model's full potential even when it"
X Link 2025-12-05T22:35Z [----] followers, [---] engagements

"@gpu_thief @ChaseBrowe32432 wait what Where in my package i still see medium as default. Where do you see this"
X Link 2025-12-10T22:22Z [----] followers, [---] engagements

"@gpu_thief @ChaseBrowe32432 I guess your hypothesis is correct then"
X Link 2025-12-10T22:25Z [----] followers, [--] engagements

"Send this to your pookie anon https://t.co/YxYvTMKLBE https://t.co/YxYvTMKLBE"
X Link 2025-12-17T13:15Z [----] followers, [---] engagements

"I recall after that FT piece where they said OAI needs to raise 220b i wrote "huh if that's it that'll be pretty easy for sama. 220b isn't that much they've raised 60b on previous levels of revenue." and lots of people were like "are you crazy Nobody will give them this much" huh what a time OpenAI has discussed a new funding round at a valuation of around $750 billion () They could raise as much as $100b. Major scoop from my colleagues @srimuppidi and @Katie_Roof https://t.co/nd1fztUpCG https://t.co/KxuD8G9Fxv OpenAI has discussed a new funding round at a valuation of around $750 billion ()"
X Link 2025-12-18T01:14Z [----] followers, 10.2K engagements

"@GregHBurnham This seems a little bit bugged or idk the UI for Agent is different for me but also I can access the VM and click UI / fill forms / etc"
X Link 2026-01-02T09:31Z [----] followers, [--] engagements

"@willccbb @aidan_mclau @figuret20 what aboy these"
X Link 2026-01-04T21:07Z [----] followers, [---] engagements

"@RyanPGreenblatt @sebkrier Was curious what [---] pro would say about your comment https://chatgpt.com/share/695c6ee8-9df8-8008-bb4e-77949fb9b881 https://chatgpt.com/share/695c6ee8-9df8-8008-bb4e-77949fb9b881"
X Link 2026-01-06T02:10Z [----] followers, [---] engagements

"1) it won't be in fact in the bank 2) as of last summer OAI had $17B CASH in the bank and that even without using any credit lines that were green-lighted early in [----] $20B in the bank. The OpenAI hegemony is officially over. Why am I bullish on xAI Its simple math: The Talent Density: Elon is vacuuming up the top 1% who are tired of OAIs corporate politics. The Full Stack: Unlike MSFT or Amazon Elon owns the entire loopCompute $20B in the bank. The OpenAI hegemony is officially over. Why am I bullish on xAI Its simple math: The Talent Density: Elon is vacuuming up the top 1% who are tired"
X Link 2026-01-06T22:04Z [----] followers, [---] engagements

"@bshlgrs @RyanPGreenblatt this is cool listening RN but how much does @RyanPGreenblatt bench"
X Link 2026-01-07T02:02Z [----] followers, [---] engagements

"@bshlgrs @RyanPGreenblatt he estimates bro this is serious we need confidence distribution forecasting model an essay in two pieces and polymarket for this :D"
X Link 2026-01-07T02:41Z [----] followers, [--] engagements

"@bshlgrs @RyanPGreenblatt fuck every time I pause @RyanPGreenblatt is golden safety can wait lemme take a nap mid-question"
X Link 2026-01-08T20:55Z [----] followers, [---] engagements

"@sdmat123 @VahidK @xai @OpenAI but also where's Grok [--] in open source"
X Link 2026-01-09T10:25Z [----] followers, [--] engagements

"Our guy @dwarkesh_sp has advertised Gemini [--] Pro [--] times and got private access to the next-gen version Gemini [---]. Last time he was playing around AlphaGo for Agents. In the sponsor section for the next part he'll seemingly have a dyson swarm prototype or idk And it's up This was a really fun format - you get the back-and-forth energy of a conversation but you can actually think through an idea in writing. Thanks to @SubstackInc for hosting and to @patio11 @michaeljburry and @jackclarkSF for a great discussion And it's up This was a really fun format - you get the back-and-forth energy of a"
X Link 2026-01-09T19:14Z [----] followers, [---] engagements

"Ok there's new Whale drop cc @teortaxesTex https://github.com/deepseek-ai/Engram/tree/main https://github.com/deepseek-ai/Engram/tree/main"
X Link 2026-01-12T16:57Z [----] followers, 45.3K engagements

"@kelly_nicholes idk they pass the big tech L5 IC interview"
X Link 2026-01-12T23:06Z [----] followers, [---] engagements

"@Gusarich @AcerFur likely but do we know anything about their internals"
X Link 2026-01-14T15:12Z [----] followers, [--] engagements

"These two are the biggest non-highlighted takeaways for me: 1) pass@5 for Opus is well above all other models 2) gpt-5.2-medium is really token-efficient. Yes it lags behind opus (which offers bigger TPS i believe) but the score is still strong. We have updated SWE-rebench with the December tasks SWE-rebench is a live benchmark with fresh SWE tasks (issue+PR) from GitHub every month. Some insights: top-3 models right now are: [--]. Claude Opus [---] [--]. gpt-5.2-2025-12-11-xhigh [--]. Gemini [--] Flash Preview Gemini [--] https://t.co/gz6OX1XEfP We have updated SWE-rebench with the December tasks SWE-rebench"
X Link 2026-01-16T13:40Z [----] followers, [----] engagements

"@overworld_ai Hey boys why don't you want to wrap the outputs of the model with DLSS to upscale + generate frames If tried what's the problem - image quality is poor or something"
X Link 2026-01-20T21:03Z [----] followers, [---] engagements

"Wow I didn't know @ykilcher had a hair transplant operation. WAGMI Me defending my O(n3) solution to the coding interviewer. https://t.co/2p0tWhuFtb Me defending my O(n3) solution to the coding interviewer. https://t.co/2p0tWhuFtb"
X Link 2026-01-20T21:13Z [----] followers, [---] engagements

"@anm5704 @overworld_ai IDK it adds negligible overhead to the rendering pipeline in games. On [----] with dlss [--] on quality you get 100+ FPS in most games"
X Link 2026-01-21T09:26Z [----] followers, [--] engagements

"@deredleritt3r smart /compact lol :D"
X Link 2026-01-22T21:00Z [----] followers, [---] engagements

"1) gaia-2 is out 2) type of benchmark Elon will never retweet (check gemini and grok scores) 3) meta paid so much for MSL they can't afford to bench Opus thus GPT-5 is on top links in the first reply"
X Link 2025-09-22T14:21Z [----] followers, [---] engagements

"Great Paid AI Studio Experience @OfficialLoganK: 1) spent [--] minutes setting up the billing through the 2000s Google Cloud UI 2) just so that now I have to wait 15-20 sec before the model starts to generate COT tokens (on free tier it was 1-2 sec prefilling for the same # of tokens) /s https://twitter.com/i/web/status/2015842225805037647 https://twitter.com/i/web/status/2015842225805037647"
X Link 2026-01-26T17:40Z [----] followers, 10.2K engagements

"@AcerFur I really hope it gains traction around the world and that many mathematicians will contribute problems across various fields"
X Link 2026-01-27T16:50Z [----] followers, [---] engagements

"@xeophon I like Lucas I hold nothing against him"
X Link 2026-01-27T20:15Z [----] followers, [---] engagements

"@littmath I guess i won't know hahahah"
X Link 2026-01-28T16:04Z [----] followers, [----] engagements

"@DeadlockAir IMO the problem was much more obvious on the bases (esp. hinned king that has bright red lights). There I struggle the most with creep separation. Can you please make a comparison there"
X Link 2026-01-29T03:59Z [----] followers, [----] engagements

"@AcerFur @CarinaLHong Breaking: Acer closely tied to OpenAi's Kevin Weil ex CPO an head of science department and the first person to made ChatGPT to discover new science has hinted at arrival of AGI in just TWO weeks. Buy this course go know how to prepare -"
X Link 2026-01-29T12:37Z [----] followers, [--] engagements

"without the annoying hard usage limits Does he know Does he know they generate more thank 10k CoTs per problem Each probably having 10s of thousands reasoning tokens Looking forward to generating 100M+ tokens per problem on 671b-sized models without usage limits :))))))) (to be clear 100M of output tokens on v3.2 costs only $42) whale bros never seem to disappoint. It's basically a Gemini IMO Gold DeepThink without the annoying hard usage limits. They're the first lab that actually provides some info on the current limitations of these LLM-based approaches which I value highly. It still"
X Link 2025-11-27T20:18Z [----] followers, [----] engagements

"Nope OpenAI is famous for its reasoning effort budgeting. The models are aware of this budget because of some changes in the training. At inferece yes that's just a variable in the prompt (juice number) but that's not the same as just prompting "think hard" or "think breiefly" https://twitter.com/i/web/status/2018815049645560197 https://twitter.com/i/web/status/2018815049645560197"
X Link 2026-02-03T22:33Z [----] followers, [--] engagements

Limited data mode. Full metrics available with subscription: lunarcrush.com/pricing