Dark | Light
[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

# ![@omarsar0 Avatar](https://lunarcrush.com/gi/w:26/cr:twitter::3448284313.png) @omarsar0 elvis

elvis posts on X about ai, banger, context engineering, agentic the most. They currently have XXXXXXX followers and 1957 posts still getting attention that total XXXXXX engagements in the last XX hours.

### Engagements: XXXXXX [#](/creator/twitter::3448284313/interactions)
![Engagements Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::3448284313/c:line/m:interactions.svg)

- X Week XXXXXXX -XXXX%
- X Month XXXXXXXXX +36%
- X Months XXXXXXXXXX -XXXX%
- X Year XXXXXXXXXX +87%

### Mentions: XX [#](/creator/twitter::3448284313/posts_active)
![Mentions Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::3448284313/c:line/m:posts_active.svg)

- X Month XXX +6.90%
- X Months XXX +45%
- X Year XXXXX +76%

### Followers: XXXXXXX [#](/creator/twitter::3448284313/followers)
![Followers Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::3448284313/c:line/m:followers.svg)

- X Week XXXXXXX +0.35%
- X Month XXXXXXX +2.10%
- X Months XXXXXXX +12%
- X Year XXXXXXX +29%

### CreatorRank: XXXXXXX [#](/creator/twitter::3448284313/influencer_rank)
![CreatorRank Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::3448284313/c:line/m:influencer_rank.svg)

### Social Influence

**Social category influence**
[musicians](/list/musicians)  #2525 [technology brands](/list/technology-brands)  [stocks](/list/stocks) 

**Social topic influence**
[ai](/topic/ai), [banger](/topic/banger) #563, [context engineering](/topic/context-engineering) #10, [agentic](/topic/agentic) #193, [llm](/topic/llm) #15, [$googl](/topic/$googl), [builders](/topic/builders), [patterns](/topic/patterns), [the first](/topic/the-first), [imo](/topic/imo)

**Top accounts mentioned or mentioned by**
[@dairai](/creator/undefined) [@openai](/creator/undefined) [@karpathy](/creator/undefined) [@officiallogank](/creator/undefined) [@elonmusk](/creator/undefined) [@xai](/creator/undefined) [@anthropicai](/creator/undefined) [@abacaj](/creator/undefined) [@abacusai](/creator/undefined) [@ibab](/creator/undefined) [@skirano](/creator/undefined) [@thefireworksai](/creator/undefined) [@andrewcurran](/creator/undefined) [@gregkamradt](/creator/undefined) [@rungalileo](/creator/undefined) [@orbatos](/creator/undefined) [@adonissingh](/creator/undefined) [@cometml](/creator/undefined) [@romainhuet](/creator/undefined) [@typeofalex](/creator/undefined)

**Top assets mentioned**
[Alphabet Inc Class A (GOOGL)](/topic/$googl)
### Top Social Posts
Top posts by engagements in the last XX hours

"Claude Code subagents are all you need. Some will complain on # of tokens. However the output this spits out will save you days. The code quality is mindblowing Agentic search works exceptionally well. The subagents run in parallel. ChatGPT's deep research is no match"  
[X Link](https://x.com/omarsar0/status/1978235329237668214)  2025-10-14T23:03Z 278.9K followers, 65.6K engagements


"Banger paper for agent builders. Multi-agent systems often underdeliver. The problem isn't how the agents themselves are built. It's how they're organized. They are mostly built with fixed chains trees and graphs that can't adapt as tasks evolve. But what if the system could learn its own coordination patterns This new research introduces Puppeteer a framework that learns to orchestrate agents dynamically rather than relying on handcrafted topologies. Instead of pre-defining collaboration structures an orchestrator selects which agent speaks next based on the evolving conversation state. The"  
[X Link](https://x.com/omarsar0/status/1995553529436783096)  2025-12-01T18:00Z 278.9K followers, 54.4K engagements


"Why would I ask an LLM what it thinks in the first place Is this how people are actually using LLMs for exploration What you suggest sounds more like a workaround for what current LLMs struggle at. IMO this doesnt say much about it being better to treat LLMs more like simulators than entities. For exploring I usually ask directly: I am interested in exploring xyz help me write a plan first before diving deep. Planning is extremely effective at things like topic exploration and coding. Just a matter of preference in interactions I guess. We probably need better research methods to understand"  
[X Link](https://x.com/omarsar0/status/1997743896387059897)  2025-12-07T19:03Z 278.9K followers, 37.7K engagements


"This is insane 🤯 Just built a new skill in Claude Code using Opus XXX. The skill uses Gemini X Pro (via API) for designing web pages. Look at what it generated from one simple prompt"  
[X Link](https://x.com/omarsar0/status/1993101718041903565)  2025-11-24T23:37Z 278.9K followers, 148.3K engagements


"What's missing to build useful deep research agents Deep research agents promise analyst-level reports through automated search and synthesis. However current systems fall short of genuinely useful research. The question is: where exactly do they fail This new paper introduces FINDER a benchmark of XXX human-curated research tasks with XXX structured checklist items for evaluating report quality. Unlike QA benchmarks FINDER focuses on comprehensive report generation. The researchers analyzed approximately 1000 reports from mainstream deep research agents. Their findings challenge assumptions"  
[X Link](https://x.com/omarsar0/status/1995915929973403827)  2025-12-02T18:00Z 278.9K followers, 23.5K engagements


"Quiet Feature Learning in Transformers This is one of the most fascinating papers I have read this week. Let me explain: It argues that loss curves can mislead about what a model is learning. The default approach to monitoring neural network training relies on loss as the primary progress measure. If loss is flat nothing is happening. If loss drops learning is occurring. But this assumption breaks down on algorithmic tasks. This new research trained Transformers on ten foundational algorithmic tasks and discovered "quiet features": internal representations that develop while loss appears"  
[X Link](https://x.com/omarsar0/status/1996233046799106128)  2025-12-03T15:00Z 278.9K followers, 17.5K engagements


"This is a really good report on the reality of building agents in production"  
[X Link](https://x.com/omarsar0/status/1997367450548031809)  2025-12-06T18:08Z 278.9K followers, 161.2K engagements


"New survey on Agentic LLMs. The survey spans three interconnected categories: reasoning and retrieval for better decision-making action-oriented models for practical assistance and multi-agent systems for collaboration and studying emergent social behavior. Key applications include medical diagnosis logistics financial analysis and augmenting scientific research through self-reflective role-playing agents. Notably the report highlights that agentic LLMs offer a solution to training data scarcity by generating new training states during inference. Paper:"  
[X Link](https://x.com/omarsar0/status/1997717251546583103)  2025-12-07T17:18Z 278.9K followers, 44.8K engagements


"I love this figure from Anthropic's new talk on "Skills Agents". Here are my notes: The more skills you build the more useful Claude Code gets. And it makes perfect sense. Procedural knowledge and continuous learning for the win Skills essentially are the way you make Claude Code more knowledgeable over time. This is why I had argued that Skills is a good name for this functionality. Claude Code acquires new capabilities from domain experts (they are the ones building skills). Claude Code can evolve the skills as needed and forget the ones it doesn't need anymore. It's a collaborative effort"  
[X Link](https://x.com/omarsar0/status/1998383154181361813)  2025-12-09T13:24Z 278.9K followers, 120.8K engagements


"2. The ScaleRL recipe that just works. The authors tested dozens of RL variations and found one that scales cleanly to 100k GPU hours without blowing up: - PipelineRL (8 pipelines) with CISPO loss (a stabilized REINFORCE variant). - Prompt-level averaging and batch-level normalization to reduce variance. - FP32 logits for better stability and higher final accuracy. - No-Positive-Resampling curriculum to avoid reward hacking. - Forced interruptions (stopping long thoughts) instead of punishing long completions. - This combo called ScaleRL hit the best trade-off between stability sample"  
[X Link](https://x.com/omarsar0/status/1978865070303232267)  2025-10-16T16:46Z 278.6K followers, 1947 engagements


"3. What actually matters for better RL results. Not every trick helps equally: - Loss choice and precision matter most; CISPO + FP32 logits boosted final pass rates from XX% to 61%. - Normalization aggregation and curriculum mainly affect how fast you improve (efficiency) not how far you can go. - Fancy variants like GRPO DAPO or Magistral didnt beat ScaleRL once scaled properly"  
[X Link](https://x.com/omarsar0/status/1978865085939654692)  2025-10-16T16:46Z 278.6K followers, 1595 engagements


"Nano Banana Pro is wild. I just built a little app in Google AI Studio to help build intuition around AI papers. Paper reading is more fun than ever. :) Images generated by Nano Banana Pro. Gemini X + Nano Banana Pro is an insane combo"  
[X Link](https://x.com/omarsar0/status/1991657126188773878)  2025-11-20T23:57Z 278.7K followers, 128.8K engagements


"Reasoning models are expensive. Not because the models are huge. It's because they generate thousands of tokens just to think. But what if smaller models could learn to reason efficiently This new paper compares training 12B models on reasoning traces from two frontier systems: - DeepSeek-R1 - gpt-oss (OpenAI's open-source reasoner) The key finding: gpt-oss traces produce 4x more efficient reasoning. DeepSeek-R1 averages 15500 tokens per response. gpt-oss averages 3500 tokens. Yet accuracy stays nearly identical across benchmarks. Verbose reasoning doesn't mean better reasoning. Why does this"  
[X Link](https://x.com/omarsar0/status/1993695515595444366)  2025-11-26T14:57Z 278.7K followers, 24.1K engagements


"Major release from DeepSeek. And a big deal for open-source LLMs. DeepSeek-V3.2-Speciale is on par with Gemini-3-Pro on the 2025 International Mathematical Olympiad (IMO) and the International Olympiad in Informatics (IOI). It even surpasses the Gemini X Pro on several benchmarks. DeepSeek identifies three critical bottlenecks: vanilla attention mechanisms that choke on long sequences insufficient post-training compute and weak generalization in agentic scenarios. They introduce DeepSeek-V3.2 a model that tackles all three problems simultaneously. One key innovation is DeepSeek Sparse"  
[X Link](https://x.com/omarsar0/status/1995509721605038475)  2025-12-01T15:06Z 278.6K followers, 32.1K engagements


"Multi-agent AI systems are poor at communication. The default approach in multi-agent RL today focuses almost entirely on task success rates. Can agents coordinate Did they solve the problem The actual cost of communication is rarely measured or optimized. But in real-world systems bandwidth energy and compute are finite. Every message has a price. This new research introduces three Communication Efficiency Metrics (CEMs) and a framework for learning protocols that are both effective and efficient. They find that communication inefficiency arises primarily from poorly designed optimization"  
[X Link](https://x.com/omarsar0/status/1996263279052931372)  2025-12-03T17:00Z 278.7K followers, 10K engagements


"Another banger by Google It's a new technical guide on how to deploy scale and productionize AI Agents. (bookmark it) Love the emphasis of CI/CD and the Agent2Agent protocol"  
[X Link](https://x.com/omarsar0/status/1989689354789556479)  2025-11-15T13:38Z 278.8K followers, 88.9K engagements


"As usual Anthropic just published another banger. This one is on context engineering. Great section on how it is different from prompt engineering. A must-read for AI devs"  
[X Link](https://x.com/omarsar0/status/1973101118990254366)  2025-09-30T19:02Z 278.9K followers, 375K engagements


"LLMs can get "Brain Rot" Continual pretraining on junk high-engagement web text causes lasting "cognitive decline" in LLMs reducing reasoning long-context and safety performance. The main failure mode is thought-skipping where models skip reasoning steps and adopt dark personality traits like narcissism and low agreeableness. Even strong mitigations such as reflection or further fine-tuning only partially reverse the damage making data curation a critical safety concern for AI training"  
[X Link](https://x.com/omarsar0/status/1979217719082774873)  2025-10-17T16:07Z 278.9K followers, 292.3K engagements


"Don't sleep on Skills. Skills is easily one of the most effective ways to steer Claude Code. Impressive for optimization. I built a skill inside of Claude Code that automatically builds tests and optimizes MCP tools. It runs in a loop loading context and tools (bash scripts) efficiently to test and optimize MCP tools based on best practices implementation and outputs. Heck you could even run MCP tools within it if you like but that wasn't what I needed here. One of the most impressive aspects of using Claude Code with Skills is the efficient token usage. The context tiering system is a"  
[X Link](https://x.com/omarsar0/status/1979242073372164306)  2025-10-17T17:44Z 278.9K followers, 192.5K engagements


"Anthropic just posted another banger guide. This one is on building more efficient agents to handle more tools and efficient token usage. This is a must-read for AI devs (bookmark it) It helps with three major issues in AI agent tool calling: token costs latency and tool composition. How It combines code executions with MCP where it turns MCP servers into code APIs rather than direct tool calls. Here is all you need to know: X. Token Efficiency Problem: Loading all MCP tool definitions upfront and passing intermediate results through the context window creates massive token overhead sometimes"  
[X Link](https://x.com/omarsar0/status/1986099467914023194)  2025-11-05T15:53Z 278.9K followers, 181.1K engagements


"As usual Anthropic just published another banger. This one is on building agents that continue to do useful work for an arbitrarily long time. Great tips on context management. A must-read for AI devs"  
[X Link](https://x.com/omarsar0/status/1993778777562960083)  2025-11-26T20:27Z 278.9K followers, 45K engagements


"Context engineering skills will be even more important in 2026. It will enable the most effective agent harnesses in domains like science and engineering. I'm particularly researching how context engineering applies to long-horizon/running problems and scientific discovery"  
[X Link](https://x.com/omarsar0/status/1994832449545933136)  2025-11-29T18:14Z 278.8K followers, 13K engagements


"Banger paper from DeepSeek. Math AI models have a fundamental problem. (bookmarks this one) The issue isn't accuracy on benchmarks. It's that correct answers don't mean correct reasoning. Models can brute-force solutions numerically guess or stumble into right answers through flawed derivations. When you move from equation-grinding to real mathematics theorem proving invariants and inequalities you can't bypass the reasoning anymore. This new research introduces DeepSeekMath-V2 a 685B parameter mixture-of-experts model built for self-verifiable mathematical reasoning. The core idea: A"  
[X Link](https://x.com/omarsar0/status/1994843836347290123)  2025-11-29T19:00Z 278.9K followers, 68.1K engagements


"Interesting research from Meta on hardware scaling trends. More GPUs doesn't always mean faster training. The default approach to scaling LLM training today remains throwing more hardware at the problem. More accelerators more parallelism more compute. However there's a ceiling that most teams don't see until they hit it. This new research demonstrates that scaling the total number of accelerators for large model training quickly yields diminishing returns even with optimized hardware and parallelization strategies. The researchers tested Llama-2 models (1B to 70B parameters) across X to 2048"  
[X Link](https://x.com/omarsar0/status/1995136500124827753)  2025-11-30T14:23Z 278.9K followers, 213.4K engagements


"Is Vibe Coding Safe There is finally research that goes deep into this question. Here is what the research found: AI coding agents can write functional code. But functional doesn't mean safe. The rise of "vibe coding" where developers hand off tasks to AI agents with minimal oversight is accelerating. More autonomy more speed more productivity. The assumption: if it works it's good enough. But working code and secure code are not the same thing. This new research introduces SUSVIBES a benchmark of XXX real-world feature requests from open-source projects specifically tasks that previously led"  
[X Link](https://x.com/omarsar0/status/1996595107924263287)  2025-12-04T14:59Z 278.9K followers, 62.3K engagements


"It's such a joy to use Opus XXX in Claude Code. It's the best coding assistant on the planet - Planning is brilliant - Creativity is remarkable - Understands intent with precision - Implements features extensively - Supercharges all agents - Next-level context understanding - Extremely efficient at context engineering - Easy to extend with skills and tools It makes very few mistakes and pays closer attention to details. Just on another level"  
[X Link](https://x.com/omarsar0/status/1997020528222130512)  2025-12-05T19:09Z 278.9K followers, 22.8K engagements


"Google just published a banger guide on effective context engineering for multi-agent systems. Pay attention to this one AI devs (bookmark it) Here are my key takeaways: Context windows aren't the bottleneck. Context engineering is. For more complex and long-horizon problems context management cannot be treated as a simple "string manipulation" problem. The default approach to handling context in agent systems today remains stuffing everything into the prompt. More history more tokens more confusion. Most teams treat context as a string concatenation problem. But raw context dumps create"  
[X Link](https://x.com/omarsar0/status/1997348089888374918)  2025-12-06T16:51Z 278.9K followers, 77.4K engagements


"The AI Consumer Index (ACE) Most AI benchmarks today focus on reasoning and coding. But most people use AI to shop cook and plan their weekends. In those domains LLM hallucinations continue to be a real problem. XX% of ChatGPT messages (according a recent report) are now non-work-related. Consumers are using AI for everyday tasks and we have no systematic way to measure how well models perform on them. This new research introduces ACE (AI Consumer Index) a benchmark assessing whether frontier models can perform high-value consumer tasks across shopping food gaming and DIY. Consumer tasks"  
[X Link](https://x.com/omarsar0/status/1998039629556256995)  2025-12-08T14:39Z 278.9K followers, 89.1K engagements


"Looks like Mistral has entered the agentic coding arena They just released Mistral Vibe CLI an open-source command-line coding assistant powered by Devstral"  
[X Link](https://x.com/omarsar0/status/1998443420050760020)  2025-12-09T17:23Z 278.9K followers, 9545 engagements


"I will be covering Skills in our upcoming Claude Code build sessions. There are so many impactful ways that non-technical or technical folks can build with Skills. The time to jump into this stuff is now:"  
[X Link](https://x.com/omarsar0/status/1998495724276002862)  2025-12-09T20:51Z 278.9K followers, 5838 engagements


"The most effective AI Agents are built on these core ideas. It's what powers Claude Code. It's referred to as the Claude Agent SDK Loop which is an agent framework to build all kinds of AI agents. (bookmark it) The loop involves three steps: Gathering Context: Use subagents (parallelize them for task efficiency when possible) compact/maintain context and leverage agentic/semantic search for retrieving relevant context for the AI agent. Hybrid search approaches work really well for domains like agentic coding. Taking Action: Leverage tools prebuilt MCP servers bash/scripts (Skills have made it"  
[X Link](https://x.com/omarsar0/status/1987167737639325886)  2025-11-08T14:38Z 278.9K followers, 174.5K engagements


"New research from Apple. Diffusion models dominate video generation. However the current approach has fundamental limitations like multi-step sampling no exact likelihood and training and inference objectives that don't align. This new research introduces STARFlow-V a novel normalizing flow-based causal video generator. It demonstrates that flow models can match diffusion quality while offering end-to-end training exact likelihood estimation and native multi-task support. The architecture uses a global-local two-level system. A deep autoregressive Transformer handles temporal reasoning in"  
[X Link](https://x.com/omarsar0/status/1995856241974005980)  2025-12-02T14:03Z 278.9K followers, 13.3K engagements


"Agentic Context Engineering The code for the paper is finally out I had built an implementation for this (not exactly the same) that already boosted performance for my agents. Evolving context for AI agents is a great idea. Official implementation out now"  
[X Link](https://x.com/omarsar0/status/1996980037161996691)  2025-12-05T16:28Z 278.9K followers, 82.5K engagements


"Next week we start our first live cohort on building with Claude Code. Excited to share how I've been using Claude Code for coding research designing searching and everything in between. You also get to build. :) A few more seats are available if you are interested"  
[X Link](https://x.com/omarsar0/status/1997035321243169061)  2025-12-05T20:08Z 278.9K followers, 10K engagements


"Claude Code can now run agents asynchronously. Huge for productivity. You can run many subagents in the background to explore your codebase. Work continues uninterrupted. When subagents complete tasks they wake up/report to the main agent. Workflows feel faster already"  
[X Link](https://x.com/omarsar0/status/1998774531188830304)  2025-12-10T15:19Z 278.9K followers, 81.7K engagements

[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

@omarsar0 Avatar @omarsar0 elvis

elvis posts on X about ai, banger, context engineering, agentic the most. They currently have XXXXXXX followers and 1957 posts still getting attention that total XXXXXX engagements in the last XX hours.

Engagements: XXXXXX #

Engagements Line Chart

  • X Week XXXXXXX -XXXX%
  • X Month XXXXXXXXX +36%
  • X Months XXXXXXXXXX -XXXX%
  • X Year XXXXXXXXXX +87%

Mentions: XX #

Mentions Line Chart

  • X Month XXX +6.90%
  • X Months XXX +45%
  • X Year XXXXX +76%

Followers: XXXXXXX #

Followers Line Chart

  • X Week XXXXXXX +0.35%
  • X Month XXXXXXX +2.10%
  • X Months XXXXXXX +12%
  • X Year XXXXXXX +29%

CreatorRank: XXXXXXX #

CreatorRank Line Chart

Social Influence

Social category influence musicians #2525 technology brands stocks

Social topic influence ai, banger #563, context engineering #10, agentic #193, llm #15, $googl, builders, patterns, the first, imo

Top accounts mentioned or mentioned by @dairai @openai @karpathy @officiallogank @elonmusk @xai @anthropicai @abacaj @abacusai @ibab @skirano @thefireworksai @andrewcurran @gregkamradt @rungalileo @orbatos @adonissingh @cometml @romainhuet @typeofalex

Top assets mentioned Alphabet Inc Class A (GOOGL)

Top Social Posts

Top posts by engagements in the last XX hours

"Claude Code subagents are all you need. Some will complain on # of tokens. However the output this spits out will save you days. The code quality is mindblowing Agentic search works exceptionally well. The subagents run in parallel. ChatGPT's deep research is no match"
X Link 2025-10-14T23:03Z 278.9K followers, 65.6K engagements

"Banger paper for agent builders. Multi-agent systems often underdeliver. The problem isn't how the agents themselves are built. It's how they're organized. They are mostly built with fixed chains trees and graphs that can't adapt as tasks evolve. But what if the system could learn its own coordination patterns This new research introduces Puppeteer a framework that learns to orchestrate agents dynamically rather than relying on handcrafted topologies. Instead of pre-defining collaboration structures an orchestrator selects which agent speaks next based on the evolving conversation state. The"
X Link 2025-12-01T18:00Z 278.9K followers, 54.4K engagements

"Why would I ask an LLM what it thinks in the first place Is this how people are actually using LLMs for exploration What you suggest sounds more like a workaround for what current LLMs struggle at. IMO this doesnt say much about it being better to treat LLMs more like simulators than entities. For exploring I usually ask directly: I am interested in exploring xyz help me write a plan first before diving deep. Planning is extremely effective at things like topic exploration and coding. Just a matter of preference in interactions I guess. We probably need better research methods to understand"
X Link 2025-12-07T19:03Z 278.9K followers, 37.7K engagements

"This is insane 🤯 Just built a new skill in Claude Code using Opus XXX. The skill uses Gemini X Pro (via API) for designing web pages. Look at what it generated from one simple prompt"
X Link 2025-11-24T23:37Z 278.9K followers, 148.3K engagements

"What's missing to build useful deep research agents Deep research agents promise analyst-level reports through automated search and synthesis. However current systems fall short of genuinely useful research. The question is: where exactly do they fail This new paper introduces FINDER a benchmark of XXX human-curated research tasks with XXX structured checklist items for evaluating report quality. Unlike QA benchmarks FINDER focuses on comprehensive report generation. The researchers analyzed approximately 1000 reports from mainstream deep research agents. Their findings challenge assumptions"
X Link 2025-12-02T18:00Z 278.9K followers, 23.5K engagements

"Quiet Feature Learning in Transformers This is one of the most fascinating papers I have read this week. Let me explain: It argues that loss curves can mislead about what a model is learning. The default approach to monitoring neural network training relies on loss as the primary progress measure. If loss is flat nothing is happening. If loss drops learning is occurring. But this assumption breaks down on algorithmic tasks. This new research trained Transformers on ten foundational algorithmic tasks and discovered "quiet features": internal representations that develop while loss appears"
X Link 2025-12-03T15:00Z 278.9K followers, 17.5K engagements

"This is a really good report on the reality of building agents in production"
X Link 2025-12-06T18:08Z 278.9K followers, 161.2K engagements

"New survey on Agentic LLMs. The survey spans three interconnected categories: reasoning and retrieval for better decision-making action-oriented models for practical assistance and multi-agent systems for collaboration and studying emergent social behavior. Key applications include medical diagnosis logistics financial analysis and augmenting scientific research through self-reflective role-playing agents. Notably the report highlights that agentic LLMs offer a solution to training data scarcity by generating new training states during inference. Paper:"
X Link 2025-12-07T17:18Z 278.9K followers, 44.8K engagements

"I love this figure from Anthropic's new talk on "Skills Agents". Here are my notes: The more skills you build the more useful Claude Code gets. And it makes perfect sense. Procedural knowledge and continuous learning for the win Skills essentially are the way you make Claude Code more knowledgeable over time. This is why I had argued that Skills is a good name for this functionality. Claude Code acquires new capabilities from domain experts (they are the ones building skills). Claude Code can evolve the skills as needed and forget the ones it doesn't need anymore. It's a collaborative effort"
X Link 2025-12-09T13:24Z 278.9K followers, 120.8K engagements

"2. The ScaleRL recipe that just works. The authors tested dozens of RL variations and found one that scales cleanly to 100k GPU hours without blowing up: - PipelineRL (8 pipelines) with CISPO loss (a stabilized REINFORCE variant). - Prompt-level averaging and batch-level normalization to reduce variance. - FP32 logits for better stability and higher final accuracy. - No-Positive-Resampling curriculum to avoid reward hacking. - Forced interruptions (stopping long thoughts) instead of punishing long completions. - This combo called ScaleRL hit the best trade-off between stability sample"
X Link 2025-10-16T16:46Z 278.6K followers, 1947 engagements

"3. What actually matters for better RL results. Not every trick helps equally: - Loss choice and precision matter most; CISPO + FP32 logits boosted final pass rates from XX% to 61%. - Normalization aggregation and curriculum mainly affect how fast you improve (efficiency) not how far you can go. - Fancy variants like GRPO DAPO or Magistral didnt beat ScaleRL once scaled properly"
X Link 2025-10-16T16:46Z 278.6K followers, 1595 engagements

"Nano Banana Pro is wild. I just built a little app in Google AI Studio to help build intuition around AI papers. Paper reading is more fun than ever. :) Images generated by Nano Banana Pro. Gemini X + Nano Banana Pro is an insane combo"
X Link 2025-11-20T23:57Z 278.7K followers, 128.8K engagements

"Reasoning models are expensive. Not because the models are huge. It's because they generate thousands of tokens just to think. But what if smaller models could learn to reason efficiently This new paper compares training 12B models on reasoning traces from two frontier systems: - DeepSeek-R1 - gpt-oss (OpenAI's open-source reasoner) The key finding: gpt-oss traces produce 4x more efficient reasoning. DeepSeek-R1 averages 15500 tokens per response. gpt-oss averages 3500 tokens. Yet accuracy stays nearly identical across benchmarks. Verbose reasoning doesn't mean better reasoning. Why does this"
X Link 2025-11-26T14:57Z 278.7K followers, 24.1K engagements

"Major release from DeepSeek. And a big deal for open-source LLMs. DeepSeek-V3.2-Speciale is on par with Gemini-3-Pro on the 2025 International Mathematical Olympiad (IMO) and the International Olympiad in Informatics (IOI). It even surpasses the Gemini X Pro on several benchmarks. DeepSeek identifies three critical bottlenecks: vanilla attention mechanisms that choke on long sequences insufficient post-training compute and weak generalization in agentic scenarios. They introduce DeepSeek-V3.2 a model that tackles all three problems simultaneously. One key innovation is DeepSeek Sparse"
X Link 2025-12-01T15:06Z 278.6K followers, 32.1K engagements

"Multi-agent AI systems are poor at communication. The default approach in multi-agent RL today focuses almost entirely on task success rates. Can agents coordinate Did they solve the problem The actual cost of communication is rarely measured or optimized. But in real-world systems bandwidth energy and compute are finite. Every message has a price. This new research introduces three Communication Efficiency Metrics (CEMs) and a framework for learning protocols that are both effective and efficient. They find that communication inefficiency arises primarily from poorly designed optimization"
X Link 2025-12-03T17:00Z 278.7K followers, 10K engagements

"Another banger by Google It's a new technical guide on how to deploy scale and productionize AI Agents. (bookmark it) Love the emphasis of CI/CD and the Agent2Agent protocol"
X Link 2025-11-15T13:38Z 278.8K followers, 88.9K engagements

"As usual Anthropic just published another banger. This one is on context engineering. Great section on how it is different from prompt engineering. A must-read for AI devs"
X Link 2025-09-30T19:02Z 278.9K followers, 375K engagements

"LLMs can get "Brain Rot" Continual pretraining on junk high-engagement web text causes lasting "cognitive decline" in LLMs reducing reasoning long-context and safety performance. The main failure mode is thought-skipping where models skip reasoning steps and adopt dark personality traits like narcissism and low agreeableness. Even strong mitigations such as reflection or further fine-tuning only partially reverse the damage making data curation a critical safety concern for AI training"
X Link 2025-10-17T16:07Z 278.9K followers, 292.3K engagements

"Don't sleep on Skills. Skills is easily one of the most effective ways to steer Claude Code. Impressive for optimization. I built a skill inside of Claude Code that automatically builds tests and optimizes MCP tools. It runs in a loop loading context and tools (bash scripts) efficiently to test and optimize MCP tools based on best practices implementation and outputs. Heck you could even run MCP tools within it if you like but that wasn't what I needed here. One of the most impressive aspects of using Claude Code with Skills is the efficient token usage. The context tiering system is a"
X Link 2025-10-17T17:44Z 278.9K followers, 192.5K engagements

"Anthropic just posted another banger guide. This one is on building more efficient agents to handle more tools and efficient token usage. This is a must-read for AI devs (bookmark it) It helps with three major issues in AI agent tool calling: token costs latency and tool composition. How It combines code executions with MCP where it turns MCP servers into code APIs rather than direct tool calls. Here is all you need to know: X. Token Efficiency Problem: Loading all MCP tool definitions upfront and passing intermediate results through the context window creates massive token overhead sometimes"
X Link 2025-11-05T15:53Z 278.9K followers, 181.1K engagements

"As usual Anthropic just published another banger. This one is on building agents that continue to do useful work for an arbitrarily long time. Great tips on context management. A must-read for AI devs"
X Link 2025-11-26T20:27Z 278.9K followers, 45K engagements

"Context engineering skills will be even more important in 2026. It will enable the most effective agent harnesses in domains like science and engineering. I'm particularly researching how context engineering applies to long-horizon/running problems and scientific discovery"
X Link 2025-11-29T18:14Z 278.8K followers, 13K engagements

"Banger paper from DeepSeek. Math AI models have a fundamental problem. (bookmarks this one) The issue isn't accuracy on benchmarks. It's that correct answers don't mean correct reasoning. Models can brute-force solutions numerically guess or stumble into right answers through flawed derivations. When you move from equation-grinding to real mathematics theorem proving invariants and inequalities you can't bypass the reasoning anymore. This new research introduces DeepSeekMath-V2 a 685B parameter mixture-of-experts model built for self-verifiable mathematical reasoning. The core idea: A"
X Link 2025-11-29T19:00Z 278.9K followers, 68.1K engagements

"Interesting research from Meta on hardware scaling trends. More GPUs doesn't always mean faster training. The default approach to scaling LLM training today remains throwing more hardware at the problem. More accelerators more parallelism more compute. However there's a ceiling that most teams don't see until they hit it. This new research demonstrates that scaling the total number of accelerators for large model training quickly yields diminishing returns even with optimized hardware and parallelization strategies. The researchers tested Llama-2 models (1B to 70B parameters) across X to 2048"
X Link 2025-11-30T14:23Z 278.9K followers, 213.4K engagements

"Is Vibe Coding Safe There is finally research that goes deep into this question. Here is what the research found: AI coding agents can write functional code. But functional doesn't mean safe. The rise of "vibe coding" where developers hand off tasks to AI agents with minimal oversight is accelerating. More autonomy more speed more productivity. The assumption: if it works it's good enough. But working code and secure code are not the same thing. This new research introduces SUSVIBES a benchmark of XXX real-world feature requests from open-source projects specifically tasks that previously led"
X Link 2025-12-04T14:59Z 278.9K followers, 62.3K engagements

"It's such a joy to use Opus XXX in Claude Code. It's the best coding assistant on the planet - Planning is brilliant - Creativity is remarkable - Understands intent with precision - Implements features extensively - Supercharges all agents - Next-level context understanding - Extremely efficient at context engineering - Easy to extend with skills and tools It makes very few mistakes and pays closer attention to details. Just on another level"
X Link 2025-12-05T19:09Z 278.9K followers, 22.8K engagements

"Google just published a banger guide on effective context engineering for multi-agent systems. Pay attention to this one AI devs (bookmark it) Here are my key takeaways: Context windows aren't the bottleneck. Context engineering is. For more complex and long-horizon problems context management cannot be treated as a simple "string manipulation" problem. The default approach to handling context in agent systems today remains stuffing everything into the prompt. More history more tokens more confusion. Most teams treat context as a string concatenation problem. But raw context dumps create"
X Link 2025-12-06T16:51Z 278.9K followers, 77.4K engagements

"The AI Consumer Index (ACE) Most AI benchmarks today focus on reasoning and coding. But most people use AI to shop cook and plan their weekends. In those domains LLM hallucinations continue to be a real problem. XX% of ChatGPT messages (according a recent report) are now non-work-related. Consumers are using AI for everyday tasks and we have no systematic way to measure how well models perform on them. This new research introduces ACE (AI Consumer Index) a benchmark assessing whether frontier models can perform high-value consumer tasks across shopping food gaming and DIY. Consumer tasks"
X Link 2025-12-08T14:39Z 278.9K followers, 89.1K engagements

"Looks like Mistral has entered the agentic coding arena They just released Mistral Vibe CLI an open-source command-line coding assistant powered by Devstral"
X Link 2025-12-09T17:23Z 278.9K followers, 9545 engagements

"I will be covering Skills in our upcoming Claude Code build sessions. There are so many impactful ways that non-technical or technical folks can build with Skills. The time to jump into this stuff is now:"
X Link 2025-12-09T20:51Z 278.9K followers, 5838 engagements

"The most effective AI Agents are built on these core ideas. It's what powers Claude Code. It's referred to as the Claude Agent SDK Loop which is an agent framework to build all kinds of AI agents. (bookmark it) The loop involves three steps: Gathering Context: Use subagents (parallelize them for task efficiency when possible) compact/maintain context and leverage agentic/semantic search for retrieving relevant context for the AI agent. Hybrid search approaches work really well for domains like agentic coding. Taking Action: Leverage tools prebuilt MCP servers bash/scripts (Skills have made it"
X Link 2025-11-08T14:38Z 278.9K followers, 174.5K engagements

"New research from Apple. Diffusion models dominate video generation. However the current approach has fundamental limitations like multi-step sampling no exact likelihood and training and inference objectives that don't align. This new research introduces STARFlow-V a novel normalizing flow-based causal video generator. It demonstrates that flow models can match diffusion quality while offering end-to-end training exact likelihood estimation and native multi-task support. The architecture uses a global-local two-level system. A deep autoregressive Transformer handles temporal reasoning in"
X Link 2025-12-02T14:03Z 278.9K followers, 13.3K engagements

"Agentic Context Engineering The code for the paper is finally out I had built an implementation for this (not exactly the same) that already boosted performance for my agents. Evolving context for AI agents is a great idea. Official implementation out now"
X Link 2025-12-05T16:28Z 278.9K followers, 82.5K engagements

"Next week we start our first live cohort on building with Claude Code. Excited to share how I've been using Claude Code for coding research designing searching and everything in between. You also get to build. :) A few more seats are available if you are interested"
X Link 2025-12-05T20:08Z 278.9K followers, 10K engagements

"Claude Code can now run agents asynchronously. Huge for productivity. You can run many subagents in the background to explore your codebase. Work continues uninterrupted. When subagents complete tasks they wake up/report to the main agent. Workflows feel faster already"
X Link 2025-12-10T15:19Z 278.9K followers, 81.7K engagements

@omarsar0
/creator/twitter::omarsar0