Dark | Light
# ![@omarsar0 Avatar](https://lunarcrush.com/gi/w:26/cr:twitter::3448284313.png) @omarsar0 elvis

elvis posts on X about ai, llm, agentic, $googl the most. They currently have [-------] followers and [----] posts still getting attention that total [------] engagements in the last [--] hours.

### Engagements: [------] [#](/creator/twitter::3448284313/interactions)
![Engagements Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::3448284313/c:line/m:interactions.svg)

- [--] Week [-------] +56%
- [--] Month [---------] -31%
- [--] Months [----------] -17%
- [--] Year [----------] +51%

### Mentions: [--] [#](/creator/twitter::3448284313/posts_active)
![Mentions Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::3448284313/c:line/m:posts_active.svg)

- [--] Week [--] -1.60%
- [--] Month [---] +20%
- [--] Months [---] +21%
- [--] Year [-----] +21%

### Followers: [-------] [#](/creator/twitter::3448284313/followers)
![Followers Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::3448284313/c:line/m:followers.svg)

- [--] Week [-------] +0.43%
- [--] Month [-------] +1.40%
- [--] Months [-------] +11%
- [--] Year [-------] +26%

### CreatorRank: [-------] [#](/creator/twitter::3448284313/influencer_rank)
![CreatorRank Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::3448284313/c:line/m:influencer_rank.svg)

### Social Influence

**Social category influence**
[musicians](/list/musicians)  #5190 [technology brands](/list/technology-brands)  [stocks](/list/stocks)  [social networks](/list/social-networks)  [finance](/list/finance)  [celebrities](/list/celebrities)  [travel destinations](/list/travel-destinations)  [currencies](/list/currencies)  [events](/list/events)  [vc firms](/list/vc-firms) 

**Social topic influence**
[ai](/topic/ai), [llm](/topic/llm), [agentic](/topic/agentic) #377, [$googl](/topic/$googl), [open ai](/topic/open-ai), [devs](/topic/devs), [if you](/topic/if-you), [agents](/topic/agents) #292, [claude code](/topic/claude-code) #123, [this is](/topic/this-is)

**Top assets mentioned**
[Alphabet Inc Class A (GOOGL)](/topic/$googl) [Microsoft Corp. (MSFT)](/topic/microsoft) [GrokCoin (GROKCOIN)](/topic/grok) [IBM (IBM)](/topic/ibm)
### Top Social Posts
Top posts by engagements in the last [--] hours

"12) Galactica - A large language model for science (Ross et al) A large language model for the science domain trained on a massive scientific corpus. https://arxiv.org/abs/2211.09085 https://arxiv.org/abs/2211.09085"  
[X Link](https://x.com/omarsar0/status/1607083687367303168)  2022-12-25T18:39Z 276.4K followers, 15.6K engagements


"The list is non-exhaustive. I tried to highlight trending papers for each month of the year based on trends. Feel free to share your favorite ML papers below. Happy holidays🎉 One last favor: follow me (@omarsar0) to keep track of more exciting ML papers in 2023"  
[X Link](https://x.com/omarsar0/status/1607084074534846465)  2022-12-25T18:41Z 276.2K followers, 13.6K engagements


"Batch Prompting: Efficient Inference with LLM APIs Batch prompting helps to reduce the inference token and time costs while achieving better or comparable performance. Love to find these neat little tricks on efficiency gains during inference with LLMs. https://arxiv.org/abs/2301.08721 https://arxiv.org/abs/2301.08721"  
[X Link](https://x.com/omarsar0/status/1617713910639128576)  2023-01-24T02:40Z 284.9K followers, 110.1K engagements


"DetectGPT - an approach for zero-shot machine-generated text detection. Unlike other methods that require classifiers or watermarking generated text this work uses raw log probabilities from the LLM to determine if the passage was sampled from it. https://arxiv.org/abs/2301.11305 https://arxiv.org/abs/2301.11305"  
[X Link](https://x.com/omarsar0/status/1618801731957325825)  2023-01-27T02:43Z 283.2K followers, 64.5K engagements


"Where I track ML trends: Papers with Code - for trending papers + code Twitter - for discussion of trends GitHub Trending - for trending ML projects"  
[X Link](https://x.com/omarsar0/status/1629576235369021440)  2023-02-25T20:17Z 276.4K followers, 41.2K engagements


"LLMs are getting cheaper better and more accessible. Introducing Stanford Alpaca a new 7B fine-tuned model based on Meta's LLaMA. That's the progress we need to see. https://github.com/tatsu-lab/stanford_alpaca https://github.com/tatsu-lab/stanford_alpaca"  
[X Link](https://x.com/omarsar0/status/1635378807510294532)  2023-03-13T20:34Z 283.5K followers, 103.4K engagements


"Not quite there yet with the zero-shot chain-of-thought prompting. Expected given the base model but I think this analysis could be an interesting one to look at further as this is something more emergent in larger models. GPT-4 is really good at these types of questions now"  
[X Link](https://x.com/omarsar0/status/1643057012433985538)  2023-04-04T01:04Z 283K followers, 12.6K engagements


"This is interesting RLHF models like GPT-4 can be aligned to output responses like this. But I am a bit surprised Vincuna can output these responses. Not sure how robust it is to these types of adversarial prompting. Needs further testing"  
[X Link](https://x.com/omarsar0/status/1643061146692341763)  2023-04-04T01:21Z 282.8K followers, 25.3K engagements


"The paper also includes some impressive results Check out how Gorilla compares with other notable LLMs like ChatGPT GPT-4 LLaMA and Claude in terms of accuracy as well as reducing hallucination errors. Comparison is performed using zero-shot BM25 retriever GPT-retriever and so on"  
[X Link](https://x.com/omarsar0/status/1661542492548866054)  2023-05-25T01:19Z 272.3K followers, [----] engagements


"ML Papers of the Week (2.7K) One of the best ways to keep track of AI is to read papers. If you are looking for good papers to read we've been collecting the top trending papers over the last [--] months. Check it out: https://github.com/dair-ai/ML-Papers-of-the-Week https://github.com/dair-ai/ML-Papers-of-the-Week"  
[X Link](https://x.com/omarsar0/status/1663539713255276546)  2023-05-30T13:35Z 276.4K followers, 143K engagements


"Meta AI is open-sourcing AudioCraft a multi-purpose framework for generating music and sounds and enabling compression capabilities. AudioCraft contains training and inference code for a series of models including MusicGen AudioGen and EnCodec. This is going to allow others in the community to extend these models to all sorts of use cases or research problems. This release is exciting as it simplifies building on top of the state-of-the-art in audio generation. People can now build things like sound generators and compression algorithms with the same code base. blog: library:"  
[X Link](https://x.com/omarsar0/status/1686776772799373312)  2023-08-02T16:31Z 285.4K followers, 63.9K engagements


"🎓ML Papers of The Week (August Edition) ICYMI we highlight some of the top trending ML papers every week. This is now used by 1000s of researchers and practitioners to follow and discover trending papers and AI topics. The August collection is now finished We also add quick summaries of the papers and work with our community to write explainers for outstanding papers. We use a combination of AI-powered tools analytics and human curation to build the lists of papers. Check it out here: https://github.com/dair-ai/ML-Papers-of-the-Week https://github.com/dair-ai/ML-Papers-of-the-Week"  
[X Link](https://x.com/omarsar0/status/1696163180483932379)  2023-08-28T14:09Z 276.2K followers, 57.5K engagements


"@abacaj GPT-4 response with a bit of clever prompting is interesting"  
[X Link](https://x.com/omarsar0/status/1737264855302692887)  2023-12-20T00:13Z 271.3K followers, 13.2K engagements


"ML Papers of the Week (5.3K) If you're looking for interesting and fun ML and LLM papers to read I got you covered. I've been curating all the top trending and most interesting papers since the beginning of the year. You will find a lot of gems there. https://github.com/dair-ai/ML-Papers-of-the-Week https://github.com/dair-ai/ML-Papers-of-the-Week"  
[X Link](https://x.com/omarsar0/status/1740052789953188253)  2023-12-27T16:51Z 276.5K followers, 82.6K engagements


"ML Papers of the Week (6K) If you are looking for good recent LLM papers to read you don't need to look too far. We have been tracking the most popular impressive and trending ones in the "ML Papers of the Week" repo. Our paper collection is used by thousands of students researchers and developers. We use this list to keep up to date on all the latest developments in AI and LLMs @dair_ai. We have also implemented and combined ideas from a few of these papers to power some of the LLM services we provide for companies. My recommendation for finding good papers is to narrow your search. While"  
[X Link](https://x.com/omarsar0/status/1748735510854377927)  2024-01-20T15:53Z 276.9K followers, 106.9K engagements


"Nice to see these library additions to ollama. I already heavily use experiment and build with LLMs locally. ollama is one of my favorite tools for this. These new Python and JavaScript libraries for ollama will make it even easier to do so. With this you can do things like streaming multimodal inference text completion creating custom models and even setting up your own custom client. Really impressed by this effort. Looks really straightforward to use: pip install ollama import ollama response = ollama. chat(model='llama2' messages= 'role': 'user' 'content': 'Why is the sky blue' )"  
[X Link](https://x.com/omarsar0/status/1750582038464172051)  2024-01-25T18:10Z 271.6K followers, 52.5K engagements


"LoRA+: Efficient Low Rank Adaptation of Large Models 100s of LLM papers dropped on arXiv yesterday. This one caught my attention. It proposes LoRA+ which improves performance and finetuning speed (up to 2X speed up) at the same computational cost as LoRA. Lots of theory in this paper but the key difference between LoRA and LoRA+ is how the learning rate is set. LoRA+ sets different learning rates for LoRA adapter matrices while in LoRA the learning rate is the same"  
[X Link](https://x.com/omarsar0/status/1760063230406258892)  2024-02-20T22:05Z 276.9K followers, 43K engagements


"It's here Introducing Llama [--] by Meta. 8B and 70B pretrained and instruction-tuned models are available. Details in the thread: https://llama.meta.com/llama3/ https://llama.meta.com/llama3/"  
[X Link](https://x.com/omarsar0/status/1780989879817478228)  2024-04-18T16:00Z 285.4K followers, 40.3K engagements


"Snowflake casually releases Arctic an open-source LLM (Apache [---] license.) that uses a unique Dense-MoE Hybrid transformer architecture. Arctic performs on par with Llama3 70B in enterprise metrics like coding (HumanEval+ & MBPP+) SQL (Spider) and instruction following (IFEval). The remarkable part is that it claims to use 17x less compute budget than Llama [--] 70B. The training compute is roughly under $2 million (less than 3K GPU weeks). We are witnessing the democratization of LLMs at an unprecedented rate. .@SnowflakeDB is thrilled to announce #SnowflakeArctic: A state-of-the-art large"  
[X Link](https://x.com/omarsar0/status/1783176059694821632)  2024-04-24T16:47Z 234.5K followers, 59.9K engagements


"Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations This is one of the more interesting LLM papers I read last week. It reports that LLMs struggle to acquire factual knowledge through fine-tuning. When examples with new knowledge are eventually learned they linearly increase the LLM's tendency to hallucinate. I mostly use fine-tuning to refine generations for my use cases and in some special situations and rarely for memorizing information. More thoughts on my latest LLM recap: https://youtu.be/p7xQRIHWG_Msi=Hi8xY0ROGFRCEzPI&t=1496"  
[X Link](https://x.com/omarsar0/status/1793292346978623812)  2024-05-22T14:46Z 276.5K followers, 43.2K engagements


"Introducing Genie. the most capable AI software engineering system. It achieves state-of-the-art on SWE-Bench with 30.08%. That's a 57% improvement After reviewing the short technical report here are my three key takeaways: 1) Reasoning datasets that codify human/SE reasoning processes. 2) Agentic systems with native abilities to retrieve plan write and execute 3) Self-improvement to continually improve the model to fix mistakes when they arise More of my thoughts here: The importance of the reasoning dataset cannot go unnoticed. This reminds me of this great post by @karpathy: I must admit"  
[X Link](https://x.com/omarsar0/status/1823118952362278962)  2024-08-12T22:06Z 284.6K followers, 29.1K engagements


"LLM News: AI agents continue to advance by combining ideas like self-play self-improvement self-evaluation and search. Other developments: building efficient RAG systems LMSYS rankings automating paper writing and reviewing Claude's prompt caching and distilling & pruning Llama [---] models. Here's all the latest in LLMs: https://youtu.be/x_GIh9NpVds https://youtu.be/x_GIh9NpVds"  
[X Link](https://x.com/omarsar0/status/1825618642517766554)  2024-08-19T19:39Z 270.4K followers, 26.7K engagements


"IBM devs release Bee Agent Framework an open-source framework to build deeply and serve agentic workflows at scale. Features include: - Bee agents refined for Llama [---] - sandboxed code execution - flexible memory management for optimizing token usage - handling complex agentic workflow controls and easily pausing and resuming agent states - provides traceability through MLFlow integration and event logging along with production-grade features such as caching and error handling. - API to integrate agents using an OpenAI compatible Assistants API and Python SDK - serve agents using a Chat UI"  
[X Link](https://x.com/omarsar0/status/1849909579817140691)  2024-10-25T20:23Z 269.7K followers, 110.6K engagements


"A Taxonomy of AgentOps for Enabling Observability of Foundation Model based Agents New research analyzes AgentOps platforms and tools highlighting the need for comprehensive observability and traceability features to ensure reliability in foundation model-based autonomous agent systems across their development and production lifecycle"  
[X Link](https://x.com/omarsar0/status/1857400667318702118)  2024-11-15T12:30Z 285.1K followers, 27.3K engagements


"paper: https://arxiv.org/abs/2411.04905 https://arxiv.org/abs/2411.04905"  
[X Link](https://x.com/omarsar0/status/1857515358267093223)  2024-11-15T20:05Z 277.7K followers, [----] engagements


"🦆 Docling reaches 12.3K There are now a few parsers for LLMs out there but this is one of the most popular. Supports PDF DOCX PPTX XLSX Images HTML AsciiDoc & Markdown. Other features include advanced PDF understanding integrations OCR for scanned PDFs and even a CLI. The big feature request I am seeing is support for other types of information like code and math equations. That's coming soon https://twitter.com/i/web/status/1864335669230727465 https://twitter.com/i/web/status/1864335669230727465"  
[X Link](https://x.com/omarsar0/status/1864335669230727465)  2024-12-04T15:47Z 284K followers, 17.6K engagements


"Learning how to think with Meta Chain-of-Thought Proposes Meta Chain-of-Thought (Meta-CoT) which extends traditional Chain-of-Thought (CoT) by modeling the underlying reasoning required to arrive at a particular CoT. The argument is that CoT is naive and Meta-CoT gets closer to the cognitive process required for advanced problem-solving. This is a very detailed paper (100 pages) presenting ideas and methods to achieve system [--] reasoning in LLMs. Lots of interesting discussion around scaling laws verifier roles iterative refinement and the search for novel reasoning algorithms"  
[X Link](https://x.com/omarsar0/status/1877729112569291131)  2025-01-10T14:48Z 233.2K followers, 44.1K engagements


"paper: code: https://github.com/sunnynexus/Search-o1 https://arxiv.org/abs/2501.05366 https://github.com/sunnynexus/Search-o1 https://arxiv.org/abs/2501.05366"  
[X Link](https://x.com/omarsar0/status/1877742482152436055)  2025-01-10T15:41Z 233.2K followers, [----] engagements


"Windsurf makes coding insanely fun and fast It's quickly becoming my favorite coding tool. And the new features are 🔥 - Web search - Autogenerated Memories - Code Execution Improvements Here is what's new:"  
[X Link](https://x.com/omarsar0/status/1880646342265205169)  2025-01-18T16:00Z 270.2K followers, 76.5K engagements


"Agentic RAG Overview This is a great intro to LLM agents and Agentic RAG. It provides a comprehensive exploration of Agentic RAG architectures applications and implementation strategies"  
[X Link](https://x.com/omarsar0/status/1881360794019156362)  2025-01-20T15:19Z 229.6K followers, 59.8K engagements


"Building Effective Agents Cookbook Nice repo from the Anthropic team with code examples of how to build common agent workflows. In my experience building agent workflows I highly recommend learning these and other concepts like function calling structured outputs evaluating outputs ReAct LLM-as-a-judge retrieval CoT/few-shot prompting and structuring inputs among others"  
[X Link](https://x.com/omarsar0/status/1882090602579550413)  2025-01-22T15:39Z 229.8K followers, 56.4K engagements


"AI Agents for Computer Use This report provides a comprehensive overview of the emerging field of instruction-based computer control examining available agents their taxonomy development and resources"  
[X Link](https://x.com/omarsar0/status/1885710957458149483)  2025-02-01T15:25Z 229.6K followers, 65.6K engagements


"Oumi is a fully open-source platform to help you build state-of-the-art foundation models end-to-end"  
[X Link](https://x.com/omarsar0/status/1886422397089440143)  2025-02-03T14:32Z 285.4K followers, 59.6K engagements


"s1: Simple test-time scaling Test-time scaling is an interesting problem as previous results have shown its potential to scale model performance with more compute. This new paper proposes a simple approach to achieve test-time scaling and strong reasoning performance"  
[X Link](https://x.com/omarsar0/status/1886428631041225030)  2025-02-03T14:56Z 230.3K followers, 73.4K engagements


"I guess now you can just do research More seriously using Deep Research as a researcher is such high leverage. Watch how I use it to generate a very detailed comparison between AI agentic frameworks. To get the best results you have to be very specific about what you want"  
[X Link](https://x.com/omarsar0/status/1886801646497288646)  2025-02-04T15:39Z 229.6K followers, [----] engagements


"How to Scale Your Model Google DeepMind just released an awesome book on scaling language models on TPUs. This is gold Worth checking you are an LLM developer"  
[X Link](https://x.com/omarsar0/status/1886867831289270727)  2025-02-04T20:02Z 229.7K followers, 23.6K engagements


"DeepSeek-R1: Technical Deep Dive 36-page report courtesy of Deep Research. Models like o3 still make mistakes so keep that in mind. (prompt + report in the ALT)"  
[X Link](https://x.com/omarsar0/status/1887220193874649378)  2025-02-05T19:22Z 229.3K followers, 75.1K engagements


"🎓OpenAI Deep Research Guide Just finished our live webinar on Deep Research including examples prompting tips use cases and what's missing. I am releasing the full guide I shared with our members (link in the comments)"  
[X Link](https://x.com/omarsar0/status/1887565129711014114)  2025-02-06T18:12Z 230.6K followers, 34.1K engagements


"Claude [--] when Llama [--] when Claude is still my most used AI. But with all the latest models and products usage and workflows are changing"  
[X Link](https://x.com/omarsar0/status/1887605238472867875)  2025-02-06T20:52Z 229.6K followers, [----] engagements


"Teaching AI Agents to Work Smarter Not Harder This new paper introduces MaAS (Multi-agent Architecture Search) a new framework that optimizes multi-agent systems. Instead of searching for a single optimal multi-agent system MaAS develops an "agentic supernet" - a probabilistic distribution of agent architectures that can adapt to different queries based on their difficulty and domain. MaAS can dynamically sample different multi-agent architectures tailored to each specific query. For simple arithmetic questions it might use a lightweight system while for complex coding tasks it can deploy a"  
[X Link](https://x.com/omarsar0/status/1887884027530727876)  2025-02-07T15:20Z 230.9K followers, 40.4K engagements


"ACU (Awesome Agents for Computer Use) This is a nice curated knowledge base of AI agents that can operate computers. It compiles research papers projects frameworks and tools for agents that autonomously perform tasks via clicks keystrokes command-line calls and API calls on computer and mobile devices. https://twitter.com/i/web/status/1887911960551043166 https://twitter.com/i/web/status/1887911960551043166"  
[X Link](https://x.com/omarsar0/status/1887911960551043166)  2025-02-07T17:11Z 283.5K followers, 28.8K engagements


"LLM Functions allows you to build LLM tools and agents using Bash Javascript and Python"  
[X Link](https://x.com/omarsar0/status/1888248390326075860)  2025-02-08T15:27Z 230.5K followers, 32.2K engagements


"Training LLMs to Reason Efficiently This new paper introduces an RL approach to train LLMs to allocate inference-time computation dynamically. This process helps to optimize the reasoning efficiency of the LLMs. Key insights include: Optimizing inference cost without sacrificing accuracy The method incentivizes models to use minimal computational resources while preserving accuracy reducing the excessive inference cost associated with long chain-of-thought reasoning. RL for efficiency Instead of enforcing fixed-length reasoning they apply RL policy gradient methods to balance computational"  
[X Link](https://x.com/omarsar0/status/1889328796224127428)  2025-02-11T15:01Z 231K followers, 30.4K engagements


"Hierarchical LLM Reasoning ReasonFlux is a hierarchical reasoning framework for LLMs that optimizes complex problem-solving using scaling thought templates. It outperforms state-of-the-art models in mathematical reasoning. Key contributions include: Structured Thought Template Library A curated set of 500+ high-level templates designed to generalize across complex problems. This enables efficient retrieval and structured reasoning without exhaustive search. Hierarchical Reinforcement Learning Instead of training on long Chain-of-Thought (CoT) sequences ReasonFlux optimizes trajectories of"  
[X Link](https://x.com/omarsar0/status/1889343676272525600)  2025-02-11T16:00Z 231.1K followers, 25.4K engagements


"Large Memory Models for Long-Context Reasoning This paper focuses on improving long-context reasoning with explicit memory mechanisms. It presents LM2 a Transformer-based architecture equipped with a dedicated memory module to enhance long-context reasoning multi-hop inference and numerical reasoning. LM2 outperforms Recurrent Memory Transformer (RMT) by 37.1% and a non-memory baseline (Llama-3.2) by 86.3% on memory-intensive benchmarks. Key components: Memory-Augmented Transformer LM2 integrates a memory module that acts as an explicit long-term storage system interacting with input tokens"  
[X Link](https://x.com/omarsar0/status/1889681118913577345)  2025-02-12T14:21Z 231.3K followers, 33.9K engagements


""We detect global cache sharing across users in seven API providers including OpenAI resulting in potential privacy leakage about users prompts." Concerning if true"  
[X Link](https://x.com/omarsar0/status/1889685386856673463)  2025-02-12T14:37Z 231K followers, 20.9K engagements


"Excited to launch our new course on prompt engineering for devs Every developer and company I work with often struggles to prompt LLMs properly. We've built this course to provide the best and most up-to-date info and best practices on designing and optimizing prompts. This should help devs how to effectively apply prompting techniques to their LLM applications and agentic workflows. We are also going to include several new guides on how to effectively prompt models like DeepSeek-R1 open-source models and newer reasoning models. Our Prompt Engineering Guide is used by over 6M+ people to learn"  
[X Link](https://x.com/omarsar0/status/1890034651135176822)  2025-02-13T13:45Z 230.5K followers, [----] engagements


"The Hundred-Page Language Models Book Finally got a chance to look at @burkov's book on language models. This book provides an accessible technical exploration of language models practical examples and the core maths behind them. Highly recommend it"  
[X Link](https://x.com/omarsar0/status/1890469968082280517)  2025-02-14T18:35Z 231.5K followers, 56.7K engagements


"Perplexity just announced Deep Research (PDR) I'm now testing and comparing it with OpenAI's Deep Research (ODR). I still think the o3 variant powering ODR is a massive advantage. 20.5% (PDR) vs. 26.6% (ODR) on Humanity's Last Exam"  
[X Link](https://x.com/omarsar0/status/1890525249977872640)  2025-02-14T22:15Z 231.4K followers, [----] engagements


"1 Leader After evaluating [--] leading LLMs across [--] diverse datasets here are the key findings: Google's -.- leads with a [----] score at a remarkably low cost"  
[X Link](https://x.com/omarsar0/status/1890789260627751041)  2025-02-15T15:44Z 230.5K followers, [----] engagements


"2 Pricing The top [--] models span a 10x price difference with only 4% performance gap. Many of you might be overpaying"  
[X Link](https://x.com/omarsar0/status/1890789262066348450)  2025-02-15T15:44Z 230.5K followers, [----] engagements


"3 Open-source Mistral AI's mistral-small-2501 leads open-source options matching GPT-4o-mini at [----]. Smaller models tuned for tool calling have a lot of potential"  
[X Link](https://x.com/omarsar0/status/1890789263320424624)  2025-02-15T15:44Z 230.5K followers, [----] engagements


"4 Reasoning models Whilereasoning models like o1ando3-minidemonstrated excellent integration with function calling capabilities DeepSeek-R1 didn't make the rankings as it doesn't support native function calling (yet)"  
[X Link](https://x.com/omarsar0/status/1890789264679379152)  2025-02-15T15:44Z 230.5K followers, [----] engagements


"@adonis_singh well they are unifying things so whats the point of that now o3 mini is really good btw"  
[X Link](https://x.com/omarsar0/status/1891563566907445438)  2025-02-17T19:01Z 230.6K followers, [---] engagements


"grok [--] will be fire if it comes with its deep research too"  
[X Link](https://x.com/omarsar0/status/1891688247174001055)  2025-02-18T03:16Z 230.9K followers, 17.3K engagements


"Grok [--] coding example: Thinking traces as generated as the model tries to solve the problem. Elon confirmed that the thinking steps have been obscured to avoid getting copied"  
[X Link](https://x.com/omarsar0/status/1891708550407176517)  2025-02-18T04:37Z 231.3K followers, 73.6K engagements


"Grok [--] also excels at creative coding like generating creative and novel games. Elon emphasized Grok 3's creative emergent capabilities. You can also use the Big Brain mode to use more compute and reasoning with Grok 3"  
[X Link](https://x.com/omarsar0/status/1891709371802910967)  2025-02-18T04:40Z 231.6K followers, 69.5K engagements


"Grok [--] Reasoning performance: The results correspond to the beta version of Grok-3 Reasoning. It outperforms o1 and DeepSeek-R1 when given more test-time compute (allowing it to think longer). The Grok [--] mini reasoning model is also very capable"  
[X Link](https://x.com/omarsar0/status/1891710283401347367)  2025-02-18T04:44Z 231.5K followers, 63.9K engagements


"Grok [--] Reasoning Beta performance on AIME [----]. Grok [--] shows generalization capabilities. It not only does coding and math problem-solving but it can also do other creative and useful real-world tasks"  
[X Link](https://x.com/omarsar0/status/1891711110476111884)  2025-02-18T04:47Z 231.1K followers, 58.7K engagements


"DeepSearch also exposes the steps that it takes to conduct the search itself"  
[X Link](https://x.com/omarsar0/status/1891714589185765659)  2025-02-18T05:01Z 231.1K followers, 46.4K engagements


"Grok [--] on X Premium+"  
[X Link](https://x.com/omarsar0/status/1891715441292083572)  2025-02-18T05:04Z 231.1K followers, 45.8K engagements


"SuperGrok dedicated app is also available with a polished experience. Try on the web as well: The web will include the latest Grok features. http://grok.com http://grok.com"  
[X Link](https://x.com/omarsar0/status/1891715443770946026)  2025-02-18T05:04Z 231.4K followers, 46.3K engagements


"Improvements will happen rapidly and almost daily according to the team. There is also a Grok-powered voice app coming too -- about a week away"  
[X Link](https://x.com/omarsar0/status/1891715813956108699)  2025-02-18T05:06Z 231.2K followers, 42.6K engagements


"@xai Thread with all the details that were announced: https://x.com/omarsar0/status/1891705029083512934 BREAKING: xAI announces Grok [--] Here is everything you need to know: https://t.co/EZKnjq57qh https://x.com/omarsar0/status/1891705029083512934 BREAKING: xAI announces Grok [--] Here is everything you need to know: https://t.co/EZKnjq57qh"  
[X Link](https://x.com/omarsar0/status/1891716013609136626)  2025-02-18T05:06Z 230.8K followers, [----] engagements


"More results: AI co-scientist outperforms other SoTA agentic and reasoning models for complex problems generated by domain experts. Just look at how performance increases with more time spent on reasoning surpassing unassisted human experts"  
[X Link](https://x.com/omarsar0/status/1892223531485724714)  2025-02-19T14:43Z 231K followers, [----] engagements


"How about novelty Experts assessed the AI co-scientist to have a higher potential for novelty and impact. It was even preferred over other models like OpenAI o1"  
[X Link](https://x.com/omarsar0/status/1892223534207820023)  2025-02-19T14:43Z 231K followers, [----] engagements


"Introduction to CUDA Programming for Python Developers (link in comments)"  
[X Link](https://x.com/omarsar0/status/1892938951108751757)  2025-02-21T14:06Z 277.4K followers, 103.6K engagements


"@alan__xiao i use it mostly for getting either gpt-4o or claude (both have this feature) to produce text that's closer to the style i prefer. i find that it's good for instance to get it to produce cleaner summaries of technical docs for personal consumption"  
[X Link](https://x.com/omarsar0/status/1893056869528150306)  2025-02-21T21:55Z 231.2K followers, [--] engagements


"@kristallpirat yeah thats not ideal. I do get issues like that when there is too much stuff in the rules. Organizing and structuring with like xml tags has helped improve my windsurf setup"  
[X Link](https://x.com/omarsar0/status/1893442284152095191)  2025-02-22T23:26Z 231.3K followers, [--] engagements


"🤯 We also get the GitHub integration They heard us loud LFG"  
[X Link](https://x.com/omarsar0/status/1894145008556519602)  2025-02-24T21:58Z 232K followers, 19.9K engagements


"this is next-level vibe coding 🤯 watch me use windsurf's new preview feature to make quick ui improvements all within my coding environment this feature is insane agentic capabilities + long-context awareness in full display"  
[X Link](https://x.com/omarsar0/status/1897390212323643857)  2025-03-05T20:54Z 271.9K followers, 68K engagements


"Claude [---] Sonnet has serious competition Gemini [---] Pro is a legit good model for code. - code quality is really good - 1M token context - native multimodality - long code generation - understand large codebases I used it with Windsurf to generate an AI search agent app:"  
[X Link](https://x.com/omarsar0/status/1906404825509560408)  2025-03-30T17:55Z 277.2K followers, 212.9K engagements


"Hard to deny how much better Gemini [---] Pro is at long context understanding compared to other models. It's strong in many areas but Google has been working on long context LLMs since the early days. Strong performance on benchmarks and our own internal tests. They lead"  
[X Link](https://x.com/omarsar0/status/1912141918080737648)  2025-04-15T13:52Z 284.7K followers, 64.3K engagements


"Agent Zero A personal agentic framework that dynamically grows and learns with you. - It uses the OS as a tool. - Has search and terminal execution too. - It has persistent memory to memorize key information to solve future tasks more reliably. - Multi-agent support"  
[X Link](https://x.com/omarsar0/status/1929219531374813672)  2025-06-01T16:52Z 271.3K followers, 81.5K engagements


"MemAgent MemAgent-14B is trained on 32K-length documents with an 8K context window. Achieves 76% accuracy even at 3.5M tokens That consistency is crazy Here are my notes:"  
[X Link](https://x.com/omarsar0/status/1942667308368871457)  2025-07-08T19:29Z 270.8K followers, 102K engagements


"Overview Introduces an RLdriven memory agent that enables transformer-based LLMs to handle documents up to [---] million tokens with near lossless performance linear complexity and no architectural modifications"  
[X Link](https://x.com/omarsar0/status/1942667311401558372)  2025-07-08T19:29Z 270.7K followers, [----] engagements


"any good open-source agentic browsers it's important that this exists"  
[X Link](https://x.com/omarsar0/status/1942977931040739458)  2025-07-09T16:03Z 285.1K followers, 20.9K engagements


"comment below on what you are building with AI agents great chance to bring some attention to your projects"  
[X Link](https://x.com/omarsar0/status/1943712210573725925)  2025-07-11T16:41Z 269.7K followers, 10.6K engagements


"This handbook is so good It covers *everything* you need to know about LLM inference. FREE to access:"  
[X Link](https://x.com/omarsar0/status/1943727674637033601)  2025-07-11T17:42Z 274.6K followers, 87.2K engagements


"Each scenario includes up to eight interdependent goals user personas ambiguous or missing tools and dynamic user intent mimicking real-world complexity"  
[X Link](https://x.com/omarsar0/status/1945956451697988032)  2025-07-17T21:19Z 280.8K followers, [----] engagements


"Results: GPT-4.1 leads overall with 62% AC while Gemini-2.5-flash tops TSQ at 94% but lags on AC (38%). GPT-4.1-mini offers strong cost-efficiency ($0.014/session vs. GPT-4.1s $0.068). Open-source Kimi K2 is the top performer among non-closed models (53% AC 90% TSQ). Reasoning-enabled models generally underperform non-reasoning counterparts in action completion. Evaluation relies on a simulation loop involving synthetic users and tools measuring whether agents can meet real user demands while selecting and executing the right toolchains"  
[X Link](https://x.com/omarsar0/status/1945956454806254048)  2025-07-17T21:19Z 280.8K followers, [----] engagements


"GLM-4.5 looks like a big deal MoE Architecture Hybrid reasoning models 355B total (32B active) GQA + partial RoPE Multi-Token Prediction Muon Optimizer + QK-Norm 22T-token training corpus Slime RL Infrastructure Native tool use Here's all you need to know:"  
[X Link](https://x.com/omarsar0/status/1949884927316762713)  2025-07-28T17:29Z 285.7K followers, 60.3K engagements


"Model Architecture & Pre-Training GLM-4.5 is 355B total parameters (32B active); deeper model with narrower width; optimized for reasoning via more layers and [--] attention heads. GLM-4.5-Air is 106B (12B active). 22T-token training corpus that combines 15T general data with 7T code/reasoning-focused data. Grouped-Query Attention + partial RoPE to enhance long-context efficiency and accuracy in reasoning tasks"  
[X Link](https://x.com/omarsar0/status/1949884942512771543)  2025-07-28T17:29Z 269.6K followers, [----] engagements


"You simply cannot ignore evals AI agents are now selling insurance handling legal docs and assisting with finance. But without real-time evaluation theyre one hallucination away from disaster. Meet Luna-2 the small model built to guardrail agents before they screw up"  
[X Link](https://x.com/omarsar0/status/1954942654007091636)  2025-08-11T16:27Z 274.2K followers, 17.4K engagements


"The GLM-4.5 technical report is out Sharing some key details in case you missed it:"  
[X Link](https://x.com/omarsar0/status/1955007573419233461)  2025-08-11T20:45Z 278.8K followers, 32.6K engagements


"Overview GLM-4.5 is an open MoE LLM (355B total / 32B activated) with a smaller 106B Air variant (12B activated). It trains on 23T tokens supports hybrid reasoning (thinking + direct modes) and posts strong ARC results: TAU-Bench [----] AIME24 [----] SWE-bench Verified 64.2"  
[X Link](https://x.com/omarsar0/status/1955007588808135100)  2025-08-11T20:45Z 278.8K followers, [----] engagements


"Go to define persona & capabilities select tools map sub-agents test scenarios deploy scale. Their architecture uses design patterns that demonstrate 3x better task focus than generic models. It gives you a competitive advantage that is unique to you. http://emergent.sh http://emergent.sh http://emergent.sh http://emergent.sh"  
[X Link](https://x.com/omarsar0/status/1955618521192464864)  2025-08-13T13:12Z 275.1K followers, [---] engagements


"Their prompt-to-production pipeline bypasses traditional mobile development entirely. Natural language input gets parsed through their semantic layer and compiled into native iOS/Android apps. Here is a personalized news app with minimal clean UI sourcing global news in a categorised manner"  
[X Link](https://x.com/omarsar0/status/1955618627446833657)  2025-08-13T13:13Z 275.2K followers, [---] engagements


"Don't sleep on small models Anemoi is the latest multi-agent system that proves small models pack a punch when combined effectively. GPT-4.1-mini (for planning) and GPT-4o (for worker agents) surpass the strongest open-source baseline on GAIA. A must-read for devs:"  
[X Link](https://x.com/omarsar0/status/1960799241888260513)  2025-08-27T20:19Z 282.4K followers, 94K engagements


"I'm surprised Agentic RAG is not getting more attention. That's all about to change. Here's why:"  
[X Link](https://x.com/omarsar0/status/1965115682322042954)  2025-09-08T18:11Z 269.9K followers, 88.7K engagements


"This is one of the most promising directions to improve RAG systems. It involves combining dynamic retrieval with structured knowledge. It helps to mitigate hallucinations and outdated information and improves knowledge quality. Pay attention to this one AI devs"  
[X Link](https://x.com/omarsar0/status/1967963949158240485)  2025-09-16T14:49Z 270.3K followers, 66K engagements


"If you are looking to get started with Codex you will find this little OpenAI guide useful. (bookmark it)"  
[X Link](https://x.com/omarsar0/status/1968438129846567163)  2025-09-17T22:13Z 274.1K followers, 171.4K engagements


"MCP is extremely underrated It's crazy to me that most devs and teams are not taking advantage of MCP. I have been using MCP to make my prompts memory and context portable. I believe this is the ultimate application of MCP. Context is king; you really want complete control of it. MCP is your advantage. You are not tied to one tool. Your context goes where you go. Whether it's ChatGPT Claude Code Windsurf or Codex. I now worry less about switching model providers. It's really liberating. It has built my confidence in the use of tools knowing that I can switch off as I please. Plus LLMs"  
[X Link](https://x.com/omarsar0/status/1969824708372893865)  2025-09-21T18:03Z 270.5K followers, 95.9K engagements


"Very cool work from Meta Superintelligence Lab. They are open-sourcing Meta Agents Research Environments (ARE) the platform they use to create and scale agent environments. Great resource to stress-test agents in environments closer to real apps. Read on for more:"  
[X Link](https://x.com/omarsar0/status/1970147840245879116)  2025-09-22T15:27Z 271.4K followers, 152K engagements


"It doesn't matter what tools you use for AI Agents. I've put together the ultimate curriculum to learn how to build AI agents. (bookmark it) From context engineering to evaluating optimizing and shipping agentic applications"  
[X Link](https://x.com/omarsar0/status/1973053220462244141)  2025-09-30T15:51Z 273.5K followers, 93.5K engagements


"How do you apply effective context engineering for AI agents Read this if you are an AI dev building AI agents today. Context is king And it must be engineered not just prompted. I wrote a few notes after reading through the awesome new context engineering guide from Anthropic: Context Engineering vs. Prompt Engineering - Prompt Engineering = writing and organizing instructions - Context Engineering = curating and maintaining prompts tools history and external data - Context Engineering is iterative and context is curated regularly Why Context Engineering Matters - Finite attention budget -"  
[X Link](https://x.com/omarsar0/status/1973848472576274562)  2025-10-02T20:32Z 277.4K followers, 50.5K engagements


"Cool research paper from Google. This is what clever context engineering looks like. It proposes Tool-Use-Mixture (TUMIX) leveraging diverse tool-use strategies to improve reasoning. This work shows how to get better reasoning from LLMs by running a bunch of diverse agents (text-only code search etc.) in parallel and letting them share notes across a few rounds. Instead of brute-forcing more samples it mixes strategies stops when confident and ends up both more accurate and cheaper. Mix different agents not just more of one: They ran [--] different agent styles (CoT code execution web search"  
[X Link](https://x.com/omarsar0/status/1974106927287447725)  2025-10-03T13:39Z 269.9K followers, 82.4K engagements


"Very excited about OpenAIs new AgentKit. Visual agent builders are a game changer for iterating on and shipping agents"  
[X Link](https://x.com/omarsar0/status/1975265351731716463)  2025-10-06T18:22Z 285.3K followers, 47.2K engagements


"2025 is the year of AI agents. But they need a lot more work. More work is needed on architecture design optimization context engineering environments observability reliability evaluations scaling and more"  
[X Link](https://x.com/omarsar0/status/1975916120935948623)  2025-10-08T13:28Z 270.1K followers, 13.4K engagements


"Agentic Context Engineering Great paper on agentic context engineering. The recipe: Treat your system prompts and agent memory as a living playbook. Log trajectories reflect to extract actionable bullets (strategies tool schemas failure modes) then merge as append-only deltas with periodic semantic de-dupe. Use execution signals and unit tests as supervision. Start offline to warm up a seed playbook then continue online to self-improve. On AppWorld ACE consistently beats strong baselines in both offline and online adaptation. Example: ReAct+ACE (offline) lifts average score to 59.4% vs"  
[X Link](https://x.com/omarsar0/status/1976746822204113072)  2025-10-10T20:29Z 276.9K followers, 84.6K engagements


"Great recap of security risks associated with LLM-based agents. The literature keeps growing but these are key papers worth reading. Analysis of 150+ papers finds that there is a shift from monolithic to planner-executor and multi-agent architectures. Multi-agent security is a widely underexplored space for devs. Issues range from LLM-to-LLM prompt infection spoofing trust delegation and collusion"  
[X Link](https://x.com/omarsar0/status/1977073477309043023)  2025-10-11T18:07Z 270.6K followers, 35.1K engagements


"Memory is key to effective AI agents but it's hard to get right. Google presents memory-aware test-time scaling for improving self-evolving agents. It outperforms other memory mechanisms by leveraging structured and adaptable memory. Technical highlights:"  
[X Link](https://x.com/omarsar0/status/1977404165916930181)  2025-10-12T16:01Z 269.9K followers, 27.4K engagements


"Is your LLM-based multi-agent system actually coordinating Thats the question behind this paper. They use information theory to tell the difference between a pile of chatbots and a true collective intelligence. They introduce a clean measurement loop. First test if the groups overall output predicts future outcomes better than any single agent. If yes there is synergy information that only exists at the collective level. Next decompose that information using partial information decomposition. This splits whats shared unique or synergistic between agents. Real emergence shows up as synergy not"  
[X Link](https://x.com/omarsar0/status/1977784668323008641)  2025-10-13T17:13Z 269.9K followers, 24.3K engagements


"Why does RL work for enhancing agentic reasoning This paper studies what actually works when using RL to improve tool-using LLM agents across three axes: data algorithm and reasoning mode. Instead of chasing bigger models or fancy algorithms the authors find that real diverse data and a few smart RL tweaks make the biggest difference -- even for small models. My [--] key takeaways from the paper:"  
[X Link](https://x.com/omarsar0/status/1978112328974692692)  2025-10-14T14:55Z 270.3K followers, 32.5K engagements


"4. Dont kill the entropy. Too little exploration and the model stops learning; too much and it becomes unstable. Finding just the right clip range depends on model size; small models need more room to explore"  
[X Link](https://x.com/omarsar0/status/1978112377506996626)  2025-10-14T14:55Z 269.7K followers, [----] engagements


"5. Slow thoughtful agents win. Agents that plan before acting (fewer but smarter tool calls) outperform reactive ones that constantly rush to use tools. The best ones pause think internally then act once with high precision"  
[X Link](https://x.com/omarsar0/status/1978112389209026920)  2025-10-14T14:55Z 269.7K followers, [----] engagements


"Most agents today are shallow. They easily break down on long multi-step problems (e.g. deep research or agentic coding). Thats changing fast Were entering the era of "Deep Agents" systems that strategically plan remember and delegate intelligently for solving very complex problems. We at @dair_ai and other folks from LangChain Claude Code as well as more recently individuals like Philipp Schmid have been documenting this idea. Heres roughly the core idea behind Deep Agents (based on my own thoughts and notes that I've gathered from others): // Planning // Instead of reasoning ad-hoc inside a"  
[X Link](https://x.com/omarsar0/status/1978175740832284782)  2025-10-14T19:07Z 270.4K followers, 42.7K engagements


"Dr.LLM: Dynamic Layer Routing in LLMs Neat technique to reduce computation in LLMs while improving accuracy. Routers increase accuracy while reducing layers by roughly [--] to [--] per query. My notes below:"  
[X Link](https://x.com/omarsar0/status/1978829550709866766)  2025-10-16T14:25Z 270.6K followers, 22.7K engagements


"Banger paper from Meta and collaborators. This paper is one of the best deep dives yet on how reinforcement learning (RL) actually scales for LLMs. The team ran over [------] GPU hours of experiments to find a predictable scaling pattern and a stable recipe (ScaleRL) that consistently works as you scale up compute. Think of it as a practical guide for anyone trying to train reasoning or alignment models with RL. More on why this is a big deal:"  
[X Link](https://x.com/omarsar0/status/1978865039529689257)  2025-10-16T16:46Z 278.6K followers, 35.3K engagements


"2. The ScaleRL recipe that just works. The authors tested dozens of RL variations and found one that scales cleanly to 100k GPU hours without blowing up: - PipelineRL (8 pipelines) with CISPO loss (a stabilized REINFORCE variant). - Prompt-level averaging and batch-level normalization to reduce variance. - FP32 logits for better stability and higher final accuracy. - No-Positive-Resampling curriculum to avoid reward hacking. - Forced interruptions (stopping long thoughts) instead of punishing long completions. - This combo called ScaleRL hit the best trade-off between stability sample"  
[X Link](https://x.com/omarsar0/status/1978865070303232267)  2025-10-16T16:46Z 285.9K followers, [----] engagements


"3. What actually matters for better RL results. Not every trick helps equally: - Loss choice and precision matter most; CISPO + FP32 logits boosted final pass rates from 52% to 61%. - Normalization aggregation and curriculum mainly affect how fast you improve (efficiency) not how far you can go. - Fancy variants like GRPO DAPO or Magistral didnt beat ScaleRL once scaled properly. https://twitter.com/i/web/status/1978865085939654692 https://twitter.com/i/web/status/1978865085939654692"  
[X Link](https://x.com/omarsar0/status/1978865085939654692)  2025-10-16T16:46Z 285.9K followers, [----] engagements


"I am not going to lie. I see a lot of potential in the Skills feature that Anthropic just dropped Just tested with Claude Code. It leads to sharper and precise outputs. It's structured context engineering to power CC with specialized capabilities leveraging the filesystem"  
[X Link](https://x.com/omarsar0/status/1978919087137804567)  2025-10-16T20:20Z 270.9K followers, 64.1K engagements


"An easy way to try Skills in Claude Code is by asking it to help you build one. I am surprised by how aware it is of Skills and how to build comprehensive ones"  
[X Link](https://x.com/omarsar0/status/1978923010854646142)  2025-10-16T20:36Z 269.8K followers, [----] engagements


"This is also neat To help deal with context rot or context collapse Skills uses a neat tiered system (3 levels) to help Claude Code load context efficiently and only when it needs it. Don't sleep on agentic search"  
[X Link](https://x.com/omarsar0/status/1978925302018347057)  2025-10-16T20:45Z 269.8K followers, [----] engagements


"Another interesting development on LLM efficiency. How do you make diffusion LLMs faster without breaking them This is the problem tackled in Elastic-Cache a new approach to decoding efficiency for diffusion-based LLMs. It shows that attention really is all you need if you know when to stop paying too much of it. The idea is surprisingly simple. Instead of recomputing every Key/Value (KV) pair at every denoising step Elastic-Cache learns to reuse what hasnt changed and only update where attention truly drifts. Heres roughly how it works: [--] Track attention drift. At each step the model checks"  
[X Link](https://x.com/omarsar0/status/1979180865520570615)  2025-10-17T13:41Z 283K followers, 21.9K engagements


"LLMs can get "Brain Rot" Continual pretraining on junk high-engagement web text causes lasting "cognitive decline" in LLMs reducing reasoning long-context and safety performance. The main failure mode is thought-skipping where models skip reasoning steps and adopt dark personality traits like narcissism and low agreeableness. Even strong mitigations such as reflection or further fine-tuning only partially reverse the damage making data curation a critical safety concern for AI training"  
[X Link](https://x.com/omarsar0/status/1979217719082774873)  2025-10-17T16:07Z 285.2K followers, 293.3K engagements


"Don't sleep on Skills. Skills is easily one of the most effective ways to steer Claude Code. Impressive for optimization. I built a skill inside of Claude Code that automatically builds tests and optimizes MCP tools. It runs in a loop loading context and tools (bash scripts) efficiently to test and optimize MCP tools based on best practices implementation and outputs. Heck you could even run MCP tools within it if you like but that wasn't what I needed here. One of the most impressive aspects of using Claude Code with Skills is the efficient token usage. The context tiering system is a"  
[X Link](https://x.com/omarsar0/status/1979242073372164306)  2025-10-17T17:44Z 282.5K followers, 193.8K engagements


"At a high level Skills is great for context engineering and steering Claude Code. But I strongly agree on the continual learning stuff. Skills are a bigger deal than I initially thought. I wrote yesterday about how great it is at optimizing tools (with an MCP tuning example) but this ability can generalize to self-improving/evolving agents through adaptive skills. A bit like how humans actually learn and upskill over time. As an example I am building a high-level skill that enables Claude Code to actively monitor my interactions with it and my feedback and to document those where needed (and"  
[X Link](https://x.com/omarsar0/status/1979547649519899019)  2025-10-18T13:58Z 272.1K followers, 98.2K engagements


"So I wrote down a better-formatted version of my post on Deep Agents. I added it to the AI Agents section of the Prompt Engineering Guide. If you are building with AI Agents this is a must-read. I also added links to other useful references. promptingguide .ai/agents"  
[X Link](https://x.com/omarsar0/status/1979570754317352968)  2025-10-18T15:30Z 270.9K followers, 19K engagements


"@nikitabier You say something about finding the right signals. Great I get that. And hope the UX improves things. But can you explain why there is literally no post with actual links on my timeline. It feels like some deserve to. Whats up with that. Might be a separate issue"  
[X Link](https://x.com/omarsar0/status/1980049904467652724)  2025-10-19T23:14Z 269.8K followers, [----] engagements


"I tried Codex on ChatGPT today. Claude Code is just irreplaceable to me at this point. And with this new Skills feature the edge it gives is just too good to pass on. I am sure Codex will get better. Will keep trying future iterations. Whats your experience"  
[X Link](https://x.com/omarsar0/status/1980052095639126457)  2025-10-19T23:23Z 270.8K followers, 104.5K engagements


"This is huge I sensed that Claude Code on web was coming. Coding agents on web serve specific use cases. There is a wide opening to improve coding experience on web. If you tried Codex (web) you know this well. If it's anything like Claude Code on the terminal its game on. Introducing Claude Code on the web. You can now delegate coding tasks to Claude without opening your terminal. https://t.co/Hw8KkKiFGj Introducing Claude Code on the web. You can now delegate coding tasks to Claude without opening your terminal. https://t.co/Hw8KkKiFGj"  
[X Link](https://x.com/omarsar0/status/1980403571935130073)  2025-10-20T22:39Z 270.8K followers, 23.6K engagements


"Claude Code on mobile is exciting. I am often on the road a lot so this is extremely helpful. I normally used GitHub Actions but this should feel more native"  
[X Link](https://x.com/omarsar0/status/1980406365781991642)  2025-10-20T22:50Z 270K followers, [----] engagements


"Safe agentic code execution matters. The sandboxing feature in Claude Code is also neat and well thoughout"  
[X Link](https://x.com/omarsar0/status/1980407566158196747)  2025-10-20T22:55Z 270.3K followers, [----] engagements


"People are sleeping on Deep Agents. Start using them now. This is a fun paper showcasing how to put together advanced deep agents for enterprise use cases. Uses the best techniques: task decomposition planning specialized subagents MCP for NL2SQL file analysis and more"  
[X Link](https://x.com/omarsar0/status/1980629163976675779)  2025-10-21T13:36Z 272.7K followers, 60.6K engagements


"BREAKING: OpenAI launches ChatGPT Atlas. A new AI-powered browser built around ChatGPT. Chat goes with you anywhere on the web"  
[X Link](https://x.com/omarsar0/status/1980684241202520238)  2025-10-21T17:14Z 270.7K followers, 15.9K engagements


"It's OpenAI's attempt at combining the AI chat experience with the browser. ChatGPT is the "beating heart of Atlas" Built to be fast and flexible"  
[X Link](https://x.com/omarsar0/status/1980684242980884483)  2025-10-21T17:14Z 271.7K followers, [----] engagements


"Three core features of Atlas: - Chat goes with you anywhere on the web - Browser memory to personalize the experience across the web - Agents can take actions for you"  
[X Link](https://x.com/omarsar0/status/1980684244818301158)  2025-10-21T17:14Z 273.5K followers, [----] engagements


"You can invite ChatGPT to any tab you have open. It can see the webpage and answer questions"  
[X Link](https://x.com/omarsar0/status/1980685471987507241)  2025-10-21T17:19Z 270.1K followers, [----] engagements


"You can do web searches and click through links to preview pages. And you can continue interacting with webpages seamlessly"  
[X Link](https://x.com/omarsar0/status/1980685725826691098)  2025-10-21T17:20Z 270.1K followers, [----] engagements


"🎓Stanford CME295 Transformers & LLMs Nice to see the new release of this new course on Transformers and LLMs. Great way to catch up on the world of LLMs and AI Agents. Includes topics like the basics of attention mixture-of-experts to agents. Excited to see more on evals. First lessons available now. https://cme295.stanford.edu/syllabus/ https://cme295.stanford.edu/syllabus/"  
[X Link](https://x.com/omarsar0/status/1981030346037612847)  2025-10-22T16:10Z 272.3K followers, 46.2K engagements


"Lookahead Routing for LLMs Proposes Lookahead a routing framework to enable more informed routing without full inference. Achieves an average performance gain of 7.7% over the state-of-the-art. Here is why it works: Lookahead is a new framework for routing in multi-LLM systems deciding which model should handle each query. Key idea: Instead of routing based only on the input query Lookahead predicts latent representations of potential responses giving it a peek into what each model would say without fully generating text. Smarter decisions: This response-aware prediction makes routing more"  
[X Link](https://x.com/omarsar0/status/1981360482813710384)  2025-10-23T14:02Z 271.5K followers, 23.8K engagements


"Builds an OS agent for long-horizon tasks. Uses step-wise RL and self-evolve training to enable the agent to carry out long-horizon interactions. Instead of one big model doing everything it uses multiple cooperating agents. One for memory and context one for breaking down complex tasks and another to check and fix mistakes along the way. This is another great example of deep agents to personalize user-agent interactions"  
[X Link](https://x.com/omarsar0/status/1981364255564935339)  2025-10-23T14:17Z 270.9K followers, 15.3K engagements


"Not sure what's up with ChatGPT lately. Projects don't work as they used to. I have moved all my Projects to Claude Code Skills. Night and day Skills give you so much personalization. They can easily be adapted too The Claude team really cooked with this one"  
[X Link](https://x.com/omarsar0/status/1981373662524821996)  2025-10-23T14:54Z 271.1K followers, 14.8K engagements


"Fundamentals of Building Autonomous LLM Agents Great overview of LLM-based agents. Great if you are just getting started with AI agents. This covers the basics good"  
[X Link](https://x.com/omarsar0/status/1981793327956865504)  2025-10-24T18:42Z 279.3K followers, 63.1K engagements


"On Skills vs. Subagents Been using Skills extensively for the past couple of days. But like many other Claude Code users I've been thinking about the difference between subagents and Skills. Subagents are useful for handing off subtasks (i.e. separation of concerns) from the main task (e.g. deep research with a planner). And skills are about loading context efficiently with a neat tiered system. Subagents can also leverage Skills. And I sometimes also use custom commands to orchestrate more complex scenarios. They are all useful features to have. Who knows This will probably change again or"  
[X Link](https://x.com/omarsar0/status/1981798842866557281)  2025-10-24T19:03Z 271.9K followers, 57.7K engagements


"Curious how Kimi CLI will stack up against Claude Code and the like. I think there is a lot of room for innovation with CLI agents"  
[X Link](https://x.com/omarsar0/status/1982463725589168152)  2025-10-26T15:05Z 277.5K followers, 13.8K engagements


"Project page: https://github.com/MoonshotAI/kimi-cli https://github.com/MoonshotAI/kimi-cli"  
[X Link](https://x.com/omarsar0/status/1982463822158795238)  2025-10-26T15:06Z 277.5K followers, [----] engagements


"Damn it Now its hard to think of building AI agents without access to the filesystem and agentic search. Claude Code has spoiled me. But my agents are so darn better with bash glob grep files skills and the like. Memory context engineering evals etc all better"  
[X Link](https://x.com/omarsar0/status/1983175377393479759)  2025-10-28T14:13Z 274.3K followers, 19K engagements


"Microsoft's open-source game is on 🔥 They release Agent Lightning to help optimize multi-agent systems. It works with popular agent frameworks. Barely any code change needed. Integrates algos like RL Automatic Prompt Optimization Supervised Fine-tuning and more"  
[X Link](https://x.com/omarsar0/status/1983180523473354906)  2025-10-28T14:34Z 272.4K followers, 46.1K engagements


"@claudeusmaximus i wouldn't throw away vector dbs just yet -- there is a world where both and even emerging search tools exist"  
[X Link](https://x.com/omarsar0/status/1983186700810760214)  2025-10-28T14:58Z 270.8K followers, [---] engagements


"Memory for AI agents is a bigger deal than it looks. Huge congrats to the Mem0 team on their Series A. I've been building with mem0 for the past couple of months. It's clear this team is building some impressive stuff. Check them out if you are an AI dev building agents. Memory is what makes us human. It's also what makes AI truly intelligent. @mem0ai has raised $24M to build the universal memory layer for AI. Thousands of teams in production. 14M downloads. 41K GitHub stars. Intelligence needs memory & we're building it for everyone. More👇 https://t.co/7r2zHCnNYh Memory is what makes us"  
[X Link](https://x.com/omarsar0/status/1983200901402927115)  2025-10-28T15:55Z 272.3K followers, 17.1K engagements


"What happened to AGI I don't see anyone mentioning it on my timeline anymore. Apparently it's now ASI. :)"  
[X Link](https://x.com/omarsar0/status/1983630211506958699)  2025-10-29T20:21Z 272.8K followers, 17.6K engagements


"There is so much value in data for training/tuning LLM agents. But there aren't too many good public ones. If you do find a good one it's not in a standard format and tools vary. Agent Data Protocol attempts to solve this by unifying datasets for fine-tuning LLM agents"  
[X Link](https://x.com/omarsar0/status/1983637226996298210)  2025-10-29T20:49Z 272.5K followers, 29.8K engagements


"This is actually a clever context engineering technique for web agents. It's called AgentFold an agent that acts as a self-aware knowledge manager. It treats context as a dynamic cognitive workspace by folding information at different scales: - Light folding: Compressing small details while keeping the important stuff - Deep folding: Combining multiple steps or tasks into a simplified summary More of my notes: 1) Solving context saturation Traditional ReAct-based web agents accumulate noisy histories causing context overload while fixed summarization methods risk irreversible information"  
[X Link](https://x.com/omarsar0/status/1983646041850495140)  2025-10-29T21:24Z 278K followers, 30.9K engagements


"Graph-based Agent Planning It lets AI agents run multiple tools in parallel to accelerate task completion. Uses graphs to map tool dependencies + RL to learn the best execution order. RL also helps with scheduling strategies and planning. Major speedup for complex tasks"  
[X Link](https://x.com/omarsar0/status/1983892163990843692)  2025-10-30T13:42Z 272.6K followers, 21.9K engagements


"It's pretty exciting to see how fast agentic models are getting. Besides improved reliability speed is a must for coding agents today. Today were releasing SWE-1.5 our fast agent model. It achieves near-SOTA coding performance while setting a new standard for speed. Now available in @windsurf. https://t.co/0RvQVLezA0 Today were releasing SWE-1.5 our fast agent model. It achieves near-SOTA coding performance while setting a new standard for speed. Now available in @windsurf. https://t.co/0RvQVLezA0"  
[X Link](https://x.com/omarsar0/status/1983892272492978178)  2025-10-30T13:42Z 271.3K followers, [----] engagements


"🚀 MiniMax-M2 is the new top-tier open model for agentic workflows and advanced coding. #1 Open-source Model & #4 Intelligence (Artificial Analysis) 200k context window (128K max output tokens) [---] TPS throughput 8% Claude Sonnet price 2x faster More details:"  
[X Link](https://x.com/omarsar0/status/1983915573215162873)  2025-10-30T15:15Z 271.1K followers, 42.9K engagements


"🐙 Engineered for Agents & Code M2 excels in math science and code. #1 Open-source Model & #4 Intelligence (by Artificial Analysis & LMArena) Crushes SWE-Bench + Terminal-Bench. Comparable to top proprietary models in BrowseComp. Powered by MiniMaxs CISPO scaling algorithm the technique is highlighted in Metas "Art of Scaling RL Compute.""  
[X Link](https://x.com/omarsar0/status/1983915592055968239)  2025-10-30T15:15Z 273.1K followers, [----] engagements


"1 #1 on OpenRouter's Top Today Chart Just a few days after release it's trending to be the top open-source model in terms of token usage in OpenRouter. It's taking off really quickly"  
[X Link](https://x.com/omarsar0/status/1983915607608455540)  2025-10-30T15:15Z 273.1K followers, [----] engagements


"👨💻 Agentic Intelligence Simplified Supports full dev cycles: multi-file edits run debug auto-fix. M2 enables developers to transition seamlessly from the shell to the browser to MCP tools. Great for agents & long-horizon tasks. M2 plans executes retrieves and self-corrects autonomously"  
[X Link](https://x.com/omarsar0/status/1983915623181894127)  2025-10-30T15:15Z 271.4K followers, [---] engagements


"🧠 Built for Efficiency & Accuracy Built for multi-hop retrieval-heavy reasoning. M2 also shows that activation size matters. It uses 10B activations which leads to more responsive agent loops + better unit economics. More info in the repo: https://github.com/MiniMax-AI/MiniMax-M2 https://github.com/MiniMax-AI/MiniMax-M2"  
[X Link](https://x.com/omarsar0/status/1983915635655766483)  2025-10-30T15:15Z 271.3K followers, [----] engagements


"Free for all devs globally via MiniMax Agent and APIs for a limited time. Try it: 🔹 MiniMax Agent (Web): 🔹 API: Also live across Claude Code Cursor Cline Roo Code Grok CLI Gemini CLI & more. https://platform.minimax.io/docs/guides/text-generation https://bit.ly/elvism2 https://platform.minimax.io/docs/guides/text-generation https://bit.ly/elvism2"  
[X Link](https://x.com/omarsar0/status/1983915652764332401)  2025-10-30T15:15Z 271.1K followers, [----] engagements


"@bryan_johnson Any thoughts on the Hearing feature inside the iPhone's Health app It does seem to measure well and it also alerts when it gets very noisy"  
[X Link](https://x.com/omarsar0/status/1983978091996430542)  2025-10-30T19:23Z 271.2K followers, [----] engagements


"The MCP is dead camp is just as clueless as the RAG is dead camp. They have absolutely no clue what they are talking about. Just watch what happens in the next couple of weeks in this timeline. I am so tired of would be AI influencers shilling GitHub as an example of why MCP is bad (and thus why CLI is better). You have all done a disservice to the community by miseducating people. GitHub is a really poorly designed MCP. The CLI works well(-ish) because its heavily I am so tired of would be AI influencers shilling GitHub as an example of why MCP is bad (and thus why CLI is better). You have"  
[X Link](https://x.com/omarsar0/status/1984069751518515644)  2025-10-31T01:27Z 272.9K followers, 18.3K engagements


"Excited to announce our academy's first cohort-based course Building Effective AI Agents. This is the one solution that teaches end-to-end how to build evaluate and deploy AI agents. For [--] month you'll build with cutting-edge techniques like memory and deep agents"  
[X Link](https://x.com/omarsar0/status/1984235423707844958)  2025-10-31T12:26Z 271.7K followers, 37.8K engagements


"The more I use Claude Code the more it makes me believe that I can build whatever I want. It coded this little feature flawlessly in one shot. It's a luxury feature (I haven't had the time to implement it) that lots of people have been asking for our guides. Shipping soon"  
[X Link](https://x.com/omarsar0/status/1984248563711586493)  2025-10-31T13:18Z 273.2K followers, 42.4K engagements


"@OfficialLoganK 💯 vibe coding removes so many barriers"  
[X Link](https://x.com/omarsar0/status/1984647765377307081)  2025-11-01T15:44Z 271.4K followers, [----] engagements


"As someone who browses through arxiv papers daily on CS and CL I can totally understand this decision. I have seen many low quality submissions lately and they are surveys a lot of the times. I enjoy a few surveys and position papers so I think there is still a place for those. Hope the review process improves things overall"  
[X Link](https://x.com/omarsar0/status/1984701784691261653)  2025-11-01T19:19Z 276.3K followers, [----] engagements


"People really seem to like this Copy Markdown/AI Summarization functionality. This is the crazy thing about AI today. There are so many useful experiences just waiting to be unlocked"  
[X Link](https://x.com/omarsar0/status/1984999926200152135)  2025-11-02T15:03Z 272.3K followers, [----] engagements


"Proactive agents are going to fundamentally change how we interact with software. Traditional software and UI/UX are about to get a major upgrade. Proactive agents will eliminate broken interfaces. We know really well that the way we interact with computers and devices today is broken (including mobile phones) and I believe we might finally be able to fix that. Proactive agents will carry out an insane amount of work for you and the way we provide them feedback and assess outputs cannot be supported by current interfaces. Current interfaces limit proactive agents. We need to reassess how we"  
[X Link](https://x.com/omarsar0/status/1985022450606714899)  2025-11-02T16:33Z 273.7K followers, 36.9K engagements


"@mattpocockuk Windsurf takes the crown for me. Haven't had the need to change it since I installed it. I know their team has worked really hard on the tab to complete and they keep improving it"  
[X Link](https://x.com/omarsar0/status/1985025447625896143)  2025-11-02T16:45Z 273.7K followers, [----] engagements


"Long-term memory doesn't have to suck Here is the problem: even LLMs with 1M token windows struggle substantially as dialogues lengthen. Raw context capacity alone is insufficient for effective long-term conversational memory. On the other hand hybrid memory approaches work really well on AI agents"  
[X Link](https://x.com/omarsar0/status/1985348779193860414)  2025-11-03T14:10Z 273.7K followers, 38K engagements


"The framework shows the largest relative improvements in tasks requiring long-range reasoning: summarization (+160.6%) multi-hop reasoning (+27.2%) and preference following (+76.5%). Ablation studies reveal that all components become increasingly essential as context grows. At 10M tokens removing any single component causes significant performance drops (retrieval -8.5% scratchpad -3.7% noise filtering -8.3%). https://arxiv.org/abs/2510.27246 https://arxiv.org/abs/2510.27246"  
[X Link](https://x.com/omarsar0/status/1985348825197039718)  2025-11-03T14:10Z 273.3K followers, [----] engagements


"Cool project showcasing the use of multi-agent systems for scientific research"  
[X Link](https://x.com/omarsar0/status/1985353551984427435)  2025-11-03T14:29Z 274.6K followers, 12K engagements


"Vibe coding is cute but pair it with intentional development cycles and watch how far you can take a project with coding agents today. Learn to code and vibe code. You won't regret it"  
[X Link](https://x.com/omarsar0/status/1985358804469284945)  2025-11-03T14:49Z 273.4K followers, [----] engagements


"I find this puzzling as well. Like there are so many ways to use coding agents and there are even free ones too. I use coding agents (paid ones) for my side projects as it also provides a good test bed to gain experience on where to improve my workflows and pushes me to figure out how to ship things faster"  
[X Link](https://x.com/omarsar0/status/1985380447614894151)  2025-11-03T16:15Z 271.6K followers, [----] engagements


"Mostly agree with this. "Write evals" sounds scary to most people but it's a stand-out skill for AI agent builders today. From building datasets to verification to understanding how to connect metrics to business value. Evals are now a requirement in our academy. With superintelligence we will need to reset education. Human labor will be contingent on our capacity to imagine and prompt and write evals instead of merely doing the work Unlimited human desire will take care of the rest. We just have to rewrite HOW we build. With superintelligence we will need to reset education. Human labor will"  
[X Link](https://x.com/omarsar0/status/1985406276986098174)  2025-11-03T17:58Z 273.8K followers, 22.1K engagements


"Tools-to-Agent Retrieval This work presents a unified vector space embedding of both tools and agents with metadata links. Enables fine-grained tool-level and agent retrieval. It's a great context engineering approach in that it retrieves at both the tool and agent levels using a joint index returning whichever better matches the query. It preserves fine-grained tool details while keeping the agent context intact avoiding the information loss from collapsing many tools into coarse descriptions. This is useful for scaling tools and multi-agent systems. Huge improvements on LiveMCPBench. Paper:"  
[X Link](https://x.com/omarsar0/status/1985745152204554720)  2025-11-04T16:25Z 275.8K followers, 18.7K engagements


"Context Engineering [---] This report discusses the context of context engineering and examines key design considerations for its practice. Explosion of intelligence will lead to greater context-processing capabilities so it's important to build for the future too. This aligns well with my vision on proactive agents that can proactively build context and both reduce the cost of and close the gap on human-AI interactions. Great read for AI devs building AI agents. Paper -- arxiv. org/abs/2510.26493"  
[X Link](https://x.com/omarsar0/status/1985747789796483109)  2025-11-04T16:35Z 276.6K followers, 84.2K engagements


"Code understanding is a huge unlock for devs building with coding agents. Windsurf Codemaps is an exciting feature and a good example of what effective and automatic context engineering looks like. This really has the potential to unleash 10x AI engineers. Introducing Codemaps in @windsurf powered by SWE-1.5 and Sonnet [---] Your code is your understanding of the problem youre exploring. So its only when you have your code in your head that you really understand the problem. @paulg https://t.co/Tyodea1MrN Introducing Codemaps in @windsurf powered by SWE-1.5 and Sonnet [---] Your code is your"  
[X Link](https://x.com/omarsar0/status/1986078529641975809)  2025-11-05T14:29Z 275.7K followers, 12.1K engagements


"This is an idea I have been posting about for the last couple of weeks. I've been building deep research subagents and have a repo/code-mapper to run deep research in codebases. Code understanding leads to output quality that's on another level. https://x.com/omarsar0/status/1978235329237668214 Claude Code subagents are all you need. Some will complain on # of tokens. However the output this spits out will save you days. The code quality is mindblowing Agentic search works exceptionally well. The subagents run in parallel. ChatGPT's deep research is no match https://t.co/I5cCKLZJuV"  
[X Link](https://x.com/omarsar0/status/1986078532754128922)  2025-11-05T14:29Z 273.3K followers, [----] engagements


"Codemaps is more polished and tightly integrated into Windsurf which is the IDE I already use. If you are interested I will be showcasing and talking more about this in tomorrow's workshop for my academy subs. https://dair-ai.thinkific.com/courses/ai-agents-workshops https://dair-ai.thinkific.com/courses/ai-agents-workshops"  
[X Link](https://x.com/omarsar0/status/1986078536449311009)  2025-11-05T14:29Z 272.8K followers, [----] engagements


"Anthropic just posted another banger guide. This one is on building more efficient agents to handle more tools and efficient token usage. This is a must-read for AI devs (bookmark it) It helps with three major issues in AI agent tool calling: token costs latency and tool composition. How It combines code executions with MCP where it turns MCP servers into code APIs rather than direct tool calls. Here is all you need to know: [--]. Token Efficiency Problem: Loading all MCP tool definitions upfront and passing intermediate results through the context window creates massive token overhead sometimes"  
[X Link](https://x.com/omarsar0/status/1986099467914023194)  2025-11-05T15:53Z 280.7K followers, 181.9K engagements


"Confidence is everything when building great software. Love how Yansu is approaching this. Yansu is a new AI coding platform built by @isoformai for serious and complex software development. It puts scenario simulation before coding. Here is the sauce:"  
[X Link](https://x.com/omarsar0/status/1986118895108002012)  2025-11-05T17:10Z 274.4K followers, 11.5K engagements


"@AdamDittrichOne I am working on some new material to make all of these context engineering approaches and MCP building more accessible. More soon"  
[X Link](https://x.com/omarsar0/status/1986181460055728423)  2025-11-05T21:18Z 274.3K followers, [----] engagements


"MCP when used correctly with AI agents is extremely high-leverage. To make MCP more approachable I just launched our first course on the topic. As the name implies anyone can take and find this short course useful. Check it out here: https://dair-ai.thinkific.com/courses/building-mcp-servers https://dair-ai.thinkific.com/courses/building-mcp-servers"  
[X Link](https://x.com/omarsar0/status/1986222200878215332)  2025-11-06T00:00Z 274.3K followers, 15.2K engagements


"Building MCP tools for LLMs is a standout AI skill today. I just launched a short hands-on lab to help everyone build their first MCP server and integrate it with AI tools like ChatGPT and Claude. Anyone (even non-technical people) will find the lab useful. What you will learn: - How to index your data into Pinecone (popular vector store) - Build an MCP server in n8n on top of the data source - Build and connect an agentic search tool that efficiently retrieves data from your Pinecone datasource - And connect and test your MCP server and tools inside ChatGPT and Claude Learn it once and scale"  
[X Link](https://x.com/omarsar0/status/1986439248745275577)  2025-11-06T14:23Z 275.6K followers, 53.3K engagements


"Claude Code + Bash + Skills is all you need Jokes aside after playing around with a couple of computer-using agents I gave up on them. They are heavily optimized for clicking on common buttons and interfaces (e.g. order delivery) but suck at actual creative work like writing editing formatting etc. I decided to move those workflows to Claude Code + Skills. I wasn't so hopeful because unlike other tasks creative tasks benefit from visual cues hence the strong urge to use computer-using agents. Night and day Turns out Claude Code + bash + Skills is all I needed for the creative work I was"  
[X Link](https://x.com/omarsar0/status/1986568278605521009)  2025-11-06T22:56Z 274.9K followers, 55.6K engagements


"Unlocking the Power of Multi-Agent LLM for Reasoning Designing and optimizing multi-agent systems is important. This paper analyzes multiagent systems where one metathinking agent plans and another reasoning agent executes and identifies a lazy agent failure mode. They find that one agent does most work the other contributes little essentially collapsing into a singleagent system. This is something that happens very often and that you might not want in your design. To address this issue they propose Dr. MAMR (MultiAgent MetaReasoning Done Right) which introduces a Shapleystyle causal"  
[X Link](https://x.com/omarsar0/status/1986831275144138756)  2025-11-07T16:21Z 275.6K followers, 30.3K engagements


"@SrinivasValekar this test was on a customer support agentic system but i think the agent coding use case will be interesting too. on it"  
[X Link](https://x.com/omarsar0/status/1987567082150547532)  2025-11-09T17:04Z 273.5K followers, [----] engagements


"This is a wild use case I used Gamma + n8n to automatically generate a complete presentation on AI Agents research. In just minutes It combines web search (for research) GPT-5 (narrative) and Gamma (for slide content generation). Full workflow breakdown below 👇"  
[X Link](https://x.com/omarsar0/status/1987900645031252298)  2025-11-10T15:10Z 276.7K followers, 50.1K engagements


"1/ THE PROBLEM: Creating visual content is time-consuming. Research takes hours. Writing requires deep focus. Design demands specialized skills. What if AI could handle the entire pipeline"  
[X Link](https://x.com/omarsar0/status/1987900657161117831)  2025-11-10T15:10Z 273.6K followers, [----] engagements


"2/ THE SOLUTION: An n8n workflow that orchestrates Tavily for web research GPT-5 for storytelling Gamma for visual generation and Google Sheets for tracking. You provide a topic and audience. The system outputs a LinkedIn-ready carousel"  
[X Link](https://x.com/omarsar0/status/1987900669051998301)  2025-11-10T15:10Z 273.6K followers, [----] engagements


"5/ VISUAL GENERATION Gamma API takes the narrative and generates a scroll-friendly carousel with engaging headlines professional design and relevant imagery using Imagen [--] Pro. It specifies [--] cards in social format with custom instructions tailored to your target platform"  
[X Link](https://x.com/omarsar0/status/1987900704707813453)  2025-11-10T15:10Z 273.6K followers, [---] engagements


"6/ WORKFLOW BENEFITS: Time goes from hours to minutes. Quality remains high with researched content and visual polish. You can also refine everything in the Gamma UI. Perfect for content creators and technical communicators who need to scale their output"  
[X Link](https://x.com/omarsar0/status/1987900716623736849)  2025-11-10T15:10Z 273.6K followers, [---] engagements


"7/ USE CASES: This workflow is perfect for daily research summaries weekly newsletters community content marketing materials and educational content. It targets visual learners who consume content efficiently and need professional-looking presentations quickly"  
[X Link](https://x.com/omarsar0/status/1987900728527253601)  2025-11-10T15:10Z 274.9K followers, [----] engagements


"8/ GIVEAWAY I am sharing the full n8n workflow JSON and video walkthrough. Comment Gamma & Ill DM it to you"  
[X Link](https://x.com/omarsar0/status/1987900740401352832)  2025-11-10T15:10Z 274.9K followers, [----] engagements


"@thisisgrantlee @sarahdingwang @a16z Gamma is awesome Thanks for allowing me to test it out. https://x.com/omarsar0/status/1987900645031252298s=20 This is a wild use case I used Gamma + n8n to automatically generate a complete presentation on AI Agents research. In just minutes It combines web search (for research) GPT-5 (narrative) and Gamma (for slide content generation). Full workflow breakdown below 👇 https://t.co/4J9zM8tpuA https://x.com/omarsar0/status/1987900645031252298s=20 This is a wild use case I used Gamma + n8n to automatically generate a complete presentation on AI Agents"  
[X Link](https://x.com/omarsar0/status/1987905623204233270)  2025-11-10T15:30Z 275.6K followers, [----] engagements


"It turns out that Kimi K2 Thinking is also a beast at deep research. It can run 200-300 tool requests for impressive multi-agent capabilities. Would you like to see a code example of it Kimi K2 Thinking is a bigger deal than I thought I just ran a quick eval on a deep agent I built for customer support. It's on par with GPT-5; no other LLM has reached this level of agentic orchestration and reasoning capabilities. Huge for agentic and reasoning tasks. https://t.co/tW3BYThgPf Kimi K2 Thinking is a bigger deal than I thought I just ran a quick eval on a deep agent I built for customer support."  
[X Link](https://x.com/omarsar0/status/1987912692099682399)  2025-11-10T15:58Z 276.1K followers, 34.9K engagements


"Hey AI Devs Don't sleep on the new Gemini File Search API Feels like the easiest way to build agentic RAG systems. I built a little MCP server to analyze codebases with semantic search (Gemini File Search) & agentic search. Fun chatting with @karpathy's nanochat project"  
[X Link](https://x.com/omarsar0/status/1988236096195776683)  2025-11-11T13:23Z 277K followers, 89.9K engagements


"@karpathy The best part is that at this moment you only pay for indexing. Thanks to @OfficialLoganK and team for this"  
[X Link](https://x.com/omarsar0/status/1988237435671556505)  2025-11-11T13:28Z 273.9K followers, [----] engagements


"@karpathy @OfficialLoganK I also like the logs view in Google AI Studio. I show it towards the end of the clip. It gives me a good understanding of what happened on each request. Working on something to let folks try it out"  
[X Link](https://x.com/omarsar0/status/1988238506070581265)  2025-11-11T13:32Z 273.9K followers, [----] engagements


"@karpathy @OfficialLoganK As for the agent orchestration that's powering the MCP server I am using the AI SDK v6 for the agentic loop. It allows both semantic search via Gemini's File Search and keyword search over files. I am finding that having both forms of search makes it more robust"  
[X Link](https://x.com/omarsar0/status/1988239178165879189)  2025-11-11T13:35Z 273.9K followers, [----] engagements


"@HashgraphOnline @karpathy It works great on large codebases. More efficient than regular keyword/grep search. For now I am using simple similarity metrics but this can be tuned further. I combine both semantic + keyword search. It's working great so far. More testing is required"  
[X Link](https://x.com/omarsar0/status/1988241100964851974)  2025-11-11T13:43Z 273.8K followers, [---] engagements


"@karpathy @OfficialLoganK The inspiration behind the MCP server is that I want to be able to use it everywhere -- Claude ChatGPT Claude Code etc"  
[X Link](https://x.com/omarsar0/status/1988242251894444146)  2025-11-11T13:47Z 276.2K followers, [----] engagements


"It's crazy to me how tedious it can be to build a RAG system today. All the setup required for it is insane. This File Search API simplifies everything. Now I can focus on the agent harness and context engineering that takes a bit of time to get right but is important for user experience"  
[X Link](https://x.com/omarsar0/status/1988244779029692491)  2025-11-11T13:57Z 276.2K followers, [----] engagements


"@Beareka @karpathy I will share more on this soon. The MCP stuff was just an easy way to showcase the File Search API at work but there are actually many new implementation details I have baked into the MCP component"  
[X Link](https://x.com/omarsar0/status/1988261355762139157)  2025-11-11T15:03Z 273.8K followers, [---] engagements


"This simple Claude Code hack has reduced token usage by 90%. It adopts the "Code Execution with MCP" concept published by Anthropic. Remove preloaded MCP tools from context and use Python to execute tools via bash instead. BTW this can be optimized much further. Insane"  
[X Link](https://x.com/omarsar0/status/1988269255604007275)  2025-11-11T15:35Z 277.5K followers, 135.2K engagements


"For this test I used a prebuilt py script but ultimately what I want is for the agent (in this case Claude Code) to generate whatever code it needs to execute the right sequence of tools. The best part is that we avoid context bloat. It keeps CC more focused on what matters"  
[X Link](https://x.com/omarsar0/status/1988270477005967417)  2025-11-11T15:39Z 275.4K followers, 11.6K engagements

Limited data mode. Full metrics available with subscription: lunarcrush.com/pricing

@omarsar0 Avatar @omarsar0 elvis

elvis posts on X about ai, llm, agentic, $googl the most. They currently have [-------] followers and [----] posts still getting attention that total [------] engagements in the last [--] hours.

Engagements: [------] #

Engagements Line Chart

  • [--] Week [-------] +56%
  • [--] Month [---------] -31%
  • [--] Months [----------] -17%
  • [--] Year [----------] +51%

Mentions: [--] #

Mentions Line Chart

  • [--] Week [--] -1.60%
  • [--] Month [---] +20%
  • [--] Months [---] +21%
  • [--] Year [-----] +21%

Followers: [-------] #

Followers Line Chart

  • [--] Week [-------] +0.43%
  • [--] Month [-------] +1.40%
  • [--] Months [-------] +11%
  • [--] Year [-------] +26%

CreatorRank: [-------] #

CreatorRank Line Chart

Social Influence

Social category influence musicians #5190 technology brands stocks social networks finance celebrities travel destinations currencies events vc firms

Social topic influence ai, llm, agentic #377, $googl, open ai, devs, if you, agents #292, claude code #123, this is

Top assets mentioned Alphabet Inc Class A (GOOGL) Microsoft Corp. (MSFT) GrokCoin (GROKCOIN) IBM (IBM)

Top Social Posts

Top posts by engagements in the last [--] hours

"12) Galactica - A large language model for science (Ross et al) A large language model for the science domain trained on a massive scientific corpus. https://arxiv.org/abs/2211.09085 https://arxiv.org/abs/2211.09085"
X Link 2022-12-25T18:39Z 276.4K followers, 15.6K engagements

"The list is non-exhaustive. I tried to highlight trending papers for each month of the year based on trends. Feel free to share your favorite ML papers below. Happy holidays🎉 One last favor: follow me (@omarsar0) to keep track of more exciting ML papers in 2023"
X Link 2022-12-25T18:41Z 276.2K followers, 13.6K engagements

"Batch Prompting: Efficient Inference with LLM APIs Batch prompting helps to reduce the inference token and time costs while achieving better or comparable performance. Love to find these neat little tricks on efficiency gains during inference with LLMs. https://arxiv.org/abs/2301.08721 https://arxiv.org/abs/2301.08721"
X Link 2023-01-24T02:40Z 284.9K followers, 110.1K engagements

"DetectGPT - an approach for zero-shot machine-generated text detection. Unlike other methods that require classifiers or watermarking generated text this work uses raw log probabilities from the LLM to determine if the passage was sampled from it. https://arxiv.org/abs/2301.11305 https://arxiv.org/abs/2301.11305"
X Link 2023-01-27T02:43Z 283.2K followers, 64.5K engagements

"Where I track ML trends: Papers with Code - for trending papers + code Twitter - for discussion of trends GitHub Trending - for trending ML projects"
X Link 2023-02-25T20:17Z 276.4K followers, 41.2K engagements

"LLMs are getting cheaper better and more accessible. Introducing Stanford Alpaca a new 7B fine-tuned model based on Meta's LLaMA. That's the progress we need to see. https://github.com/tatsu-lab/stanford_alpaca https://github.com/tatsu-lab/stanford_alpaca"
X Link 2023-03-13T20:34Z 283.5K followers, 103.4K engagements

"Not quite there yet with the zero-shot chain-of-thought prompting. Expected given the base model but I think this analysis could be an interesting one to look at further as this is something more emergent in larger models. GPT-4 is really good at these types of questions now"
X Link 2023-04-04T01:04Z 283K followers, 12.6K engagements

"This is interesting RLHF models like GPT-4 can be aligned to output responses like this. But I am a bit surprised Vincuna can output these responses. Not sure how robust it is to these types of adversarial prompting. Needs further testing"
X Link 2023-04-04T01:21Z 282.8K followers, 25.3K engagements

"The paper also includes some impressive results Check out how Gorilla compares with other notable LLMs like ChatGPT GPT-4 LLaMA and Claude in terms of accuracy as well as reducing hallucination errors. Comparison is performed using zero-shot BM25 retriever GPT-retriever and so on"
X Link 2023-05-25T01:19Z 272.3K followers, [----] engagements

"ML Papers of the Week (2.7K) One of the best ways to keep track of AI is to read papers. If you are looking for good papers to read we've been collecting the top trending papers over the last [--] months. Check it out: https://github.com/dair-ai/ML-Papers-of-the-Week https://github.com/dair-ai/ML-Papers-of-the-Week"
X Link 2023-05-30T13:35Z 276.4K followers, 143K engagements

"Meta AI is open-sourcing AudioCraft a multi-purpose framework for generating music and sounds and enabling compression capabilities. AudioCraft contains training and inference code for a series of models including MusicGen AudioGen and EnCodec. This is going to allow others in the community to extend these models to all sorts of use cases or research problems. This release is exciting as it simplifies building on top of the state-of-the-art in audio generation. People can now build things like sound generators and compression algorithms with the same code base. blog: library:"
X Link 2023-08-02T16:31Z 285.4K followers, 63.9K engagements

"🎓ML Papers of The Week (August Edition) ICYMI we highlight some of the top trending ML papers every week. This is now used by 1000s of researchers and practitioners to follow and discover trending papers and AI topics. The August collection is now finished We also add quick summaries of the papers and work with our community to write explainers for outstanding papers. We use a combination of AI-powered tools analytics and human curation to build the lists of papers. Check it out here: https://github.com/dair-ai/ML-Papers-of-the-Week https://github.com/dair-ai/ML-Papers-of-the-Week"
X Link 2023-08-28T14:09Z 276.2K followers, 57.5K engagements

"@abacaj GPT-4 response with a bit of clever prompting is interesting"
X Link 2023-12-20T00:13Z 271.3K followers, 13.2K engagements

"ML Papers of the Week (5.3K) If you're looking for interesting and fun ML and LLM papers to read I got you covered. I've been curating all the top trending and most interesting papers since the beginning of the year. You will find a lot of gems there. https://github.com/dair-ai/ML-Papers-of-the-Week https://github.com/dair-ai/ML-Papers-of-the-Week"
X Link 2023-12-27T16:51Z 276.5K followers, 82.6K engagements

"ML Papers of the Week (6K) If you are looking for good recent LLM papers to read you don't need to look too far. We have been tracking the most popular impressive and trending ones in the "ML Papers of the Week" repo. Our paper collection is used by thousands of students researchers and developers. We use this list to keep up to date on all the latest developments in AI and LLMs @dair_ai. We have also implemented and combined ideas from a few of these papers to power some of the LLM services we provide for companies. My recommendation for finding good papers is to narrow your search. While"
X Link 2024-01-20T15:53Z 276.9K followers, 106.9K engagements

"Nice to see these library additions to ollama. I already heavily use experiment and build with LLMs locally. ollama is one of my favorite tools for this. These new Python and JavaScript libraries for ollama will make it even easier to do so. With this you can do things like streaming multimodal inference text completion creating custom models and even setting up your own custom client. Really impressed by this effort. Looks really straightforward to use: pip install ollama import ollama response = ollama. chat(model='llama2' messages= 'role': 'user' 'content': 'Why is the sky blue' )"
X Link 2024-01-25T18:10Z 271.6K followers, 52.5K engagements

"LoRA+: Efficient Low Rank Adaptation of Large Models 100s of LLM papers dropped on arXiv yesterday. This one caught my attention. It proposes LoRA+ which improves performance and finetuning speed (up to 2X speed up) at the same computational cost as LoRA. Lots of theory in this paper but the key difference between LoRA and LoRA+ is how the learning rate is set. LoRA+ sets different learning rates for LoRA adapter matrices while in LoRA the learning rate is the same"
X Link 2024-02-20T22:05Z 276.9K followers, 43K engagements

"It's here Introducing Llama [--] by Meta. 8B and 70B pretrained and instruction-tuned models are available. Details in the thread: https://llama.meta.com/llama3/ https://llama.meta.com/llama3/"
X Link 2024-04-18T16:00Z 285.4K followers, 40.3K engagements

"Snowflake casually releases Arctic an open-source LLM (Apache [---] license.) that uses a unique Dense-MoE Hybrid transformer architecture. Arctic performs on par with Llama3 70B in enterprise metrics like coding (HumanEval+ & MBPP+) SQL (Spider) and instruction following (IFEval). The remarkable part is that it claims to use 17x less compute budget than Llama [--] 70B. The training compute is roughly under $2 million (less than 3K GPU weeks). We are witnessing the democratization of LLMs at an unprecedented rate. .@SnowflakeDB is thrilled to announce #SnowflakeArctic: A state-of-the-art large"
X Link 2024-04-24T16:47Z 234.5K followers, 59.9K engagements

"Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations This is one of the more interesting LLM papers I read last week. It reports that LLMs struggle to acquire factual knowledge through fine-tuning. When examples with new knowledge are eventually learned they linearly increase the LLM's tendency to hallucinate. I mostly use fine-tuning to refine generations for my use cases and in some special situations and rarely for memorizing information. More thoughts on my latest LLM recap: https://youtu.be/p7xQRIHWG_Msi=Hi8xY0ROGFRCEzPI&t=1496"
X Link 2024-05-22T14:46Z 276.5K followers, 43.2K engagements

"Introducing Genie. the most capable AI software engineering system. It achieves state-of-the-art on SWE-Bench with 30.08%. That's a 57% improvement After reviewing the short technical report here are my three key takeaways: 1) Reasoning datasets that codify human/SE reasoning processes. 2) Agentic systems with native abilities to retrieve plan write and execute 3) Self-improvement to continually improve the model to fix mistakes when they arise More of my thoughts here: The importance of the reasoning dataset cannot go unnoticed. This reminds me of this great post by @karpathy: I must admit"
X Link 2024-08-12T22:06Z 284.6K followers, 29.1K engagements

"LLM News: AI agents continue to advance by combining ideas like self-play self-improvement self-evaluation and search. Other developments: building efficient RAG systems LMSYS rankings automating paper writing and reviewing Claude's prompt caching and distilling & pruning Llama [---] models. Here's all the latest in LLMs: https://youtu.be/x_GIh9NpVds https://youtu.be/x_GIh9NpVds"
X Link 2024-08-19T19:39Z 270.4K followers, 26.7K engagements

"IBM devs release Bee Agent Framework an open-source framework to build deeply and serve agentic workflows at scale. Features include: - Bee agents refined for Llama [---] - sandboxed code execution - flexible memory management for optimizing token usage - handling complex agentic workflow controls and easily pausing and resuming agent states - provides traceability through MLFlow integration and event logging along with production-grade features such as caching and error handling. - API to integrate agents using an OpenAI compatible Assistants API and Python SDK - serve agents using a Chat UI"
X Link 2024-10-25T20:23Z 269.7K followers, 110.6K engagements

"A Taxonomy of AgentOps for Enabling Observability of Foundation Model based Agents New research analyzes AgentOps platforms and tools highlighting the need for comprehensive observability and traceability features to ensure reliability in foundation model-based autonomous agent systems across their development and production lifecycle"
X Link 2024-11-15T12:30Z 285.1K followers, 27.3K engagements

"paper: https://arxiv.org/abs/2411.04905 https://arxiv.org/abs/2411.04905"
X Link 2024-11-15T20:05Z 277.7K followers, [----] engagements

"🦆 Docling reaches 12.3K There are now a few parsers for LLMs out there but this is one of the most popular. Supports PDF DOCX PPTX XLSX Images HTML AsciiDoc & Markdown. Other features include advanced PDF understanding integrations OCR for scanned PDFs and even a CLI. The big feature request I am seeing is support for other types of information like code and math equations. That's coming soon https://twitter.com/i/web/status/1864335669230727465 https://twitter.com/i/web/status/1864335669230727465"
X Link 2024-12-04T15:47Z 284K followers, 17.6K engagements

"Learning how to think with Meta Chain-of-Thought Proposes Meta Chain-of-Thought (Meta-CoT) which extends traditional Chain-of-Thought (CoT) by modeling the underlying reasoning required to arrive at a particular CoT. The argument is that CoT is naive and Meta-CoT gets closer to the cognitive process required for advanced problem-solving. This is a very detailed paper (100 pages) presenting ideas and methods to achieve system [--] reasoning in LLMs. Lots of interesting discussion around scaling laws verifier roles iterative refinement and the search for novel reasoning algorithms"
X Link 2025-01-10T14:48Z 233.2K followers, 44.1K engagements

"paper: code: https://github.com/sunnynexus/Search-o1 https://arxiv.org/abs/2501.05366 https://github.com/sunnynexus/Search-o1 https://arxiv.org/abs/2501.05366"
X Link 2025-01-10T15:41Z 233.2K followers, [----] engagements

"Windsurf makes coding insanely fun and fast It's quickly becoming my favorite coding tool. And the new features are 🔥 - Web search - Autogenerated Memories - Code Execution Improvements Here is what's new:"
X Link 2025-01-18T16:00Z 270.2K followers, 76.5K engagements

"Agentic RAG Overview This is a great intro to LLM agents and Agentic RAG. It provides a comprehensive exploration of Agentic RAG architectures applications and implementation strategies"
X Link 2025-01-20T15:19Z 229.6K followers, 59.8K engagements

"Building Effective Agents Cookbook Nice repo from the Anthropic team with code examples of how to build common agent workflows. In my experience building agent workflows I highly recommend learning these and other concepts like function calling structured outputs evaluating outputs ReAct LLM-as-a-judge retrieval CoT/few-shot prompting and structuring inputs among others"
X Link 2025-01-22T15:39Z 229.8K followers, 56.4K engagements

"AI Agents for Computer Use This report provides a comprehensive overview of the emerging field of instruction-based computer control examining available agents their taxonomy development and resources"
X Link 2025-02-01T15:25Z 229.6K followers, 65.6K engagements

"Oumi is a fully open-source platform to help you build state-of-the-art foundation models end-to-end"
X Link 2025-02-03T14:32Z 285.4K followers, 59.6K engagements

"s1: Simple test-time scaling Test-time scaling is an interesting problem as previous results have shown its potential to scale model performance with more compute. This new paper proposes a simple approach to achieve test-time scaling and strong reasoning performance"
X Link 2025-02-03T14:56Z 230.3K followers, 73.4K engagements

"I guess now you can just do research More seriously using Deep Research as a researcher is such high leverage. Watch how I use it to generate a very detailed comparison between AI agentic frameworks. To get the best results you have to be very specific about what you want"
X Link 2025-02-04T15:39Z 229.6K followers, [----] engagements

"How to Scale Your Model Google DeepMind just released an awesome book on scaling language models on TPUs. This is gold Worth checking you are an LLM developer"
X Link 2025-02-04T20:02Z 229.7K followers, 23.6K engagements

"DeepSeek-R1: Technical Deep Dive 36-page report courtesy of Deep Research. Models like o3 still make mistakes so keep that in mind. (prompt + report in the ALT)"
X Link 2025-02-05T19:22Z 229.3K followers, 75.1K engagements

"🎓OpenAI Deep Research Guide Just finished our live webinar on Deep Research including examples prompting tips use cases and what's missing. I am releasing the full guide I shared with our members (link in the comments)"
X Link 2025-02-06T18:12Z 230.6K followers, 34.1K engagements

"Claude [--] when Llama [--] when Claude is still my most used AI. But with all the latest models and products usage and workflows are changing"
X Link 2025-02-06T20:52Z 229.6K followers, [----] engagements

"Teaching AI Agents to Work Smarter Not Harder This new paper introduces MaAS (Multi-agent Architecture Search) a new framework that optimizes multi-agent systems. Instead of searching for a single optimal multi-agent system MaAS develops an "agentic supernet" - a probabilistic distribution of agent architectures that can adapt to different queries based on their difficulty and domain. MaAS can dynamically sample different multi-agent architectures tailored to each specific query. For simple arithmetic questions it might use a lightweight system while for complex coding tasks it can deploy a"
X Link 2025-02-07T15:20Z 230.9K followers, 40.4K engagements

"ACU (Awesome Agents for Computer Use) This is a nice curated knowledge base of AI agents that can operate computers. It compiles research papers projects frameworks and tools for agents that autonomously perform tasks via clicks keystrokes command-line calls and API calls on computer and mobile devices. https://twitter.com/i/web/status/1887911960551043166 https://twitter.com/i/web/status/1887911960551043166"
X Link 2025-02-07T17:11Z 283.5K followers, 28.8K engagements

"LLM Functions allows you to build LLM tools and agents using Bash Javascript and Python"
X Link 2025-02-08T15:27Z 230.5K followers, 32.2K engagements

"Training LLMs to Reason Efficiently This new paper introduces an RL approach to train LLMs to allocate inference-time computation dynamically. This process helps to optimize the reasoning efficiency of the LLMs. Key insights include: Optimizing inference cost without sacrificing accuracy The method incentivizes models to use minimal computational resources while preserving accuracy reducing the excessive inference cost associated with long chain-of-thought reasoning. RL for efficiency Instead of enforcing fixed-length reasoning they apply RL policy gradient methods to balance computational"
X Link 2025-02-11T15:01Z 231K followers, 30.4K engagements

"Hierarchical LLM Reasoning ReasonFlux is a hierarchical reasoning framework for LLMs that optimizes complex problem-solving using scaling thought templates. It outperforms state-of-the-art models in mathematical reasoning. Key contributions include: Structured Thought Template Library A curated set of 500+ high-level templates designed to generalize across complex problems. This enables efficient retrieval and structured reasoning without exhaustive search. Hierarchical Reinforcement Learning Instead of training on long Chain-of-Thought (CoT) sequences ReasonFlux optimizes trajectories of"
X Link 2025-02-11T16:00Z 231.1K followers, 25.4K engagements

"Large Memory Models for Long-Context Reasoning This paper focuses on improving long-context reasoning with explicit memory mechanisms. It presents LM2 a Transformer-based architecture equipped with a dedicated memory module to enhance long-context reasoning multi-hop inference and numerical reasoning. LM2 outperforms Recurrent Memory Transformer (RMT) by 37.1% and a non-memory baseline (Llama-3.2) by 86.3% on memory-intensive benchmarks. Key components: Memory-Augmented Transformer LM2 integrates a memory module that acts as an explicit long-term storage system interacting with input tokens"
X Link 2025-02-12T14:21Z 231.3K followers, 33.9K engagements

""We detect global cache sharing across users in seven API providers including OpenAI resulting in potential privacy leakage about users prompts." Concerning if true"
X Link 2025-02-12T14:37Z 231K followers, 20.9K engagements

"Excited to launch our new course on prompt engineering for devs Every developer and company I work with often struggles to prompt LLMs properly. We've built this course to provide the best and most up-to-date info and best practices on designing and optimizing prompts. This should help devs how to effectively apply prompting techniques to their LLM applications and agentic workflows. We are also going to include several new guides on how to effectively prompt models like DeepSeek-R1 open-source models and newer reasoning models. Our Prompt Engineering Guide is used by over 6M+ people to learn"
X Link 2025-02-13T13:45Z 230.5K followers, [----] engagements

"The Hundred-Page Language Models Book Finally got a chance to look at @burkov's book on language models. This book provides an accessible technical exploration of language models practical examples and the core maths behind them. Highly recommend it"
X Link 2025-02-14T18:35Z 231.5K followers, 56.7K engagements

"Perplexity just announced Deep Research (PDR) I'm now testing and comparing it with OpenAI's Deep Research (ODR). I still think the o3 variant powering ODR is a massive advantage. 20.5% (PDR) vs. 26.6% (ODR) on Humanity's Last Exam"
X Link 2025-02-14T22:15Z 231.4K followers, [----] engagements

"1 Leader After evaluating [--] leading LLMs across [--] diverse datasets here are the key findings: Google's -.- leads with a [----] score at a remarkably low cost"
X Link 2025-02-15T15:44Z 230.5K followers, [----] engagements

"2 Pricing The top [--] models span a 10x price difference with only 4% performance gap. Many of you might be overpaying"
X Link 2025-02-15T15:44Z 230.5K followers, [----] engagements

"3 Open-source Mistral AI's mistral-small-2501 leads open-source options matching GPT-4o-mini at [----]. Smaller models tuned for tool calling have a lot of potential"
X Link 2025-02-15T15:44Z 230.5K followers, [----] engagements

"4 Reasoning models Whilereasoning models like o1ando3-minidemonstrated excellent integration with function calling capabilities DeepSeek-R1 didn't make the rankings as it doesn't support native function calling (yet)"
X Link 2025-02-15T15:44Z 230.5K followers, [----] engagements

"@adonis_singh well they are unifying things so whats the point of that now o3 mini is really good btw"
X Link 2025-02-17T19:01Z 230.6K followers, [---] engagements

"grok [--] will be fire if it comes with its deep research too"
X Link 2025-02-18T03:16Z 230.9K followers, 17.3K engagements

"Grok [--] coding example: Thinking traces as generated as the model tries to solve the problem. Elon confirmed that the thinking steps have been obscured to avoid getting copied"
X Link 2025-02-18T04:37Z 231.3K followers, 73.6K engagements

"Grok [--] also excels at creative coding like generating creative and novel games. Elon emphasized Grok 3's creative emergent capabilities. You can also use the Big Brain mode to use more compute and reasoning with Grok 3"
X Link 2025-02-18T04:40Z 231.6K followers, 69.5K engagements

"Grok [--] Reasoning performance: The results correspond to the beta version of Grok-3 Reasoning. It outperforms o1 and DeepSeek-R1 when given more test-time compute (allowing it to think longer). The Grok [--] mini reasoning model is also very capable"
X Link 2025-02-18T04:44Z 231.5K followers, 63.9K engagements

"Grok [--] Reasoning Beta performance on AIME [----]. Grok [--] shows generalization capabilities. It not only does coding and math problem-solving but it can also do other creative and useful real-world tasks"
X Link 2025-02-18T04:47Z 231.1K followers, 58.7K engagements

"DeepSearch also exposes the steps that it takes to conduct the search itself"
X Link 2025-02-18T05:01Z 231.1K followers, 46.4K engagements

"Grok [--] on X Premium+"
X Link 2025-02-18T05:04Z 231.1K followers, 45.8K engagements

"SuperGrok dedicated app is also available with a polished experience. Try on the web as well: The web will include the latest Grok features. http://grok.com http://grok.com"
X Link 2025-02-18T05:04Z 231.4K followers, 46.3K engagements

"Improvements will happen rapidly and almost daily according to the team. There is also a Grok-powered voice app coming too -- about a week away"
X Link 2025-02-18T05:06Z 231.2K followers, 42.6K engagements

"@xai Thread with all the details that were announced: https://x.com/omarsar0/status/1891705029083512934 BREAKING: xAI announces Grok [--] Here is everything you need to know: https://t.co/EZKnjq57qh https://x.com/omarsar0/status/1891705029083512934 BREAKING: xAI announces Grok [--] Here is everything you need to know: https://t.co/EZKnjq57qh"
X Link 2025-02-18T05:06Z 230.8K followers, [----] engagements

"More results: AI co-scientist outperforms other SoTA agentic and reasoning models for complex problems generated by domain experts. Just look at how performance increases with more time spent on reasoning surpassing unassisted human experts"
X Link 2025-02-19T14:43Z 231K followers, [----] engagements

"How about novelty Experts assessed the AI co-scientist to have a higher potential for novelty and impact. It was even preferred over other models like OpenAI o1"
X Link 2025-02-19T14:43Z 231K followers, [----] engagements

"Introduction to CUDA Programming for Python Developers (link in comments)"
X Link 2025-02-21T14:06Z 277.4K followers, 103.6K engagements

"@alan__xiao i use it mostly for getting either gpt-4o or claude (both have this feature) to produce text that's closer to the style i prefer. i find that it's good for instance to get it to produce cleaner summaries of technical docs for personal consumption"
X Link 2025-02-21T21:55Z 231.2K followers, [--] engagements

"@kristallpirat yeah thats not ideal. I do get issues like that when there is too much stuff in the rules. Organizing and structuring with like xml tags has helped improve my windsurf setup"
X Link 2025-02-22T23:26Z 231.3K followers, [--] engagements

"🤯 We also get the GitHub integration They heard us loud LFG"
X Link 2025-02-24T21:58Z 232K followers, 19.9K engagements

"this is next-level vibe coding 🤯 watch me use windsurf's new preview feature to make quick ui improvements all within my coding environment this feature is insane agentic capabilities + long-context awareness in full display"
X Link 2025-03-05T20:54Z 271.9K followers, 68K engagements

"Claude [---] Sonnet has serious competition Gemini [---] Pro is a legit good model for code. - code quality is really good - 1M token context - native multimodality - long code generation - understand large codebases I used it with Windsurf to generate an AI search agent app:"
X Link 2025-03-30T17:55Z 277.2K followers, 212.9K engagements

"Hard to deny how much better Gemini [---] Pro is at long context understanding compared to other models. It's strong in many areas but Google has been working on long context LLMs since the early days. Strong performance on benchmarks and our own internal tests. They lead"
X Link 2025-04-15T13:52Z 284.7K followers, 64.3K engagements

"Agent Zero A personal agentic framework that dynamically grows and learns with you. - It uses the OS as a tool. - Has search and terminal execution too. - It has persistent memory to memorize key information to solve future tasks more reliably. - Multi-agent support"
X Link 2025-06-01T16:52Z 271.3K followers, 81.5K engagements

"MemAgent MemAgent-14B is trained on 32K-length documents with an 8K context window. Achieves 76% accuracy even at 3.5M tokens That consistency is crazy Here are my notes:"
X Link 2025-07-08T19:29Z 270.8K followers, 102K engagements

"Overview Introduces an RLdriven memory agent that enables transformer-based LLMs to handle documents up to [---] million tokens with near lossless performance linear complexity and no architectural modifications"
X Link 2025-07-08T19:29Z 270.7K followers, [----] engagements

"any good open-source agentic browsers it's important that this exists"
X Link 2025-07-09T16:03Z 285.1K followers, 20.9K engagements

"comment below on what you are building with AI agents great chance to bring some attention to your projects"
X Link 2025-07-11T16:41Z 269.7K followers, 10.6K engagements

"This handbook is so good It covers everything you need to know about LLM inference. FREE to access:"
X Link 2025-07-11T17:42Z 274.6K followers, 87.2K engagements

"Each scenario includes up to eight interdependent goals user personas ambiguous or missing tools and dynamic user intent mimicking real-world complexity"
X Link 2025-07-17T21:19Z 280.8K followers, [----] engagements

"Results: GPT-4.1 leads overall with 62% AC while Gemini-2.5-flash tops TSQ at 94% but lags on AC (38%). GPT-4.1-mini offers strong cost-efficiency ($0.014/session vs. GPT-4.1s $0.068). Open-source Kimi K2 is the top performer among non-closed models (53% AC 90% TSQ). Reasoning-enabled models generally underperform non-reasoning counterparts in action completion. Evaluation relies on a simulation loop involving synthetic users and tools measuring whether agents can meet real user demands while selecting and executing the right toolchains"
X Link 2025-07-17T21:19Z 280.8K followers, [----] engagements

"GLM-4.5 looks like a big deal MoE Architecture Hybrid reasoning models 355B total (32B active) GQA + partial RoPE Multi-Token Prediction Muon Optimizer + QK-Norm 22T-token training corpus Slime RL Infrastructure Native tool use Here's all you need to know:"
X Link 2025-07-28T17:29Z 285.7K followers, 60.3K engagements

"Model Architecture & Pre-Training GLM-4.5 is 355B total parameters (32B active); deeper model with narrower width; optimized for reasoning via more layers and [--] attention heads. GLM-4.5-Air is 106B (12B active). 22T-token training corpus that combines 15T general data with 7T code/reasoning-focused data. Grouped-Query Attention + partial RoPE to enhance long-context efficiency and accuracy in reasoning tasks"
X Link 2025-07-28T17:29Z 269.6K followers, [----] engagements

"You simply cannot ignore evals AI agents are now selling insurance handling legal docs and assisting with finance. But without real-time evaluation theyre one hallucination away from disaster. Meet Luna-2 the small model built to guardrail agents before they screw up"
X Link 2025-08-11T16:27Z 274.2K followers, 17.4K engagements

"The GLM-4.5 technical report is out Sharing some key details in case you missed it:"
X Link 2025-08-11T20:45Z 278.8K followers, 32.6K engagements

"Overview GLM-4.5 is an open MoE LLM (355B total / 32B activated) with a smaller 106B Air variant (12B activated). It trains on 23T tokens supports hybrid reasoning (thinking + direct modes) and posts strong ARC results: TAU-Bench [----] AIME24 [----] SWE-bench Verified 64.2"
X Link 2025-08-11T20:45Z 278.8K followers, [----] engagements

"Go to define persona & capabilities select tools map sub-agents test scenarios deploy scale. Their architecture uses design patterns that demonstrate 3x better task focus than generic models. It gives you a competitive advantage that is unique to you. http://emergent.sh http://emergent.sh http://emergent.sh http://emergent.sh"
X Link 2025-08-13T13:12Z 275.1K followers, [---] engagements

"Their prompt-to-production pipeline bypasses traditional mobile development entirely. Natural language input gets parsed through their semantic layer and compiled into native iOS/Android apps. Here is a personalized news app with minimal clean UI sourcing global news in a categorised manner"
X Link 2025-08-13T13:13Z 275.2K followers, [---] engagements

"Don't sleep on small models Anemoi is the latest multi-agent system that proves small models pack a punch when combined effectively. GPT-4.1-mini (for planning) and GPT-4o (for worker agents) surpass the strongest open-source baseline on GAIA. A must-read for devs:"
X Link 2025-08-27T20:19Z 282.4K followers, 94K engagements

"I'm surprised Agentic RAG is not getting more attention. That's all about to change. Here's why:"
X Link 2025-09-08T18:11Z 269.9K followers, 88.7K engagements

"This is one of the most promising directions to improve RAG systems. It involves combining dynamic retrieval with structured knowledge. It helps to mitigate hallucinations and outdated information and improves knowledge quality. Pay attention to this one AI devs"
X Link 2025-09-16T14:49Z 270.3K followers, 66K engagements

"If you are looking to get started with Codex you will find this little OpenAI guide useful. (bookmark it)"
X Link 2025-09-17T22:13Z 274.1K followers, 171.4K engagements

"MCP is extremely underrated It's crazy to me that most devs and teams are not taking advantage of MCP. I have been using MCP to make my prompts memory and context portable. I believe this is the ultimate application of MCP. Context is king; you really want complete control of it. MCP is your advantage. You are not tied to one tool. Your context goes where you go. Whether it's ChatGPT Claude Code Windsurf or Codex. I now worry less about switching model providers. It's really liberating. It has built my confidence in the use of tools knowing that I can switch off as I please. Plus LLMs"
X Link 2025-09-21T18:03Z 270.5K followers, 95.9K engagements

"Very cool work from Meta Superintelligence Lab. They are open-sourcing Meta Agents Research Environments (ARE) the platform they use to create and scale agent environments. Great resource to stress-test agents in environments closer to real apps. Read on for more:"
X Link 2025-09-22T15:27Z 271.4K followers, 152K engagements

"It doesn't matter what tools you use for AI Agents. I've put together the ultimate curriculum to learn how to build AI agents. (bookmark it) From context engineering to evaluating optimizing and shipping agentic applications"
X Link 2025-09-30T15:51Z 273.5K followers, 93.5K engagements

"How do you apply effective context engineering for AI agents Read this if you are an AI dev building AI agents today. Context is king And it must be engineered not just prompted. I wrote a few notes after reading through the awesome new context engineering guide from Anthropic: Context Engineering vs. Prompt Engineering - Prompt Engineering = writing and organizing instructions - Context Engineering = curating and maintaining prompts tools history and external data - Context Engineering is iterative and context is curated regularly Why Context Engineering Matters - Finite attention budget -"
X Link 2025-10-02T20:32Z 277.4K followers, 50.5K engagements

"Cool research paper from Google. This is what clever context engineering looks like. It proposes Tool-Use-Mixture (TUMIX) leveraging diverse tool-use strategies to improve reasoning. This work shows how to get better reasoning from LLMs by running a bunch of diverse agents (text-only code search etc.) in parallel and letting them share notes across a few rounds. Instead of brute-forcing more samples it mixes strategies stops when confident and ends up both more accurate and cheaper. Mix different agents not just more of one: They ran [--] different agent styles (CoT code execution web search"
X Link 2025-10-03T13:39Z 269.9K followers, 82.4K engagements

"Very excited about OpenAIs new AgentKit. Visual agent builders are a game changer for iterating on and shipping agents"
X Link 2025-10-06T18:22Z 285.3K followers, 47.2K engagements

"2025 is the year of AI agents. But they need a lot more work. More work is needed on architecture design optimization context engineering environments observability reliability evaluations scaling and more"
X Link 2025-10-08T13:28Z 270.1K followers, 13.4K engagements

"Agentic Context Engineering Great paper on agentic context engineering. The recipe: Treat your system prompts and agent memory as a living playbook. Log trajectories reflect to extract actionable bullets (strategies tool schemas failure modes) then merge as append-only deltas with periodic semantic de-dupe. Use execution signals and unit tests as supervision. Start offline to warm up a seed playbook then continue online to self-improve. On AppWorld ACE consistently beats strong baselines in both offline and online adaptation. Example: ReAct+ACE (offline) lifts average score to 59.4% vs"
X Link 2025-10-10T20:29Z 276.9K followers, 84.6K engagements

"Great recap of security risks associated with LLM-based agents. The literature keeps growing but these are key papers worth reading. Analysis of 150+ papers finds that there is a shift from monolithic to planner-executor and multi-agent architectures. Multi-agent security is a widely underexplored space for devs. Issues range from LLM-to-LLM prompt infection spoofing trust delegation and collusion"
X Link 2025-10-11T18:07Z 270.6K followers, 35.1K engagements

"Memory is key to effective AI agents but it's hard to get right. Google presents memory-aware test-time scaling for improving self-evolving agents. It outperforms other memory mechanisms by leveraging structured and adaptable memory. Technical highlights:"
X Link 2025-10-12T16:01Z 269.9K followers, 27.4K engagements

"Is your LLM-based multi-agent system actually coordinating Thats the question behind this paper. They use information theory to tell the difference between a pile of chatbots and a true collective intelligence. They introduce a clean measurement loop. First test if the groups overall output predicts future outcomes better than any single agent. If yes there is synergy information that only exists at the collective level. Next decompose that information using partial information decomposition. This splits whats shared unique or synergistic between agents. Real emergence shows up as synergy not"
X Link 2025-10-13T17:13Z 269.9K followers, 24.3K engagements

"Why does RL work for enhancing agentic reasoning This paper studies what actually works when using RL to improve tool-using LLM agents across three axes: data algorithm and reasoning mode. Instead of chasing bigger models or fancy algorithms the authors find that real diverse data and a few smart RL tweaks make the biggest difference -- even for small models. My [--] key takeaways from the paper:"
X Link 2025-10-14T14:55Z 270.3K followers, 32.5K engagements

"4. Dont kill the entropy. Too little exploration and the model stops learning; too much and it becomes unstable. Finding just the right clip range depends on model size; small models need more room to explore"
X Link 2025-10-14T14:55Z 269.7K followers, [----] engagements

"5. Slow thoughtful agents win. Agents that plan before acting (fewer but smarter tool calls) outperform reactive ones that constantly rush to use tools. The best ones pause think internally then act once with high precision"
X Link 2025-10-14T14:55Z 269.7K followers, [----] engagements

"Most agents today are shallow. They easily break down on long multi-step problems (e.g. deep research or agentic coding). Thats changing fast Were entering the era of "Deep Agents" systems that strategically plan remember and delegate intelligently for solving very complex problems. We at @dair_ai and other folks from LangChain Claude Code as well as more recently individuals like Philipp Schmid have been documenting this idea. Heres roughly the core idea behind Deep Agents (based on my own thoughts and notes that I've gathered from others): // Planning // Instead of reasoning ad-hoc inside a"
X Link 2025-10-14T19:07Z 270.4K followers, 42.7K engagements

"Dr.LLM: Dynamic Layer Routing in LLMs Neat technique to reduce computation in LLMs while improving accuracy. Routers increase accuracy while reducing layers by roughly [--] to [--] per query. My notes below:"
X Link 2025-10-16T14:25Z 270.6K followers, 22.7K engagements

"Banger paper from Meta and collaborators. This paper is one of the best deep dives yet on how reinforcement learning (RL) actually scales for LLMs. The team ran over [------] GPU hours of experiments to find a predictable scaling pattern and a stable recipe (ScaleRL) that consistently works as you scale up compute. Think of it as a practical guide for anyone trying to train reasoning or alignment models with RL. More on why this is a big deal:"
X Link 2025-10-16T16:46Z 278.6K followers, 35.3K engagements

"2. The ScaleRL recipe that just works. The authors tested dozens of RL variations and found one that scales cleanly to 100k GPU hours without blowing up: - PipelineRL (8 pipelines) with CISPO loss (a stabilized REINFORCE variant). - Prompt-level averaging and batch-level normalization to reduce variance. - FP32 logits for better stability and higher final accuracy. - No-Positive-Resampling curriculum to avoid reward hacking. - Forced interruptions (stopping long thoughts) instead of punishing long completions. - This combo called ScaleRL hit the best trade-off between stability sample"
X Link 2025-10-16T16:46Z 285.9K followers, [----] engagements

"3. What actually matters for better RL results. Not every trick helps equally: - Loss choice and precision matter most; CISPO + FP32 logits boosted final pass rates from 52% to 61%. - Normalization aggregation and curriculum mainly affect how fast you improve (efficiency) not how far you can go. - Fancy variants like GRPO DAPO or Magistral didnt beat ScaleRL once scaled properly. https://twitter.com/i/web/status/1978865085939654692 https://twitter.com/i/web/status/1978865085939654692"
X Link 2025-10-16T16:46Z 285.9K followers, [----] engagements

"I am not going to lie. I see a lot of potential in the Skills feature that Anthropic just dropped Just tested with Claude Code. It leads to sharper and precise outputs. It's structured context engineering to power CC with specialized capabilities leveraging the filesystem"
X Link 2025-10-16T20:20Z 270.9K followers, 64.1K engagements

"An easy way to try Skills in Claude Code is by asking it to help you build one. I am surprised by how aware it is of Skills and how to build comprehensive ones"
X Link 2025-10-16T20:36Z 269.8K followers, [----] engagements

"This is also neat To help deal with context rot or context collapse Skills uses a neat tiered system (3 levels) to help Claude Code load context efficiently and only when it needs it. Don't sleep on agentic search"
X Link 2025-10-16T20:45Z 269.8K followers, [----] engagements

"Another interesting development on LLM efficiency. How do you make diffusion LLMs faster without breaking them This is the problem tackled in Elastic-Cache a new approach to decoding efficiency for diffusion-based LLMs. It shows that attention really is all you need if you know when to stop paying too much of it. The idea is surprisingly simple. Instead of recomputing every Key/Value (KV) pair at every denoising step Elastic-Cache learns to reuse what hasnt changed and only update where attention truly drifts. Heres roughly how it works: [--] Track attention drift. At each step the model checks"
X Link 2025-10-17T13:41Z 283K followers, 21.9K engagements

"LLMs can get "Brain Rot" Continual pretraining on junk high-engagement web text causes lasting "cognitive decline" in LLMs reducing reasoning long-context and safety performance. The main failure mode is thought-skipping where models skip reasoning steps and adopt dark personality traits like narcissism and low agreeableness. Even strong mitigations such as reflection or further fine-tuning only partially reverse the damage making data curation a critical safety concern for AI training"
X Link 2025-10-17T16:07Z 285.2K followers, 293.3K engagements

"Don't sleep on Skills. Skills is easily one of the most effective ways to steer Claude Code. Impressive for optimization. I built a skill inside of Claude Code that automatically builds tests and optimizes MCP tools. It runs in a loop loading context and tools (bash scripts) efficiently to test and optimize MCP tools based on best practices implementation and outputs. Heck you could even run MCP tools within it if you like but that wasn't what I needed here. One of the most impressive aspects of using Claude Code with Skills is the efficient token usage. The context tiering system is a"
X Link 2025-10-17T17:44Z 282.5K followers, 193.8K engagements

"At a high level Skills is great for context engineering and steering Claude Code. But I strongly agree on the continual learning stuff. Skills are a bigger deal than I initially thought. I wrote yesterday about how great it is at optimizing tools (with an MCP tuning example) but this ability can generalize to self-improving/evolving agents through adaptive skills. A bit like how humans actually learn and upskill over time. As an example I am building a high-level skill that enables Claude Code to actively monitor my interactions with it and my feedback and to document those where needed (and"
X Link 2025-10-18T13:58Z 272.1K followers, 98.2K engagements

"So I wrote down a better-formatted version of my post on Deep Agents. I added it to the AI Agents section of the Prompt Engineering Guide. If you are building with AI Agents this is a must-read. I also added links to other useful references. promptingguide .ai/agents"
X Link 2025-10-18T15:30Z 270.9K followers, 19K engagements

"@nikitabier You say something about finding the right signals. Great I get that. And hope the UX improves things. But can you explain why there is literally no post with actual links on my timeline. It feels like some deserve to. Whats up with that. Might be a separate issue"
X Link 2025-10-19T23:14Z 269.8K followers, [----] engagements

"I tried Codex on ChatGPT today. Claude Code is just irreplaceable to me at this point. And with this new Skills feature the edge it gives is just too good to pass on. I am sure Codex will get better. Will keep trying future iterations. Whats your experience"
X Link 2025-10-19T23:23Z 270.8K followers, 104.5K engagements

"This is huge I sensed that Claude Code on web was coming. Coding agents on web serve specific use cases. There is a wide opening to improve coding experience on web. If you tried Codex (web) you know this well. If it's anything like Claude Code on the terminal its game on. Introducing Claude Code on the web. You can now delegate coding tasks to Claude without opening your terminal. https://t.co/Hw8KkKiFGj Introducing Claude Code on the web. You can now delegate coding tasks to Claude without opening your terminal. https://t.co/Hw8KkKiFGj"
X Link 2025-10-20T22:39Z 270.8K followers, 23.6K engagements

"Claude Code on mobile is exciting. I am often on the road a lot so this is extremely helpful. I normally used GitHub Actions but this should feel more native"
X Link 2025-10-20T22:50Z 270K followers, [----] engagements

"Safe agentic code execution matters. The sandboxing feature in Claude Code is also neat and well thoughout"
X Link 2025-10-20T22:55Z 270.3K followers, [----] engagements

"People are sleeping on Deep Agents. Start using them now. This is a fun paper showcasing how to put together advanced deep agents for enterprise use cases. Uses the best techniques: task decomposition planning specialized subagents MCP for NL2SQL file analysis and more"
X Link 2025-10-21T13:36Z 272.7K followers, 60.6K engagements

"BREAKING: OpenAI launches ChatGPT Atlas. A new AI-powered browser built around ChatGPT. Chat goes with you anywhere on the web"
X Link 2025-10-21T17:14Z 270.7K followers, 15.9K engagements

"It's OpenAI's attempt at combining the AI chat experience with the browser. ChatGPT is the "beating heart of Atlas" Built to be fast and flexible"
X Link 2025-10-21T17:14Z 271.7K followers, [----] engagements

"Three core features of Atlas: - Chat goes with you anywhere on the web - Browser memory to personalize the experience across the web - Agents can take actions for you"
X Link 2025-10-21T17:14Z 273.5K followers, [----] engagements

"You can invite ChatGPT to any tab you have open. It can see the webpage and answer questions"
X Link 2025-10-21T17:19Z 270.1K followers, [----] engagements

"You can do web searches and click through links to preview pages. And you can continue interacting with webpages seamlessly"
X Link 2025-10-21T17:20Z 270.1K followers, [----] engagements

"🎓Stanford CME295 Transformers & LLMs Nice to see the new release of this new course on Transformers and LLMs. Great way to catch up on the world of LLMs and AI Agents. Includes topics like the basics of attention mixture-of-experts to agents. Excited to see more on evals. First lessons available now. https://cme295.stanford.edu/syllabus/ https://cme295.stanford.edu/syllabus/"
X Link 2025-10-22T16:10Z 272.3K followers, 46.2K engagements

"Lookahead Routing for LLMs Proposes Lookahead a routing framework to enable more informed routing without full inference. Achieves an average performance gain of 7.7% over the state-of-the-art. Here is why it works: Lookahead is a new framework for routing in multi-LLM systems deciding which model should handle each query. Key idea: Instead of routing based only on the input query Lookahead predicts latent representations of potential responses giving it a peek into what each model would say without fully generating text. Smarter decisions: This response-aware prediction makes routing more"
X Link 2025-10-23T14:02Z 271.5K followers, 23.8K engagements

"Builds an OS agent for long-horizon tasks. Uses step-wise RL and self-evolve training to enable the agent to carry out long-horizon interactions. Instead of one big model doing everything it uses multiple cooperating agents. One for memory and context one for breaking down complex tasks and another to check and fix mistakes along the way. This is another great example of deep agents to personalize user-agent interactions"
X Link 2025-10-23T14:17Z 270.9K followers, 15.3K engagements

"Not sure what's up with ChatGPT lately. Projects don't work as they used to. I have moved all my Projects to Claude Code Skills. Night and day Skills give you so much personalization. They can easily be adapted too The Claude team really cooked with this one"
X Link 2025-10-23T14:54Z 271.1K followers, 14.8K engagements

"Fundamentals of Building Autonomous LLM Agents Great overview of LLM-based agents. Great if you are just getting started with AI agents. This covers the basics good"
X Link 2025-10-24T18:42Z 279.3K followers, 63.1K engagements

"On Skills vs. Subagents Been using Skills extensively for the past couple of days. But like many other Claude Code users I've been thinking about the difference between subagents and Skills. Subagents are useful for handing off subtasks (i.e. separation of concerns) from the main task (e.g. deep research with a planner). And skills are about loading context efficiently with a neat tiered system. Subagents can also leverage Skills. And I sometimes also use custom commands to orchestrate more complex scenarios. They are all useful features to have. Who knows This will probably change again or"
X Link 2025-10-24T19:03Z 271.9K followers, 57.7K engagements

"Curious how Kimi CLI will stack up against Claude Code and the like. I think there is a lot of room for innovation with CLI agents"
X Link 2025-10-26T15:05Z 277.5K followers, 13.8K engagements

"Project page: https://github.com/MoonshotAI/kimi-cli https://github.com/MoonshotAI/kimi-cli"
X Link 2025-10-26T15:06Z 277.5K followers, [----] engagements

"Damn it Now its hard to think of building AI agents without access to the filesystem and agentic search. Claude Code has spoiled me. But my agents are so darn better with bash glob grep files skills and the like. Memory context engineering evals etc all better"
X Link 2025-10-28T14:13Z 274.3K followers, 19K engagements

"Microsoft's open-source game is on 🔥 They release Agent Lightning to help optimize multi-agent systems. It works with popular agent frameworks. Barely any code change needed. Integrates algos like RL Automatic Prompt Optimization Supervised Fine-tuning and more"
X Link 2025-10-28T14:34Z 272.4K followers, 46.1K engagements

"@claudeusmaximus i wouldn't throw away vector dbs just yet -- there is a world where both and even emerging search tools exist"
X Link 2025-10-28T14:58Z 270.8K followers, [---] engagements

"Memory for AI agents is a bigger deal than it looks. Huge congrats to the Mem0 team on their Series A. I've been building with mem0 for the past couple of months. It's clear this team is building some impressive stuff. Check them out if you are an AI dev building agents. Memory is what makes us human. It's also what makes AI truly intelligent. @mem0ai has raised $24M to build the universal memory layer for AI. Thousands of teams in production. 14M downloads. 41K GitHub stars. Intelligence needs memory & we're building it for everyone. More👇 https://t.co/7r2zHCnNYh Memory is what makes us"
X Link 2025-10-28T15:55Z 272.3K followers, 17.1K engagements

"What happened to AGI I don't see anyone mentioning it on my timeline anymore. Apparently it's now ASI. :)"
X Link 2025-10-29T20:21Z 272.8K followers, 17.6K engagements

"There is so much value in data for training/tuning LLM agents. But there aren't too many good public ones. If you do find a good one it's not in a standard format and tools vary. Agent Data Protocol attempts to solve this by unifying datasets for fine-tuning LLM agents"
X Link 2025-10-29T20:49Z 272.5K followers, 29.8K engagements

"This is actually a clever context engineering technique for web agents. It's called AgentFold an agent that acts as a self-aware knowledge manager. It treats context as a dynamic cognitive workspace by folding information at different scales: - Light folding: Compressing small details while keeping the important stuff - Deep folding: Combining multiple steps or tasks into a simplified summary More of my notes: 1) Solving context saturation Traditional ReAct-based web agents accumulate noisy histories causing context overload while fixed summarization methods risk irreversible information"
X Link 2025-10-29T21:24Z 278K followers, 30.9K engagements

"Graph-based Agent Planning It lets AI agents run multiple tools in parallel to accelerate task completion. Uses graphs to map tool dependencies + RL to learn the best execution order. RL also helps with scheduling strategies and planning. Major speedup for complex tasks"
X Link 2025-10-30T13:42Z 272.6K followers, 21.9K engagements

"It's pretty exciting to see how fast agentic models are getting. Besides improved reliability speed is a must for coding agents today. Today were releasing SWE-1.5 our fast agent model. It achieves near-SOTA coding performance while setting a new standard for speed. Now available in @windsurf. https://t.co/0RvQVLezA0 Today were releasing SWE-1.5 our fast agent model. It achieves near-SOTA coding performance while setting a new standard for speed. Now available in @windsurf. https://t.co/0RvQVLezA0"
X Link 2025-10-30T13:42Z 271.3K followers, [----] engagements

"🚀 MiniMax-M2 is the new top-tier open model for agentic workflows and advanced coding. #1 Open-source Model & #4 Intelligence (Artificial Analysis) 200k context window (128K max output tokens) [---] TPS throughput 8% Claude Sonnet price 2x faster More details:"
X Link 2025-10-30T15:15Z 271.1K followers, 42.9K engagements

"🐙 Engineered for Agents & Code M2 excels in math science and code. #1 Open-source Model & #4 Intelligence (by Artificial Analysis & LMArena) Crushes SWE-Bench + Terminal-Bench. Comparable to top proprietary models in BrowseComp. Powered by MiniMaxs CISPO scaling algorithm the technique is highlighted in Metas "Art of Scaling RL Compute.""
X Link 2025-10-30T15:15Z 273.1K followers, [----] engagements

"1 #1 on OpenRouter's Top Today Chart Just a few days after release it's trending to be the top open-source model in terms of token usage in OpenRouter. It's taking off really quickly"
X Link 2025-10-30T15:15Z 273.1K followers, [----] engagements

"👨💻 Agentic Intelligence Simplified Supports full dev cycles: multi-file edits run debug auto-fix. M2 enables developers to transition seamlessly from the shell to the browser to MCP tools. Great for agents & long-horizon tasks. M2 plans executes retrieves and self-corrects autonomously"
X Link 2025-10-30T15:15Z 271.4K followers, [---] engagements

"🧠 Built for Efficiency & Accuracy Built for multi-hop retrieval-heavy reasoning. M2 also shows that activation size matters. It uses 10B activations which leads to more responsive agent loops + better unit economics. More info in the repo: https://github.com/MiniMax-AI/MiniMax-M2 https://github.com/MiniMax-AI/MiniMax-M2"
X Link 2025-10-30T15:15Z 271.3K followers, [----] engagements

"Free for all devs globally via MiniMax Agent and APIs for a limited time. Try it: 🔹 MiniMax Agent (Web): 🔹 API: Also live across Claude Code Cursor Cline Roo Code Grok CLI Gemini CLI & more. https://platform.minimax.io/docs/guides/text-generation https://bit.ly/elvism2 https://platform.minimax.io/docs/guides/text-generation https://bit.ly/elvism2"
X Link 2025-10-30T15:15Z 271.1K followers, [----] engagements

"@bryan_johnson Any thoughts on the Hearing feature inside the iPhone's Health app It does seem to measure well and it also alerts when it gets very noisy"
X Link 2025-10-30T19:23Z 271.2K followers, [----] engagements

"The MCP is dead camp is just as clueless as the RAG is dead camp. They have absolutely no clue what they are talking about. Just watch what happens in the next couple of weeks in this timeline. I am so tired of would be AI influencers shilling GitHub as an example of why MCP is bad (and thus why CLI is better). You have all done a disservice to the community by miseducating people. GitHub is a really poorly designed MCP. The CLI works well(-ish) because its heavily I am so tired of would be AI influencers shilling GitHub as an example of why MCP is bad (and thus why CLI is better). You have"
X Link 2025-10-31T01:27Z 272.9K followers, 18.3K engagements

"Excited to announce our academy's first cohort-based course Building Effective AI Agents. This is the one solution that teaches end-to-end how to build evaluate and deploy AI agents. For [--] month you'll build with cutting-edge techniques like memory and deep agents"
X Link 2025-10-31T12:26Z 271.7K followers, 37.8K engagements

"The more I use Claude Code the more it makes me believe that I can build whatever I want. It coded this little feature flawlessly in one shot. It's a luxury feature (I haven't had the time to implement it) that lots of people have been asking for our guides. Shipping soon"
X Link 2025-10-31T13:18Z 273.2K followers, 42.4K engagements

"@OfficialLoganK 💯 vibe coding removes so many barriers"
X Link 2025-11-01T15:44Z 271.4K followers, [----] engagements

"As someone who browses through arxiv papers daily on CS and CL I can totally understand this decision. I have seen many low quality submissions lately and they are surveys a lot of the times. I enjoy a few surveys and position papers so I think there is still a place for those. Hope the review process improves things overall"
X Link 2025-11-01T19:19Z 276.3K followers, [----] engagements

"People really seem to like this Copy Markdown/AI Summarization functionality. This is the crazy thing about AI today. There are so many useful experiences just waiting to be unlocked"
X Link 2025-11-02T15:03Z 272.3K followers, [----] engagements

"Proactive agents are going to fundamentally change how we interact with software. Traditional software and UI/UX are about to get a major upgrade. Proactive agents will eliminate broken interfaces. We know really well that the way we interact with computers and devices today is broken (including mobile phones) and I believe we might finally be able to fix that. Proactive agents will carry out an insane amount of work for you and the way we provide them feedback and assess outputs cannot be supported by current interfaces. Current interfaces limit proactive agents. We need to reassess how we"
X Link 2025-11-02T16:33Z 273.7K followers, 36.9K engagements

"@mattpocockuk Windsurf takes the crown for me. Haven't had the need to change it since I installed it. I know their team has worked really hard on the tab to complete and they keep improving it"
X Link 2025-11-02T16:45Z 273.7K followers, [----] engagements

"Long-term memory doesn't have to suck Here is the problem: even LLMs with 1M token windows struggle substantially as dialogues lengthen. Raw context capacity alone is insufficient for effective long-term conversational memory. On the other hand hybrid memory approaches work really well on AI agents"
X Link 2025-11-03T14:10Z 273.7K followers, 38K engagements

"The framework shows the largest relative improvements in tasks requiring long-range reasoning: summarization (+160.6%) multi-hop reasoning (+27.2%) and preference following (+76.5%). Ablation studies reveal that all components become increasingly essential as context grows. At 10M tokens removing any single component causes significant performance drops (retrieval -8.5% scratchpad -3.7% noise filtering -8.3%). https://arxiv.org/abs/2510.27246 https://arxiv.org/abs/2510.27246"
X Link 2025-11-03T14:10Z 273.3K followers, [----] engagements

"Cool project showcasing the use of multi-agent systems for scientific research"
X Link 2025-11-03T14:29Z 274.6K followers, 12K engagements

"Vibe coding is cute but pair it with intentional development cycles and watch how far you can take a project with coding agents today. Learn to code and vibe code. You won't regret it"
X Link 2025-11-03T14:49Z 273.4K followers, [----] engagements

"I find this puzzling as well. Like there are so many ways to use coding agents and there are even free ones too. I use coding agents (paid ones) for my side projects as it also provides a good test bed to gain experience on where to improve my workflows and pushes me to figure out how to ship things faster"
X Link 2025-11-03T16:15Z 271.6K followers, [----] engagements

"Mostly agree with this. "Write evals" sounds scary to most people but it's a stand-out skill for AI agent builders today. From building datasets to verification to understanding how to connect metrics to business value. Evals are now a requirement in our academy. With superintelligence we will need to reset education. Human labor will be contingent on our capacity to imagine and prompt and write evals instead of merely doing the work Unlimited human desire will take care of the rest. We just have to rewrite HOW we build. With superintelligence we will need to reset education. Human labor will"
X Link 2025-11-03T17:58Z 273.8K followers, 22.1K engagements

"Tools-to-Agent Retrieval This work presents a unified vector space embedding of both tools and agents with metadata links. Enables fine-grained tool-level and agent retrieval. It's a great context engineering approach in that it retrieves at both the tool and agent levels using a joint index returning whichever better matches the query. It preserves fine-grained tool details while keeping the agent context intact avoiding the information loss from collapsing many tools into coarse descriptions. This is useful for scaling tools and multi-agent systems. Huge improvements on LiveMCPBench. Paper:"
X Link 2025-11-04T16:25Z 275.8K followers, 18.7K engagements

"Context Engineering [---] This report discusses the context of context engineering and examines key design considerations for its practice. Explosion of intelligence will lead to greater context-processing capabilities so it's important to build for the future too. This aligns well with my vision on proactive agents that can proactively build context and both reduce the cost of and close the gap on human-AI interactions. Great read for AI devs building AI agents. Paper -- arxiv. org/abs/2510.26493"
X Link 2025-11-04T16:35Z 276.6K followers, 84.2K engagements

"Code understanding is a huge unlock for devs building with coding agents. Windsurf Codemaps is an exciting feature and a good example of what effective and automatic context engineering looks like. This really has the potential to unleash 10x AI engineers. Introducing Codemaps in @windsurf powered by SWE-1.5 and Sonnet [---] Your code is your understanding of the problem youre exploring. So its only when you have your code in your head that you really understand the problem. @paulg https://t.co/Tyodea1MrN Introducing Codemaps in @windsurf powered by SWE-1.5 and Sonnet [---] Your code is your"
X Link 2025-11-05T14:29Z 275.7K followers, 12.1K engagements

"This is an idea I have been posting about for the last couple of weeks. I've been building deep research subagents and have a repo/code-mapper to run deep research in codebases. Code understanding leads to output quality that's on another level. https://x.com/omarsar0/status/1978235329237668214 Claude Code subagents are all you need. Some will complain on # of tokens. However the output this spits out will save you days. The code quality is mindblowing Agentic search works exceptionally well. The subagents run in parallel. ChatGPT's deep research is no match https://t.co/I5cCKLZJuV"
X Link 2025-11-05T14:29Z 273.3K followers, [----] engagements

"Codemaps is more polished and tightly integrated into Windsurf which is the IDE I already use. If you are interested I will be showcasing and talking more about this in tomorrow's workshop for my academy subs. https://dair-ai.thinkific.com/courses/ai-agents-workshops https://dair-ai.thinkific.com/courses/ai-agents-workshops"
X Link 2025-11-05T14:29Z 272.8K followers, [----] engagements

"Anthropic just posted another banger guide. This one is on building more efficient agents to handle more tools and efficient token usage. This is a must-read for AI devs (bookmark it) It helps with three major issues in AI agent tool calling: token costs latency and tool composition. How It combines code executions with MCP where it turns MCP servers into code APIs rather than direct tool calls. Here is all you need to know: [--]. Token Efficiency Problem: Loading all MCP tool definitions upfront and passing intermediate results through the context window creates massive token overhead sometimes"
X Link 2025-11-05T15:53Z 280.7K followers, 181.9K engagements

"Confidence is everything when building great software. Love how Yansu is approaching this. Yansu is a new AI coding platform built by @isoformai for serious and complex software development. It puts scenario simulation before coding. Here is the sauce:"
X Link 2025-11-05T17:10Z 274.4K followers, 11.5K engagements

"@AdamDittrichOne I am working on some new material to make all of these context engineering approaches and MCP building more accessible. More soon"
X Link 2025-11-05T21:18Z 274.3K followers, [----] engagements

"MCP when used correctly with AI agents is extremely high-leverage. To make MCP more approachable I just launched our first course on the topic. As the name implies anyone can take and find this short course useful. Check it out here: https://dair-ai.thinkific.com/courses/building-mcp-servers https://dair-ai.thinkific.com/courses/building-mcp-servers"
X Link 2025-11-06T00:00Z 274.3K followers, 15.2K engagements

"Building MCP tools for LLMs is a standout AI skill today. I just launched a short hands-on lab to help everyone build their first MCP server and integrate it with AI tools like ChatGPT and Claude. Anyone (even non-technical people) will find the lab useful. What you will learn: - How to index your data into Pinecone (popular vector store) - Build an MCP server in n8n on top of the data source - Build and connect an agentic search tool that efficiently retrieves data from your Pinecone datasource - And connect and test your MCP server and tools inside ChatGPT and Claude Learn it once and scale"
X Link 2025-11-06T14:23Z 275.6K followers, 53.3K engagements

"Claude Code + Bash + Skills is all you need Jokes aside after playing around with a couple of computer-using agents I gave up on them. They are heavily optimized for clicking on common buttons and interfaces (e.g. order delivery) but suck at actual creative work like writing editing formatting etc. I decided to move those workflows to Claude Code + Skills. I wasn't so hopeful because unlike other tasks creative tasks benefit from visual cues hence the strong urge to use computer-using agents. Night and day Turns out Claude Code + bash + Skills is all I needed for the creative work I was"
X Link 2025-11-06T22:56Z 274.9K followers, 55.6K engagements

"Unlocking the Power of Multi-Agent LLM for Reasoning Designing and optimizing multi-agent systems is important. This paper analyzes multiagent systems where one metathinking agent plans and another reasoning agent executes and identifies a lazy agent failure mode. They find that one agent does most work the other contributes little essentially collapsing into a singleagent system. This is something that happens very often and that you might not want in your design. To address this issue they propose Dr. MAMR (MultiAgent MetaReasoning Done Right) which introduces a Shapleystyle causal"
X Link 2025-11-07T16:21Z 275.6K followers, 30.3K engagements

"@SrinivasValekar this test was on a customer support agentic system but i think the agent coding use case will be interesting too. on it"
X Link 2025-11-09T17:04Z 273.5K followers, [----] engagements

"This is a wild use case I used Gamma + n8n to automatically generate a complete presentation on AI Agents research. In just minutes It combines web search (for research) GPT-5 (narrative) and Gamma (for slide content generation). Full workflow breakdown below 👇"
X Link 2025-11-10T15:10Z 276.7K followers, 50.1K engagements

"1/ THE PROBLEM: Creating visual content is time-consuming. Research takes hours. Writing requires deep focus. Design demands specialized skills. What if AI could handle the entire pipeline"
X Link 2025-11-10T15:10Z 273.6K followers, [----] engagements

"2/ THE SOLUTION: An n8n workflow that orchestrates Tavily for web research GPT-5 for storytelling Gamma for visual generation and Google Sheets for tracking. You provide a topic and audience. The system outputs a LinkedIn-ready carousel"
X Link 2025-11-10T15:10Z 273.6K followers, [----] engagements

"5/ VISUAL GENERATION Gamma API takes the narrative and generates a scroll-friendly carousel with engaging headlines professional design and relevant imagery using Imagen [--] Pro. It specifies [--] cards in social format with custom instructions tailored to your target platform"
X Link 2025-11-10T15:10Z 273.6K followers, [---] engagements

"6/ WORKFLOW BENEFITS: Time goes from hours to minutes. Quality remains high with researched content and visual polish. You can also refine everything in the Gamma UI. Perfect for content creators and technical communicators who need to scale their output"
X Link 2025-11-10T15:10Z 273.6K followers, [---] engagements

"7/ USE CASES: This workflow is perfect for daily research summaries weekly newsletters community content marketing materials and educational content. It targets visual learners who consume content efficiently and need professional-looking presentations quickly"
X Link 2025-11-10T15:10Z 274.9K followers, [----] engagements

"8/ GIVEAWAY I am sharing the full n8n workflow JSON and video walkthrough. Comment Gamma & Ill DM it to you"
X Link 2025-11-10T15:10Z 274.9K followers, [----] engagements

"@thisisgrantlee @sarahdingwang @a16z Gamma is awesome Thanks for allowing me to test it out. https://x.com/omarsar0/status/1987900645031252298s=20 This is a wild use case I used Gamma + n8n to automatically generate a complete presentation on AI Agents research. In just minutes It combines web search (for research) GPT-5 (narrative) and Gamma (for slide content generation). Full workflow breakdown below 👇 https://t.co/4J9zM8tpuA https://x.com/omarsar0/status/1987900645031252298s=20 This is a wild use case I used Gamma + n8n to automatically generate a complete presentation on AI Agents"
X Link 2025-11-10T15:30Z 275.6K followers, [----] engagements

"It turns out that Kimi K2 Thinking is also a beast at deep research. It can run 200-300 tool requests for impressive multi-agent capabilities. Would you like to see a code example of it Kimi K2 Thinking is a bigger deal than I thought I just ran a quick eval on a deep agent I built for customer support. It's on par with GPT-5; no other LLM has reached this level of agentic orchestration and reasoning capabilities. Huge for agentic and reasoning tasks. https://t.co/tW3BYThgPf Kimi K2 Thinking is a bigger deal than I thought I just ran a quick eval on a deep agent I built for customer support."
X Link 2025-11-10T15:58Z 276.1K followers, 34.9K engagements

"Hey AI Devs Don't sleep on the new Gemini File Search API Feels like the easiest way to build agentic RAG systems. I built a little MCP server to analyze codebases with semantic search (Gemini File Search) & agentic search. Fun chatting with @karpathy's nanochat project"
X Link 2025-11-11T13:23Z 277K followers, 89.9K engagements

"@karpathy The best part is that at this moment you only pay for indexing. Thanks to @OfficialLoganK and team for this"
X Link 2025-11-11T13:28Z 273.9K followers, [----] engagements

"@karpathy @OfficialLoganK I also like the logs view in Google AI Studio. I show it towards the end of the clip. It gives me a good understanding of what happened on each request. Working on something to let folks try it out"
X Link 2025-11-11T13:32Z 273.9K followers, [----] engagements

"@karpathy @OfficialLoganK As for the agent orchestration that's powering the MCP server I am using the AI SDK v6 for the agentic loop. It allows both semantic search via Gemini's File Search and keyword search over files. I am finding that having both forms of search makes it more robust"
X Link 2025-11-11T13:35Z 273.9K followers, [----] engagements

"@HashgraphOnline @karpathy It works great on large codebases. More efficient than regular keyword/grep search. For now I am using simple similarity metrics but this can be tuned further. I combine both semantic + keyword search. It's working great so far. More testing is required"
X Link 2025-11-11T13:43Z 273.8K followers, [---] engagements

"@karpathy @OfficialLoganK The inspiration behind the MCP server is that I want to be able to use it everywhere -- Claude ChatGPT Claude Code etc"
X Link 2025-11-11T13:47Z 276.2K followers, [----] engagements

"It's crazy to me how tedious it can be to build a RAG system today. All the setup required for it is insane. This File Search API simplifies everything. Now I can focus on the agent harness and context engineering that takes a bit of time to get right but is important for user experience"
X Link 2025-11-11T13:57Z 276.2K followers, [----] engagements

"@Beareka @karpathy I will share more on this soon. The MCP stuff was just an easy way to showcase the File Search API at work but there are actually many new implementation details I have baked into the MCP component"
X Link 2025-11-11T15:03Z 273.8K followers, [---] engagements

"This simple Claude Code hack has reduced token usage by 90%. It adopts the "Code Execution with MCP" concept published by Anthropic. Remove preloaded MCP tools from context and use Python to execute tools via bash instead. BTW this can be optimized much further. Insane"
X Link 2025-11-11T15:35Z 277.5K followers, 135.2K engagements

"For this test I used a prebuilt py script but ultimately what I want is for the agent (in this case Claude Code) to generate whatever code it needs to execute the right sequence of tools. The best part is that we avoid context bloat. It keeps CC more focused on what matters"
X Link 2025-11-11T15:39Z 275.4K followers, 11.6K engagements

Limited data mode. Full metrics available with subscription: lunarcrush.com/pricing

@omarsar0
/creator/twitter::omarsar0