Dark | Light
# ![@ankrgyl Avatar](https://lunarcrush.com/gi/w:26/cr:twitter::161602232.png) @ankrgyl Ankur Goyal

Ankur Goyal posts on X about ai, in the, braintrust, products the most. They currently have [------] followers and [---] posts still getting attention that total [------] engagements in the last [--] hours.

### Engagements: [------] [#](/creator/twitter::161602232/interactions)
![Engagements Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::161602232/c:line/m:interactions.svg)

- [--] Week [-------] +37%
- [--] Month [-------] -20%
- [--] Months [---------] +132%
- [--] Year [---------] +107%

### Mentions: [--] [#](/creator/twitter::161602232/posts_active)
![Mentions Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::161602232/c:line/m:posts_active.svg)

- [--] Week [--] -33%
- [--] Month [--] -18%
- [--] Months [---] +182%
- [--] Year [---] +24%

### Followers: [------] [#](/creator/twitter::161602232/followers)
![Followers Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::161602232/c:line/m:followers.svg)

- [--] Week [------] +0.43%
- [--] Month [------] +2.70%
- [--] Months [------] +20%
- [--] Year [------] +48%

### CreatorRank: [-------] [#](/creator/twitter::161602232/influencer_rank)
![CreatorRank Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::161602232/c:line/m:influencer_rank.svg)

### Social Influence

**Social category influence**
[technology brands](/list/technology-brands)  10.26% [cryptocurrencies](/list/cryptocurrencies)  9.4% [finance](/list/finance)  5.98% [social networks](/list/social-networks)  1.71% [vc firms](/list/vc-firms)  0.85%

**Social topic influence**
[ai](/topic/ai) 25.64%, [in the](/topic/in-the) 10.26%, [braintrust](/topic/braintrust) #15, [products](/topic/products) 6.84%, [code](/topic/code) 6.84%, [llm](/topic/llm) 5.98%, [prompt](/topic/prompt) 5.13%, [open ai](/topic/open-ai) 5.13%, [systems](/topic/systems) 5.13%, [super](/topic/super) 4.27%

**Top accounts mentioned or mentioned by**
[@braintrustdata](/creator/undefined) [@braintrust](/creator/undefined) [@financialvice](/creator/undefined) [@eladgil](/creator/undefined) [@alanaagoyal](/creator/undefined) [@benhylak](/creator/undefined) [@thejaykhatri](/creator/undefined) [@zxytim](/creator/undefined) [@courtstarr](/creator/undefined) [@akionuernberger](/creator/undefined) [@pradyuprasad](/creator/undefined) [@basecasevc](/creator/undefined) [@saammotamedi](/creator/undefined) [@greylockvc](/creator/undefined) [@vercel](/creator/undefined) [@replit](/creator/undefined) [@andrewqu](/creator/undefined) [@cramforce](/creator/undefined) [@mikepmunroe](/creator/undefined) [@morganepaloma](/creator/undefined)

**Top assets mentioned**
[Braintrust (BTRST)](/topic/braintrust)
### Top Social Posts
Top posts by engagements in the last [--] hours

"The couple of cases it scored lower on are a bit ambiguous and IMO open to interpretation. For example in this case the output produces the correct answer but misses some of the "frills" in the ground truth I think this is a benchmark quirk. https://www.braintrust.dev/app/braintrust-labs/p/bash-evals/experiments/sql-glm-5-2f26a644c=sql-glm-5&r=13095d8a-250e-4af7-8b11-5d2c9f3e365d&s=13095d8a-250e-4af7-8b11-5d2c9f3e365d&fs=1"  
[X Link](https://x.com/ankrgyl/status/2021767118644293781)  2026-02-12T02:03Z 10.6K followers, [---] engagements


"why is glm5 missing on all the inference providers am i missing something"  
[X Link](https://x.com/ankrgyl/status/2021726694982426679)  2026-02-11T23:23Z 10.6K followers, 13.1K engagements


"what is safer from an ip standpoint use american models and let the LLM provider do who knows what with your API requests use chinese models that you can run on your own GPUs"  
[X Link](https://x.com/ankrgyl/status/2021815422589579378)  2026-02-12T05:15Z 10.6K followers, 13.2K engagements


"Despite modern LLMs' familiarity with bash it turns out that they're pretty darn great at writing SQL. In fact many highly unstructured problems mapped to bash are both less accurate and less token efficient than just giving an agent SQL. We dug into this in depth"  
[X Link](https://x.com/ankrgyl/status/2014440816820027900)  2026-01-22T20:51Z 10.6K followers, 30K engagements


"We did a super fun exploration of this with @andrewqu and @cramforce optimized an agent that used sql bash and both. Ultimately the bash+sql agent was reliably the highest accuracy with some trade offs in cost/latency. https://www.braintrust.dev/blog/bash-agent-evals https://www.braintrust.dev/blog/bash-agent-evals"  
[X Link](https://x.com/ankrgyl/status/2014440818682261660)  2026-01-22T20:51Z 10.6K followers, [----] engagements


"A few learnings: * Looking at traces together is critical. Question every eng decision and experiment * Our test data had tons of mistakes. LOOK AT THE DATA. * You do not need much code to build a really powerful agent (this is just AI SDK + Braintrust + Claude Code)"  
[X Link](https://x.com/ankrgyl/status/2014440820095832430)  2026-01-22T20:51Z 10.6K followers, [----] engagements


"I am very excited about this model but at least on our fairly practical bash-sql-eval it was not competitive with sonnet [---]. it is much faster and uses fewer tokens though so i suspect there's room to prompt it to do more work and increase performance. ๐Ÿฅ Meet Kimi K2.5 Open-Source Visual Agentic Intelligence. ๐Ÿ”น Global SOTA on Agentic Benchmarks: HLE full set (50.2%) BrowseComp (74.9%) ๐Ÿ”น Open-source SOTA on Vision and Coding: MMMU Pro (78.5%) VideoMMMU (86.6%) SWE-bench Verified (76.8%) ๐Ÿ”น Code with Taste: turn chats https://t.co/wp6JZS47bN ๐Ÿฅ Meet Kimi K2.5 Open-Source Visual Agentic"  
[X Link](https://x.com/ankrgyl/status/2016282182474727488)  2026-01-27T22:48Z 10.6K followers, [----] engagements


"here's an example of a relatively silly error. it thinks the total database is [-----] events rather than [------] (which gpt-4.1-mini the grader successfully dinged it on) https://www.braintrust.dev/app/braintrust-labs/p/bash-evals/experiments/sql-kimi-k2.5c=sql-claude-sonnet-4-5&diff=offbetween_experiments&r=aa7e5bdd-376e-4175-91e0-eed778532671&s=aa7e5bdd-376e-4175-91e0-eed778532671 https://www.braintrust.dev/app/braintrust-labs/p/bash-evals/experiments/sql-kimi-k2.5c=sql-claude-sonnet-4-5&diff=offbetween_experiments&r=aa7e5bdd-376e-4175-91e0-eed778532671&s=aa7e5bdd-376e-4175-91e0-eed778532671"  
[X Link](https://x.com/ankrgyl/status/2016282184366371153)  2026-01-27T22:48Z 10.6K followers, [---] engagements


"@zxytim thinking. at least there are thinking outputs in the results we got back from the model. feel free to poke around the code or eval results https://www.braintrust.dev/app/braintrust-labs/p/bash-evals/experiments/sql-kimi-k2.5c=&diff=offbetween_experiments&r=aa7e5bdd-376e-4175-91e0-eed778532671&s=aa7e5bdd-376e-4175-91e0-eed778532671 https://github.com/braintrustdata/bash-agent-evals https://www.braintrust.dev/app/braintrust-labs/p/bash-evals/experiments/sql-kimi-k2.5c=&diff=offbetween_experiments&r=aa7e5bdd-376e-4175-91e0-eed778532671&s=aa7e5bdd-376e-4175-91e0-eed778532671"  
[X Link](https://x.com/ankrgyl/status/2016318254336389270)  2026-01-28T01:11Z 10.6K followers, [---] engagements


"whats the best linux laptop hardware (assuming you can max out dont care about portability)"  
[X Link](https://x.com/ankrgyl/status/2016341206784151825)  2026-01-28T02:43Z 10.6K followers, [----] engagements


"@mikepmunroe @braintrust Let me know when you try it out. Would love to hear your feedback ๐Ÿ™"  
[X Link](https://x.com/ankrgyl/status/2016543484895977839)  2026-01-28T16:06Z 10.6K followers, [---] engagements


"boring: coding agents that do even more stuff for me in the background exciting: new UIs/workflows that help me do the new parts of my job better. for example personal code review. https://twitter.com/i/web/status/2016962416844951807 https://twitter.com/i/web/status/2016962416844951807"  
[X Link](https://x.com/ankrgyl/status/2016962416844951807)  2026-01-29T19:51Z 10.6K followers, [----] engagements


"@financialvice @benhylak Unfortunately 100% of my coding time is spent making @braintrust better and this is outside of our product scope"  
[X Link](https://x.com/ankrgyl/status/2016963889972597012)  2026-01-29T19:57Z 10.6K followers, [---] engagements


"@braintrust events are popping evals are a critical step in shipping quality AI. And builders are paying attention cos quality doesnt improve by accident. @braintrust evals on tap SF. https://t.co/XT7p9S9rZa evals are a critical step in shipping quality AI. And builders are paying attention cos quality doesnt improve by accident. @braintrust evals on tap SF. https://t.co/XT7p9S9rZa"  
[X Link](https://x.com/ankrgyl/status/2017291138013794326)  2026-01-30T17:37Z 10.6K followers, [----] engagements


"The "agent line" is the capability threshold of an agent to independently perform engineering work. Many tasks (eg can be one-shot) are below the "agent line" As a human engineer you need to be "above" not "below" the agent line. This takes an attitude and focus adjustment"  
[X Link](https://x.com/ankrgyl/status/2018826414787903827)  2026-02-03T23:18Z 10.6K followers, [----] engagements


"@courtstarr @eladgil @alanaagoyal @braintrust ๐Ÿ™ DM me your feedback if you try us out"  
[X Link](https://x.com/ankrgyl/status/2018827198392869198)  2026-02-03T23:21Z 10.6K followers, [---] engagements


"everyone should do this. open up codex paste this tweet and have it audit your code. Here's a [--] GB memory reduction for very long Claude Code sessions Before: () = controller.abort() Fix: controller.abort.bind(controller) https://t.co/4tsFe4VJVL Here's a [--] GB memory reduction for very long Claude Code sessions Before: () = controller.abort() Fix: controller.abort.bind(controller) https://t.co/4tsFe4VJVL"  
[X Link](https://x.com/ankrgyl/status/2019132517836886291)  2026-02-04T19:34Z 10.6K followers, 32K engagements


"AI-powered coding bloats the illusion of work more significantly than it increases the output itself. Of course it does (or can) increase output. But it's easy to be tricked by the illusion of work and we have to learn to un-train our intuition that code - product output"  
[X Link](https://x.com/ankrgyl/status/2019138618376483052)  2026-02-04T19:58Z 10.6K followers, 29K engagements


"everything @morgane_paloma wrote here applies to building great products too at @braintrust there is a very strong continuum from product--marketing. it's one thing. Thoughts on taste (from a marketing pov): * Quality is the baseline - Taste starts with where you set the bar. You dont ship work you wouldnt put your name on you notice sloppiness others normalize and you feel friction when something is almost right but not quite. * Thoughts on taste (from a marketing pov): * Quality is the baseline - Taste starts with where you set the bar. You dont ship work you wouldnt put your name on you"  
[X Link](https://x.com/ankrgyl/status/2019171810974298364)  2026-02-04T22:10Z 10.6K followers, [----] engagements


"@aidenybai i think fundamentally it demonstrates that you can design an experience from first principles that is powerful and surprisingly simple. the popular UIs are all quite bloated so its refreshing to be able to accomplish just as much without them. imo UI will win long term"  
[X Link](https://x.com/ankrgyl/status/2019459242210603337)  2026-02-05T17:12Z 10.6K followers, [----] engagements


"AI has gone from weights to APIs to a rich and thriving ecosystem of models SDKs services and agents. Excited to build more and more integrations into all of these systems so you can evaluate and observe them with Braintrust"  
[X Link](https://x.com/ankrgyl/status/2019545921323954303)  2026-02-05T22:57Z 10.6K followers, [----] engagements


"In-product chat is great but sometimes it's not enough. Loop now lets you directly escalate to our support team when you need more help"  
[X Link](https://x.com/ankrgyl/status/2019579105839403256)  2026-02-06T01:09Z 10.6K followers, [----] engagements


"I'm quite excited about this concept but I think it is risky to compare it to a sandbox. I found a trivial exploit within like [--] minutes of poking around the source code. Fuck it a bit early but here goes: Monty: a new python implementation from scratch in rust for LLMs to run code without host access. Startup time measured in single digit microseconds not seconds. @mitsuhiko here's another sandbox/not-sandbox to be snarky about ๐Ÿ˜œ Thanks Fuck it a bit early but here goes: Monty: a new python implementation from scratch in rust for LLMs to run code without host access. Startup time measured"  
[X Link](https://x.com/ankrgyl/status/2019904571536207950)  2026-02-06T22:42Z 10.6K followers, [----] engagements


"the interruption UX in codex has gotten really good. i think this is the new standard"  
[X Link](https://x.com/ankrgyl/status/2020578720302707147)  2026-02-08T19:21Z 10.6K followers, [----] engagements


"over: database-wrapper software not over: llm-wrapper software"  
[X Link](https://x.com/ankrgyl/status/2020695633427087537)  2026-02-09T03:05Z 10.6K followers, [----] engagements


"trying to see if [---] has a sense of humor"  
[X Link](https://x.com/ankrgyl/status/2020697398033735711)  2026-02-09T03:12Z 10.6K followers, [----] engagements


"this seems like a very clear distinction to me RLMs force the agent to explicitly reason about what enters context vs what is managed as opaque state. @Teknium The following are not standard in a coding agent: [--]. The user prompt P itself (not just external data) is a symbolic object in the environment. The model is not allowed to grep/read long snippets from P. [--]. The model has to write recursive code (that calls LMs) to understand or https://t.co/DzPIcfHAuA @Teknium The following are not standard in a coding agent: [--]. The user prompt P itself (not just external data) is a symbolic object in"  
[X Link](https://x.com/ankrgyl/status/2020885331269312566)  2026-02-09T15:39Z 10.6K followers, 11.3K engagements


"@AkioNuernberger @braintrust Sorry about that. This was not intentional and will be fixed ASAP"  
[X Link](https://x.com/ankrgyl/status/2021094147269206149)  2026-02-10T05:29Z 10.6K followers, [----] engagements


"@samrags_ Having worked closely with them and visited the office many times I think it's quite simple: * Important problem for customers * Incredibly smart colleagues * Fun / energetic vibes It's just a great environment"  
[X Link](https://x.com/ankrgyl/status/2021321206155313285)  2026-02-10T20:31Z 10.6K followers, [----] engagements


"Full text search is super fast but often limited. You usually have to provide an exact set of tokens for the engine to consecutively match. Until now We just shipped support for '%' which even works in the realtime search bar :)"  
[X Link](https://x.com/ankrgyl/status/2018838060470059072)  2026-02-04T00:04Z 10.6K followers, [----] engagements


"GLM5 is an impressive model. It's the first OSS model to perform competitively well to a leading commercial model (claude sonnet 4.5) on our bash eval"  
[X Link](https://x.com/ankrgyl/status/2021767117100790186)  2026-02-12T02:03Z 10.6K followers, [----] engagements


"It's also fast. I'm hoping that the inference providers will compete to make it both ultra-fast and somewhat-cheap. If they do. I think there is a very real chance this thing ends up in a lot of products"  
[X Link](https://x.com/ankrgyl/status/2021767120330403903)  2026-02-12T02:03Z 10.6K followers, [---] engagements


"remember when docker came out there were tons of PaaS startups (incl. dotCloud which became Docker) and none had much traction then Solomon gave a talk about Docker and like [--] week later everyone was using it i feel like that's about to happen for sandboxes"  
[X Link](https://x.com/ankrgyl/status/2021807938235678914)  2026-02-12T04:45Z 10.6K followers, [----] engagements


"@PradyuPrasad People are often worried about tampering (eg imagine a model misbehaves under very specific conditions that are designed to mess with your biz)"  
[X Link](https://x.com/ankrgyl/status/2021817065414496269)  2026-02-12T05:22Z 10.6K followers, [---] engagements


"@theJayKhatri not sure i agree. my local dev loop (even when not handwriting code) is significantly faster than any sandbox solution i've tried. i'm sure this is not true of expert homegrown solutions but is not yet true of commercial solutions"  
[X Link](https://x.com/ankrgyl/status/2021842771301289988)  2026-02-12T07:04Z 10.6K followers, [---] engagements


"Still digesting minimax's m2.5 but one interesting snippet is that they chose to emulate the Anthropic API vs. the OpenAI one. First time I've seen this"  
[X Link](https://x.com/ankrgyl/status/2022036982327914607)  2026-02-12T19:55Z 10.6K followers, [---] engagements


"Quick benchmark suggests minimax m2.5 is close to GLM5 and Sonnet. I'm curious what the US inference provider price will be but with current available numbers it's 3x cheaper than GLM5 (which is 3x cheaper than Sonnet) GLM5 is an impressive model. It's the first OSS model to perform competitively well to a leading commercial model (claude sonnet 4.5) on our bash eval. https://t.co/zujV3IQ1lU GLM5 is an impressive model. It's the first OSS model to perform competitively well to a leading commercial model (claude sonnet 4.5) on our bash eval. https://t.co/zujV3IQ1lU"  
[X Link](https://x.com/ankrgyl/status/2022051654108164393)  2026-02-12T20:54Z 10.6K followers, [---] engagements


"Amidst the chaos we built something: an AI proxy that lets you use a variety of providers (OpenAI Anthropic LLaMa2 Mistral and others) behind a single interface w/ caching & API key management"  
[X Link](https://x.com/anyuser/status/1726640887227388210)  2023-11-20T16:37Z 10.6K followers, 412.1K engagements


"The blog post goes into details on how it all works under the hood. The proxy is available for anyone to use with or without a @braintrustdata account for free. https://www.braintrustdata.com/blog/ai-proxy https://www.braintrustdata.com/blog/ai-proxy"  
[X Link](https://x.com/ankrgyl/status/1726640889865617607)  2023-11-20T16:37Z 10.6K followers, [----] engagements


"Very excited to announce that the proxy is now open source The response has been overwhelming and we have a simple framework for what we open source: if it's on the critical path of production it ought to be OSS https://github.com/braintrustdata/braintrust-proxy Amidst the chaos we built something: an AI proxy that lets you use a variety of providers (OpenAI Anthropic LLaMa2 Mistral and others) behind a single interface w/ caching & API key management. https://t.co/jJWiRElMHB https://github.com/braintrustdata/braintrust-proxy Amidst the chaos we built something: an AI proxy that lets you use"  
[X Link](https://x.com/anyuser/status/1729212862902755638)  2023-11-27T18:57Z 10.6K followers, 71.8K engagements


"Extremely proud of @alanaagoyal for being featured in Forbes She is an inspiration and is trailblazing the next generation of VCs who put in the work to help your company succeed. We are lucky to have @basecasevc as an investor in @braintrustdata. https://www.forbes.com/30-under-30/2024/venture-capital https://www.forbes.com/30-under-30/2024/venture-capital"  
[X Link](https://x.com/anyuser/status/1729526671618060460)  2023-11-28T15:44Z 10.6K followers, 31.9K engagements


"Not too long ago I announced a new company called @braintrustdata. Today I'm super excited to share our $5m seed round led by @saammotamedi at @GreylockVC"  
[X Link](https://x.com/anyuser/status/1734971104765280263)  2023-12-13T16:18Z 10.6K followers, 199.4K engagements


"New PMs (esp frmr eng) are often very worried about scope sequencing etc. Their intuition is to perfectly articulate the next few days of work into small tasks and closely monitor eng progress. This is the opposite of what you should do Great eng want visionary forward looking (quarters or years) well researched roadmaps that are rooted in market research and customer learnings. They are more than capable of sequencing their own work. Also incredibly motivating to work towards a big vision vs. incremental tasks"  
[X Link](https://x.com/ankrgyl/status/1741241178060914998)  2023-12-30T23:33Z 10.6K followers, 17.4K engagements


"It's truly wild how many small bugs you have to fix to build a great product"  
[X Link](https://x.com/ankrgyl/status/1766582504792723797)  2024-03-09T21:51Z 10.6K followers, 22.4K engagements


"Stare at race condition for 2h Go on a 5min walk Solve race in 30s"  
[X Link](https://x.com/ankrgyl/status/1768095832128274803)  2024-03-14T02:04Z 10.6K followers, [----] engagements


"Quick rant about RDBMS multi-tenancy and AI. Data warehouses are great if you control the shape of your data and hand-code schema definitions. And they are basically useless for multi-tenant products (like @datadoghq @splunk @braintrustdata etc.) where you have to ingest customer data. The recommended fixes -- "reshape your data" "add casts" "fix the schema" "create more tables" etc. -- simply do not work with multi-tenant data. In fact companies that scale like @datadoghq end up building tech like which overlap significantly with powerful tech buried/shackled inside of RDBMS. The raw query"  
[X Link](https://x.com/anyuser/status/1775941202510246063)  2024-04-04T17:39Z 10.6K followers, 28.7K engagements


"Super excited for @braintrustdata to be part of the [----] enterprise tech [--] list by @Wing_VC. We exist to serve our customers -- and feel very grateful to be recognized for the traction we've demonstrated so far by focusing on that. Excited for what's ahead in [----] and beyond"  
[X Link](https://x.com/anyuser/status/1777745974703816967)  2024-04-09T17:10Z 10.6K followers, 26.3K engagements


"Extremely excited to unveil our new landing page We wanted to elevate our message with visuals that speak to how tough it is to build effective AI products without great evals logging and prompt tooling"  
[X Link](https://x.com/ankrgyl/status/1780277586561769784)  2024-04-16T16:50Z 10.6K followers, 13.6K engagements


"If you're using AI on the internet it's likely eval'd by @braintrustdata"  
[X Link](https://x.com/ankrgyl/status/1802868475129680066)  2024-06-18T00:58Z 10.6K followers, 15.7K engagements


"Amazing to see a FastCompany article featuring @alanaagoyal who rarely speaks publicly about her approach to investing. We write a lot of code in the Goyal household :)"  
[X Link](https://x.com/ankrgyl/status/1810021564207042860)  2024-07-07T18:42Z 10.6K followers, 159.7K engagements


"Turns out Llama [---] can do tool calls with prompt engineering I did a deep dive on this and it is remarkably accurate (much more than any other open model we've tested). ๐Ÿ‘‡"  
[X Link](https://x.com/ankrgyl/status/1816914982631800934)  2024-07-26T19:14Z 10.6K followers, 17.9K engagements


"Not gonna name names but would highly recommend running evals on the same model (this is "llama-3.1-8B") across inference providers. Looks like massive difference in quality"  
[X Link](https://x.com/ankrgyl/status/1817563275313451097)  2024-07-28T14:10Z 10.6K followers, 37.6K engagements


"Okay I had a chance to do a more thorough test of [--] providers (1-3). The results are pretty shocking ๐Ÿคฏ Details and code below"  
[X Link](https://x.com/anyuser/status/1818036782543634439)  2024-07-29T21:32Z 10.6K followers, 47.7K engagements


"Today is @braintrustdata's [--] year bday Things I'm most proud of: * Supporting the world's best AI products: @zapier @NotionHQ @vercel @airtable @coda_hq @Replit @browsercompany @Superhuman and many others * With humble smart down-to-earth colleagues and investors"  
[X Link](https://x.com/anyuser/status/1829666706022940985)  2024-08-30T23:45Z 10.6K followers, 21K engagements


"I deeply relate to this post -- I was "VPE" at [--] and "CEO" at [--]. The advice I got was "stop coding" "hire PMs" "delegate everything". Listening to that crap was catastrophic. Now I spend all day talking to users writing code/content and collaborating with ICs. Founder Mode: https://t.co/3hOnlKOJBi Founder Mode: https://t.co/3hOnlKOJBi"  
[X Link](https://x.com/anyuser/status/1830307891649433843)  2024-09-01T18:13Z 10.6K followers, 95K engagements


"Lots of noise about how product doesnt matter anymore because AI. only distribution As someone who is all in on AI coding it is still extremely hard to build a good product. Code volume was never the bottleneck"  
[X Link](https://x.com/ankrgyl/status/1832428521769111866)  2024-09-07T14:39Z 10.6K followers, [----] engagements


"some predictions on o1 and what it means for ai eng: * more evidence that convoluted/over complicated agent frameworks are not the future * more english fewer programs * expect async to be the next streaming"  
[X Link](https://x.com/ankrgyl/status/1834325648510476760)  2024-09-12T20:18Z 10.6K followers, [----] engagements


"Object storage is getting so good that any database system not built on object storage will be obsolete in a few years"  
[X Link](https://x.com/ankrgyl/status/1840146666206441486)  2024-09-28T21:48Z 10.6K followers, 46.8K engagements


"Excited to share that we've raised $36m from @martin_casado at @a16z along with @saammotamedi @GreylockVC @eladgil @basecasevc to further our mission of helping developers build AI products that work. A bit more on what we're up to ๐Ÿงต"  
[X Link](https://x.com/anyuser/status/1843685307344146833)  2024-10-08T16:10Z 10.6K followers, 172.3K engagements


"If yesterday was not enough Braintrust for you -- I went on No priors with @eladgil to talk about the journey so far fundraise and what's ahead. Was especially fun to talk about learnings from Impira since Elad was an investor there too :)"  
[X Link](https://x.com/anyuser/status/1844045170331070613)  2024-10-09T16:00Z 10.6K followers, 10.4K engagements


"LLM-as-a-judge is a very powerful technique but it's difficult to get reliable results. The #1 mistake I see people make is to have an LLM produce a numeric score which like asking a human to rate 1-10 is not precise. We reproduce that and walk through how to do better :) LLM-as-a-judge scorers are a powerful tool you can use when you need to evaluate more complex responses to LLM calls. We published an OpenAI cookbook to work through different strategies for detecting hallucinations check it out in the @OpenAIDevs cookbook library. https://t.co/25FJfadcOe LLM-as-a-judge scorers are a"  
[X Link](https://x.com/ankrgyl/status/1851284901812969703)  2024-10-29T15:28Z 10.6K followers, 82.2K engagements


"If you're starting a company take customer interviews EXTREMELY seriously. This was prob the main thing we did well when starting @braintrustdata. Key things are: * Research as much as possible to avoid wasting time on basic context * Come with specific ideas. "What do you think of X" vs. "what should we build Tell me everything you've looked at" * Offer something: a demo early access to a product insights about the space"  
[X Link](https://x.com/ankrgyl/status/1857183373825098135)  2024-11-14T22:06Z 10.6K followers, [----] engagements


"If you want someone to help with your code the #1 thing you can do is create the smallest possible easiest to run repro. It feels like a huge pain in the ass but (a) you have more context and (b) the other person would have to do this anyway"  
[X Link](https://x.com/ankrgyl/status/1858230226762801310)  2024-11-17T19:26Z 10.6K followers, 35.6K engagements


"At some point the AI ecosystem is going to move beyond dual implementing everything in python and typescript"  
[X Link](https://x.com/anyuser/status/1858635054702096477)  2024-11-18T22:15Z 10.6K followers, 13.6K engagements


""React for AI" is going to come from neither an LLM vendor nor a frameworks vendor. It'll come from a company who builds a top [--] LLM application at scale. In the meantime buckle up for lots of distracting big announcements"  
[X Link](https://x.com/anyuser/status/1861124510470332790)  2024-11-25T19:07Z 10.6K followers, 34.8K engagements


"@rauchg @nextjs @vercel @nodejs Websockets"  
[X Link](https://x.com/anyuser/status/1872731035395604934)  2024-12-27T19:47Z 10.6K followers, [----] engagements


"things that most serverless providers dont handle that will be existential for AI in 2025: * Async execution (o1 frequently returns first byte after 1m) * Websockets (realtime audio) * Secure runtime code execution (AIs generate code where can I run it)"  
[X Link](https://x.com/anyuser/status/1873131006628352029)  2024-12-28T22:16Z 10.6K followers, 88.2K engagements


"To anti leet code folks -- just spent the last day optimizing a hash table implementation which makes a user visible action in our product 7x faster. Would not have been able to do that without knowledge of CS fundamentals (nor was today's AI)"  
[X Link](https://x.com/anyuser/status/1879327937306186030)  2025-01-15T00:41Z 10.6K followers, 78K engagements


"To me the term "agent" can be precisely defined as "a way to get budget allocated for my project". AI engineering is in its infancy and almost every sufficiently advanced team we work with abandons "agent architecture" in favor of "good engineering". Please stop calling a cron job an AI agent ๐Ÿ˜ญ Please stop calling a cron job an AI agent ๐Ÿ˜ญ"  
[X Link](https://x.com/ankrgyl/status/1879669350489371055)  2025-01-15T23:17Z 10.6K followers, 13.9K engagements


"I no longer speculate about how various systems work. Go to github clone locally open cursor and chat with the codebase to find out"  
[X Link](https://x.com/ankrgyl/status/1885101084315893994)  2025-01-30T23:01Z 10.6K followers, 124.3K engagements


"Reasoning for evals is such a killer use case As some of you have noticed avoid boomer prompts with o-series models. Instead be simple and direct with specific guidelines. Delimiters (xml tags) will help keep things clean for the model and improve output. Read the full best practices guide: https://t.co/mLi4M8woOs As some of you have noticed avoid boomer prompts with o-series models. Instead be simple and direct with specific guidelines. Delimiters (xml tags) will help keep things clean for the model and improve output. Read the full best practices guide: https://t.co/mLi4M8woOs"  
[X Link](https://x.com/ankrgyl/status/1890480151189700625)  2025-02-14T19:16Z 10.6K followers, 13.3K engagements


"When your startup starts to work EVERYTHING tries to pull you away from building on actually good product. Resist the temptation. Focus"  
[X Link](https://x.com/anyuser/status/1890506512759411179)  2025-02-14T21:00Z 10.6K followers, 16.2K engagements


"There are two somewhat rapidly diverging views in the AI eng world: * English is the new programming language (ppl will be writing prompts) * English is the new assembly language (programs will be generating prompts) Will be interesting to see how it plays out"  
[X Link](https://x.com/anyuser/status/1893723809263284424)  2025-02-23T18:05Z 10.6K followers, 205.3K engagements


"Super excited to announce probably our biggest release yet -- Brainstore Braintrust is now so fast that we have an instant search bar built into the logs page :) Braintrust is now 80x faster than any other LLM observability platform on the market. To achieve this benchmark we built Brainstore the first database designed to handle the high scale and complexity of AI data. https://t.co/7YESnVbQ1Q Braintrust is now 80x faster than any other LLM observability platform on the market. To achieve this benchmark we built Brainstore the first database designed to handle the high scale and complexity"  
[X Link](https://x.com/anyuser/status/1896612570586550497)  2025-03-03T17:24Z 10.6K followers, 34.1K engagements


"I think the world wants what MCP offers but not quite the protocol definition. I'm both interested to see what its next iteration looks like (specifically native remote APIs) and whether it takes off despite the design shortcomings"  
[X Link](https://x.com/ankrgyl/status/1898095059381403954)  2025-03-07T19:35Z 10.6K followers, 13K engagements


"Something small that I am very proud of @OpenAI included us in their tracing docs And Braintrust is the only SDK that actually works "as intended". I.e. we fit neatly into the tracing processor abstraction which works across both logs and evals"  
[X Link](https://x.com/anyuser/status/1899546102212538765)  2025-03-11T19:40Z 10.6K followers, 26.3K engagements


"I explored this idea a bit this weekend and it's now a first class MCP server that you can start using right away. This video takes a super lazy prompt I wrote to auto generate queries and improves it w/ few shots. I just hit command enter a few times :) Build your own pseudo DSPy-ish prompt auto-optimiser in [--] steps with @braintrustdata: [--]. Grab Braintrusts OpenAPI and turn into MCP (there are OSS libs to do that) [--]. Plug-in that MCP to Cursor [--]. In Agent mode: please optimise this prompt (@src/prompt.ts) based on evals (pnpm Build your own pseudo DSPy-ish prompt auto-optimiser in [--] steps"  
[X Link](https://x.com/ankrgyl/status/1903988673093538271)  2025-03-24T01:54Z 10.6K followers, 15.1K engagements


"AI seems to multiply the quality of code that someone would write without AI. Bad programmer * AI = Lots of bad code Medium programmer * AI = Lots of medium code Great programmer * AI = Lots of great code"  
[X Link](https://x.com/ankrgyl/status/1903996985713270931)  2025-03-24T02:27Z 10.6K followers, 34.3K engagements


"People say you no longer need to learn how to code every couple years. I think it's pretty exciting that technology progresses so fast that you no longer need to code the same things you used to. But I've never spent more time writing code than I do today. Thanks to AI. I think you should learn to code. I think you should learn to code"  
[X Link](https://x.com/ankrgyl/status/1905407332181577962)  2025-03-27T23:51Z 10.6K followers, 12.8K engagements


"The #1 most useful thing I learned early on as a programmer: When something doesn't work assume I am wrong. Not the computer. Not the library. Not my colleagues. Not the vendor. Me"  
[X Link](https://x.com/ankrgyl/status/1913356044807663728)  2025-04-18T22:16Z 10.6K followers, 43.1K engagements


"As models get more powerful i find myself focusing more effort on context engineering which is the task of bringing the right information (in the right format) to the LLM. Context engineering is hard because it is pervasive. You need to engineer every layer of the stack to capture and make context available. Send too little context and the LLM wont know what to do. Too much and youre out of tokens or the LLM gets lost. Good context engineering caches well. Bad context engineering is both slow and expensive"  
[X Link](https://x.com/anyuser/status/1913766591910842619)  2025-04-20T01:28Z 10.6K followers, 172.2K engagements


"RAG is one flavor of context engineering. For those that say RAG is dead hopefully this framing makes it obvious why thats not the case"  
[X Link](https://x.com/ankrgyl/status/1913776821537329270)  2025-04-20T02:08Z 10.6K followers, [----] engagements


"the fact that k8s still exists and thrives is the exception not the norm. shitty software does not usually make it"  
[X Link](https://x.com/anyuser/status/1914161948910121220)  2025-04-21T03:39Z 10.6K followers, 10K engagements


"Random observations from recent AI coding adventures: * You'll have to rip o3 out of my cold dead hands. It's that good. I prefer to interact with it directly on (UX + feels like it's the right format for open ended chats) * Agents are now "good enough" to write an overwhelming majority of tests including hardcore stuff (like brainstore for us). They benefit from very specific instructions about what you are trying to test and picking up on existing patterns. * AI code review is good enough that it should be on for every project (we use @withgraphite diamond). I find a lot of value in the"  
[X Link](https://x.com/ankrgyl/status/1916990064955167232)  2025-04-28T22:57Z 10.6K followers, 21.2K engagements


"every moment I spend with @alanaagoyal is the best part of every day highly recommend marrying someone great https://t.co/OCUvGUznci highly recommend marrying someone great https://t.co/OCUvGUznci"  
[X Link](https://x.com/ankrgyl/status/1920218173053546727)  2025-05-07T20:44Z 10.6K followers, 56.2K engagements


"keys to effective agentic llms: * long context w/ caching * extremely accurate tool calls * reliable API perf"  
[X Link](https://x.com/ankrgyl/status/1922802294703186095)  2025-05-14T23:52Z 10.6K followers, [----] engagements


"Getting really good at AI coding is a pretty fun new challenge. My goal is to produce very high quality code that will stand the test of time faster than I could without AI or with "basic" AI (code completion). It's quite different than vibe coding"  
[X Link](https://x.com/ankrgyl/status/1924252775703273893)  2025-05-18T23:56Z 10.6K followers, [----] engagements


"I personally don't like the term "eval driven development" I think evals are the end goal. In many ways a good AI system is literally just a good set of evals. Or more generally you need to develop two workflows: * A way to create evals. This is: good o11y + scoring functions + discipline to package up user issues into datasets * A way to convert good evals into good AI systems. This is somewhat manual right now but if you have good evals this is relatively simple/fun experimentation. I think some of the stuff we're working on right now will make this even clearer. Or said another way you'll"  
[X Link](https://x.com/anyuser/status/1927493331203875084)  2025-05-27T22:33Z 10.6K followers, [----] engagements


"We're back baby Evals should be easy. Meet Loop the AI agent for automatic prompt dataset and scorer optimization. @aiDotEngineer https://t.co/frhLLLceNw Evals should be easy. Meet Loop the AI agent for automatic prompt dataset and scorer optimization. @aiDotEngineer https://t.co/frhLLLceNw"  
[X Link](https://x.com/ankrgyl/status/1930794030067294348)  2025-06-06T01:09Z 10.6K followers, [----] engagements


"Counter intuitively if you're good at something that is probably the thing you should put the most effort into improving because (a) you probably can improve it and (b) there are likely still people much better. Don't underestimate the cost of excellence"  
[X Link](https://x.com/ankrgyl/status/1931936764480123375)  2025-06-09T04:49Z 10.6K followers, [----] engagements


"I tried to get both Claude Code and Codex to add OTEL to a popular well implemented open source repo. Both failed and eventually gave up with snarky comments about how bad the OTEL libraries are :"  
[X Link](https://x.com/anyuser/status/1934415308800053485)  2025-06-16T00:58Z 10.6K followers, 13.3K engagements


""decade of agents" is probably the most insightful thing i've heard recently"  
[X Link](https://x.com/anyuser/status/1936506779342700830)  2025-06-21T19:29Z 10.6K followers, 13.8K engagements


"I think one of the most interesting opportunities over the next decade will be building simple elegant systems that LLMs can use and abstract away hard computer science problems. I've noticed over the past several months that tool use is prompt engineering. You HAVE to create beautiful elegant simple tool definitions otherwise agents get lost trying to navigate through a complex trajectory. The precedent for this predates AI however. Simple elegant systems that abstract away hard computer science problems have also been hugely important and valuable for human programmers (OS RDBMS"  
[X Link](https://x.com/anyuser/status/1941145387114332626)  2025-07-04T14:41Z 10.6K followers, [----] engagements


"i would suggest having two programming environments: sync and async. The sync one should use an IDE of your choice and optimize for your synchronous attention. The async one should use a background agent of your choice and let you work on a loop of request check in a few mins later review work iterate. Im sure theres fancier ways to do this but I just have two repos. One of the benefits of this architecture is that you can think deliberately about which tool is best in class for each workflow"  
[X Link](https://x.com/anyuser/status/1941879104074035558)  2025-07-06T15:17Z 10.6K followers, [----] engagements


"Gentle reminder We're hiring across a lot of roles including systems (brainstore) product/design eng infra (help us make aws/azure/gcp/k8s deployments gr8) support growth sales BD SE you name it. Anyone who wants to chat about roles this weekend I'm around"  
[X Link](https://x.com/anyuser/status/1943825989202837739)  2025-07-12T00:13Z 10.6K followers, 12.6K engagements


"CI environments are about to be the new hot thing Who is going to make it super easy to quickly give your agent a sandbox with your whole dev setup so it can iterate on its own outside of your computer Not an easy problem to nail"  
[X Link](https://x.com/anyuser/status/1947319679183417466)  2025-07-21T15:36Z 10.6K followers, 23K engagements


"something i've been thinking about recently there are no more engineers designers PMs etc there are product owners. product owners write code solicit feedback drive roadmap collaborate talk to customers answer support tickets etc"  
[X Link](https://x.com/anyuser/status/1948252078968889384)  2025-07-24T05:21Z 10.6K followers, 30.5K engagements


"@deedydas Low level systems / C are fantastic with LLMs. Most folks I know who dont find it useful are simply in denial (which to be fair is understandable)"  
[X Link](https://x.com/anyuser/status/1949137663866679766)  2025-07-26T16:00Z 10.6K followers, [----] engagements


"We are no longer charging per user. This applies to both free and pro plans. It is a privilege to be able to simplify our pricing. We've grown exponentially over the last year which has made it obvious that the more you eval & log the more value we can provide. Plain & simple"  
[X Link](https://x.com/ankrgyl/status/1951316115126108535)  2025-08-01T16:16Z 10.6K followers, [----] engagements


"i'm increasingly diverging my programming towards: * Code I write and understand literally every character * Code I don't write at all and only review. I never even manually edit it"  
[X Link](https://x.com/anyuser/status/1952125402257997864)  2025-08-03T21:52Z 10.6K followers, [----] engagements


"I've been surprised and impressed by how all of the leading agents are essentially while loops with llm + tool calls. As a systems nerd I've always wished that AI software (even pre-LLM) were modular. I love abstractions :) But the reality is the bitter lesson of AI systems is that the more you can push decision making & control flow to the model the better your system will get as models improve. Therefore the simplest architecture wins. Agents are transforming how we interact with technology but building them can feel like navigating a maze of frameworks. Some of the best agents follow a"  
[X Link](https://x.com/anyuser/status/1954288217274110130)  2025-08-09T21:06Z 10.6K followers, 93.2K engagements


"gosh it's wonderful to talk to a smart person who has spent many many hours working on a problem and learn things you couldn't even conceive otherwise"  
[X Link](https://x.com/ankrgyl/status/1958412384042918115)  2025-08-21T06:14Z 10.6K followers, 19K engagements


""llms all the way down" is the best way to build software now i dont think most realize this yet"  
[X Link](https://x.com/ankrgyl/status/1959312812104388817)  2025-08-23T17:52Z 10.6K followers, 50.6K engagements


"Fellow founders pay close attention to the VCs who react negatively to this tweet. They are essentially the non-consensus "hipsters" that you actually want to stay away from. The idea that non consensus investing is where the alpha is is actually quite dangerous in the early stage. Follow on capital tends to be more and more consensus aligned. The idea that non consensus investing is where the alpha is is actually quite dangerous in the early stage. Follow on capital tends to be more and more consensus aligned"  
[X Link](https://x.com/ankrgyl/status/1959778122657894844)  2025-08-25T00:41Z 10.6K followers, 46.2K engagements


"i continue to not be a fan of the direction model apis are headed. i think the beautiful thing about chat completions is that it does exactly one thing -- messages in and messages out -- and leaves the rest to the creativity of a developer. the next gen of APIs have lots of specific built-in flags for specific features (web search caching reasoning) and lots of portability issues across providers. i genuinely do not believe the motivation is lock in (even though it seems suspicious) and instead i think they're just trying to get us access to new stuff that has an evolving surface area. but a"  
[X Link](https://x.com/anyuser/status/1961525101658276196)  2025-08-29T20:23Z 10.6K followers, 59.3K engagements


"@JayaGup10 By the time you fine tune a model your competitor will prompt the latest gen LLM and produce better results"  
[X Link](https://x.com/ankrgyl/status/1961825076216766834)  2025-08-30T16:15Z 10.6K followers, [----] engagements


"fun hack: create two worktrees and have claude and codex race to solve a problem it's fun to watch them in parallel and see how each thinks. also to experience each UX"  
[X Link](https://x.com/ankrgyl/status/1962167740531400762)  2025-08-31T14:56Z 10.6K followers, [----] engagements


"talked to another great team today who ripped out their ai framework the story is the same every time -- most of the value of an abstraction is abstracting across LLMs the rest eventually weighs you down"  
[X Link](https://x.com/anyuser/status/1962377267340013724)  2025-09-01T04:49Z 10.6K followers, 107.1K engagements


"@pchamal pretty much everyone i talk to lands here: https://www.braintrust.dev/blog/agent-while-loop https://www.braintrust.dev/blog/agent-while-loop"  
[X Link](https://x.com/anyuser/status/1962378572271947967)  2025-09-01T04:54Z 10.6K followers, [----] engagements


"This is the exact opposite conclusion to draw from the acquisition. AI enables you to build much more dynamic products that evolve faster (and even automatically). The foundation underlying that is good evals. A/B tests are officially the way of the past. wow openai just bought @statsig evals are dead. a/b tests are the future of building AI products. wow openai just bought @statsig evals are dead. a/b tests are the future of building AI products"  
[X Link](https://x.com/anyuser/status/1962983886100824554)  2025-09-02T21:00Z 10.6K followers, 63.7K engagements


"As promised some more extended thoughts The tl;dr is that software is changing dramatically and one of the gifts of AI is that you can iterate at the speed of evals rather than A/B tests. Evals are non-trivial to build. figure them out and you win. https://www.braintrust.dev/blog/ab-testing-evals This is the exact opposite conclusion to draw from the acquisition. AI enables you to build much more dynamic products that evolve faster (and even automatically). The foundation underlying that is good evals. A/B tests are officially the way of the past."  
[X Link](https://x.com/anyuser/status/1963347774935814172)  2025-09-03T21:06Z 10.6K followers, 13.7K engagements


"cool to see @Replit ship something genuinely novel -- feels like the code space has been stuck in "terminal vs. ide" wars for the past couple quarters AI agents can prototype apps But shipping real software takes hours of testing debugging and refactoring. Agent [--] is [--] more autonomous it keeps going where others get stuck. The Full Self-Driving moment of software. https://t.co/z66nxQKieO AI agents can prototype apps But shipping real software takes hours of testing debugging and refactoring. Agent [--] is [--] more autonomous it keeps going where others get stuck. The Full Self-Driving moment of"  
[X Link](https://x.com/anyuser/status/1965843671515046309)  2025-09-10T18:23Z 10.6K followers, 27.9K engagements


"it takes an insane amount of work to structure a software project so that an agent can make large meaningful changes to it without careful review. i suspect that this flavor of "meta engineering" will be the new "software engineering""  
[X Link](https://x.com/anyuser/status/1969921508773711915)  2025-09-22T00:27Z 10.6K followers, 62.9K engagements


"a prediction in advance of dev day tomorrow: agent logic/control flow is going to move down into the API layer rather than live in your code. in other words the agent not chat turn will be the API. responses is already designed to support this"  
[X Link](https://x.com/ankrgyl/status/1974966982765363660)  2025-10-05T22:36Z 10.6K followers, 36.4K engagements


"If you love databases and you love basketball please DM me. Have an interesting opportunity"  
[X Link](https://x.com/ankrgyl/status/1976703326436909101)  2025-10-10T17:36Z 10.6K followers, 20.3K engagements


"mark my words door to door sales is making a comeback"  
[X Link](https://x.com/anyuser/status/1984712321336492301)  2025-11-01T20:01Z 10.6K followers, 76.8K engagements


"there is so much value in focus"  
[X Link](https://x.com/anyuser/status/1989017213282746825)  2025-11-13T17:07Z 10.6K followers, [----] engagements

Limited data mode. Full metrics available with subscription: lunarcrush.com/pricing

@ankrgyl Avatar @ankrgyl Ankur Goyal

Ankur Goyal posts on X about ai, in the, braintrust, products the most. They currently have [------] followers and [---] posts still getting attention that total [------] engagements in the last [--] hours.

Engagements: [------] #

Engagements Line Chart

  • [--] Week [-------] +37%
  • [--] Month [-------] -20%
  • [--] Months [---------] +132%
  • [--] Year [---------] +107%

Mentions: [--] #

Mentions Line Chart

  • [--] Week [--] -33%
  • [--] Month [--] -18%
  • [--] Months [---] +182%
  • [--] Year [---] +24%

Followers: [------] #

Followers Line Chart

  • [--] Week [------] +0.43%
  • [--] Month [------] +2.70%
  • [--] Months [------] +20%
  • [--] Year [------] +48%

CreatorRank: [-------] #

CreatorRank Line Chart

Social Influence

Social category influence technology brands 10.26% cryptocurrencies 9.4% finance 5.98% social networks 1.71% vc firms 0.85%

Social topic influence ai 25.64%, in the 10.26%, braintrust #15, products 6.84%, code 6.84%, llm 5.98%, prompt 5.13%, open ai 5.13%, systems 5.13%, super 4.27%

Top accounts mentioned or mentioned by @braintrustdata @braintrust @financialvice @eladgil @alanaagoyal @benhylak @thejaykhatri @zxytim @courtstarr @akionuernberger @pradyuprasad @basecasevc @saammotamedi @greylockvc @vercel @replit @andrewqu @cramforce @mikepmunroe @morganepaloma

Top assets mentioned Braintrust (BTRST)

Top Social Posts

Top posts by engagements in the last [--] hours

"The couple of cases it scored lower on are a bit ambiguous and IMO open to interpretation. For example in this case the output produces the correct answer but misses some of the "frills" in the ground truth I think this is a benchmark quirk. https://www.braintrust.dev/app/braintrust-labs/p/bash-evals/experiments/sql-glm-5-2f26a644c=sql-glm-5&r=13095d8a-250e-4af7-8b11-5d2c9f3e365d&s=13095d8a-250e-4af7-8b11-5d2c9f3e365d&fs=1"
X Link 2026-02-12T02:03Z 10.6K followers, [---] engagements

"why is glm5 missing on all the inference providers am i missing something"
X Link 2026-02-11T23:23Z 10.6K followers, 13.1K engagements

"what is safer from an ip standpoint use american models and let the LLM provider do who knows what with your API requests use chinese models that you can run on your own GPUs"
X Link 2026-02-12T05:15Z 10.6K followers, 13.2K engagements

"Despite modern LLMs' familiarity with bash it turns out that they're pretty darn great at writing SQL. In fact many highly unstructured problems mapped to bash are both less accurate and less token efficient than just giving an agent SQL. We dug into this in depth"
X Link 2026-01-22T20:51Z 10.6K followers, 30K engagements

"We did a super fun exploration of this with @andrewqu and @cramforce optimized an agent that used sql bash and both. Ultimately the bash+sql agent was reliably the highest accuracy with some trade offs in cost/latency. https://www.braintrust.dev/blog/bash-agent-evals https://www.braintrust.dev/blog/bash-agent-evals"
X Link 2026-01-22T20:51Z 10.6K followers, [----] engagements

"A few learnings: * Looking at traces together is critical. Question every eng decision and experiment * Our test data had tons of mistakes. LOOK AT THE DATA. * You do not need much code to build a really powerful agent (this is just AI SDK + Braintrust + Claude Code)"
X Link 2026-01-22T20:51Z 10.6K followers, [----] engagements

"I am very excited about this model but at least on our fairly practical bash-sql-eval it was not competitive with sonnet [---]. it is much faster and uses fewer tokens though so i suspect there's room to prompt it to do more work and increase performance. ๐Ÿฅ Meet Kimi K2.5 Open-Source Visual Agentic Intelligence. ๐Ÿ”น Global SOTA on Agentic Benchmarks: HLE full set (50.2%) BrowseComp (74.9%) ๐Ÿ”น Open-source SOTA on Vision and Coding: MMMU Pro (78.5%) VideoMMMU (86.6%) SWE-bench Verified (76.8%) ๐Ÿ”น Code with Taste: turn chats https://t.co/wp6JZS47bN ๐Ÿฅ Meet Kimi K2.5 Open-Source Visual Agentic"
X Link 2026-01-27T22:48Z 10.6K followers, [----] engagements

"here's an example of a relatively silly error. it thinks the total database is [-----] events rather than [------] (which gpt-4.1-mini the grader successfully dinged it on) https://www.braintrust.dev/app/braintrust-labs/p/bash-evals/experiments/sql-kimi-k2.5c=sql-claude-sonnet-4-5&diff=offbetween_experiments&r=aa7e5bdd-376e-4175-91e0-eed778532671&s=aa7e5bdd-376e-4175-91e0-eed778532671 https://www.braintrust.dev/app/braintrust-labs/p/bash-evals/experiments/sql-kimi-k2.5c=sql-claude-sonnet-4-5&diff=offbetween_experiments&r=aa7e5bdd-376e-4175-91e0-eed778532671&s=aa7e5bdd-376e-4175-91e0-eed778532671"
X Link 2026-01-27T22:48Z 10.6K followers, [---] engagements

"@zxytim thinking. at least there are thinking outputs in the results we got back from the model. feel free to poke around the code or eval results https://www.braintrust.dev/app/braintrust-labs/p/bash-evals/experiments/sql-kimi-k2.5c=&diff=offbetween_experiments&r=aa7e5bdd-376e-4175-91e0-eed778532671&s=aa7e5bdd-376e-4175-91e0-eed778532671 https://github.com/braintrustdata/bash-agent-evals https://www.braintrust.dev/app/braintrust-labs/p/bash-evals/experiments/sql-kimi-k2.5c=&diff=offbetween_experiments&r=aa7e5bdd-376e-4175-91e0-eed778532671&s=aa7e5bdd-376e-4175-91e0-eed778532671"
X Link 2026-01-28T01:11Z 10.6K followers, [---] engagements

"whats the best linux laptop hardware (assuming you can max out dont care about portability)"
X Link 2026-01-28T02:43Z 10.6K followers, [----] engagements

"@mikepmunroe @braintrust Let me know when you try it out. Would love to hear your feedback ๐Ÿ™"
X Link 2026-01-28T16:06Z 10.6K followers, [---] engagements

"boring: coding agents that do even more stuff for me in the background exciting: new UIs/workflows that help me do the new parts of my job better. for example personal code review. https://twitter.com/i/web/status/2016962416844951807 https://twitter.com/i/web/status/2016962416844951807"
X Link 2026-01-29T19:51Z 10.6K followers, [----] engagements

"@financialvice @benhylak Unfortunately 100% of my coding time is spent making @braintrust better and this is outside of our product scope"
X Link 2026-01-29T19:57Z 10.6K followers, [---] engagements

"@braintrust events are popping evals are a critical step in shipping quality AI. And builders are paying attention cos quality doesnt improve by accident. @braintrust evals on tap SF. https://t.co/XT7p9S9rZa evals are a critical step in shipping quality AI. And builders are paying attention cos quality doesnt improve by accident. @braintrust evals on tap SF. https://t.co/XT7p9S9rZa"
X Link 2026-01-30T17:37Z 10.6K followers, [----] engagements

"The "agent line" is the capability threshold of an agent to independently perform engineering work. Many tasks (eg can be one-shot) are below the "agent line" As a human engineer you need to be "above" not "below" the agent line. This takes an attitude and focus adjustment"
X Link 2026-02-03T23:18Z 10.6K followers, [----] engagements

"@courtstarr @eladgil @alanaagoyal @braintrust ๐Ÿ™ DM me your feedback if you try us out"
X Link 2026-02-03T23:21Z 10.6K followers, [---] engagements

"everyone should do this. open up codex paste this tweet and have it audit your code. Here's a [--] GB memory reduction for very long Claude Code sessions Before: () = controller.abort() Fix: controller.abort.bind(controller) https://t.co/4tsFe4VJVL Here's a [--] GB memory reduction for very long Claude Code sessions Before: () = controller.abort() Fix: controller.abort.bind(controller) https://t.co/4tsFe4VJVL"
X Link 2026-02-04T19:34Z 10.6K followers, 32K engagements

"AI-powered coding bloats the illusion of work more significantly than it increases the output itself. Of course it does (or can) increase output. But it's easy to be tricked by the illusion of work and we have to learn to un-train our intuition that code - product output"
X Link 2026-02-04T19:58Z 10.6K followers, 29K engagements

"everything @morgane_paloma wrote here applies to building great products too at @braintrust there is a very strong continuum from product--marketing. it's one thing. Thoughts on taste (from a marketing pov): * Quality is the baseline - Taste starts with where you set the bar. You dont ship work you wouldnt put your name on you notice sloppiness others normalize and you feel friction when something is almost right but not quite. * Thoughts on taste (from a marketing pov): * Quality is the baseline - Taste starts with where you set the bar. You dont ship work you wouldnt put your name on you"
X Link 2026-02-04T22:10Z 10.6K followers, [----] engagements

"@aidenybai i think fundamentally it demonstrates that you can design an experience from first principles that is powerful and surprisingly simple. the popular UIs are all quite bloated so its refreshing to be able to accomplish just as much without them. imo UI will win long term"
X Link 2026-02-05T17:12Z 10.6K followers, [----] engagements

"AI has gone from weights to APIs to a rich and thriving ecosystem of models SDKs services and agents. Excited to build more and more integrations into all of these systems so you can evaluate and observe them with Braintrust"
X Link 2026-02-05T22:57Z 10.6K followers, [----] engagements

"In-product chat is great but sometimes it's not enough. Loop now lets you directly escalate to our support team when you need more help"
X Link 2026-02-06T01:09Z 10.6K followers, [----] engagements

"I'm quite excited about this concept but I think it is risky to compare it to a sandbox. I found a trivial exploit within like [--] minutes of poking around the source code. Fuck it a bit early but here goes: Monty: a new python implementation from scratch in rust for LLMs to run code without host access. Startup time measured in single digit microseconds not seconds. @mitsuhiko here's another sandbox/not-sandbox to be snarky about ๐Ÿ˜œ Thanks Fuck it a bit early but here goes: Monty: a new python implementation from scratch in rust for LLMs to run code without host access. Startup time measured"
X Link 2026-02-06T22:42Z 10.6K followers, [----] engagements

"the interruption UX in codex has gotten really good. i think this is the new standard"
X Link 2026-02-08T19:21Z 10.6K followers, [----] engagements

"over: database-wrapper software not over: llm-wrapper software"
X Link 2026-02-09T03:05Z 10.6K followers, [----] engagements

"trying to see if [---] has a sense of humor"
X Link 2026-02-09T03:12Z 10.6K followers, [----] engagements

"this seems like a very clear distinction to me RLMs force the agent to explicitly reason about what enters context vs what is managed as opaque state. @Teknium The following are not standard in a coding agent: [--]. The user prompt P itself (not just external data) is a symbolic object in the environment. The model is not allowed to grep/read long snippets from P. [--]. The model has to write recursive code (that calls LMs) to understand or https://t.co/DzPIcfHAuA @Teknium The following are not standard in a coding agent: [--]. The user prompt P itself (not just external data) is a symbolic object in"
X Link 2026-02-09T15:39Z 10.6K followers, 11.3K engagements

"@AkioNuernberger @braintrust Sorry about that. This was not intentional and will be fixed ASAP"
X Link 2026-02-10T05:29Z 10.6K followers, [----] engagements

"@samrags_ Having worked closely with them and visited the office many times I think it's quite simple: * Important problem for customers * Incredibly smart colleagues * Fun / energetic vibes It's just a great environment"
X Link 2026-02-10T20:31Z 10.6K followers, [----] engagements

"Full text search is super fast but often limited. You usually have to provide an exact set of tokens for the engine to consecutively match. Until now We just shipped support for '%' which even works in the realtime search bar :)"
X Link 2026-02-04T00:04Z 10.6K followers, [----] engagements

"GLM5 is an impressive model. It's the first OSS model to perform competitively well to a leading commercial model (claude sonnet 4.5) on our bash eval"
X Link 2026-02-12T02:03Z 10.6K followers, [----] engagements

"It's also fast. I'm hoping that the inference providers will compete to make it both ultra-fast and somewhat-cheap. If they do. I think there is a very real chance this thing ends up in a lot of products"
X Link 2026-02-12T02:03Z 10.6K followers, [---] engagements

"remember when docker came out there were tons of PaaS startups (incl. dotCloud which became Docker) and none had much traction then Solomon gave a talk about Docker and like [--] week later everyone was using it i feel like that's about to happen for sandboxes"
X Link 2026-02-12T04:45Z 10.6K followers, [----] engagements

"@PradyuPrasad People are often worried about tampering (eg imagine a model misbehaves under very specific conditions that are designed to mess with your biz)"
X Link 2026-02-12T05:22Z 10.6K followers, [---] engagements

"@theJayKhatri not sure i agree. my local dev loop (even when not handwriting code) is significantly faster than any sandbox solution i've tried. i'm sure this is not true of expert homegrown solutions but is not yet true of commercial solutions"
X Link 2026-02-12T07:04Z 10.6K followers, [---] engagements

"Still digesting minimax's m2.5 but one interesting snippet is that they chose to emulate the Anthropic API vs. the OpenAI one. First time I've seen this"
X Link 2026-02-12T19:55Z 10.6K followers, [---] engagements

"Quick benchmark suggests minimax m2.5 is close to GLM5 and Sonnet. I'm curious what the US inference provider price will be but with current available numbers it's 3x cheaper than GLM5 (which is 3x cheaper than Sonnet) GLM5 is an impressive model. It's the first OSS model to perform competitively well to a leading commercial model (claude sonnet 4.5) on our bash eval. https://t.co/zujV3IQ1lU GLM5 is an impressive model. It's the first OSS model to perform competitively well to a leading commercial model (claude sonnet 4.5) on our bash eval. https://t.co/zujV3IQ1lU"
X Link 2026-02-12T20:54Z 10.6K followers, [---] engagements

"Amidst the chaos we built something: an AI proxy that lets you use a variety of providers (OpenAI Anthropic LLaMa2 Mistral and others) behind a single interface w/ caching & API key management"
X Link 2023-11-20T16:37Z 10.6K followers, 412.1K engagements

"The blog post goes into details on how it all works under the hood. The proxy is available for anyone to use with or without a @braintrustdata account for free. https://www.braintrustdata.com/blog/ai-proxy https://www.braintrustdata.com/blog/ai-proxy"
X Link 2023-11-20T16:37Z 10.6K followers, [----] engagements

"Very excited to announce that the proxy is now open source The response has been overwhelming and we have a simple framework for what we open source: if it's on the critical path of production it ought to be OSS https://github.com/braintrustdata/braintrust-proxy Amidst the chaos we built something: an AI proxy that lets you use a variety of providers (OpenAI Anthropic LLaMa2 Mistral and others) behind a single interface w/ caching & API key management. https://t.co/jJWiRElMHB https://github.com/braintrustdata/braintrust-proxy Amidst the chaos we built something: an AI proxy that lets you use"
X Link 2023-11-27T18:57Z 10.6K followers, 71.8K engagements

"Extremely proud of @alanaagoyal for being featured in Forbes She is an inspiration and is trailblazing the next generation of VCs who put in the work to help your company succeed. We are lucky to have @basecasevc as an investor in @braintrustdata. https://www.forbes.com/30-under-30/2024/venture-capital https://www.forbes.com/30-under-30/2024/venture-capital"
X Link 2023-11-28T15:44Z 10.6K followers, 31.9K engagements

"Not too long ago I announced a new company called @braintrustdata. Today I'm super excited to share our $5m seed round led by @saammotamedi at @GreylockVC"
X Link 2023-12-13T16:18Z 10.6K followers, 199.4K engagements

"New PMs (esp frmr eng) are often very worried about scope sequencing etc. Their intuition is to perfectly articulate the next few days of work into small tasks and closely monitor eng progress. This is the opposite of what you should do Great eng want visionary forward looking (quarters or years) well researched roadmaps that are rooted in market research and customer learnings. They are more than capable of sequencing their own work. Also incredibly motivating to work towards a big vision vs. incremental tasks"
X Link 2023-12-30T23:33Z 10.6K followers, 17.4K engagements

"It's truly wild how many small bugs you have to fix to build a great product"
X Link 2024-03-09T21:51Z 10.6K followers, 22.4K engagements

"Stare at race condition for 2h Go on a 5min walk Solve race in 30s"
X Link 2024-03-14T02:04Z 10.6K followers, [----] engagements

"Quick rant about RDBMS multi-tenancy and AI. Data warehouses are great if you control the shape of your data and hand-code schema definitions. And they are basically useless for multi-tenant products (like @datadoghq @splunk @braintrustdata etc.) where you have to ingest customer data. The recommended fixes -- "reshape your data" "add casts" "fix the schema" "create more tables" etc. -- simply do not work with multi-tenant data. In fact companies that scale like @datadoghq end up building tech like which overlap significantly with powerful tech buried/shackled inside of RDBMS. The raw query"
X Link 2024-04-04T17:39Z 10.6K followers, 28.7K engagements

"Super excited for @braintrustdata to be part of the [----] enterprise tech [--] list by @Wing_VC. We exist to serve our customers -- and feel very grateful to be recognized for the traction we've demonstrated so far by focusing on that. Excited for what's ahead in [----] and beyond"
X Link 2024-04-09T17:10Z 10.6K followers, 26.3K engagements

"Extremely excited to unveil our new landing page We wanted to elevate our message with visuals that speak to how tough it is to build effective AI products without great evals logging and prompt tooling"
X Link 2024-04-16T16:50Z 10.6K followers, 13.6K engagements

"If you're using AI on the internet it's likely eval'd by @braintrustdata"
X Link 2024-06-18T00:58Z 10.6K followers, 15.7K engagements

"Amazing to see a FastCompany article featuring @alanaagoyal who rarely speaks publicly about her approach to investing. We write a lot of code in the Goyal household :)"
X Link 2024-07-07T18:42Z 10.6K followers, 159.7K engagements

"Turns out Llama [---] can do tool calls with prompt engineering I did a deep dive on this and it is remarkably accurate (much more than any other open model we've tested). ๐Ÿ‘‡"
X Link 2024-07-26T19:14Z 10.6K followers, 17.9K engagements

"Not gonna name names but would highly recommend running evals on the same model (this is "llama-3.1-8B") across inference providers. Looks like massive difference in quality"
X Link 2024-07-28T14:10Z 10.6K followers, 37.6K engagements

"Okay I had a chance to do a more thorough test of [--] providers (1-3). The results are pretty shocking ๐Ÿคฏ Details and code below"
X Link 2024-07-29T21:32Z 10.6K followers, 47.7K engagements

"Today is @braintrustdata's [--] year bday Things I'm most proud of: * Supporting the world's best AI products: @zapier @NotionHQ @vercel @airtable @coda_hq @Replit @browsercompany @Superhuman and many others * With humble smart down-to-earth colleagues and investors"
X Link 2024-08-30T23:45Z 10.6K followers, 21K engagements

"I deeply relate to this post -- I was "VPE" at [--] and "CEO" at [--]. The advice I got was "stop coding" "hire PMs" "delegate everything". Listening to that crap was catastrophic. Now I spend all day talking to users writing code/content and collaborating with ICs. Founder Mode: https://t.co/3hOnlKOJBi Founder Mode: https://t.co/3hOnlKOJBi"
X Link 2024-09-01T18:13Z 10.6K followers, 95K engagements

"Lots of noise about how product doesnt matter anymore because AI. only distribution As someone who is all in on AI coding it is still extremely hard to build a good product. Code volume was never the bottleneck"
X Link 2024-09-07T14:39Z 10.6K followers, [----] engagements

"some predictions on o1 and what it means for ai eng: * more evidence that convoluted/over complicated agent frameworks are not the future * more english fewer programs * expect async to be the next streaming"
X Link 2024-09-12T20:18Z 10.6K followers, [----] engagements

"Object storage is getting so good that any database system not built on object storage will be obsolete in a few years"
X Link 2024-09-28T21:48Z 10.6K followers, 46.8K engagements

"Excited to share that we've raised $36m from @martin_casado at @a16z along with @saammotamedi @GreylockVC @eladgil @basecasevc to further our mission of helping developers build AI products that work. A bit more on what we're up to ๐Ÿงต"
X Link 2024-10-08T16:10Z 10.6K followers, 172.3K engagements

"If yesterday was not enough Braintrust for you -- I went on No priors with @eladgil to talk about the journey so far fundraise and what's ahead. Was especially fun to talk about learnings from Impira since Elad was an investor there too :)"
X Link 2024-10-09T16:00Z 10.6K followers, 10.4K engagements

"LLM-as-a-judge is a very powerful technique but it's difficult to get reliable results. The #1 mistake I see people make is to have an LLM produce a numeric score which like asking a human to rate 1-10 is not precise. We reproduce that and walk through how to do better :) LLM-as-a-judge scorers are a powerful tool you can use when you need to evaluate more complex responses to LLM calls. We published an OpenAI cookbook to work through different strategies for detecting hallucinations check it out in the @OpenAIDevs cookbook library. https://t.co/25FJfadcOe LLM-as-a-judge scorers are a"
X Link 2024-10-29T15:28Z 10.6K followers, 82.2K engagements

"If you're starting a company take customer interviews EXTREMELY seriously. This was prob the main thing we did well when starting @braintrustdata. Key things are: * Research as much as possible to avoid wasting time on basic context * Come with specific ideas. "What do you think of X" vs. "what should we build Tell me everything you've looked at" * Offer something: a demo early access to a product insights about the space"
X Link 2024-11-14T22:06Z 10.6K followers, [----] engagements

"If you want someone to help with your code the #1 thing you can do is create the smallest possible easiest to run repro. It feels like a huge pain in the ass but (a) you have more context and (b) the other person would have to do this anyway"
X Link 2024-11-17T19:26Z 10.6K followers, 35.6K engagements

"At some point the AI ecosystem is going to move beyond dual implementing everything in python and typescript"
X Link 2024-11-18T22:15Z 10.6K followers, 13.6K engagements

""React for AI" is going to come from neither an LLM vendor nor a frameworks vendor. It'll come from a company who builds a top [--] LLM application at scale. In the meantime buckle up for lots of distracting big announcements"
X Link 2024-11-25T19:07Z 10.6K followers, 34.8K engagements

"@rauchg @nextjs @vercel @nodejs Websockets"
X Link 2024-12-27T19:47Z 10.6K followers, [----] engagements

"things that most serverless providers dont handle that will be existential for AI in 2025: * Async execution (o1 frequently returns first byte after 1m) * Websockets (realtime audio) * Secure runtime code execution (AIs generate code where can I run it)"
X Link 2024-12-28T22:16Z 10.6K followers, 88.2K engagements

"To anti leet code folks -- just spent the last day optimizing a hash table implementation which makes a user visible action in our product 7x faster. Would not have been able to do that without knowledge of CS fundamentals (nor was today's AI)"
X Link 2025-01-15T00:41Z 10.6K followers, 78K engagements

"To me the term "agent" can be precisely defined as "a way to get budget allocated for my project". AI engineering is in its infancy and almost every sufficiently advanced team we work with abandons "agent architecture" in favor of "good engineering". Please stop calling a cron job an AI agent ๐Ÿ˜ญ Please stop calling a cron job an AI agent ๐Ÿ˜ญ"
X Link 2025-01-15T23:17Z 10.6K followers, 13.9K engagements

"I no longer speculate about how various systems work. Go to github clone locally open cursor and chat with the codebase to find out"
X Link 2025-01-30T23:01Z 10.6K followers, 124.3K engagements

"Reasoning for evals is such a killer use case As some of you have noticed avoid boomer prompts with o-series models. Instead be simple and direct with specific guidelines. Delimiters (xml tags) will help keep things clean for the model and improve output. Read the full best practices guide: https://t.co/mLi4M8woOs As some of you have noticed avoid boomer prompts with o-series models. Instead be simple and direct with specific guidelines. Delimiters (xml tags) will help keep things clean for the model and improve output. Read the full best practices guide: https://t.co/mLi4M8woOs"
X Link 2025-02-14T19:16Z 10.6K followers, 13.3K engagements

"When your startup starts to work EVERYTHING tries to pull you away from building on actually good product. Resist the temptation. Focus"
X Link 2025-02-14T21:00Z 10.6K followers, 16.2K engagements

"There are two somewhat rapidly diverging views in the AI eng world: * English is the new programming language (ppl will be writing prompts) * English is the new assembly language (programs will be generating prompts) Will be interesting to see how it plays out"
X Link 2025-02-23T18:05Z 10.6K followers, 205.3K engagements

"Super excited to announce probably our biggest release yet -- Brainstore Braintrust is now so fast that we have an instant search bar built into the logs page :) Braintrust is now 80x faster than any other LLM observability platform on the market. To achieve this benchmark we built Brainstore the first database designed to handle the high scale and complexity of AI data. https://t.co/7YESnVbQ1Q Braintrust is now 80x faster than any other LLM observability platform on the market. To achieve this benchmark we built Brainstore the first database designed to handle the high scale and complexity"
X Link 2025-03-03T17:24Z 10.6K followers, 34.1K engagements

"I think the world wants what MCP offers but not quite the protocol definition. I'm both interested to see what its next iteration looks like (specifically native remote APIs) and whether it takes off despite the design shortcomings"
X Link 2025-03-07T19:35Z 10.6K followers, 13K engagements

"Something small that I am very proud of @OpenAI included us in their tracing docs And Braintrust is the only SDK that actually works "as intended". I.e. we fit neatly into the tracing processor abstraction which works across both logs and evals"
X Link 2025-03-11T19:40Z 10.6K followers, 26.3K engagements

"I explored this idea a bit this weekend and it's now a first class MCP server that you can start using right away. This video takes a super lazy prompt I wrote to auto generate queries and improves it w/ few shots. I just hit command enter a few times :) Build your own pseudo DSPy-ish prompt auto-optimiser in [--] steps with @braintrustdata: [--]. Grab Braintrusts OpenAPI and turn into MCP (there are OSS libs to do that) [--]. Plug-in that MCP to Cursor [--]. In Agent mode: please optimise this prompt (@src/prompt.ts) based on evals (pnpm Build your own pseudo DSPy-ish prompt auto-optimiser in [--] steps"
X Link 2025-03-24T01:54Z 10.6K followers, 15.1K engagements

"AI seems to multiply the quality of code that someone would write without AI. Bad programmer * AI = Lots of bad code Medium programmer * AI = Lots of medium code Great programmer * AI = Lots of great code"
X Link 2025-03-24T02:27Z 10.6K followers, 34.3K engagements

"People say you no longer need to learn how to code every couple years. I think it's pretty exciting that technology progresses so fast that you no longer need to code the same things you used to. But I've never spent more time writing code than I do today. Thanks to AI. I think you should learn to code. I think you should learn to code"
X Link 2025-03-27T23:51Z 10.6K followers, 12.8K engagements

"The #1 most useful thing I learned early on as a programmer: When something doesn't work assume I am wrong. Not the computer. Not the library. Not my colleagues. Not the vendor. Me"
X Link 2025-04-18T22:16Z 10.6K followers, 43.1K engagements

"As models get more powerful i find myself focusing more effort on context engineering which is the task of bringing the right information (in the right format) to the LLM. Context engineering is hard because it is pervasive. You need to engineer every layer of the stack to capture and make context available. Send too little context and the LLM wont know what to do. Too much and youre out of tokens or the LLM gets lost. Good context engineering caches well. Bad context engineering is both slow and expensive"
X Link 2025-04-20T01:28Z 10.6K followers, 172.2K engagements

"RAG is one flavor of context engineering. For those that say RAG is dead hopefully this framing makes it obvious why thats not the case"
X Link 2025-04-20T02:08Z 10.6K followers, [----] engagements

"the fact that k8s still exists and thrives is the exception not the norm. shitty software does not usually make it"
X Link 2025-04-21T03:39Z 10.6K followers, 10K engagements

"Random observations from recent AI coding adventures: * You'll have to rip o3 out of my cold dead hands. It's that good. I prefer to interact with it directly on (UX + feels like it's the right format for open ended chats) * Agents are now "good enough" to write an overwhelming majority of tests including hardcore stuff (like brainstore for us). They benefit from very specific instructions about what you are trying to test and picking up on existing patterns. * AI code review is good enough that it should be on for every project (we use @withgraphite diamond). I find a lot of value in the"
X Link 2025-04-28T22:57Z 10.6K followers, 21.2K engagements

"every moment I spend with @alanaagoyal is the best part of every day highly recommend marrying someone great https://t.co/OCUvGUznci highly recommend marrying someone great https://t.co/OCUvGUznci"
X Link 2025-05-07T20:44Z 10.6K followers, 56.2K engagements

"keys to effective agentic llms: * long context w/ caching * extremely accurate tool calls * reliable API perf"
X Link 2025-05-14T23:52Z 10.6K followers, [----] engagements

"Getting really good at AI coding is a pretty fun new challenge. My goal is to produce very high quality code that will stand the test of time faster than I could without AI or with "basic" AI (code completion). It's quite different than vibe coding"
X Link 2025-05-18T23:56Z 10.6K followers, [----] engagements

"I personally don't like the term "eval driven development" I think evals are the end goal. In many ways a good AI system is literally just a good set of evals. Or more generally you need to develop two workflows: * A way to create evals. This is: good o11y + scoring functions + discipline to package up user issues into datasets * A way to convert good evals into good AI systems. This is somewhat manual right now but if you have good evals this is relatively simple/fun experimentation. I think some of the stuff we're working on right now will make this even clearer. Or said another way you'll"
X Link 2025-05-27T22:33Z 10.6K followers, [----] engagements

"We're back baby Evals should be easy. Meet Loop the AI agent for automatic prompt dataset and scorer optimization. @aiDotEngineer https://t.co/frhLLLceNw Evals should be easy. Meet Loop the AI agent for automatic prompt dataset and scorer optimization. @aiDotEngineer https://t.co/frhLLLceNw"
X Link 2025-06-06T01:09Z 10.6K followers, [----] engagements

"Counter intuitively if you're good at something that is probably the thing you should put the most effort into improving because (a) you probably can improve it and (b) there are likely still people much better. Don't underestimate the cost of excellence"
X Link 2025-06-09T04:49Z 10.6K followers, [----] engagements

"I tried to get both Claude Code and Codex to add OTEL to a popular well implemented open source repo. Both failed and eventually gave up with snarky comments about how bad the OTEL libraries are :"
X Link 2025-06-16T00:58Z 10.6K followers, 13.3K engagements

""decade of agents" is probably the most insightful thing i've heard recently"
X Link 2025-06-21T19:29Z 10.6K followers, 13.8K engagements

"I think one of the most interesting opportunities over the next decade will be building simple elegant systems that LLMs can use and abstract away hard computer science problems. I've noticed over the past several months that tool use is prompt engineering. You HAVE to create beautiful elegant simple tool definitions otherwise agents get lost trying to navigate through a complex trajectory. The precedent for this predates AI however. Simple elegant systems that abstract away hard computer science problems have also been hugely important and valuable for human programmers (OS RDBMS"
X Link 2025-07-04T14:41Z 10.6K followers, [----] engagements

"i would suggest having two programming environments: sync and async. The sync one should use an IDE of your choice and optimize for your synchronous attention. The async one should use a background agent of your choice and let you work on a loop of request check in a few mins later review work iterate. Im sure theres fancier ways to do this but I just have two repos. One of the benefits of this architecture is that you can think deliberately about which tool is best in class for each workflow"
X Link 2025-07-06T15:17Z 10.6K followers, [----] engagements

"Gentle reminder We're hiring across a lot of roles including systems (brainstore) product/design eng infra (help us make aws/azure/gcp/k8s deployments gr8) support growth sales BD SE you name it. Anyone who wants to chat about roles this weekend I'm around"
X Link 2025-07-12T00:13Z 10.6K followers, 12.6K engagements

"CI environments are about to be the new hot thing Who is going to make it super easy to quickly give your agent a sandbox with your whole dev setup so it can iterate on its own outside of your computer Not an easy problem to nail"
X Link 2025-07-21T15:36Z 10.6K followers, 23K engagements

"something i've been thinking about recently there are no more engineers designers PMs etc there are product owners. product owners write code solicit feedback drive roadmap collaborate talk to customers answer support tickets etc"
X Link 2025-07-24T05:21Z 10.6K followers, 30.5K engagements

"@deedydas Low level systems / C are fantastic with LLMs. Most folks I know who dont find it useful are simply in denial (which to be fair is understandable)"
X Link 2025-07-26T16:00Z 10.6K followers, [----] engagements

"We are no longer charging per user. This applies to both free and pro plans. It is a privilege to be able to simplify our pricing. We've grown exponentially over the last year which has made it obvious that the more you eval & log the more value we can provide. Plain & simple"
X Link 2025-08-01T16:16Z 10.6K followers, [----] engagements

"i'm increasingly diverging my programming towards: * Code I write and understand literally every character * Code I don't write at all and only review. I never even manually edit it"
X Link 2025-08-03T21:52Z 10.6K followers, [----] engagements

"I've been surprised and impressed by how all of the leading agents are essentially while loops with llm + tool calls. As a systems nerd I've always wished that AI software (even pre-LLM) were modular. I love abstractions :) But the reality is the bitter lesson of AI systems is that the more you can push decision making & control flow to the model the better your system will get as models improve. Therefore the simplest architecture wins. Agents are transforming how we interact with technology but building them can feel like navigating a maze of frameworks. Some of the best agents follow a"
X Link 2025-08-09T21:06Z 10.6K followers, 93.2K engagements

"gosh it's wonderful to talk to a smart person who has spent many many hours working on a problem and learn things you couldn't even conceive otherwise"
X Link 2025-08-21T06:14Z 10.6K followers, 19K engagements

""llms all the way down" is the best way to build software now i dont think most realize this yet"
X Link 2025-08-23T17:52Z 10.6K followers, 50.6K engagements

"Fellow founders pay close attention to the VCs who react negatively to this tweet. They are essentially the non-consensus "hipsters" that you actually want to stay away from. The idea that non consensus investing is where the alpha is is actually quite dangerous in the early stage. Follow on capital tends to be more and more consensus aligned. The idea that non consensus investing is where the alpha is is actually quite dangerous in the early stage. Follow on capital tends to be more and more consensus aligned"
X Link 2025-08-25T00:41Z 10.6K followers, 46.2K engagements

"i continue to not be a fan of the direction model apis are headed. i think the beautiful thing about chat completions is that it does exactly one thing -- messages in and messages out -- and leaves the rest to the creativity of a developer. the next gen of APIs have lots of specific built-in flags for specific features (web search caching reasoning) and lots of portability issues across providers. i genuinely do not believe the motivation is lock in (even though it seems suspicious) and instead i think they're just trying to get us access to new stuff that has an evolving surface area. but a"
X Link 2025-08-29T20:23Z 10.6K followers, 59.3K engagements

"@JayaGup10 By the time you fine tune a model your competitor will prompt the latest gen LLM and produce better results"
X Link 2025-08-30T16:15Z 10.6K followers, [----] engagements

"fun hack: create two worktrees and have claude and codex race to solve a problem it's fun to watch them in parallel and see how each thinks. also to experience each UX"
X Link 2025-08-31T14:56Z 10.6K followers, [----] engagements

"talked to another great team today who ripped out their ai framework the story is the same every time -- most of the value of an abstraction is abstracting across LLMs the rest eventually weighs you down"
X Link 2025-09-01T04:49Z 10.6K followers, 107.1K engagements

"@pchamal pretty much everyone i talk to lands here: https://www.braintrust.dev/blog/agent-while-loop https://www.braintrust.dev/blog/agent-while-loop"
X Link 2025-09-01T04:54Z 10.6K followers, [----] engagements

"This is the exact opposite conclusion to draw from the acquisition. AI enables you to build much more dynamic products that evolve faster (and even automatically). The foundation underlying that is good evals. A/B tests are officially the way of the past. wow openai just bought @statsig evals are dead. a/b tests are the future of building AI products. wow openai just bought @statsig evals are dead. a/b tests are the future of building AI products"
X Link 2025-09-02T21:00Z 10.6K followers, 63.7K engagements

"As promised some more extended thoughts The tl;dr is that software is changing dramatically and one of the gifts of AI is that you can iterate at the speed of evals rather than A/B tests. Evals are non-trivial to build. figure them out and you win. https://www.braintrust.dev/blog/ab-testing-evals This is the exact opposite conclusion to draw from the acquisition. AI enables you to build much more dynamic products that evolve faster (and even automatically). The foundation underlying that is good evals. A/B tests are officially the way of the past."
X Link 2025-09-03T21:06Z 10.6K followers, 13.7K engagements

"cool to see @Replit ship something genuinely novel -- feels like the code space has been stuck in "terminal vs. ide" wars for the past couple quarters AI agents can prototype apps But shipping real software takes hours of testing debugging and refactoring. Agent [--] is [--] more autonomous it keeps going where others get stuck. The Full Self-Driving moment of software. https://t.co/z66nxQKieO AI agents can prototype apps But shipping real software takes hours of testing debugging and refactoring. Agent [--] is [--] more autonomous it keeps going where others get stuck. The Full Self-Driving moment of"
X Link 2025-09-10T18:23Z 10.6K followers, 27.9K engagements

"it takes an insane amount of work to structure a software project so that an agent can make large meaningful changes to it without careful review. i suspect that this flavor of "meta engineering" will be the new "software engineering""
X Link 2025-09-22T00:27Z 10.6K followers, 62.9K engagements

"a prediction in advance of dev day tomorrow: agent logic/control flow is going to move down into the API layer rather than live in your code. in other words the agent not chat turn will be the API. responses is already designed to support this"
X Link 2025-10-05T22:36Z 10.6K followers, 36.4K engagements

"If you love databases and you love basketball please DM me. Have an interesting opportunity"
X Link 2025-10-10T17:36Z 10.6K followers, 20.3K engagements

"mark my words door to door sales is making a comeback"
X Link 2025-11-01T20:01Z 10.6K followers, 76.8K engagements

"there is so much value in focus"
X Link 2025-11-13T17:07Z 10.6K followers, [----] engagements

Limited data mode. Full metrics available with subscription: lunarcrush.com/pricing

@ankrgyl
/creator/twitter::ankrgyl