[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

# ![@_avichawla Avatar](https://lunarcrush.com/gi/w:26/cr:twitter::1175166450832687104.png) @_avichawla Avi Chawla

Avi Chawla posts on X about llm, for your, should be, mit the most. They currently have XXXXXX followers and XX posts still getting attention that total XXXXXX engagements in the last XX hours.

### Engagements: XXXXXX [#](/creator/twitter::1175166450832687104/interactions)
![Engagements Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::1175166450832687104/c:line/m:interactions.svg)

- X Week XXXXXXX -XX%
- X Month XXXXXXXXX -XX%
- X Months XXXXXXXXXX +150%
- X Year XXXXXXXXXX +8,320%

### Mentions: XX [#](/creator/twitter::1175166450832687104/posts_active)
![Mentions Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::1175166450832687104/c:line/m:posts_active.svg)

- X Week XX -XX%
- X Month XX -XX%
- X Months XXX +190%
- X Year XXX +1,892%

### Followers: XXXXXX [#](/creator/twitter::1175166450832687104/followers)
![Followers Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::1175166450832687104/c:line/m:followers.svg)

- X Week XXXXXX +0.45%
- X Month XXXXXX +3.10%
- X Months XXXXXX +101%
- X Year XXXXXX +1,948%

### CreatorRank: XXXXXXX [#](/creator/twitter::1175166450832687104/influencer_rank)
![CreatorRank Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::1175166450832687104/c:line/m:influencer_rank.svg)

### Social Influence [#](/creator/twitter::1175166450832687104/influence)
---

**Social category influence**
[technology brands](/list/technology-brands)  XXXX% [stocks](/list/stocks)  XXXX% [social networks](/list/social-networks)  XXXX%

**Social topic influence**
[llm](/topic/llm) #21, [for your](/topic/for-your) 4.76%, [should be](/topic/should-be) 2.38%, [mit](/topic/mit) 2.38%, [instead of](/topic/instead-of) 2.38%, [inference](/topic/inference) 2.38%, [token](/topic/token) 2.38%, [$googl](/topic/$googl) 2.38%, [sim](/topic/sim) 2.38%, [stack](/topic/stack) XXXX%

**Top accounts mentioned or mentioned by**
[@akshaypachaar](/creator/undefined) [@ianisarobot](/creator/undefined) [@ronald_vanloon](/creator/undefined) [@suhrabautomates](/creator/undefined) [@akshay_pachaar](/creator/undefined) [@grok](/creator/undefined) [@howardaulsbrook](/creator/undefined) [@lokeshsivakumar](/creator/undefined) [@davethackeray](/creator/undefined) [@1__________l1l_](/creator/undefined) [@otto_katz_456](/creator/undefined) [@whitewoodcity](/creator/undefined) [@freedrthug](/creator/undefined) [@realmikeescarn](/creator/undefined) [@_junaidkhalid1](/creator/undefined) [@duru_tobe](/creator/undefined)

**Top assets mentioned**
[Alphabet Inc Class A (GOOGL)](/topic/$googl)
### Top Social Posts [#](/creator/twitter::1175166450832687104/posts)
---
Top posts by engagements in the last XX hours

"This should be impossible You can clean any ML dataset in just three lines of code. Flag outliers find label errors and more across: - Any data (tabular text image etc.) - Any task (classification entity recognition etc.) XXX% open-source built by MIT researchers"  
[X Link](https://x.com/_avichawla/status/1977629357108867396) [@_avichawla](/creator/x/_avichawla) 2025-10-13T06:55Z 53K followers, 29.4K engagements


"Fine-tuning LLM Agents without Fine-tuning LLMs Imagine improving your AI agent's performance from experience without ever touching the model weights. It's just like how humans remember past episodes and learn from them. That's precisely what Memento does. The core concept: Instead of updating LLM weights Memento learns from experiences using memory. It reframes continual learning as memory-based online reinforcement learning over a memory-augmented MDP. Think of it as giving your agent a notebook to remember what worked and what didn't How does it work The system breaks down into two key"  
[X Link](https://x.com/_avichawla/status/1981246733322768780) [@_avichawla](/creator/x/_avichawla) 2025-10-23T06:30Z 53K followers, 56.3K engagements


"KV caching is a technique used to speed up LLM inference. Before understanding the internal details look at the inference speed difference in the video: - with KV caching X seconds - without KV caching XX seconds (5x slower) Let's dive in"  
[X Link](https://x.com/_avichawla/status/1949356658658005482) [@_avichawla](/creator/x/_avichawla) 2025-07-27T06:30Z 53K followers, 22.3K engagements


"To understand KV caching we must know how LLMs output tokens. - Transformer produces hidden states for all tokens. - Hidden states are projected to the vocab space. - Logits of the last token are used to generate the next token. - Repeat for subsequent tokens. Check this👇"  
[X Link](https://x.com/_avichawla/status/1949356680871067925) [@_avichawla](/creator/x/_avichawla) 2025-07-27T06:30Z 53K followers, 19.3K engagements


"Google did it again First they launched ADK a fully open-source framework to build orchestrate evaluate and deploy production-grade Agentic systems. And now they have made it even powerful Google ADK is now fully compatible with all three major AI protocols out there: - MCP: To connect to external tools - A2A: To connect to other agents - AG-UI: To connect to users. AG-UI is the newest addition which is an open-source protocol that enables agents to collaborate with users. They worked with the AG-UI team to build this. It takes just two steps: - Define the agent with ADK with its tools state"  
[X Link](https://x.com/_avichawla/status/1975811218763096199) [@_avichawla](/creator/x/_avichawla) 2025-10-08T06:31Z 52.9K followers, 164.3K engagements


"10 MCP AI Agents and RAG projects for AI Engineers (with code):"  
[X Link](https://x.com/_avichawla/status/1911306413932163338) [@_avichawla](/creator/x/_avichawla) 2025-04-13T06:32Z 53K followers, 699.3K engagements


"How LLMs work clearly explained (with visuals):"  
[X Link](https://x.com/_avichawla/status/1942472125484523605) [@_avichawla](/creator/x/_avichawla) 2025-07-08T06:33Z 53K followers, 745.7K engagements


"Figma canvas to build AI agent workflows. Sim is a lightweight user-friendly platform for building AI agent workflows in minutes. It natively supports all major LLMs Vector DBs etc. XXX% open-source with 7k+ stars"  
[X Link](https://x.com/_avichawla/status/1957691571908038717) [@_avichawla](/creator/x/_avichawla) 2025-08-19T06:30Z 53K followers, 81.9K engagements


"DeepMind built a simple RAG technique that: - reduces hallucinations by XX% - improves answer relevancy by XX% Let's understand how to use it in RAG systems (with code):"  
[X Link](https://x.com/_avichawla/status/1958053890831872065) [@_avichawla](/creator/x/_avichawla) 2025-08-20T06:30Z 53K followers, 668.8K engagements


"I've been coding in Python for X years now. If I were to start over today here's a complete roadmap:"  
[X Link](https://x.com/_avichawla/status/1968926369599078503) [@_avichawla](/creator/x/_avichawla) 2025-09-19T06:33Z 53K followers, 221.2K engagements


"A great tool to estimate how much VRAM your LLMs actually need. Alter the hardware config quantization etc. it tells you about: - Generation speed (tokens/sec) - Precise memory allocation - System throughput etc. No more VRAM guessing"  
[X Link](https://x.com/_avichawla/status/1978710362557235495) [@_avichawla](/creator/x/_avichawla) 2025-10-16T06:31Z 53K followers, 22K engagements


"Keras now lets you quantize models with just one line of code You can either quantize your own models or any pre-trained model obtained from KerasHub. Simply run model.quantize(quantization_mode). Supports quantization to int4 int8 float8 and GPTQ modes"  
[X Link](https://x.com/_avichawla/status/1979435056251965809) [@_avichawla](/creator/x/_avichawla) 2025-10-18T06:31Z 53K followers, 20.8K engagements


"The open-source RAG stack (2025):"  
[X Link](https://x.com/_avichawla/status/1980521956446462455) [@_avichawla](/creator/x/_avichawla) 2025-10-21T06:30Z 53K followers, 42.9K engagements


"@akshay_pachaar Zep is my go-to for memory. This is one of my local use cases where I have both Claude and Cursor connected via a common memory layer. My thread about it 👇"  
[X Link](https://x.com/_avichawla/status/1978091449884233966) [@_avichawla](/creator/x/_avichawla) 2025-10-14T13:32Z 53K followers, XXX engagements


"Try it here"  
[X Link](https://x.com/_avichawla/status/1978710374540341407) [@_avichawla](/creator/x/_avichawla) 2025-10-16T06:31Z 53K followers, 1826 engagements


"The only MCP server you'll ever need MindsDB lets you query data from 200+ sources like Slack Gmail social platforms and more in both SQL and natural language. A federated query engine that comes with a built-in MCP server. XXX% open-source with 33k+ stars"  
[X Link](https://x.com/_avichawla/status/1944283926622875816) [@_avichawla](/creator/x/_avichawla) 2025-07-13T06:33Z 53K followers, 213.4K engagements


"The power of MCP explained in one picture Without MCP: - Every LLM app wrote its own tool integration - M apps & N tools = MN integrations With MCP: - Create an MCP server for your tool and plug it into an LLM app - You go from MN integrations to M+N integrations"  
[X Link](https://x.com/_avichawla/status/1966751224356892769) [@_avichawla](/creator/x/_avichawla) 2025-09-13T06:30Z 53K followers, 45.5K engagements


"Agents forget everything after each task Graphiti builds temporally-aware knowledge graphs for your AI agents. Integrating its MCP server with Claude/Cursor adds a powerful memory layer to all your AI interactions across apps. XXX% open-source with 18k+ stars"  
[X Link](https://x.com/_avichawla/status/1976536015969120312) [@_avichawla](/creator/x/_avichawla) 2025-10-10T06:31Z 53K followers, 58.7K engagements


"Researchers from Meta built a new RAG approach that: - outperforms LLaMA on XX RAG benchmarks. - has 30.85x faster time-to-first-token. - handles 16x larger context windows. - and it utilizes 2-4x fewer tokens. Here's the core problem with a typical RAG setup that Meta solves: Most of what we retrieve in RAG setups never actually helps the LLM. In classic RAG when a query arrives: - You encode it into a vector. - Fetch similar chunks from vector DB. - Dump the retrieved context into the LLM. It typically works but at a huge cost: - Most chunks contain irrelevant text. - The LLM has to process"  
[X Link](https://x.com/_avichawla/status/1977260787027919209) [@_avichawla](/creator/x/_avichawla) 2025-10-12T06:31Z 53K followers, 99.9K engagements


"Finally Python XXXX lets you disable GIL It's a big deal because earlier even if you wrote multi-threaded code Python could only run one thread at a time giving no performance benefit. But now Python can run your multi-threaded code in parallel. And uv fully supports it"  
[X Link](https://x.com/_avichawla/status/1977985594103140710) [@_avichawla](/creator/x/_avichawla) 2025-10-14T06:31Z 53K followers, 547.8K engagements


"A time-complexity cheat sheet of XX ML algorithms: What's the inference time-complexity of KMeans"  
[X Link](https://x.com/_avichawla/status/1978353875494240260) [@_avichawla](/creator/x/_avichawla) 2025-10-15T06:54Z 53K followers, 32.9K engagements


"Youre in an ML Engineer interview at Stripe. The interviewer asks: "People often dispute transactions they actually made. How to build a supervised model that predicts fake disputes Theres no labeled data." You: "I'll flag cards with high dispute rates." Interview over. Here's what you missed: Active learning is a relatively easy and inexpensive way to build supervised models when you dont have annotated data to begin with. As the name suggests the idea is to build the model with active human feedback on examples it is struggling with. The visual below summarizes this. 1) Begin by manually"  
[X Link](https://x.com/_avichawla/status/1979072665194500539) [@_avichawla](/creator/x/_avichawla) 2025-10-17T06:31Z 53K followers, 40.4K engagements


"Here's a neural net optimization trick that leads to 4x faster CPU to GPU transfers. Imagine an image classification task. - We define the network load the data and transform it. - In the training loop we transfer the data to the GPU and train. Here's the problem with this: If you look at the profiler: - Most of the time/resources will be allocated to the kernel (the actual training code). - However a significant amount of time will also be dedicated to data transfer from CPU to GPU (this appears under cudaMemcpyAsync). Reducing the data transfer is simple. Recall that the original dataset"  
[X Link](https://x.com/_avichawla/status/1979797527772930278) [@_avichawla](/creator/x/_avichawla) 2025-10-19T06:31Z 53K followers, 37K engagements


"Finally researchers have open-sourced a new reasoning approach that actually prevents hallucinations in LLMs. It beats popular techniques like Chain-of-Thought and has a SOTA success rate of 90.2%. Here's the core problem with current techniques that this new approach solves: We have enough research to conclude that LLMs often struggle to assess what truly matters in a particular stage of a long multi-turn conversation. For instance when you give Agents a 2000-word system prompt filled with policies tone rules and behavioral dos and donts you expect them to follow it word by word. But heres"  
[X Link](https://x.com/_avichawla/status/1980159925109309799) [@_avichawla](/creator/x/_avichawla) 2025-10-20T06:31Z 53K followers, 89.4K engagements


"Finally someone fixed LLM hallucinations Parlant lets you build Agents that do not hallucinate and actually follow instructions. One of the things that makes this possible is the new reasoning approach that the open-source Parlant framework uses. Detailed explainer below:"  
[X Link](https://x.com/_avichawla/status/1980359702602215825) [@_avichawla](/creator/x/_avichawla) 2025-10-20T19:45Z 53K followers, 56.7K engagements


"Pytest for LLM Apps is finally here DeepEval turns LLM evals into a two-line test suite to help you identify the best models prompts and architecture for AI workflows (including MCPs). Learn the limitations of G-Eval and an alternative to it in the explainer below:"  
[X Link](https://x.com/_avichawla/status/1981076827591626943) [@_avichawla](/creator/x/_avichawla) 2025-10-22T19:14Z 53K followers, 11.5K engagements


"Let's build a reasoning LLM using GRPO from scratch (100% local):"  
[X Link](https://x.com/_avichawla/status/1981609451498156413) [@_avichawla](/creator/x/_avichawla) 2025-10-24T06:31Z 53K followers, 14.5K engagements


"You're in an ML Engineer interview at Apple. The interviewer asks: "Two models are XX% accurate. - Model A is XX% confident. - Model B is XX% confident. Which one would you pick" You: "Any would work since both have same accuracy." Interview over. Here's what you missed: Modern neural networks can be misleading. They are overconfident in their predictions. For instance I saw an experiment that used the CIFAR-100 dataset to compare LeNet with ResNet. LeNet produced: - Accuracy = XXXX - Average confidence = XXXX ResNet produced: - Accuracy = XXX - Average confidence = XXX Despite being more"  
[X Link](https://x.com/_avichawla/status/1981971955990544832) [@_avichawla](/creator/x/_avichawla) 2025-10-25T06:31Z 53K followers, 60.9K engagements


"@IanIsARobot I doubt if that would be good enough. Going back to what I mentioned in the post the model would be XX% confident that it has written the correct code which turns out to be XX% accurate only. @grok any suggestions specifically for calibration in generative models"  
[X Link](https://x.com/_avichawla/status/1982070441998012729) [@_avichawla](/creator/x/_avichawla) 2025-10-25T13:03Z 53K followers, 1177 engagements

[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

@_avichawla Avi Chawla

Avi Chawla posts on X about llm, for your, should be, mit the most. They currently have XXXXXX followers and XX posts still getting attention that total XXXXXX engagements in the last XX hours.

Engagements: XXXXXX #

X Week XXXXXXX -XX%
X Month XXXXXXXXX -XX%
X Months XXXXXXXXXX +150%
X Year XXXXXXXXXX +8,320%

Mentions: XX #

X Week XX -XX%
X Month XX -XX%
X Months XXX +190%
X Year XXX +1,892%

Followers: XXXXXX #

X Week XXXXXX +0.45%
X Month XXXXXX +3.10%
X Months XXXXXX +101%
X Year XXXXXX +1,948%

CreatorRank: XXXXXXX #

Social Influence #

Social category influence technology brands XXXX% stocks XXXX% social networks XXXX%

Social topic influence llm #21, for your 4.76%, should be 2.38%, mit 2.38%, instead of 2.38%, inference 2.38%, token 2.38%, $googl 2.38%, sim 2.38%, stack XXXX%

Top accounts mentioned or mentioned by @akshaypachaar @ianisarobot @ronald_vanloon @suhrabautomates @akshay_pachaar @grok @howardaulsbrook @lokeshsivakumar @davethackeray @1__________l1l_ @otto_katz_456 @whitewoodcity @freedrthug @realmikeescarn @_junaidkhalid1 @duru_tobe

Top assets mentioned Alphabet Inc Class A (GOOGL)

Top Social Posts #

Top posts by engagements in the last XX hours

"This should be impossible You can clean any ML dataset in just three lines of code. Flag outliers find label errors and more across: - Any data (tabular text image etc.) - Any task (classification entity recognition etc.) XXX% open-source built by MIT researchers"
X Link @_avichawla 2025-10-13T06:55Z 53K followers, 29.4K engagements

"Fine-tuning LLM Agents without Fine-tuning LLMs Imagine improving your AI agent's performance from experience without ever touching the model weights. It's just like how humans remember past episodes and learn from them. That's precisely what Memento does. The core concept: Instead of updating LLM weights Memento learns from experiences using memory. It reframes continual learning as memory-based online reinforcement learning over a memory-augmented MDP. Think of it as giving your agent a notebook to remember what worked and what didn't How does it work The system breaks down into two key"
X Link @_avichawla 2025-10-23T06:30Z 53K followers, 56.3K engagements

"KV caching is a technique used to speed up LLM inference. Before understanding the internal details look at the inference speed difference in the video: - with KV caching X seconds - without KV caching XX seconds (5x slower) Let's dive in"
X Link @_avichawla 2025-07-27T06:30Z 53K followers, 22.3K engagements

"To understand KV caching we must know how LLMs output tokens. - Transformer produces hidden states for all tokens. - Hidden states are projected to the vocab space. - Logits of the last token are used to generate the next token. - Repeat for subsequent tokens. Check this👇"
X Link @_avichawla 2025-07-27T06:30Z 53K followers, 19.3K engagements

"Google did it again First they launched ADK a fully open-source framework to build orchestrate evaluate and deploy production-grade Agentic systems. And now they have made it even powerful Google ADK is now fully compatible with all three major AI protocols out there: - MCP: To connect to external tools - A2A: To connect to other agents - AG-UI: To connect to users. AG-UI is the newest addition which is an open-source protocol that enables agents to collaborate with users. They worked with the AG-UI team to build this. It takes just two steps: - Define the agent with ADK with its tools state"
X Link @_avichawla 2025-10-08T06:31Z 52.9K followers, 164.3K engagements

"10 MCP AI Agents and RAG projects for AI Engineers (with code):"
X Link @_avichawla 2025-04-13T06:32Z 53K followers, 699.3K engagements

"How LLMs work clearly explained (with visuals):"
X Link @_avichawla 2025-07-08T06:33Z 53K followers, 745.7K engagements

"Figma canvas to build AI agent workflows. Sim is a lightweight user-friendly platform for building AI agent workflows in minutes. It natively supports all major LLMs Vector DBs etc. XXX% open-source with 7k+ stars"
X Link @_avichawla 2025-08-19T06:30Z 53K followers, 81.9K engagements

"DeepMind built a simple RAG technique that: - reduces hallucinations by XX% - improves answer relevancy by XX% Let's understand how to use it in RAG systems (with code):"
X Link @_avichawla 2025-08-20T06:30Z 53K followers, 668.8K engagements

"I've been coding in Python for X years now. If I were to start over today here's a complete roadmap:"
X Link @_avichawla 2025-09-19T06:33Z 53K followers, 221.2K engagements

"A great tool to estimate how much VRAM your LLMs actually need. Alter the hardware config quantization etc. it tells you about: - Generation speed (tokens/sec) - Precise memory allocation - System throughput etc. No more VRAM guessing"
X Link @_avichawla 2025-10-16T06:31Z 53K followers, 22K engagements

"Keras now lets you quantize models with just one line of code You can either quantize your own models or any pre-trained model obtained from KerasHub. Simply run model.quantize(quantization_mode). Supports quantization to int4 int8 float8 and GPTQ modes"
X Link @_avichawla 2025-10-18T06:31Z 53K followers, 20.8K engagements

"The open-source RAG stack (2025):"
X Link @_avichawla 2025-10-21T06:30Z 53K followers, 42.9K engagements

"@akshay_pachaar Zep is my go-to for memory. This is one of my local use cases where I have both Claude and Cursor connected via a common memory layer. My thread about it 👇"
X Link @_avichawla 2025-10-14T13:32Z 53K followers, XXX engagements

"Try it here"
X Link @_avichawla 2025-10-16T06:31Z 53K followers, 1826 engagements

"The only MCP server you'll ever need MindsDB lets you query data from 200+ sources like Slack Gmail social platforms and more in both SQL and natural language. A federated query engine that comes with a built-in MCP server. XXX% open-source with 33k+ stars"
X Link @_avichawla 2025-07-13T06:33Z 53K followers, 213.4K engagements

"The power of MCP explained in one picture Without MCP: - Every LLM app wrote its own tool integration - M apps & N tools = MN integrations With MCP: - Create an MCP server for your tool and plug it into an LLM app - You go from MN integrations to M+N integrations"
X Link @_avichawla 2025-09-13T06:30Z 53K followers, 45.5K engagements

"Agents forget everything after each task Graphiti builds temporally-aware knowledge graphs for your AI agents. Integrating its MCP server with Claude/Cursor adds a powerful memory layer to all your AI interactions across apps. XXX% open-source with 18k+ stars"
X Link @_avichawla 2025-10-10T06:31Z 53K followers, 58.7K engagements

"Researchers from Meta built a new RAG approach that: - outperforms LLaMA on XX RAG benchmarks. - has 30.85x faster time-to-first-token. - handles 16x larger context windows. - and it utilizes 2-4x fewer tokens. Here's the core problem with a typical RAG setup that Meta solves: Most of what we retrieve in RAG setups never actually helps the LLM. In classic RAG when a query arrives: - You encode it into a vector. - Fetch similar chunks from vector DB. - Dump the retrieved context into the LLM. It typically works but at a huge cost: - Most chunks contain irrelevant text. - The LLM has to process"
X Link @_avichawla 2025-10-12T06:31Z 53K followers, 99.9K engagements

"Finally Python XXXX lets you disable GIL It's a big deal because earlier even if you wrote multi-threaded code Python could only run one thread at a time giving no performance benefit. But now Python can run your multi-threaded code in parallel. And uv fully supports it"
X Link @_avichawla 2025-10-14T06:31Z 53K followers, 547.8K engagements

"A time-complexity cheat sheet of XX ML algorithms: What's the inference time-complexity of KMeans"
X Link @_avichawla 2025-10-15T06:54Z 53K followers, 32.9K engagements

"Youre in an ML Engineer interview at Stripe. The interviewer asks: "People often dispute transactions they actually made. How to build a supervised model that predicts fake disputes Theres no labeled data." You: "I'll flag cards with high dispute rates." Interview over. Here's what you missed: Active learning is a relatively easy and inexpensive way to build supervised models when you dont have annotated data to begin with. As the name suggests the idea is to build the model with active human feedback on examples it is struggling with. The visual below summarizes this. 1) Begin by manually"
X Link @_avichawla 2025-10-17T06:31Z 53K followers, 40.4K engagements

"Here's a neural net optimization trick that leads to 4x faster CPU to GPU transfers. Imagine an image classification task. - We define the network load the data and transform it. - In the training loop we transfer the data to the GPU and train. Here's the problem with this: If you look at the profiler: - Most of the time/resources will be allocated to the kernel (the actual training code). - However a significant amount of time will also be dedicated to data transfer from CPU to GPU (this appears under cudaMemcpyAsync). Reducing the data transfer is simple. Recall that the original dataset"
X Link @_avichawla 2025-10-19T06:31Z 53K followers, 37K engagements

"Finally researchers have open-sourced a new reasoning approach that actually prevents hallucinations in LLMs. It beats popular techniques like Chain-of-Thought and has a SOTA success rate of 90.2%. Here's the core problem with current techniques that this new approach solves: We have enough research to conclude that LLMs often struggle to assess what truly matters in a particular stage of a long multi-turn conversation. For instance when you give Agents a 2000-word system prompt filled with policies tone rules and behavioral dos and donts you expect them to follow it word by word. But heres"
X Link @_avichawla 2025-10-20T06:31Z 53K followers, 89.4K engagements

"Finally someone fixed LLM hallucinations Parlant lets you build Agents that do not hallucinate and actually follow instructions. One of the things that makes this possible is the new reasoning approach that the open-source Parlant framework uses. Detailed explainer below:"
X Link @_avichawla 2025-10-20T19:45Z 53K followers, 56.7K engagements

"Pytest for LLM Apps is finally here DeepEval turns LLM evals into a two-line test suite to help you identify the best models prompts and architecture for AI workflows (including MCPs). Learn the limitations of G-Eval and an alternative to it in the explainer below:"
X Link @_avichawla 2025-10-22T19:14Z 53K followers, 11.5K engagements

"Let's build a reasoning LLM using GRPO from scratch (100% local):"
X Link @_avichawla 2025-10-24T06:31Z 53K followers, 14.5K engagements

"You're in an ML Engineer interview at Apple. The interviewer asks: "Two models are XX% accurate. - Model A is XX% confident. - Model B is XX% confident. Which one would you pick" You: "Any would work since both have same accuracy." Interview over. Here's what you missed: Modern neural networks can be misleading. They are overconfident in their predictions. For instance I saw an experiment that used the CIFAR-100 dataset to compare LeNet with ResNet. LeNet produced: - Accuracy = XXXX - Average confidence = XXXX ResNet produced: - Accuracy = XXX - Average confidence = XXX Despite being more"
X Link @_avichawla 2025-10-25T06:31Z 53K followers, 60.9K engagements

"@IanIsARobot I doubt if that would be good enough. Going back to what I mentioned in the post the model would be XX% confident that it has written the correct code which turns out to be XX% accurate only. @grok any suggestions specifically for calibration in generative models"
X Link @_avichawla 2025-10-25T13:03Z 53K followers, 1177 engagements