[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]
@_avichawla Avi ChawlaAvi Chawla posts on X about ai, token, open ai, agentic the most. They currently have XXXXXX followers and XXX posts still getting attention that total XXXXXX engagements in the last XX hours.
Social category influence technology brands finance stocks social networks
Social topic influence ai, token #1640, open ai #742, agentic #64, llm #134, hidden, native #275, the first, if you, mcp server #64
Top assets mentioned Alphabet Inc Class A (GOOGL) Spotify Technology (SPOT)
Top posts by engagements in the last XX hours
"I have been fine-tuning LLMs for over X years now Here are the top X LLM fine-tuning techniques explained with visuals: First of all what's so different about LLM finetuning Traditional finetuning is impractical for LLMs (billions of params; 100s GB). Since this kind of compute isn't accessible to everyone parameter-efficient finetuning (PEFT) came into existence. Before we go into details of each technique here's some background that will help you better understand these techniques: LLM weights are matrices of numbers adjusted during finetuning. Most PEFT techniques involve finding a"
X Link 2025-12-04T06:30Z 57.1K followers, 127.6K engagements
"Stanford researchers built a new prompting technique By adding XX words to a prompt it: - boosts LLM's creativity by 1.6-2x - raises human-rated diversity by XXXX% - beats fine-tuned model without any retraining - restores XXXX% of LLM's lost creativity after alignment Post-training alignment methods such as RLHF are designed to make LLMs helpful and safe. However these methods unintentionally cause a significant drop in output diversity (called mode collapse). When an LLM collapses to a mode it starts favoring a narrow set of predictable or stereotypical responses over other outputs. This"
X Link 2025-11-27T06:59Z 57.1K followers, 146.3K engagements
"You're in a Research Scientist interview at OpenAI. The interviewer asks: "How would you expand the context length of an LLM from 2K to 128K tokens" You: "I will fine-tune the model on longer docs with 128K context." Interview over. Here's what you missed:"
X Link 2025-12-07T06:42Z 57.1K followers, 247.9K engagements
"To understand KV caching we must know how LLMs output tokens. - Transformer produces hidden states for all tokens. - Hidden states are projected to the vocab space. - Logits of the last token are used to generate the next token. - Repeat for subsequent tokens. Check this๐"
X Link 2025-12-10T06:41Z 57.1K followers, 16.5K engagements
"Thus to generate a new token we only need the hidden state of the most recent token. None of the other hidden states are required. Next let's see how the last hidden state is computed within the transformer layer from the attention mechanism"
X Link 2025-12-10T06:41Z 57.1K followers, 12.5K engagements
"Finally React gets a native way to talk to agents. Building agentic UIs is still way harder than it should be. You've got your agent running on the backend. Maybe it's LangGraph CrewAI or something else. Now you need to: Stream its outputs to your frontend Keep state in sync between UI and agent Handle reconnections when users refresh Manage the agent lifecycle (start/stop/reset) Make it all feel real-time To solve this most teams end up writing a ton of custom glue code like WebSockets here state management there and manual event parsing everywhere. CopilotKit just shipped v1.50 and it"
X Link 2025-12-11T06:31Z 57.1K followers, 35.6K engagements
"Context engineering clearly explained (with visuals): (an illustrated guide below)"
X Link 2025-11-25T06:30Z 56.9K followers, 9925 engagements
"@akshay_pachaar Manual prompt engineering is indeed tedious. I really liked that this solution universally applies to any agentic workflow since all you need is an evaluation dataset an initial prompt and clear evaluation criteria"
X Link 2025-12-02T13:29Z 56.9K followers, 1107 engagements
"Bias-variance tradeoff has a missing detail Not many ML engineers know about it. Consider fitting a polynomial regression model on some dummy dataset say y=sin(x) + noise. As shown in the first plot in the image as we increase the degree (m): - The training loss will go down to zero. - The test (or validation) loss will decrease and then increase. But notice what happens as we continue to increase the degree (m): Test loss decreases again (shown in the second plot) This is called the double descent phenomenon and it is commonly observed in deep learning models. It is counterintuitive since it"
X Link 2025-12-03T06:44Z 56.9K followers, 9967 engagements
"Extending the context window isn't just about larger matrices. In a traditional transformer expanding tokens by 8x increases memory needs by 64x due to the quadratic complexity of attention. Refer to the image below So how do we manage it continue.๐"
X Link 2025-12-07T06:42Z 56.9K followers, 25K engagements
"Docker explained in X minutes Most developers use Docker daily without understanding what happens under the hood. Here's everything you need to know. Docker has X main components: X Docker Client: Where you type commands that talk to the Docker daemon via API. X Docker Host: The daemon runs here handling all the heavy lifting (building images running containers and managing resources) X Docker Registry: Stores Docker images. Docker Hub is public but companies run private registries. Here's what happens when you run "docker run": Docker pulls the image from the registry (if not available"
X Link 2025-12-06T06:30Z 57.1K followers, 28.4K engagements
"The above insight suggests that to generate a new token every attention operation in the network only needs: - query vector of the last token. - all key & value vectors. But there's one more key insight here"
X Link 2025-12-10T06:42Z 57.1K followers, 7253 engagements
"A visual guide to KV caching in LLMs:"
X Link 2025-12-10T19:56Z 57.1K followers, 32.4K engagements
"If you found it insightful reshare it with your network. Find me @_avichawla Every day I share tutorials and insights on DS ML LLMs and RAGs"
X Link 2025-12-11T11:53Z 57.1K followers, 2518 engagements
"10 MCP AI Agents and RAG projects for AI Engineers (with code):"
X Link 2025-04-13T06:32Z 57.1K followers, 712.6K engagements
"You're in an AI engineer interview at Apple. The interviewer asks: "Siri processes 25B requests/mo. How would you use this data to improve its speech recognition" You: "Upload all voice notes from devices to iCloud and train a model" Interview over Here's what you missed: Modern devices (like smartphones) host a ton of data that can be useful for ML models. To get some perspective consider the number of images you have on your phone right now the number of keystrokes you press daily etc. And this is just about one user: you. But applications have millions of users so the amount of data is"
X Link 2025-11-20T06:31Z 57.1K followers, 50.3K engagements
"Pentesting firms don't want you to see this. An open-source AI agent just replicated their $50k service. A "normal" pentest today looks like this: - $20k-$50k per engagement - 4-6 weeks of scoping NDAs kickoff calls - A big PDF that's outdated the moment you ship a new feature Meanwhile AI agents are quietly starting to perform on-par with human pentester on the stuff that actually matters day-to-day: Enumerating attack surface Fuzzing endpoints Chaining simple vulns into real impact Producing PoCs and remediation steps developers can actually use And they do it in hours instead of weeks and"
X Link 2025-11-28T07:19Z 57.1K followers, 225.1K engagements
"Few people know this about L2 regularization: (Hint: it is NOT just a regularization technique) Most models intend to use L2 Regularization for just one thing: Reduce overfitting. However L2 regularization is a great remedy for multicollinearity. Multicollinearity arises when: Two (or more) features are highly correlated OR Two (or more) features can predict another feature. To understand how L2 regularization addresses multicollinearity consider a dataset with two features and a dependent variable (y): featureA featureB Highly correlated with featureA. y = some linear combination of featureA"
X Link 2025-12-02T06:50Z 57.1K followers, 34.4K engagements
"An MCP server that detects production-grade code quality issues in real-time Even though AI is now generating code at light speed the engineering bottleneck has just moved from writing to reviewing and now devs spend XX% of their debugging time on AI-generated code. AI reviewers aren't that reliable either because they share the same fundamental blind spots as AI generators do: - They pattern match not proof check. - They validate syntax not system behavior. - They review code not consequences. I have been using the SonarQube MCP Server (by @SonarSource) to solve this. It produces"
X Link 2025-12-05T06:31Z 57.1K followers, 28.5K engagements
"SonarQube MCP server: (don't forget to star it )"
X Link 2025-12-05T06:31Z 57.1K followers, 2195 engagements
"If you need a video guide to Karpathy's nanochat check out Stanford's CS336 It covers: - Tokenization - Resource Accounting - Pretraining - Finetuning (SFT/RLHF) - Overview of Key Architectures - Working with GPUs - Kernels and Tritons - Parallelism - Scaling Laws - Inference - Evaluation - Alignment Everything you need to prepare for a job at Frontier AI Labs. I have shared the playlist in the replies"
X Link 2025-12-08T06:31Z 57.1K followers, 41.7K engagements
"AWS did it again They have introduced a novel way for developers to build Agents. Today when you build an Agent you start with a simple goal then end up juggling prompts routing logic error handling tool orchestration and fallback flows. One unexpected user input and the whole thing collapses. Strands Agents framework by AWS approaches Agent building differently. It takes a model-driven approach that lets the LLM decide how to plan choose tools and adapt to edge cases on its own. You provide the capabilities and guardrails and the model handles the workflow which is different from the brittle"
X Link 2025-12-09T06:31Z 57.1K followers, 43K engagements
"You're in an AI Engineer interview at OpenAI. The interviewer asks: "Our GPT model generates XXX tokens in XX seconds. How do you make it 5x faster" You: "I'll allocate more GPUs for faster generation." Interview over. Here's what you missed:"
X Link 2025-12-10T06:41Z 57.1K followers, 185K engagements
"During attention: The last row of query-key-product involves: - the last query vector. - all key vectors. Also the last row of the final attention result involves: - the last query vector. - all key & value vectors. Check this visual to understand better:"
X Link 2025-12-10T06:41Z 57.1K followers, 9510 engagements
"This is called KV caching To reiterate instead of redundantly computing KV vectors of all context tokens cache them. To generate a token: - Generate QKV vector for the token generated one step before. - Get all other KV vectors from cache. - Compute attention. Check this๐"
X Link 2025-12-10T06:42Z 57.1K followers, 5711 engagements
"KV caching speeds up inference by computing the prompt's KV cache before generating tokens. This is exactly why ChatGPT takes longer to generate the first token than the rest. This delay is known as time-to-first-token (TTFT). Improving TTFT is a topic for another day"
X Link 2025-12-10T06:42Z 57.1K followers, 6216 engagements
"GitHub repo: (don't forget to star it )"
X Link 2025-12-11T06:31Z 57.1K followers, 2786 engagements
"- Google Maps uses graph ML to predict ETA - Netflix uses graph ML in recommendation - Spotify uses graph ML in recommendation - Pinterest uses graph ML in recommendation Here are X must-know ways for graph feature engineering (with code):"
X Link 2025-12-12T06:46Z 57.1K followers, 144.7K engagements
"@akshay_pachaar This was much-awaited Sharing this visual I created once that summarizes the three main protocols:"
X Link 2025-12-12T13:38Z 57.1K followers, XXX engagements