# ![@kwindla Avatar](https://lunarcrush.com/gi/w:26/cr:twitter::16375739.png) @kwindla kwindla

kwindla posts on X about realtime, open ai, llm, inference the most. They currently have [------] followers and [---] posts still getting attention that total [-----] engagements in the last [--] hours.

### Engagements: [-----] [#](/creator/twitter::16375739/interactions)
![Engagements Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::16375739/c:line/m:interactions.svg)

- [--] Week [------] +490%
- [--] Month [------] -76%
- [--] Months [-------] -56%
- [--] Year [---------] +40%

### Mentions: [--] [#](/creator/twitter::16375739/posts_active)
![Mentions Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::16375739/c:line/m:posts_active.svg)

- [--] Months [--] -50%
- [--] Year [---] +51%

### Followers: [------] [#](/creator/twitter::16375739/followers)
![Followers Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::16375739/c:line/m:followers.svg)

- [--] Week [------] +1.30%
- [--] Month [------] +2.80%
- [--] Months [------] +16%
- [--] Year [------] +77%

### CreatorRank: [-------] [#](/creator/twitter::16375739/influencer_rank)
![CreatorRank Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::16375739/c:line/m:influencer_rank.svg)

### Social Influence

**Social category influence**
[technology brands](/list/technology-brands)  21.01% [stocks](/list/stocks)  10.14% [social networks](/list/social-networks)  1.45% [travel destinations](/list/travel-destinations)  0.72% [countries](/list/countries)  0.72% [cryptocurrencies](/list/cryptocurrencies)  0.72% [finance](/list/finance)  0.72%

**Social topic influence**
[realtime](/topic/realtime) #572, [open ai](/topic/open-ai) 7.97%, [llm](/topic/llm) #515, [inference](/topic/inference) #239, [ai](/topic/ai) 4.35%, [claude code](/topic/claude-code) #401, [voice](/topic/voice) #3214, [cloudflare](/topic/cloudflare) 3.62%, [mcp server](/topic/mcp-server) 2.9%, [hosted](/topic/hosted) 2.9%

**Top accounts mentioned or mentioned by**
[@pipecatai](/creator/undefined) [@nelson](/creator/undefined) [@huggingface](/creator/undefined) [@pion](/creator/undefined) [@trydaily](/creator/undefined) [@fal](/creator/undefined) [@cloudflare](/creator/undefined) [@aidotengineer](/creator/undefined) [@elevenlabsio](/creator/undefined) [@awscloud](/creator/undefined) [@googledeepmind](/creator/undefined) [@tmztmobile](/creator/undefined) [@bnicholehopkins](/creator/undefined) [@picanteverde](/creator/undefined) [@simonw](/creator/undefined) [@modal](/creator/undefined) [@rimelabs](/creator/undefined) [@chadbailey59](/creator/undefined) [@riteshchopra](/creator/undefined) [@iamhenrymascot](/creator/undefined)

**Top assets mentioned**
[Cloudflare, Inc. (NET)](/topic/cloudflare) [Alphabet Inc Class A (GOOGL)](/topic/$googl) [Robot Consulting Co., Ltd. (LAWR)](/topic/robot) [sETH (SETH)](/topic/$seth) [Scallop (SCA)](/topic/scallop)
### Top Social Posts
Top posts by engagements in the last [--] hours

"Benchmarking LLMs for voice agent use cases. New open source repo along with a deep dive into how we think about measuring LLM performance. The headline results: - The newest SOTA models are all *really* good but too slow for production voice agents. GPT-4.1 and Gemini [---] Flash are still the most widely used models in production. The benchmark shows why. - Ultravox [---] shows that it's possible to close the "intelligence gap" between speech-to-speech models and text-mode LLMs. This is a big deal - Open weights models are climbing up the capability curve. Nemotron [--] Nano is almost as capable"  
[X Link](https://x.com/kwindla/status/2018439972123185379)  2026-02-02T21:42Z 12.2K followers, [----] engagements


"The NVIDIA DGX Spark is a desktop GPU workstation with 128GB of unified memory. Working with the team at @NVIDIAAIDev we've been using these little powerhouse machines for voice agent development testing new models and inference stacks and training LLMs and audio models. Today we published a guide to training the Smart Turn model on the DGX Spark. Smart Turn is a fully open source (and open training data) native audio turn detection model that supports [--] languages. The guide walks you through installing the right dependencies for this new Arm + Blackwell architecture and includes benchmarks"  
[X Link](https://x.com/kwindla/status/2021353464270590328)  2026-02-10T22:39Z 12.2K followers, [----] engagements


"If you have skills that are useful for voice agent development contribute to the repo https://github.com/pipecat-ai/skills https://github.com/pipecat-ai/skills"  
[X Link](https://x.com/kwindla/status/2022011499599081701)  2026-02-12T18:14Z 12.2K followers, [---] engagements


"The Claude Code / Ralph Wiggum moment is exciting for a lot of reasons. One of them is that all of us building AI systems that are just a little bit beyond the capabilities of just prompting a SOTA model now have a shared set of baseline ideas we're building on. Plus an overlapping set of open questions - An agent is an LLM in a loop. (Plus a bunch of tooling integration and domain-specific optimization.) - Context management is a critical job. (Lots of ways to think about this.) - You almost certainly need multiple agents/models/processors/loops/whatever. (Lots of ways to think about this"  
[X Link](https://x.com/kwindla/status/2015185924788015350)  2026-01-24T22:12Z 12.2K followers, [----] engagements


"Voice-only programming with Claude Code . I've been playing with @aconchillo's MCP server that lets you talk to Claude Code from anywhere today. I always have multiple Claudes running and I often want to check in on them when I'm not in front of a computer. Here's a video of Claude doing some front-end web testing hitting an issue and getting input from me and then reporting that the test passed. In the video the Pipecat bot is using Deepgram for transcription and Cartesia for the voice. (Note: I sped up the web testing clickety-click sections of the video.) The code for the MCP server and"  
[X Link](https://x.com/kwindla/status/2015956506118914221)  2026-01-27T01:14Z 12.2K followers, [----] engagements


"Pipecat MCP Server: This is infinitely customizable. Getting started with Pipecat: https://docs.pipecat.ai/getting-started/quickstart https://github.com/pipecat-ai/pipecat-mcp-server https://docs.pipecat.ai/getting-started/quickstart https://github.com/pipecat-ai/pipecat-mcp-server"  
[X Link](https://x.com/kwindla/status/2015956508522250311)  2026-01-27T01:14Z 12.2K followers, [---] engagements


"Async automatic non-blocking context compaction for long-running agents. Last week I gave a talk called Space Machine Sandboxes at the @daytonaio AI builders meetup about patterns for long-running agents. I work a lot on voice AI agents which are fundamentally multi-turn long-context loops. I also build lots of other AI agent stuff often as part of bigger systems that include voice. One of the patterns I showed in the talk is non-blocking compaction. Here's a short clip. https://twitter.com/i/web/status/2016288112629187054 https://twitter.com/i/web/status/2016288112629187054"  
[X Link](https://x.com/kwindla/status/2016288112629187054)  2026-01-27T23:12Z 12.2K followers, 26.5K engagements


"@andxdy @terronk @rootvc Nice Related: (We are a Root Ventures company.) https://github.com/pipecat-ai/pipecat-mcp-server https://github.com/pipecat-ai/pipecat-mcp-server"  
[X Link](https://x.com/kwindla/status/2019528306618667389)  2026-02-05T21:47Z 12.1K followers, [--] engagements


"Blog post link: The Smart Turn open source turn detection model: https://github.com/pipecat-ai/smart-turn https://www.daily.co/blog/training-smart-turn-on-the-nvidia-dgx-spark/ https://github.com/pipecat-ai/smart-turn https://www.daily.co/blog/training-smart-turn-on-the-nvidia-dgx-spark/"  
[X Link](https://x.com/kwindla/status/2021353466397065708)  2026-02-10T22:39Z 12.2K followers, [---] engagements


"New repo: Pipecat Skills for Claude Code So far: - Create and configure a basic voice agent (running locally using any combination of models and services) - Deploy to Pipecat Cloud for production - Start the Pipecat MCP Server to talk to Claude Code via voice (including remotely from your phone) I'm working on an end-to-end testing skill. https://twitter.com/i/web/status/2022011497996816826 https://twitter.com/i/web/status/2022011497996816826"  
[X Link](https://x.com/kwindla/status/2022011497996816826)  2026-02-12T18:14Z 12.2K followers, [---] engagements


"Why do you not call the UI a sub agent if you are not speaking to it directly In this pattern I am speaking to the UI agent directly. It sees the speech input. But it doesn't respond conversationally. It performs specialized tasks related to the UI. I don't think of it as a sub-agent because it isn't controlled by the voice model. I think of it as a parallel agent or a "parallel inference loop." The reason not to have the voice agent control the UI sub-agent is that I think it's hard to implement that without adding latency. I do use sub-agent patterns for other things where the control is"  
[X Link](https://x.com/kwindla/status/2022419275232026835)  2026-02-13T21:15Z 12.2K followers, [--] engagements


"Detailed technical post about this voice agents STT benchmark: Benchmark source code: Benchmark data set on @huggingface . [----] human speech samples captured from real voice agent interactions with verified ground truth transcriptions: https://huggingface.co/datasets/pipecat-ai/stt-benchmark-data https://github.com/pipecat-ai/stt-benchmark https://www.daily.co/blog/benchmarking-stt-for-voice-agents/ https://huggingface.co/datasets/pipecat-ai/stt-benchmark-data https://github.com/pipecat-ai/stt-benchmark https://www.daily.co/blog/benchmarking-stt-for-voice-agents/"  
[X Link](https://x.com/kwindla/status/2022426777285788098)  2026-02-13T21:44Z 12.2K followers, [----] engagements


"@huggingface We also published a benchmark of LLM performance in real-world voice agent use cases recently (long multi-turn conversations with multiple tool calls and accurate instruction following required). https://x.com/kwindla/status/2019120857923375586s=20 Open source voice agent LLM benchmark: https://t.co/CbLijComqQ Technical deep dive into voice agent benchmarking: https://t.co/gHTNRSjwdI https://x.com/kwindla/status/2019120857923375586s=20 Open source voice agent LLM benchmark: https://t.co/CbLijComqQ Technical deep dive into voice agent benchmarking: https://t.co/gHTNRSjwdI"  
[X Link](https://x.com/kwindla/status/2022428506408558870)  2026-02-13T21:51Z 12.2K followers, [----] engagements


"Final transcript what about time until transcription starts streaming In general what we care about is the time from end of speech until the final transcript segment is available. We need the full transcript in order to run LLM inference. I've experimented a fair amount with greedy LLM inference on partial transcript segments and there are not enough gains to make up for the extra work. So "time to first token" from a transcription model isn't a useful metric. This is different from how we measure latency for LLMs and TTS models where we definitely focus on TTFT/TTFB"  
[X Link](https://x.com/kwindla/status/2022447926983954523)  2026-02-13T23:08Z 12.2K followers, [---] engagements


"These are voice agents. Pipecat supports Gemini Live (and Ultravox and OpenAI Realtime). But almost all production voice agents today use multiple models (STT - LLM - TTS) instead of a single speech-to-speech model. You get better latency intelligence and observability from a multi-model approach. I fully expect speech-to-speech models to have more market share over time. But right now SOTA is the multi-model pipeline. https://twitter.com/i/web/status/2022449946881069165 https://twitter.com/i/web/status/2022449946881069165"  
[X Link](https://x.com/kwindla/status/2022449946881069165)  2026-02-13T23:16Z 12.2K followers, [--] engagements


"Our goal was to set up the test the same way real-world input pipelines most often work. [--]. Audio chunks are sent to the STT service at real-time pacing. [--]. Silero VAD is configured to trigger after 200ms of non-speech frames. [--]. When the VAD triggers the STT service is sent a finalize signal. (Not all services support explicit finalization. But we think it's an important feature for real-time STT.) [--]. TTFS is the time between the first non-speech audio frame and the last transcription segment. If you use a service that sends you VAD or end-of-turn events it will function much the same way as"  
[X Link](https://x.com/kwindla/status/2022566485370245271)  2026-02-14T07:00Z 12.2K followers, [--] engagements


".@tavus just published a nice blog post about their "real-time conversation flow and floor transfer" model Sparrow-1. This model does turn detection predicting when it's the Tavus video agent's turn to speak. It does this by analyzing conversation audio in a continuous stream and learning and adapting to user behavior. This model is an impressive achievement. I've had a few opportunities to talk to @code_brian who led the R&D on this model at Tavus about his work. I love Brian's approach to this problem. Among other things the Sparrow-1 architecture allows this model to do things like handle"  
[X Link](https://x.com/kwindla/status/2011286036207583740)  2026-01-14T03:55Z 12.2K followers, [----] engagements


"You are the average of the tokens you spend the most time with. belated 'aha' moment: Context engineering is as impt to inference as Data engineering is important to training belated 'aha' moment: Context engineering is as impt to inference as Data engineering is important to training"  
[X Link](https://x.com/kwindla/status/2018571783444815889)  2026-02-03T06:26Z 12.2K followers, [----] engagements


"I sat down with @zachk and @bnicholehopkins to talk about how we benchmark models for voice AI. Benchmarks are hard to do well and good ones are really useful We covered what makes an LLM actually "intelligent" in a real-world voice conversation the latency vs intelligence trade-off how speech-to-speech models compare to text-mode LLMs infrastructure and full stack challenges and what we're all most focused on in [----]. https://twitter.com/i/web/status/2019120855570366548 https://twitter.com/i/web/status/2019120855570366548"  
[X Link](https://x.com/kwindla/status/2019120855570366548)  2026-02-04T18:48Z 12.2K followers, [----] engagements


"Open source voice agent LLM benchmark: Technical deep dive into voice agent benchmarking: https://www.daily.co/blog/benchmarking-llms-for-voice-agent-use-cases/ https://github.com/kwindla/aiewf-eval https://www.daily.co/blog/benchmarking-llms-for-voice-agent-use-cases/ https://github.com/kwindla/aiewf-eval"  
[X Link](https://x.com/kwindla/status/2019120857923375586)  2026-02-04T18:48Z 12.2K followers, [----] engagements


"Voice-controlled UI. This is an agent design pattern I'm calling EPIC "explicit prompting for implicit coordination." Feel free to suggest a better name. :-) In the video I'm navigating around a map conversationally pulling in information dynamically from tool calls and realtime streamed events. There are two separate agents (inference loops) here: a voice agent and a UI control agent. They know about each other (at the prompt level) but they work independently. https://twitter.com/i/web/status/2022087764720988296 https://twitter.com/i/web/status/2022087764720988296"  
[X Link](https://x.com/kwindla/status/2022087764720988296)  2026-02-12T23:17Z 12.2K followers, 14.2K engagements


"The critical things here are: - We can't block the voice agent's fast responses. - The voice agent already has a lot of instructions in its context and a large number of tools to call so we don't want to give it more to do each inference turn. So we prompt the voice agent to know at a high level what the UI agent will do but to ignore or respond minimally to UI-related requests. This adds relatively little complexity to the voice agent system instruction. We prompt the UI agent with a small subset of world knowledge a few tools and a lot of examples about how to perform useful UI actions in"  
[X Link](https://x.com/kwindla/status/2022087769242448134)  2026-02-12T23:17Z 12.2K followers, [----] engagements


"I'm really looking forward to @NVIDIAGTC in March. Last year was amazing. (And I came home with a new 5090) I've been working on building multi-agent local/cloud hybrid applications on my NVIDIA DGX Spark. Here's a video of an LLM-powered game running on the Spark in which you fly around by talking to your AI space ship. The conversational voice agent is a @pipecat_ai pipeline built with: - Nemotron Speech ASR - Nemotron [--] Nano - Magpie TTS The Nemotron [--] Nano voice agent delegates the long-running agent-loop tasks to bigger models in the cloud. You can see it start tasks in the video. It has"  
[X Link](https://x.com/kwindla/status/2022933254995927087)  2026-02-15T07:17Z 12.2K followers, [----] engagements


"I did a talk a couple of weeks ago about the agent patterns in the game and how they're similar to patterns we use in coding agent harnesses and in voice agents for enterprise applications. Space Machine Sandboxes: https://www.youtube.com/watchv=HnYafj9h-48 https://www.youtube.com/watchv=HnYafj9h-48"  
[X Link](https://x.com/kwindla/status/2022933256975610080)  2026-02-15T07:17Z 12.2K followers, [---] engagements


"@picanteverde @simonw I love the Sesame work but there's no API and the consumer app is still Test Flight only as far as I know. Th version that was released as open source is not a fully capable model"  
[X Link](https://x.com/kwindla/status/2023255052631318851)  2026-02-16T04:36Z 12.2K followers, [---] engagements


"These days Sergio Sillero Head of the Cloud Data & AI at MAPFRE is programming via voice while he shops for groceries. If you're deep in the Claude Code / Ralph Wiggum / tmux world this is not super surprising to you. If you're not it sounds like crazy ridiculous hype. Sergio wrote some voice interface code for his Meta Ray-Bans using the @pipecat_ai MCP server that lets him keep working on a project in @AnthropicAI's Claude Code when he steps away from his desk. https://twitter.com/i/web/status/2023264920968757521 https://twitter.com/i/web/status/2023264920968757521"  
[X Link](https://x.com/kwindla/status/2023264920968757521)  2026-02-16T05:15Z 12.2K followers, [---] engagements


"@_dr5w @simonw True. But for most of these there are only [--] providers. OpenAI/Azure or DeepMind/Vertex. 😀"  
[X Link](https://x.com/kwindla/status/2023270217921785946)  2026-02-16T05:36Z 12.2K followers, [---] engagements


"NVIDIA just released a new open source transcription model Nemotron Speech ASR designed from the ground up for low-latency use cases like voice agents. Here's a voice agent built with this new model. 24ms transcription finalization and total voice-to-voice inference time under 500ms. This agent actually uses *three* NVIDIA open source models: - Nemotron Speech ASR - Nemotron [--] Nano 30GB in a 4-bit quant (released in December) - A preview checkpoint of the upcoming Magpie text-to-speech model These models are all truly open source: weights training data training code and inference code. This"  
[X Link](https://x.com/kwindla/status/2008601714392514722)  2026-01-06T18:09Z 12.2K followers, 279.6K engagements


"New text-to-speech model from @rimelabs today: Arcana v3. Rime's models excel at customization and personality. The new model is fast available in [--] languages and you can use it as a cloud API or run it on-prem. The model also outputs word-level timestamps which is very important for maintaining accurate LLM context during a voice agent conversation. Listen to Arcana v3 in this video. @chadbailey59 uses the open source Pipecat CLI to set up a voice agent from scratch customize the prompt and talk to it"  
[X Link](https://x.com/kwindla/status/2019199774604554709)  2026-02-05T00:01Z 12.2K followers, [----] engagements


"Arcana v3 launch post: Pipecat CLI: https://docs.pipecat.ai/cli/overview https://rime.ai/resources/arcana-v3 https://docs.pipecat.ai/cli/overview https://rime.ai/resources/arcana-v3"  
[X Link](https://x.com/kwindla/status/2019199776106115239)  2026-02-05T00:01Z 12.2K followers, [---] engagements


"@riteshchopra Yes definitely Progressive "skills" loading inside a Pipecat pipeline is something we're doing fairly often these days. For a version of this in a fun voice agent context see the LoadGameInfo tool here: https://github.com/pipecat-ai/gradient-bang/blob/0a5c19f1459b2ddbbb4bc6182944239c63e4c702/src/gradientbang/pipecat_server/voice_task_manager.py#L26 https://github.com/pipecat-ai/gradient-bang/blob/0a5c19f1459b2ddbbb4bc6182944239c63e4c702/src/gradientbang/pipecat_server/voice_task_manager.py#L26"  
[X Link](https://x.com/kwindla/status/2022027383965200619)  2026-02-12T19:17Z 12.2K followers, [--] engagements


"Wake up babe. New Pareto frontier chart just dropped. Benchmarking STT for voice agents: we just published one of the internal benchmarks we use to measure latency and real-world performance of transcription models. - Median P95 and P99 "time to final transcript" numbers for hosted STT APIs. - A standardized "Semantic Word Error Rate" metric that measures transcription accuracy in the context of a voice agent pipeline. - We worked with all the model providers to optimize the configurations and @pipecat_ai implementations so that the benchmark is as fair and representative as we can possibly"  
[X Link](https://x.com/kwindla/status/2022426774815281630)  2026-02-13T21:44Z 12.2K followers, 11.2K engagements


"@iAmHenryMascot It's a new game we're building: https://github.com/pipecat-ai/gradient-bang https://github.com/pipecat-ai/gradient-bang"  
[X Link](https://x.com/kwindla/status/2022747330844500083)  2026-02-14T18:58Z 12.2K followers, [--] engagements


"Spending valentine's day exactly as you'd expect. (Arguing politely on LinkedIn about how to accurately measure latency and word error rates.) Wake up babe. New Pareto frontier chart just dropped. Benchmarking STT for voice agents: we just published one of the internal benchmarks we use to measure latency and real-world performance of transcription models. - Median P95 and P99 "time to final transcript" numbers https://t.co/y9qCrJLe0L Wake up babe. New Pareto frontier chart just dropped. Benchmarking STT for voice agents: we just published one of the internal benchmarks we use to measure"  
[X Link](https://x.com/kwindla/status/2022751171761623153)  2026-02-14T19:13Z 12.2K followers, [----] engagements


"@MAnfilofyev Super-impressive work from the @ultravox_dot_ai team on v0.7"  
[X Link](https://x.com/kwindla/status/2023249452463767797)  2026-02-16T04:13Z 12.2K followers, [---] engagements


"I do think Gemini Live has a lot of potential. It's currently too slow (2.5s voice-to-voice P50) and the API is missing important features for real-world voice workflows. You can't do context engineering mid-conversation. If you really need a speech-to-speech model for production use you're better off right now with gpt-realtime. But I expect the Gemini Live team to make progress this year https://twitter.com/i/web/status/2023261706970124674 https://twitter.com/i/web/status/2023261706970124674"  
[X Link](https://x.com/kwindla/status/2023261706970124674)  2026-02-16T05:02Z 12.2K followers, [---] engagements


"I don't think Sergio is here so you have to go follow him on the other thing: He's planning to demo his Ray-Bans + Claude Code integration at the February 25th Voice AI Meetup in Barcelona: https://www.voiceaispace.com/events/voice-ai-space-barcelona-meetup https://www.linkedin.com/feed/update/urn:li:activity:7428880495075078144/ https://www.voiceaispace.com/events/voice-ai-space-barcelona-meetup https://www.linkedin.com/feed/update/urn:li:activity:7428880495075078144/"  
[X Link](https://x.com/kwindla/status/2023264922638160349)  2026-02-16T05:15Z 12.2K followers, [---] engagements


"My thinking about this has evolved a lot now that we have real-world data from millions of interactions with voice agents. I used to aim for 500-800ms voice-to-voice latency. It turns out that people are totally fine in real conversations until latency gets above 1500ms. So now I talk about 1500ms as the "hard" cutoff that you need your P95 to be under. Note this is voice-to-voice measured on the client side so that you include networks audio buffers OS and bluetooth playout delays etc. https://twitter.com/i/web/status/2023447213725413853 https://twitter.com/i/web/status/2023447213725413853"  
[X Link](https://x.com/kwindla/status/2023447213725413853)  2026-02-16T17:19Z 12.2K followers, [--] engagements


"thats just called proactive interruptions I interpreted what @rajivayyangar was proposing in a different way: use predictive inference at the turn detection step to claw back delays introduced by other parts of the pipeline. So not aiming to actually interrupt but to reduce the voice-to-voice latency a little bit"  
[X Link](https://x.com/kwindla/status/1944989720045609028)  2025-07-15T05:17Z [----] followers, [--] engagements


"@taishik_ @thorwebdev @elevenlabsio @pipecat_ai @posthog Hardware hacking is so much fun Big props to @_pion and the libpeer author sepfy for laying the open source foundation for the pipecat-esp32 webrtc project"  
[X Link](https://x.com/kwindla/status/1945175588790047085)  2025-07-15T17:36Z [----] followers, [--] engagements


"Hot takes (from me) and reasoned discussion (from Sam) on: The current best practices for building production enterprise scale voice agents. What models to use how to think about infrastructure and what the solved and unsolved problems in voice AI are right now. Why I spend most of my time now working on an an open source vendor neutral community driven codebase even though there's always an (additional) infinite amount of work to do running a company. Building developer tools in this new AI era. The importance of moving towards voice and multimodal realtime standards. Doing inference on the"  
[X Link](https://x.com/kwindla/status/1945326773480177998)  2025-07-16T03:36Z [----] followers, [----] engagements


"Smart Turn v2: open source native audio turn detection in [--] languages. New checkpoint of the open source open data open training code semantic VAD model on @huggingface @FAL and @pipecat_ai. - 3x faster inference (12ms on an L40) - [--] languages (13 more than v1 which was english-only) - New synthetic data set chirp_3_all with 163k audio samples - 99% accuracy on held out human_5_all test data Good turn detection is critical for voice agents. This model "understands" both semantic and audio patterns and mitigates the voice AI trade-off between unwanted turn latency vs the agent interrupting"  
[X Link](https://x.com/kwindla/status/1946267669638046010)  2025-07-18T17:55Z 10K followers, 42K engagements


"This model is designed to be used together with a traditional VAD model for voice AI conversations. The voice AI pipeline typically looks like this: [--]. A very short VAD timeout chunks the audio stream for the smart-turn model [--]. Transcription runs in parallel [--]. Transcription output is gated on turn detection before going to the rest of the pipeline The basic idea here is that you want turn detection to happen faster than transcription. It doesn't really matter how much faster because you need "final" transcription fragments before you can run LLM inference. We're generally aiming for 400ms"  
[X Link](https://x.com/kwindla/status/1946267672280535407)  2025-07-18T17:55Z [----] followers, [----] engagements


"Blog post with code examples and docs pointers: - Repo with training code and development notes: - Weights: - Data sets: - Use the model for free in Pipecat Cloud + @FAL: - Here's a demo app hosted on @ vercel and Pipecat Cloud: - - https://github.com/pipecat-ai/pipecat/tree/main/examples/fal-smart-turn https://pcc-smart-turn.vercel.app/ https://docs.pipecat.daily.co/pipecat-in-production/smart-turn https://huggingface.co/pipecat-ai https://huggingface.co/pipecat-ai/smart-turn-v2 https://github.com/pipecat-ai/smart-turn"  
[X Link](https://x.com/kwindla/status/1946267675182907422)  2025-07-18T17:55Z [----] followers, [----] engagements


"And there's still some low-hanging stuff in the FastAPI request path that could cut another 100ms I think. Actual inference on an L40 is 12ms. Marcus who did the heavy lifting on this version but isn't on X was focused on getting inference time down without sacrificing accuracy. I think the P50 could easily be under 300ms including the 200ms VAD timeout and the network request to do smart-turn inference"  
[X Link](https://x.com/kwindla/status/1946283953616523515)  2025-07-18T19:00Z [----] followers, [---] engagements


"Lots of prior art using the Wav2Vec2 family of models for classification tasks. This task feels more like a classification task than a generative task. Though you can definitely get to a good model for turn detection either way I'm sure. We did a bunch of experiments with different architectures and base models. Wav2Vec2 had a good combination of useful pre-training to build on flexible architecture that can accommodate a lot of different potential classification heads and other modifications and the potential for very fast inference. I actually think an LSTM approach here makes a lot of"  
[X Link](https://x.com/kwindla/status/1946441277857501628)  2025-07-19T05:25Z [----] followers, [---] engagements


"Oh cool. I can definitely imagine Whisper performing better as a base for some kinds of audio classification tasks. Maybe especially the larger model sizes. A primary requirement for this task is very fast inference time. Im not super optimistic that it would be easy to get a classification-oriented Whisper variant down to 12ms inference on an L40 which is what this checkpoint of the smart-turn model clocks in at. But I could be wrong. My intuition after working on this problem for a bit is that Id probably put effort into going smaller and continuing to improve the data sets. Rather than"  
[X Link](https://x.com/kwindla/status/1946633889533108322)  2025-07-19T18:11Z [----] followers, [--] engagements


"Join us for ⚡ talks Wednesday night at @Cloudflare in San Francisco. There will be presentations and conversations about voice observability autonomy testing and evals and collaborative development. ⚡ @MarcKlingen / @Langfuse ⚡ @AxelBacklund / @AndonLabs ⚡ @ShivSakhuja / @AthinaAI ⚡ @lilyjclifford / @rimelabs I'm going to show some brand new @pipecat_ai voice agent user interface tooling 🔊🤖💻 📅 When: Wednesday July [--] 🕑 Time: 6:00 PM 8:30 PM RSVP below"  
[X Link](https://x.com/kwindla/status/1947347825609658512)  2025-07-21T17:27Z [----] followers, [----] engagements


"RSVP link: https://lu.ma/frqm1umn https://lu.ma/frqm1umn"  
[X Link](https://x.com/kwindla/status/1947347827840999452)  2025-07-21T17:27Z [----] followers, [---] engagements


"@anarchyco @Cloudflare @MarcKlingen @langfuse @axelbacklund @andonlabs @shivsakhuja If you come for a visit Id love to hang out"  
[X Link](https://x.com/kwindla/status/1947391550994153585)  2025-07-21T20:21Z [----] followers, [--] engagements


"@lukestanley Why even use WebRTC at all Short answer you need a UDP-based protocol for realtime voice. WebRTC is UDP and has a lot of other things you need built in too. https://voiceaiandvoiceagents.com/#websockets-webrtc https://voiceaiandvoiceagents.com/#websockets-webrtc"  
[X Link](https://x.com/kwindla/status/1947477986808369484)  2025-07-22T02:05Z [----] followers, [---] engagements


"I think there's going to be a lot of evolution in how we think about WebRTC because use cases for voice (and video) AI are very different from what we spent most of our engineering time on for the last few years. I'm excited about the return of peer-to-peer though there are trade-offs of course. I think it's possible that voice AI might even push the QUIC and MOC (and related) standards work in a good direction. I wrote about that here: https://voiceaiandvoiceagents.com/#quic-moq https://voiceaiandvoiceagents.com/#quic-moq"  
[X Link](https://x.com/kwindla/status/1947684381147160637)  2025-07-22T15:45Z [----] followers, [---] engagements


".@lizziepika kicking off AI builder night at @Cloudflare"  
[X Link](https://x.com/kwindla/status/1948195160292200666)  2025-07-24T01:34Z [----] followers, [---] engagements


"LLM selective response . If you're building a voice agent yourself you can achieve this (mostly) with a combination of prompting and orchestration logic. Basically "The user will sometimes tell you not to respond. When that happens the only thing you should do is call this tool output this specific token ." Then you need to define either a tool for the model to call that means "cool I intentionally didn't respond yet" or special token handling in your processing pipeline. I've built a few versions of this in @pipecat_ai including one that I still use for personal voice note-taking. I will try"  
[X Link](https://x.com/kwindla/status/1948418055190794658)  2025-07-24T16:20Z [----] followers, [----] engagements


"Hands-on workshop: Build real-time AI Voice Agents on AWS Join us Monday July 28th at the AWS Builder Loft for a hands-on workshop with engineers from @DeepgramAI @awscloud and @trydaily. Explore building realtime conversational agents at production scale . - using Deepgram's transcription and voice models - LLMs and the AWS Strands Agents - running on AWS Bedrock and Pipecat Cloud. We will have a complete repo with code you can clone run locally modify and deploy to the cloud. If you're new to voice AI this is a great way to get started. If you're an experienced voice AI developer and want"  
[X Link](https://x.com/kwindla/status/1948782922217230847)  2025-07-25T16:30Z [----] followers, [---] engagements


"Open source voice AI meetup on Wednesday . Join @trychroma @modal_labs and @trydaily for the monthly SF Voice AI meetup this upcoming Wednesday. The theme is using open source/weights models for conversational voice agents. I'm moderating a panel featuring: - @charles_irl - Pavankumar Reddy (Mistral) - @NikhilKMurthy - Kunal Dhawan (NVIDIA) We'll also show some new voice AI tech stack crossover code: (Modal x @pipecat_ai). Join us in person or via livestream"  
[X Link](https://x.com/kwindla/status/1949177066982899891)  2025-07-26T18:36Z [----] followers, [----] engagements


"Local voice AI with a [---] billion parameter LLM. ✅ - smart-turn v2 - MLX Whisper (large-v3-turbo-q4) - Qwen3-235B-A22B-Instruct-2507-3bit-DWQ - Kokoro All models running local on an M4 mac. Max RAM usage 110GB. Voice-to-voice latency is 950ms. There are a couple of relatively easy ways to carve another 100ms off that number. But it's not a bad start"  
[X Link](https://x.com/kwindla/status/1949308553015263609)  2025-07-27T03:19Z 10.5K followers, 64K engagements


"Code is here: This was made possible by the quant @ivanfioravanti posted today and his advice about changing the memory limits for this big model. And @Prince_Canuma's work on mlx-audio made implementing the in-process Kokoro generation for @pipecat_ai a breeze If you're interested in open source/weights models for voice AI come hang out with us in person or via livestream on Wednesday at the monthly Voice AI meetup. https://lu.ma/u3hzaj71 https://x.com/ivanfioravanti/status/1949081653663469591 https://x.com/ivanfioravanti/status/1949010482230108210"  
[X Link](https://x.com/kwindla/status/1949308554487459877)  2025-07-27T03:19Z [----] followers, [----] engagements


"Here's voice agent code that runs entirely locally on macOS: Models you can run locally are getting better and better. You still need a pretty high-end machine to run an LLM that's good at conversation and tool calling. In my opinion the 30B parameter models are now good enough to be really interesting. But depending on what you're doing you might find a smaller LLM works well too. https://github.com/kwindla/macos-local-voice-agents https://github.com/kwindla/macos-local-voice-agents https://github.com/kwindla/macos-local-voice-agents https://github.com/kwindla/macos-local-voice-agents"  
[X Link](https://x.com/kwindla/status/1949309222606475569)  2025-07-27T03:21Z [----] followers, [--] engagements


"@AhmedRezaT Have not had enough time with the model running in this quant yet to evaluate (even vibe evaluate) tool calling. But I definitely plan to"  
[X Link](https://x.com/kwindla/status/1949314550689677597)  2025-07-27T03:43Z [----] followers, [---] engagements


"@McQueenFu 950ms voice to voice is faster than the majority of voice agents running in production today. Expensive I cant argue with. At least in terms of the initial cost of the M4 machine. But inference is free. 😆"  
[X Link](https://x.com/kwindla/status/1949321969264169176)  2025-07-27T04:12Z [----] followers, [--] engagements


"@McQueenFu Its pretty easy with @pipecat_ai"  
[X Link](https://x.com/kwindla/status/1949337651485798545)  2025-07-27T05:14Z [----] followers, [--] engagements


"I agree with this and its nicely put. But I think Id describe it as reinventing software development tooling over the next few years rather than vibe coding. Its clear now that the bottleneck is not the language models and harnesses. The bottleneck is the shape of the libraries and components we design for them to use. We need a (partly) new set of LEGO blocks for the vibe coding tools to snap together. There is an argument vibe coding will make apps more safe not less. A lot of sloppy languages/systems exist because they're easier to learn. But with AI that is no longer needed. So we can"  
[X Link](https://x.com/kwindla/status/1949501876103450982)  2025-07-27T16:07Z [----] followers, [----] engagements


"Its not trivial to train a really good unified end-to-end audio model (projectors between the stages etc). The Ultravox work is impressive I do think native audio is the long-term future. But you lose some things when you fuse the inference stages as well as gain some (big) advantages in audio understanding and (potentially) latency. Evals are harder. Observability is harder. Being able to pick and choose the most effective STT and TTS based on your own evals is sometimes very helpful. Im also hoping that we get true bidirectional streaming architectures that we can scale up soon. Im actually"  
[X Link](https://x.com/kwindla/status/1949595104609808436)  2025-07-27T22:17Z 10.1K followers, [----] engagements


"The TTFT (latency) for the Qwen model here is about the same as using Groqs API with their larger models. Prefill time dominates the TTFT and Groqs speed advantage doesnt come into play very much because the network penalty is 100ms most of the individual turns add only a few hundred tokens and KV caching locally works well. Having said that for almost all production use cases you cant run a good enough LLM locally on real users devices. This demo uses 110GB of (unified) RAM And for anything where you care about throughput and not just latency Groq blows the M4 out of the water. @kwindla Is"  
[X Link](https://x.com/kwindla/status/1949597006806286745)  2025-07-27T22:25Z [----] followers, [----] engagements


"@almeida_dril I need to run my voice evals set. Have not done that yet"  
[X Link](https://x.com/kwindla/status/1949597819909922926)  2025-07-27T22:28Z [----] followers, [--] engagements


"No I dont think so for general conversational intelligence. Have an open-ended conversation on the good side of the uncanny valley *and* tool calls/instruction following work well with just generic prompting techniques. I think our hardware needs to improve. I think the magic number at least with current architectures is going to be 30B parameters. So maybe just barely if quantization-aware training SOTA continues to improve. But Im a cautious no 16GB is not enough. Its dangerous to make these predictions though. I could be wrong"  
[X Link](https://x.com/kwindla/status/1949600096800411959)  2025-07-27T22:37Z [----] followers, [--] engagements


"More great voice AI example code from @uberboffin. Play the game Guess Who interactively with an AI partner. We had a thread here last week about "selective refusal." Selective refusal means prompting or training an LLM to sometimes not respond to input. Or more accurately to generate a no response response. Your application then handles the no response response however makes sense for your conversation flow. Sam's code defines a function call that facilitates selective refusal. Sam also gives the LLM structured data about the game characters and a very clear prompt defining the game rules."  
[X Link](https://x.com/kwindla/status/1949920476060606596)  2025-07-28T19:50Z [----] followers, [----] engagements


"The code: The AI character "Humphrey" is powered by: 🎙@Speechmatics ASR + diarization 🔗@pipecat_ai for WebRTC + function calls 🧠@OpenAI for smarts 🗣@elevenlabsio for that smooth smooth British voice https://github.com/sam-s10s/pipecat-guess-who-irl/tree/main https://github.com/sam-s10s/pipecat-guess-who-irl/tree/main"  
[X Link](https://x.com/kwindla/status/1949920477801267332)  2025-07-28T19:50Z [----] followers, [---] engagements


"Voice + AWS Bedrock & Claude + Strands Agents [---] people came to the AWS Builder Loft in San Francisco yesterday for a voice agents hands-on workshop. @natrugrats talked about @deepgram_ai's speech models and APIs. Ayan Ray gave a great overview of @awscloud AI models and services. Strands Agents is an open source agents SDK that works with any model/API and has extensive AWS integrations. Strands will make as many tool calls as needed to execute a task so in a conversational voice context you need to run the agent in a parallel process alongside the realtime voice pipeline. You also want to"  
[X Link](https://x.com/kwindla/status/1950366372078493703)  2025-07-30T01:22Z 10.4K followers, [----] engagements


"@mattmireles @natrugrats @awscloud My understanding is that the same Claude you know and love in ChatClaude and Claude Code is whats served in Bedrock"  
[X Link](https://x.com/kwindla/status/1950381530922307797)  2025-07-30T02:22Z [----] followers, [---] engagements


"@swyx @aiDotEngineer @DKundel @OpenAI @shresbm @GoogleDeepMind @intercom @bnicholehopkins @dwyer_neil @raizamrtn Sean is @_pion here"  
[X Link](https://x.com/kwindla/status/1951000317283409989)  2025-07-31T19:21Z [----] followers, [---] engagements


"I feel like we've seen more work on vision than on voice from the frontier labs. (And I'm not against it) And a lot of good vision work in the open weights world too. Shout out to @vikhyatk and Moondream My feeling is that the vision capabilities are there in the models and we just need to be building more applications that use vision and more user interface experiments that aim at the new kinds of software we want to enable. Also image generation that's fast enough to use interactively in applications is here too. But I'd like to see a lot more developers incorporating near-realtime image"  
[X Link](https://x.com/kwindla/status/1951037970653520021)  2025-07-31T21:51Z 10K followers, [---] engagements


"@ivanfioravanti Thank you for your frontier work in LLM applied physics. 🫡"  
[X Link](https://x.com/kwindla/status/1951324520570298594)  2025-08-01T16:49Z 10K followers, [----] engagements


"Vibe coding multimodal voice user interfaces today with Claude Code OpenAI Codex and Google Jules. Jules understood the assignment and went all-in on the story line. Sound on"  
[X Link](https://x.com/kwindla/status/1951868849139146986)  2025-08-03T04:52Z 10.4K followers, 20.9K engagements


"New turn detection model for voice agents from the excellent team at @krispHQ. Fast accurate turn detection is one of the "hard problems" for all of us building voice AI right now. Many many voice agents use Krisp's background voice cancellation models to improve transcription accuracy and reduce unintended interruptions. It's great to see the Krisp team working on turn detection too and offering customers a really good native audio turn detection model. The launch blog post is a good read for anyone interested in voice AI. It covers: - approaches to turn taking - the importance of latency -"  
[X Link](https://x.com/kwindla/status/1952761378558915026)  2025-08-05T15:59Z 10.5K followers, 18K engagements


"@broadfield_dev TTS is Kokoro (via @Prince_Canumas mlx-audio). Links and code in the repo"  
[X Link](https://x.com/kwindla/status/1952973507568226711)  2025-08-06T06:02Z 10.8K followers, [----] engagements


"@wiltongorske @ollama I can help make that happen"  
[X Link](https://x.com/kwindla/status/1953220666049872231)  2025-08-06T22:24Z 10.5K followers, [---] engagements


"GPT-5 is out in the world Here's a single-file voice agent powered by GPT-5. All you need is an OpenAI API key and Python. export OPENAI_API_KEY=sk_proj-. uv run gpt-5-voice-agent .py The first time you run this it will take about [--] seconds to install all the dependencies accept connections and begin processing audio and video. For voice AI use cases you probably want these parameter settings for GPT-5. service_tier: priority reasoning_effort: minimal verbosity: low Note that using the "priority" service tier doubles the cost per token. Having this option is great for latency sensitive"  
[X Link](https://x.com/kwindla/status/1953597224585494863)  2025-08-07T23:20Z 10.5K followers, [----] engagements


"Here's the code: The code above uses the standard GPT-5 model plus OpenAI's transcription and voice generation models. OpenAI also released a new version of the natively voice-to-voice Realtime model and API today. Congratulations to everyone who worked on all the new things that shipped today For more expansive starter kit code that shows how to use both the three-model approach and the Realtime API see this guide: If you're interested in a technical deep dive into voice AI and building production voice agents check out the Voice AI & Voice Agents Primer: https://voiceaiandvoiceagents.com/"  
[X Link](https://x.com/kwindla/status/1953597226271564259)  2025-08-07T23:20Z 10.5K followers, [----] engagements


"@Sanava_AI Why did you comment out verbosity in the code Regression in the openai library required pinning to a previous version (that didn't yet support the verbosity property). It's fixed now. https://github.com/openai/openai-python/issues/2525 https://github.com/openai/openai-python/issues/2525"  
[X Link](https://x.com/kwindla/status/1953825531818127515)  2025-08-08T14:28Z 10.5K followers, [--] engagements


"GPT-5 is multimodal in the sense that it has vision input. (A lot of people use the term that way. A bit of a different perspective from those of us who are obsessed with realtime voice) The GPT-5 launch live stream had a really nice voice demo. The new model is in ChatGPT now but not yet in the public API. https://www.youtube.com/live/0Uu_VJeVVfosi=nVzHuy4sT09R3J2P&t=1402 https://www.youtube.com/live/0Uu_VJeVVfosi=nVzHuy4sT09R3J2P&t=1402"  
[X Link](https://x.com/kwindla/status/1953860474388586670)  2025-08-08T16:46Z 10.5K followers, [--] engagements


"Quick PSA. Settings for minimizing GPT-5 latency (time to first token). "service_tier": "priority" "reasoning_effort": "minimal" "verbosity": "low". P50 TTFT with these settings is 750ms. With the defaults it's 3s. The default settings are the right starting point for most use cases. It's *good* that this model can think proactively. As @swyx says "It's a good model ser." For use cases where you care a lot about TTFT use the above settings. Posting this here because I've answered this question a bunch of times today in various DMs and channels"  
[X Link](https://x.com/kwindla/status/1953868672470331423)  2025-08-08T17:19Z 10.5K followers, 11K engagements


"@joshwhiton It's part of the open source voice-ui-kit which supports lots of different network transports. You can use it with Gemini Live API OpenAI Realtime API serverless WebRTC WebSockets and Daily or LiveKit WebRTC. More complete docs coming soon: https://github.com/pipecat-ai/voice-ui-kit/tree/main/examples/01-console https://github.com/pipecat-ai/voice-ui-kit/tree/main/examples/01-console"  
[X Link](https://x.com/kwindla/status/1953890686136447011)  2025-08-08T18:46Z 10.5K followers, [---] engagements


"Here's my advice about how to scale WebRTC after helping hundreds of teams get to production with voice agents. tldr: either run "serverless" WebRTC if you want to host the infrastructure yourself or use a global commercial cloud. Don't try to run your own WebRTC SFUs because building out a reliable performant WebRTC cloud is many engineer-years worth of effort: I also wrote down a bunch of general advice about building and scaling voice AI including network transports here: https://voiceaiandvoiceagents.com/#network-transport"  
[X Link](https://x.com/kwindla/status/1954974922637554132)  2025-08-11T18:35Z 10.8K followers, [--] engagements


"@ID_AA_Carmack The Moshi paper is fantastic. Bidirectional multimodal streaming tokens architecture. https://kyutai.org/Moshi.pdf https://kyutai.org/Moshi.pdf"  
[X Link](https://x.com/kwindla/status/1954978486088765655)  2025-08-11T18:49Z 10.8K followers, [----] engagements


"@gautham_vijay_ @pipecat_ai Thank you for the kind words And for contributing knowledge to the community I totally agree that understanding the WebRTC building blocks is super valuable for voice AI developers"  
[X Link](https://x.com/kwindla/status/1954980722063839356)  2025-08-11T18:58Z 10.5K followers, [--] engagements


"Teaser for something coming soon. Created using FLUX.1 dev on @FAL"  
[X Link](https://x.com/kwindla/status/1955119014105272795)  2025-08-12T04:07Z 10.6K followers, [----] engagements


"@joshwhiton Huh. That doesn't happen for me. I'll do a fresh clone of the repo and try it again later today. Anything interesting in the js console on the client side when you turn off the network"  
[X Link](https://x.com/kwindla/status/1956050618428395673)  2025-08-14T17:49Z 10.5K followers, [--] engagements


"At the @aiDotEngineer World's Fair in June @shresbm and I gave a talk about all the "magic" that goes into making great voice AI experiences. Magic in the sense of making hard things look easy. Magic in the sense of sufficiently advanced technology being indistinguishable from. Shrestha and her team train the LLMs and make the APIs we all use. I work mostly higher up at the orchestration and application levels. We thought it would be fun to show the push-pull tension between making use of the open-ended emergent capabilities of today's SOTA models while also writing scaffolding that makes"  
[X Link](https://x.com/kwindla/status/1956089235179757884)  2025-08-14T20:23Z 10.6K followers, [----] engagements


"Today I refactor the app I have been vibe coding for ten days. I do not expect to return. Give my kinesis foot switches to a junior developer capable of wielding the ancient and pure tools car cdr and cons"  
[X Link](https://x.com/kwindla/status/1957122078999752876)  2025-08-17T16:47Z 10.5K followers, [----] engagements


"Voice-only programming with the new OpenAI Realtime API . I spend a lot of time these days pair programming with LLMs. Often I'm talking rather than typing. This "voice dictation" use case has become an important vibe benchmark for me. Being able to create text input just by talking flexibly in a context dependent way with tool calling is a *hard* problem for today's models. Natural language dictation requires a very high degree of contextual intelligence instruction following accuracy and tool calling reliability. Today's new gpt-realtime model is quite good at this hard problem. The"  
[X Link](https://x.com/kwindla/status/1961130737878683996)  2025-08-28T18:16Z 10.7K followers, 50K engagements


"My goals for voice input are to: [--]. Be able to talk to my computer the same way I talk to another person. I don't want to have to dictate literal phrases. I want to stop and start go back and correct things I said before rely on previous context have my tools interpret what I mean to say rather than what I literally said and have the model fill in gaps and rewrite things for me on the fly. [--]. Do many of the things I can easily do with a keyboard and mouse. Send input to different windows. Perform sequences of actions. Copy and paste. Take screenshots. [--]. Have context and memory so I don't"  
[X Link](https://x.com/kwindla/status/1961130741313724485)  2025-08-28T18:16Z 10.7K followers, [----] engagements


"I got a bunch of questions about the cost of the Realtime API yesterday after posting this. tldr: OpenAI has followed their usual (and much appreciated) path of cutting the pricing of the Realtime API with every release. Cost is now about $0.04/minute of speech-to-speech time factoring in the implicit token caching. But note: you generally do not get charged for non-talking time because the OpenAI voice activity detection filters out non-speech input. So for a use case like voice programming you're probably only talking 5% of the time and dictation assistant output is extremely brief so it's"  
[X Link](https://x.com/kwindla/status/1961473869115765177)  2025-08-29T16:59Z 10.7K followers, 25K engagements


"I heard you like TUIs. Here's a TUI for the gpt-realtime voice dictation stuff I posted last week. Also a gpt-5 version. And a local @pipecat_ai transport for macOS with acoustic echo cancellation. I'm having a lot of fun blurring the boundaries between voice dictation and using an LLM as a full natural language assistant. I *definitely* do not have the prompting right yet to be able to talk completely free-form to the LLM and have it dictate the right text to the right window every time. Things work a lot of the time. You get used to a certain level of magic happening. And then a command"  
[X Link](https://x.com/kwindla/status/1962597878138053036)  2025-09-01T19:26Z 10.8K followers, [----] engagements


"This is a really nice piece on five iconic companies informed by Paul's own experience working at Twilio and now building @browserbasehq. Commonalities: be early (to a very big market) be relentless be lucky. The first one is really hard because you will be *too* early if you're early enough. (But lots of things about startups are hard.) The second one is largely within your control. The third one is probably not something you can control. A long time ago @collision said to me that a startup is very fortunate if its big incumbent competitors are asleep at the wheel. That's not enough by"  
[X Link](https://x.com/kwindla/status/1962934369309593629)  2025-09-02T17:43Z 10.8K followers, [----] engagements


"Voice AI people: join us Wednesday in San Francisco at the @covaldev office for a conversation about reliable speech generation for realtime AI"  
[X Link](https://x.com/kwindla/status/1963087990189916671)  2025-09-03T03:53Z 10.7K followers, [----] engagements


"If you'll be in San Francisco on September 25th come hang out with great people from Cloudflare various AI luminaries a robot hand and me. 🌉 SF we are throwing a watch party of the first two episodes https://t.co/MotYBHMUrj Come hang out with the stars of the show 🤩 @thorwebdev from @elevenlabsio @kwindla from Daily and @pipecat_ai @josephofiowa from @roboflow Yorick the Robot Hand 🍿See you there 🌉 SF we are throwing a watch party of the first two episodes https://t.co/MotYBHMUrj Come hang out with the stars of the show 🤩 @thorwebdev from @elevenlabsio @kwindla from Daily and @pipecat_ai"  
[X Link](https://x.com/kwindla/status/1964081601341759504)  2025-09-05T21:41Z 10.7K followers, [----] engagements


"@_ricburton Yes I did a bunch of kite surfing when I took a break between Oblong and Daily. Dawn patrol surf. Then a few hours of Haskell hacking. Then afternoon kiting. That was a good schedule. This was before foils though so I know theres a new learning curve to climb up"  
[X Link](https://x.com/kwindla/status/1964802047112622452)  2025-09-07T21:24Z 10.7K followers, [--] engagements


"Re talking past each other: Some people are mostly thinking about the "CPU" in karpathy's LLM operating system (loosely speaking) and some people are mostly thinking about tools and libraries (loosely speaking). in ai products quality dominates a lot esp for generally intelligent products everything else like privacy cost etc is just rounding error I agree with this and it pushes in the direction of both using the biggest best (probably proprietary models for the foreseeable future) *and* using small models everywhere all the time. Application responsiveness is an important part of overall"  
[X Link](https://x.com/kwindla/status/1965797429892317537)  2025-09-10T15:20Z 10.7K followers, [----] engagements


"Blog post with more details about the new v3 version of Smart Turn: Training and inference code on GitHub: Model weights and all data sets are on @huggingface: The Krisp Turn-Taking model integrated into their suite of voice AI models: The Ultravox context-aware endpointing (turn detection) model: https://www.ultravox.ai/blog/ultravad-is-now-open-source-introducing-the-first-context-aware-audio-native-endpointing-model https://krisp.ai/blog/turn-taking-for-voice-ai/ https://huggingface.co/pipecat-ai https://github.com/pipecat-ai/smart-turn/"  
[X Link](https://x.com/kwindla/status/1966354340379275701)  2025-09-12T04:13Z 10.8K followers, [---] engagements


"Tiny SOTA model release today: v3 of the Smart Turn semantic VAD model. Smart Turn is a native audio open source open data open training code model for detecting whether a human has stopped speaking and expects a voice agent to respond. The model now runs in 60ms on most cloud vCPUs faster than that on your local CPU and in 10ms on GPU. Running on CPU makes it essentially free to use this in a voice AI agent. [--] languages and you can contribute data or data labeling to add a language or improve the model performance in any of the existing language. This model is a community effort. We think"  
[X Link](https://x.com/kwindla/status/1966359269080707363)  2025-09-12T04:32Z 10.8K followers, [----] engagements


"Blog post with more details about the new v3 version: Training and inference code on GitHub: Model weights and all data sets are on @huggingface: The Krisp Turn-Taking model integrated into their suite of voice AI models: The Ultravox context-aware endpointing (turn detection) model: https://www.ultravox.ai/blog/ultravad-is-now-open-source-introducing-the-first-context-aware-audio-native-endpointing-model https://krisp.ai/blog/turn-taking-for-voice-ai/ https://huggingface.co/pipecat-ai https://github.com/pipecat-ai/smart-turn/"  
[X Link](https://x.com/kwindla/status/1966359271391817770)  2025-09-12T04:32Z 10.7K followers, [---] engagements


"There are benchmarks and notes about technical goals in all the blog posts I linked to and the GitHub Smart Turn repo. You can actually run the training code for the Smart Turn model yourself It's all in that repo. You can train the model locally if you have a GPU or run the training on @modal"  
[X Link](https://x.com/kwindla/status/1966517864896282743)  2025-09-12T15:02Z 10.8K followers, [--] engagements


"@tensordot_ Hindi Marathi and Bengali today. Language table is here: And you can contribute data to add or improve language support https://www.daily.co/blog/announcing-smart-turn-v3-with-cpu-inference-in-just-12ms/ https://www.daily.co/blog/announcing-smart-turn-v3-with-cpu-inference-in-just-12ms/"  
[X Link](https://x.com/kwindla/status/1966525487070822768)  2025-09-12T15:33Z 10.7K followers, [---] engagements


"Introducing the @aiDotEngineer World's Fair multi-turn conversation benchmark. I wrote a post for the @AITinkerers newsletter about: - why multi-turn is hard for LLMs - the particular challenges of audio/voice and - a benchmark I use to think about LLM performance for conversational agents"  
[X Link](https://x.com/kwindla/status/1966529328633843953)  2025-09-12T15:48Z 10.8K followers, [----] engagements


"@notnotrishi Thats just demo/training code I think. I was talking about this: https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/audio/turn/smart_turn/local_smart_turn_v3.py https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/audio/turn/smart_turn/local_smart_turn_v3.py"  
[X Link](https://x.com/kwindla/status/1967025451135836529)  2025-09-14T00:39Z 10.8K followers, [---] engagements


"Pipecat TV Episode [--] - Marcus joins the show to talk about the Smart Turn model v3 release (making CPUs go brrr). Plus some notes about the new universal LLMContext component which now supports the following LLM Service classes: - AWS Bedrock - Azure - Cerebras - Deepseek - Fireworks AI - Google AI Studio - Google Vertex AI - Grok - Groq - Mistral - NVIDIA NIM - Ollama - OpenAI - OpenPipe - OpenRouter - Perplexity - Qwen - SambaNova - http://Together.ai After the first episode of Pipecat TV its been hard to go unnoticed outside 😅. It went so well that we couldnt resist recording a second"  
[X Link](https://x.com/kwindla/status/1967677319054758398)  2025-09-15T19:50Z 10.8K followers, [----] engagements


"Last time I looked at the Alexa SDKs the environment was still a little too constrained to run full Pipecat agents with Alexa. That might have changed. Would be a cool thing to be able to do We support a lot of hardware environments ranging from ESP32 to Raspberry Pi to NVIDIA Jetson https://x.com/thorwebdev/status/1945158921179570557 https://github.com/pipecat-ai/pipecat-esp32 https://x.com/thorwebdev/status/1945158921179570557 https://github.com/pipecat-ai/pipecat-esp32"  
[X Link](https://x.com/kwindla/status/1967677862191960155)  2025-09-15T19:52Z 10.8K followers, [--] engagements


"Squobert runs on a Raspberry Pi: An even smaller/cheaper embedded hardware platform and one that's very widely used is the ESP32 family of microcontrollers: @_pion posted that he'll be in San Francisco at a hardware AI meetup on 11/1: https://luma.com/physical-ai-meetup https://github.com/pipecat-ai/pipecat-esp32 https://github.com/chadbailey59/squobert/ https://github.com/pipecat-ai/pipecat-esp32 https://github.com/chadbailey59/squobert/ https://luma.com/physical-ai-meetup https://github.com/pipecat-ai/pipecat-esp32 https://github.com/chadbailey59/squobert/"  
[X Link](https://x.com/kwindla/status/1968780302618337403)  2025-09-18T20:52Z 10.8K followers, [---] engagements


"@pierre_s @aiDotEngineer @_pion @chadbailey59 Yes but WebSockets are totally fine for server-to-server connections hosted in data centers with good peering So no worries. Read more here for a deeper dive: https://voiceaiandvoiceagents.com/ https://voiceaiandvoiceagents.com/"  
[X Link](https://x.com/kwindla/status/1968906476354760928)  2025-09-19T05:14Z 10.8K followers, [--] engagements


".@craigsdennis introducing AI Avenue Episode [--] at haus lava lamp (@Cloudflare HQ)"  
[X Link](https://x.com/kwindla/status/1971390925395067183)  2025-09-26T01:46Z 10.8K followers, [---] engagements


"Sun surf and beach pianos are hard to beat. Damn @rajivayyangar and @kwindla making a pretty strong case for moving to SF 🌁🌊🤩 https://t.co/HkIrYcI3Ci Damn @rajivayyangar and @kwindla making a pretty strong case for moving to SF 🌁🌊🤩 https://t.co/HkIrYcI3Ci"  
[X Link](https://x.com/kwindla/status/1971697154998886478)  2025-09-26T22:03Z 10.8K followers, [----] engagements


"Join us for a hackathon at @ycombinator on October 11th. Gemini x Pipecat realtime AI fun and games Build an application using Gemini and Pipecat. See some new APIs. Show off interesting things you're doing in your startup or side project. Hang out with engineers from Google DeepMind and Google Cloud the AI Tinkerers community and YC companies Daily Boundary Coval Langfuse and Tavus. Eat Outta Sight pizza. Limited space . apply below"  
[X Link](https://x.com/kwindla/status/1972053907439595998)  2025-09-27T21:41Z 10.8K followers, 24K engagements


"@anarchyco Would love to take a sun surf and piano tour of Spain"  
[X Link](https://x.com/kwindla/status/1972746052978831784)  2025-09-29T19:31Z 10.8K followers, [--] engagements


"A new transcription model from @DeepgramAI launched today: Flux. Flux is completely free for all of October and is integrated into Pipecat and Pipecat Cloud. This model shows where speech recognition is headed as speech models evolve to enable more and more voice agent use cases. Deepgram has always been the market leader in very low latency transcription. (Which is critical for conversational voice) My "magic number" here is 300ms. I want the finalized transcript to be delivered no more than 300ms after the user stops speaking. One reason that 300ms is a good baseline number is that the open"  
[X Link](https://x.com/kwindla/status/1973902976822751652)  2025-10-03T00:08Z 10.8K followers, [----] engagements


"Deepgram's blog post: Pipecat Deepgram and Flux docs: Using Flux (for free) in Pipecat cloud no API key needed: https://docs.pipecat.ai/deployment/pipecat-cloud/guides/managed-api-keys#managed-api-keys https://docs.pipecat.ai/server/services/stt/deepgram https://deepgram.com/flux https://docs.pipecat.ai/deployment/pipecat-cloud/guides/managed-api-keys#managed-api-keys https://docs.pipecat.ai/server/services/stt/deepgram https://deepgram.com/flux"  
[X Link](https://x.com/kwindla/status/1973902978454335616)  2025-10-03T00:08Z 10.8K followers, [---] engagements


"I will say that "pre-filling" a system instruction using grep-ish tools is surprisingly not good in the things I've built for myself. But totally willing to believe this is a skill issue. Things that have worked better for me: - Pre-filling deterministically based on something like a CRM lookup - Dynamically updating using filesystem tools but with a fair amount of work put into the harness here. Naive approaches don't usually work as well as I hope they will. - Pre-filling or dynamic lookup using vector search. Again naive approaches are frustratingly not good. Chunking etc require some"  
[X Link](https://x.com/kwindla/status/1974553917058146538)  2025-10-04T19:15Z 10.8K followers, [--] engagements


"As someone who does a lot of not just vibe coding using voice input I'm 100% convinced that much (maybe most) programming will be done by voice. I still type a lot too. But once you get used to voice mode and change your workflow to be effective with voice it feels scratchy going back to typing. A few notes. - My setup is often 1) a clean-up and processing voice loop 2) with a fair amount of LLM prompting and iteration before I send the text into the coding environment. As you see in the video @patloeber reposted below a little bit of cleanup is helpful. I would argue that *a lot* of cleanup"  
[X Link](https://x.com/kwindla/status/1975990338432139726)  2025-10-08T18:23Z 10.8K followers, [----] engagements


"Source code: Hackathon sign-up link: Credit to @mark_backman for the code the really nice README and the demo video. https://events.ycombinator.com/pipecat-gemini-yc25 https://github.com/daily-co/pcc-gemini-twilio https://events.ycombinator.com/pipecat-gemini-yc25 https://github.com/daily-co/pcc-gemini-twilio"  
[X Link](https://x.com/kwindla/status/1975995394615333259)  2025-10-08T18:43Z 10.9K followers, [---] engagements


"Starter kit for a web app that can see your screen and talk to you. - @GoogleDeepMind Gemini Live API for voice conversation and vision. - React front end built with Next.js and voice-ui-kit. - Deploy to Pipecat Cloud or anywhere you can run @pipecat_ai code. Come build multimodal realtime AI stuff with us this Saturday at YC in San Francisco. We're doing an all-day Gemini x Pipecat hackathon"  
[X Link](https://x.com/kwindla/status/1976408031417287166)  2025-10-09T22:02Z 10.9K followers, 10.7K engagements


"Github repo: Hackathon application: Open source voice-ui-kit docs: https://github.com/pipecat-ai/voice-ui-kit/ https://events.ycombinator.com/pipecat-gemini-yc25 https://github.com/daily-co/pcc-gemini-screen-voice-ui-kit https://github.com/pipecat-ai/voice-ui-kit/ https://events.ycombinator.com/pipecat-gemini-yc25 https://github.com/daily-co/pcc-gemini-screen-voice-ui-kit"  
[X Link](https://x.com/kwindla/status/1976408032889458804)  2025-10-09T22:02Z 10.8K followers, [---] engagements


"Filipi asks Gemini how to build a WhatsApp voice agent with Gemini in a WhatsApp voice call powered by Gemini. Facebook recently launched the WhatsApp Business Calling API which integrates voice calls into WhatsApp's chat threads interface. WhatsApp users can initiate voice calls to verified business accounts. Businesses can get customers' permission to start a voice call. Businesses can answer the calls inside WhatsApp route the calls to a call center or . answer the calls with a @pipecat_ai voice agent. See below for a guide to WhatsApp voice AI and a github repo with code you can use as a"  
[X Link](https://x.com/kwindla/status/1976670435912892853)  2025-10-10T15:25Z 10.9K followers, [---] engagements


"Pipecat WhatsApp voice agent guide: Starter kit: Application for Saturday hackathon at YC: https://events.ycombinator.com/pipecat-gemini-yc25 https://github.com/daily-co/pcc-gemini-whatsapp/ https://docs.pipecat.ai/guides/features/whatsapp https://events.ycombinator.com/pipecat-gemini-yc25 https://github.com/daily-co/pcc-gemini-whatsapp/ https://docs.pipecat.ai/guides/features/whatsapp"  
[X Link](https://x.com/kwindla/status/1976670437364170753)  2025-10-10T15:25Z 10.9K followers, [---] engagements


"Bob the humanoid robot"  
[X Link](https://x.com/kwindla/status/1978312512933445745)  2025-10-15T04:10Z 10.9K followers, [---] engagements


"Arthur MCP. An MCP server that integrated tools from all the sponsoring companies. https://x.com/moeintechoff/status/1977494768629522686 It was an eye-opening experience at @ycombinator yesterday exploring the world of Voice/Video AI. Hosted by @GoogleDeepMind x @pipecat_ai (@trydailys framework for voice and multi-modal conversational AI) I had the opportunity to see how companies today are shaping the future https://t.co/lFYBzvsSZ3 https://x.com/moeintechoff/status/1977494768629522686 It was an eye-opening experience at @ycombinator yesterday exploring the world of Voice/Video AI. Hosted by"  
[X Link](https://x.com/kwindla/status/1978312514392957279)  2025-10-15T04:10Z 10.9K followers, [---] engagements


"Hey Kids Get Off My Lawn: The Once and Future Visual Programming Environment http://t.co/tduxASNG http://techcrunch.com/2012/05/27/hey-kids-get-off-my-lawn-the-once-and-future-visual-programming-environment/ http://techcrunch.com/2012/05/27/hey-kids-get-off-my-lawn-the-once-and-future-visual-programming-environment/"  
[X Link](https://x.com/kwindla/status/206805370746839040)  2012-05-27T17:53Z 10.7K followers, [--] engagements


"Foundry beer and now Foundry coffee @jasonmendelson @bfeld @ryan_mcintyre and @sether you guys are everywhere http://t.co/mcZi5KI5"  
[X Link](https://x.com/kwindla/status/246045813803991040)  2012-09-13T00:41Z [----] followers, [--] engagements


"@nelson Just had an emergency no-flaps landing at washington dulles. I've never flown from DCA to IAD before. Fast touch-down"  
[X Link](https://x.com/kwindla/status/335187488979185664)  2013-05-17T00:18Z 10.7K followers, [--] engagements


"@nelson mmm questions: salt/vinegar/%; did you blanche"  
[X Link](https://x.com/kwindla/status/467405404784177152)  2014-05-16T20:45Z 10.7K followers, [--] engagements


"@nelson Peach crisp = (pie - crust) * oatmeal"  
[X Link](https://x.com/kwindla/status/498895897837584385)  2014-08-11T18:17Z 10.7K followers, [--] engagements


"@nelson a "friend's" machine "  
[X Link](https://x.com/kwindla/status/558096175476523009)  2015-01-22T02:57Z 10.7K followers, [--] engagements


"Euell Gibbons. Stalking the Blue-Eyed Scallop. Van Nees [----]. http://t.co/Kpz68NGL3n"  
[X Link](https://x.com/kwindla/status/592741622129041409)  2015-04-27T17:26Z [----] followers, [--] engagements


"@nelson Genuinely curious . is it not worth the $$ to subscribe Does The Economist max out your neoliberal-old-timey-news budget 😂"  
[X Link](https://x.com/kwindla/status/596058765075550208)  2015-05-06T21:07Z 10.7K followers, [--] engagements


"Tomasso Toffoli and Norman Margolus. Cellular Automata Machines. MIT [----]. http://t.co/eU35FbiWIl"  
[X Link](https://x.com/kwindla/status/611967740115488768)  2015-06-19T18:44Z [----] followers, [--] engagements


"@nelson That's lovely She mentions that she started with a vector watershed map she found online. Your work"  
[X Link](https://x.com/kwindla/status/644291983058427904)  2015-09-16T23:29Z 10.7K followers, [--] engagements


"C'mon @nytimes this is ridiculous. You can do better. https://mobile.nytimes.com/2017/08/18/opinion/joseph-conrad-congo-river.htmlreferer=https://t.co/kPO4m540LWamp=1 https://mobile.nytimes.com/2017/08/18/opinion/joseph-conrad-congo-river.htmlreferer=https://t.co/kPO4m540LWamp=1"  
[X Link](https://x.com/kwindla/status/899093915864715264)  2017-08-20T02:21Z 10.7K followers, [--] engagements

Limited data mode. Full metrics available with subscription: lunarcrush.com/pricing

@kwindla kwindla

kwindla posts on X about realtime, open ai, llm, inference the most. They currently have [------] followers and [---] posts still getting attention that total [-----] engagements in the last [--] hours.

Engagements: [-----] #

[--] Week [------] +490%
[--] Month [------] -76%
[--] Months [-------] -56%
[--] Year [---------] +40%

Mentions: [--] #

[--] Months [--] -50%
[--] Year [---] +51%

Followers: [------] #

[--] Week [------] +1.30%
[--] Month [------] +2.80%
[--] Months [------] +16%
[--] Year [------] +77%

CreatorRank: [-------] #

Social Influence

Social category influence technology brands 21.01% stocks 10.14% social networks 1.45% travel destinations 0.72% countries 0.72% cryptocurrencies 0.72% finance 0.72%

Social topic influence realtime #572, open ai 7.97%, llm #515, inference #239, ai 4.35%, claude code #401, voice #3214, cloudflare 3.62%, mcp server 2.9%, hosted 2.9%

Top accounts mentioned or mentioned by @pipecatai @nelson @huggingface @pion @trydaily @fal @cloudflare @aidotengineer @elevenlabsio @awscloud @googledeepmind @tmztmobile @bnicholehopkins @picanteverde @simonw @modal @rimelabs @chadbailey59 @riteshchopra @iamhenrymascot

Top assets mentioned Cloudflare, Inc. (NET) Alphabet Inc Class A (GOOGL) Robot Consulting Co., Ltd. (LAWR) sETH (SETH) Scallop (SCA)

Top Social Posts

Top posts by engagements in the last [--] hours

"Benchmarking LLMs for voice agent use cases. New open source repo along with a deep dive into how we think about measuring LLM performance. The headline results: - The newest SOTA models are all really good but too slow for production voice agents. GPT-4.1 and Gemini [---] Flash are still the most widely used models in production. The benchmark shows why. - Ultravox [---] shows that it's possible to close the "intelligence gap" between speech-to-speech models and text-mode LLMs. This is a big deal - Open weights models are climbing up the capability curve. Nemotron [--] Nano is almost as capable"
X Link 2026-02-02T21:42Z 12.2K followers, [----] engagements

"The NVIDIA DGX Spark is a desktop GPU workstation with 128GB of unified memory. Working with the team at @NVIDIAAIDev we've been using these little powerhouse machines for voice agent development testing new models and inference stacks and training LLMs and audio models. Today we published a guide to training the Smart Turn model on the DGX Spark. Smart Turn is a fully open source (and open training data) native audio turn detection model that supports [--] languages. The guide walks you through installing the right dependencies for this new Arm + Blackwell architecture and includes benchmarks"
X Link 2026-02-10T22:39Z 12.2K followers, [----] engagements

"If you have skills that are useful for voice agent development contribute to the repo https://github.com/pipecat-ai/skills https://github.com/pipecat-ai/skills"
X Link 2026-02-12T18:14Z 12.2K followers, [---] engagements

"The Claude Code / Ralph Wiggum moment is exciting for a lot of reasons. One of them is that all of us building AI systems that are just a little bit beyond the capabilities of just prompting a SOTA model now have a shared set of baseline ideas we're building on. Plus an overlapping set of open questions - An agent is an LLM in a loop. (Plus a bunch of tooling integration and domain-specific optimization.) - Context management is a critical job. (Lots of ways to think about this.) - You almost certainly need multiple agents/models/processors/loops/whatever. (Lots of ways to think about this"
X Link 2026-01-24T22:12Z 12.2K followers, [----] engagements

"Voice-only programming with Claude Code . I've been playing with @aconchillo's MCP server that lets you talk to Claude Code from anywhere today. I always have multiple Claudes running and I often want to check in on them when I'm not in front of a computer. Here's a video of Claude doing some front-end web testing hitting an issue and getting input from me and then reporting that the test passed. In the video the Pipecat bot is using Deepgram for transcription and Cartesia for the voice. (Note: I sped up the web testing clickety-click sections of the video.) The code for the MCP server and"
X Link 2026-01-27T01:14Z 12.2K followers, [----] engagements

"Pipecat MCP Server: This is infinitely customizable. Getting started with Pipecat: https://docs.pipecat.ai/getting-started/quickstart https://github.com/pipecat-ai/pipecat-mcp-server https://docs.pipecat.ai/getting-started/quickstart https://github.com/pipecat-ai/pipecat-mcp-server"
X Link 2026-01-27T01:14Z 12.2K followers, [---] engagements

"Async automatic non-blocking context compaction for long-running agents. Last week I gave a talk called Space Machine Sandboxes at the @daytonaio AI builders meetup about patterns for long-running agents. I work a lot on voice AI agents which are fundamentally multi-turn long-context loops. I also build lots of other AI agent stuff often as part of bigger systems that include voice. One of the patterns I showed in the talk is non-blocking compaction. Here's a short clip. https://twitter.com/i/web/status/2016288112629187054 https://twitter.com/i/web/status/2016288112629187054"
X Link 2026-01-27T23:12Z 12.2K followers, 26.5K engagements

"@andxdy @terronk @rootvc Nice Related: (We are a Root Ventures company.) https://github.com/pipecat-ai/pipecat-mcp-server https://github.com/pipecat-ai/pipecat-mcp-server"
X Link 2026-02-05T21:47Z 12.1K followers, [--] engagements

"Blog post link: The Smart Turn open source turn detection model: https://github.com/pipecat-ai/smart-turn https://www.daily.co/blog/training-smart-turn-on-the-nvidia-dgx-spark/ https://github.com/pipecat-ai/smart-turn https://www.daily.co/blog/training-smart-turn-on-the-nvidia-dgx-spark/"
X Link 2026-02-10T22:39Z 12.2K followers, [---] engagements

"New repo: Pipecat Skills for Claude Code So far: - Create and configure a basic voice agent (running locally using any combination of models and services) - Deploy to Pipecat Cloud for production - Start the Pipecat MCP Server to talk to Claude Code via voice (including remotely from your phone) I'm working on an end-to-end testing skill. https://twitter.com/i/web/status/2022011497996816826 https://twitter.com/i/web/status/2022011497996816826"
X Link 2026-02-12T18:14Z 12.2K followers, [---] engagements

"Why do you not call the UI a sub agent if you are not speaking to it directly In this pattern I am speaking to the UI agent directly. It sees the speech input. But it doesn't respond conversationally. It performs specialized tasks related to the UI. I don't think of it as a sub-agent because it isn't controlled by the voice model. I think of it as a parallel agent or a "parallel inference loop." The reason not to have the voice agent control the UI sub-agent is that I think it's hard to implement that without adding latency. I do use sub-agent patterns for other things where the control is"
X Link 2026-02-13T21:15Z 12.2K followers, [--] engagements

"Detailed technical post about this voice agents STT benchmark: Benchmark source code: Benchmark data set on @huggingface . [----] human speech samples captured from real voice agent interactions with verified ground truth transcriptions: https://huggingface.co/datasets/pipecat-ai/stt-benchmark-data https://github.com/pipecat-ai/stt-benchmark https://www.daily.co/blog/benchmarking-stt-for-voice-agents/ https://huggingface.co/datasets/pipecat-ai/stt-benchmark-data https://github.com/pipecat-ai/stt-benchmark https://www.daily.co/blog/benchmarking-stt-for-voice-agents/"
X Link 2026-02-13T21:44Z 12.2K followers, [----] engagements

"@huggingface We also published a benchmark of LLM performance in real-world voice agent use cases recently (long multi-turn conversations with multiple tool calls and accurate instruction following required). https://x.com/kwindla/status/2019120857923375586s=20 Open source voice agent LLM benchmark: https://t.co/CbLijComqQ Technical deep dive into voice agent benchmarking: https://t.co/gHTNRSjwdI https://x.com/kwindla/status/2019120857923375586s=20 Open source voice agent LLM benchmark: https://t.co/CbLijComqQ Technical deep dive into voice agent benchmarking: https://t.co/gHTNRSjwdI"
X Link 2026-02-13T21:51Z 12.2K followers, [----] engagements

"Final transcript what about time until transcription starts streaming In general what we care about is the time from end of speech until the final transcript segment is available. We need the full transcript in order to run LLM inference. I've experimented a fair amount with greedy LLM inference on partial transcript segments and there are not enough gains to make up for the extra work. So "time to first token" from a transcription model isn't a useful metric. This is different from how we measure latency for LLMs and TTS models where we definitely focus on TTFT/TTFB"
X Link 2026-02-13T23:08Z 12.2K followers, [---] engagements

"These are voice agents. Pipecat supports Gemini Live (and Ultravox and OpenAI Realtime). But almost all production voice agents today use multiple models (STT - LLM - TTS) instead of a single speech-to-speech model. You get better latency intelligence and observability from a multi-model approach. I fully expect speech-to-speech models to have more market share over time. But right now SOTA is the multi-model pipeline. https://twitter.com/i/web/status/2022449946881069165 https://twitter.com/i/web/status/2022449946881069165"
X Link 2026-02-13T23:16Z 12.2K followers, [--] engagements

"Our goal was to set up the test the same way real-world input pipelines most often work. [--]. Audio chunks are sent to the STT service at real-time pacing. [--]. Silero VAD is configured to trigger after 200ms of non-speech frames. [--]. When the VAD triggers the STT service is sent a finalize signal. (Not all services support explicit finalization. But we think it's an important feature for real-time STT.) [--]. TTFS is the time between the first non-speech audio frame and the last transcription segment. If you use a service that sends you VAD or end-of-turn events it will function much the same way as"
X Link 2026-02-14T07:00Z 12.2K followers, [--] engagements

".@tavus just published a nice blog post about their "real-time conversation flow and floor transfer" model Sparrow-1. This model does turn detection predicting when it's the Tavus video agent's turn to speak. It does this by analyzing conversation audio in a continuous stream and learning and adapting to user behavior. This model is an impressive achievement. I've had a few opportunities to talk to @code_brian who led the R&D on this model at Tavus about his work. I love Brian's approach to this problem. Among other things the Sparrow-1 architecture allows this model to do things like handle"
X Link 2026-01-14T03:55Z 12.2K followers, [----] engagements

"You are the average of the tokens you spend the most time with. belated 'aha' moment: Context engineering is as impt to inference as Data engineering is important to training belated 'aha' moment: Context engineering is as impt to inference as Data engineering is important to training"
X Link 2026-02-03T06:26Z 12.2K followers, [----] engagements

"I sat down with @zachk and @bnicholehopkins to talk about how we benchmark models for voice AI. Benchmarks are hard to do well and good ones are really useful We covered what makes an LLM actually "intelligent" in a real-world voice conversation the latency vs intelligence trade-off how speech-to-speech models compare to text-mode LLMs infrastructure and full stack challenges and what we're all most focused on in [----]. https://twitter.com/i/web/status/2019120855570366548 https://twitter.com/i/web/status/2019120855570366548"
X Link 2026-02-04T18:48Z 12.2K followers, [----] engagements

"Open source voice agent LLM benchmark: Technical deep dive into voice agent benchmarking: https://www.daily.co/blog/benchmarking-llms-for-voice-agent-use-cases/ https://github.com/kwindla/aiewf-eval https://www.daily.co/blog/benchmarking-llms-for-voice-agent-use-cases/ https://github.com/kwindla/aiewf-eval"
X Link 2026-02-04T18:48Z 12.2K followers, [----] engagements

"Voice-controlled UI. This is an agent design pattern I'm calling EPIC "explicit prompting for implicit coordination." Feel free to suggest a better name. :-) In the video I'm navigating around a map conversationally pulling in information dynamically from tool calls and realtime streamed events. There are two separate agents (inference loops) here: a voice agent and a UI control agent. They know about each other (at the prompt level) but they work independently. https://twitter.com/i/web/status/2022087764720988296 https://twitter.com/i/web/status/2022087764720988296"
X Link 2026-02-12T23:17Z 12.2K followers, 14.2K engagements

"The critical things here are: - We can't block the voice agent's fast responses. - The voice agent already has a lot of instructions in its context and a large number of tools to call so we don't want to give it more to do each inference turn. So we prompt the voice agent to know at a high level what the UI agent will do but to ignore or respond minimally to UI-related requests. This adds relatively little complexity to the voice agent system instruction. We prompt the UI agent with a small subset of world knowledge a few tools and a lot of examples about how to perform useful UI actions in"
X Link 2026-02-12T23:17Z 12.2K followers, [----] engagements

"I'm really looking forward to @NVIDIAGTC in March. Last year was amazing. (And I came home with a new 5090) I've been working on building multi-agent local/cloud hybrid applications on my NVIDIA DGX Spark. Here's a video of an LLM-powered game running on the Spark in which you fly around by talking to your AI space ship. The conversational voice agent is a @pipecat_ai pipeline built with: - Nemotron Speech ASR - Nemotron [--] Nano - Magpie TTS The Nemotron [--] Nano voice agent delegates the long-running agent-loop tasks to bigger models in the cloud. You can see it start tasks in the video. It has"
X Link 2026-02-15T07:17Z 12.2K followers, [----] engagements

"I did a talk a couple of weeks ago about the agent patterns in the game and how they're similar to patterns we use in coding agent harnesses and in voice agents for enterprise applications. Space Machine Sandboxes: https://www.youtube.com/watchv=HnYafj9h-48 https://www.youtube.com/watchv=HnYafj9h-48"
X Link 2026-02-15T07:17Z 12.2K followers, [---] engagements

"@picanteverde @simonw I love the Sesame work but there's no API and the consumer app is still Test Flight only as far as I know. Th version that was released as open source is not a fully capable model"
X Link 2026-02-16T04:36Z 12.2K followers, [---] engagements

"These days Sergio Sillero Head of the Cloud Data & AI at MAPFRE is programming via voice while he shops for groceries. If you're deep in the Claude Code / Ralph Wiggum / tmux world this is not super surprising to you. If you're not it sounds like crazy ridiculous hype. Sergio wrote some voice interface code for his Meta Ray-Bans using the @pipecat_ai MCP server that lets him keep working on a project in @AnthropicAI's Claude Code when he steps away from his desk. https://twitter.com/i/web/status/2023264920968757521 https://twitter.com/i/web/status/2023264920968757521"
X Link 2026-02-16T05:15Z 12.2K followers, [---] engagements

"@_dr5w @simonw True. But for most of these there are only [--] providers. OpenAI/Azure or DeepMind/Vertex. 😀"
X Link 2026-02-16T05:36Z 12.2K followers, [---] engagements

"NVIDIA just released a new open source transcription model Nemotron Speech ASR designed from the ground up for low-latency use cases like voice agents. Here's a voice agent built with this new model. 24ms transcription finalization and total voice-to-voice inference time under 500ms. This agent actually uses three NVIDIA open source models: - Nemotron Speech ASR - Nemotron [--] Nano 30GB in a 4-bit quant (released in December) - A preview checkpoint of the upcoming Magpie text-to-speech model These models are all truly open source: weights training data training code and inference code. This"
X Link 2026-01-06T18:09Z 12.2K followers, 279.6K engagements

"New text-to-speech model from @rimelabs today: Arcana v3. Rime's models excel at customization and personality. The new model is fast available in [--] languages and you can use it as a cloud API or run it on-prem. The model also outputs word-level timestamps which is very important for maintaining accurate LLM context during a voice agent conversation. Listen to Arcana v3 in this video. @chadbailey59 uses the open source Pipecat CLI to set up a voice agent from scratch customize the prompt and talk to it"
X Link 2026-02-05T00:01Z 12.2K followers, [----] engagements

"Arcana v3 launch post: Pipecat CLI: https://docs.pipecat.ai/cli/overview https://rime.ai/resources/arcana-v3 https://docs.pipecat.ai/cli/overview https://rime.ai/resources/arcana-v3"
X Link 2026-02-05T00:01Z 12.2K followers, [---] engagements

"@riteshchopra Yes definitely Progressive "skills" loading inside a Pipecat pipeline is something we're doing fairly often these days. For a version of this in a fun voice agent context see the LoadGameInfo tool here: https://github.com/pipecat-ai/gradient-bang/blob/0a5c19f1459b2ddbbb4bc6182944239c63e4c702/src/gradientbang/pipecat_server/voice_task_manager.py#L26 https://github.com/pipecat-ai/gradient-bang/blob/0a5c19f1459b2ddbbb4bc6182944239c63e4c702/src/gradientbang/pipecat_server/voice_task_manager.py#L26"
X Link 2026-02-12T19:17Z 12.2K followers, [--] engagements

"Wake up babe. New Pareto frontier chart just dropped. Benchmarking STT for voice agents: we just published one of the internal benchmarks we use to measure latency and real-world performance of transcription models. - Median P95 and P99 "time to final transcript" numbers for hosted STT APIs. - A standardized "Semantic Word Error Rate" metric that measures transcription accuracy in the context of a voice agent pipeline. - We worked with all the model providers to optimize the configurations and @pipecat_ai implementations so that the benchmark is as fair and representative as we can possibly"
X Link 2026-02-13T21:44Z 12.2K followers, 11.2K engagements

"@iAmHenryMascot It's a new game we're building: https://github.com/pipecat-ai/gradient-bang https://github.com/pipecat-ai/gradient-bang"
X Link 2026-02-14T18:58Z 12.2K followers, [--] engagements

"Spending valentine's day exactly as you'd expect. (Arguing politely on LinkedIn about how to accurately measure latency and word error rates.) Wake up babe. New Pareto frontier chart just dropped. Benchmarking STT for voice agents: we just published one of the internal benchmarks we use to measure latency and real-world performance of transcription models. - Median P95 and P99 "time to final transcript" numbers https://t.co/y9qCrJLe0L Wake up babe. New Pareto frontier chart just dropped. Benchmarking STT for voice agents: we just published one of the internal benchmarks we use to measure"
X Link 2026-02-14T19:13Z 12.2K followers, [----] engagements

"@MAnfilofyev Super-impressive work from the @ultravox_dot_ai team on v0.7"
X Link 2026-02-16T04:13Z 12.2K followers, [---] engagements

"I do think Gemini Live has a lot of potential. It's currently too slow (2.5s voice-to-voice P50) and the API is missing important features for real-world voice workflows. You can't do context engineering mid-conversation. If you really need a speech-to-speech model for production use you're better off right now with gpt-realtime. But I expect the Gemini Live team to make progress this year https://twitter.com/i/web/status/2023261706970124674 https://twitter.com/i/web/status/2023261706970124674"
X Link 2026-02-16T05:02Z 12.2K followers, [---] engagements

"I don't think Sergio is here so you have to go follow him on the other thing: He's planning to demo his Ray-Bans + Claude Code integration at the February 25th Voice AI Meetup in Barcelona: https://www.voiceaispace.com/events/voice-ai-space-barcelona-meetup https://www.linkedin.com/feed/update/urn:li:activity:7428880495075078144/ https://www.voiceaispace.com/events/voice-ai-space-barcelona-meetup https://www.linkedin.com/feed/update/urn:li:activity:7428880495075078144/"
X Link 2026-02-16T05:15Z 12.2K followers, [---] engagements

"My thinking about this has evolved a lot now that we have real-world data from millions of interactions with voice agents. I used to aim for 500-800ms voice-to-voice latency. It turns out that people are totally fine in real conversations until latency gets above 1500ms. So now I talk about 1500ms as the "hard" cutoff that you need your P95 to be under. Note this is voice-to-voice measured on the client side so that you include networks audio buffers OS and bluetooth playout delays etc. https://twitter.com/i/web/status/2023447213725413853 https://twitter.com/i/web/status/2023447213725413853"
X Link 2026-02-16T17:19Z 12.2K followers, [--] engagements

"thats just called proactive interruptions I interpreted what @rajivayyangar was proposing in a different way: use predictive inference at the turn detection step to claw back delays introduced by other parts of the pipeline. So not aiming to actually interrupt but to reduce the voice-to-voice latency a little bit"
X Link 2025-07-15T05:17Z [----] followers, [--] engagements

"@taishik_ @thorwebdev @elevenlabsio @pipecat_ai @posthog Hardware hacking is so much fun Big props to @_pion and the libpeer author sepfy for laying the open source foundation for the pipecat-esp32 webrtc project"
X Link 2025-07-15T17:36Z [----] followers, [--] engagements

"Hot takes (from me) and reasoned discussion (from Sam) on: The current best practices for building production enterprise scale voice agents. What models to use how to think about infrastructure and what the solved and unsolved problems in voice AI are right now. Why I spend most of my time now working on an an open source vendor neutral community driven codebase even though there's always an (additional) infinite amount of work to do running a company. Building developer tools in this new AI era. The importance of moving towards voice and multimodal realtime standards. Doing inference on the"
X Link 2025-07-16T03:36Z [----] followers, [----] engagements

"Smart Turn v2: open source native audio turn detection in [--] languages. New checkpoint of the open source open data open training code semantic VAD model on @huggingface @FAL and @pipecat_ai. - 3x faster inference (12ms on an L40) - [--] languages (13 more than v1 which was english-only) - New synthetic data set chirp_3_all with 163k audio samples - 99% accuracy on held out human_5_all test data Good turn detection is critical for voice agents. This model "understands" both semantic and audio patterns and mitigates the voice AI trade-off between unwanted turn latency vs the agent interrupting"
X Link 2025-07-18T17:55Z 10K followers, 42K engagements

"This model is designed to be used together with a traditional VAD model for voice AI conversations. The voice AI pipeline typically looks like this: [--]. A very short VAD timeout chunks the audio stream for the smart-turn model [--]. Transcription runs in parallel [--]. Transcription output is gated on turn detection before going to the rest of the pipeline The basic idea here is that you want turn detection to happen faster than transcription. It doesn't really matter how much faster because you need "final" transcription fragments before you can run LLM inference. We're generally aiming for 400ms"
X Link 2025-07-18T17:55Z [----] followers, [----] engagements

"Blog post with code examples and docs pointers: - Repo with training code and development notes: - Weights: - Data sets: - Use the model for free in Pipecat Cloud + @FAL: - Here's a demo app hosted on @ vercel and Pipecat Cloud: - - https://github.com/pipecat-ai/pipecat/tree/main/examples/fal-smart-turn https://pcc-smart-turn.vercel.app/ https://docs.pipecat.daily.co/pipecat-in-production/smart-turn https://huggingface.co/pipecat-ai https://huggingface.co/pipecat-ai/smart-turn-v2 https://github.com/pipecat-ai/smart-turn"
X Link 2025-07-18T17:55Z [----] followers, [----] engagements

"And there's still some low-hanging stuff in the FastAPI request path that could cut another 100ms I think. Actual inference on an L40 is 12ms. Marcus who did the heavy lifting on this version but isn't on X was focused on getting inference time down without sacrificing accuracy. I think the P50 could easily be under 300ms including the 200ms VAD timeout and the network request to do smart-turn inference"
X Link 2025-07-18T19:00Z [----] followers, [---] engagements

"Lots of prior art using the Wav2Vec2 family of models for classification tasks. This task feels more like a classification task than a generative task. Though you can definitely get to a good model for turn detection either way I'm sure. We did a bunch of experiments with different architectures and base models. Wav2Vec2 had a good combination of useful pre-training to build on flexible architecture that can accommodate a lot of different potential classification heads and other modifications and the potential for very fast inference. I actually think an LSTM approach here makes a lot of"
X Link 2025-07-19T05:25Z [----] followers, [---] engagements

"Oh cool. I can definitely imagine Whisper performing better as a base for some kinds of audio classification tasks. Maybe especially the larger model sizes. A primary requirement for this task is very fast inference time. Im not super optimistic that it would be easy to get a classification-oriented Whisper variant down to 12ms inference on an L40 which is what this checkpoint of the smart-turn model clocks in at. But I could be wrong. My intuition after working on this problem for a bit is that Id probably put effort into going smaller and continuing to improve the data sets. Rather than"
X Link 2025-07-19T18:11Z [----] followers, [--] engagements

"Join us for ⚡ talks Wednesday night at @Cloudflare in San Francisco. There will be presentations and conversations about voice observability autonomy testing and evals and collaborative development. ⚡ @MarcKlingen / @Langfuse ⚡ @AxelBacklund / @AndonLabs ⚡ @ShivSakhuja / @AthinaAI ⚡ @lilyjclifford / @rimelabs I'm going to show some brand new @pipecat_ai voice agent user interface tooling 🔊🤖💻 📅 When: Wednesday July [--] 🕑 Time: 6:00 PM 8:30 PM RSVP below"
X Link 2025-07-21T17:27Z [----] followers, [----] engagements

"RSVP link: https://lu.ma/frqm1umn https://lu.ma/frqm1umn"
X Link 2025-07-21T17:27Z [----] followers, [---] engagements

"@anarchyco @Cloudflare @MarcKlingen @langfuse @axelbacklund @andonlabs @shivsakhuja If you come for a visit Id love to hang out"
X Link 2025-07-21T20:21Z [----] followers, [--] engagements

"@lukestanley Why even use WebRTC at all Short answer you need a UDP-based protocol for realtime voice. WebRTC is UDP and has a lot of other things you need built in too. https://voiceaiandvoiceagents.com/#websockets-webrtc https://voiceaiandvoiceagents.com/#websockets-webrtc"
X Link 2025-07-22T02:05Z [----] followers, [---] engagements

"I think there's going to be a lot of evolution in how we think about WebRTC because use cases for voice (and video) AI are very different from what we spent most of our engineering time on for the last few years. I'm excited about the return of peer-to-peer though there are trade-offs of course. I think it's possible that voice AI might even push the QUIC and MOC (and related) standards work in a good direction. I wrote about that here: https://voiceaiandvoiceagents.com/#quic-moq https://voiceaiandvoiceagents.com/#quic-moq"
X Link 2025-07-22T15:45Z [----] followers, [---] engagements

".@lizziepika kicking off AI builder night at @Cloudflare"
X Link 2025-07-24T01:34Z [----] followers, [---] engagements

"LLM selective response . If you're building a voice agent yourself you can achieve this (mostly) with a combination of prompting and orchestration logic. Basically "The user will sometimes tell you not to respond. When that happens the only thing you should do is call this tool output this specific token ." Then you need to define either a tool for the model to call that means "cool I intentionally didn't respond yet" or special token handling in your processing pipeline. I've built a few versions of this in @pipecat_ai including one that I still use for personal voice note-taking. I will try"
X Link 2025-07-24T16:20Z [----] followers, [----] engagements

"Hands-on workshop: Build real-time AI Voice Agents on AWS Join us Monday July 28th at the AWS Builder Loft for a hands-on workshop with engineers from @DeepgramAI @awscloud and @trydaily. Explore building realtime conversational agents at production scale . - using Deepgram's transcription and voice models - LLMs and the AWS Strands Agents - running on AWS Bedrock and Pipecat Cloud. We will have a complete repo with code you can clone run locally modify and deploy to the cloud. If you're new to voice AI this is a great way to get started. If you're an experienced voice AI developer and want"
X Link 2025-07-25T16:30Z [----] followers, [---] engagements

"Open source voice AI meetup on Wednesday . Join @trychroma @modal_labs and @trydaily for the monthly SF Voice AI meetup this upcoming Wednesday. The theme is using open source/weights models for conversational voice agents. I'm moderating a panel featuring: - @charles_irl - Pavankumar Reddy (Mistral) - @NikhilKMurthy - Kunal Dhawan (NVIDIA) We'll also show some new voice AI tech stack crossover code: (Modal x @pipecat_ai). Join us in person or via livestream"
X Link 2025-07-26T18:36Z [----] followers, [----] engagements

"Local voice AI with a [---] billion parameter LLM. ✅ - smart-turn v2 - MLX Whisper (large-v3-turbo-q4) - Qwen3-235B-A22B-Instruct-2507-3bit-DWQ - Kokoro All models running local on an M4 mac. Max RAM usage 110GB. Voice-to-voice latency is 950ms. There are a couple of relatively easy ways to carve another 100ms off that number. But it's not a bad start"
X Link 2025-07-27T03:19Z 10.5K followers, 64K engagements

"Code is here: This was made possible by the quant @ivanfioravanti posted today and his advice about changing the memory limits for this big model. And @Prince_Canuma's work on mlx-audio made implementing the in-process Kokoro generation for @pipecat_ai a breeze If you're interested in open source/weights models for voice AI come hang out with us in person or via livestream on Wednesday at the monthly Voice AI meetup. https://lu.ma/u3hzaj71 https://x.com/ivanfioravanti/status/1949081653663469591 https://x.com/ivanfioravanti/status/1949010482230108210"
X Link 2025-07-27T03:19Z [----] followers, [----] engagements

"Here's voice agent code that runs entirely locally on macOS: Models you can run locally are getting better and better. You still need a pretty high-end machine to run an LLM that's good at conversation and tool calling. In my opinion the 30B parameter models are now good enough to be really interesting. But depending on what you're doing you might find a smaller LLM works well too. https://github.com/kwindla/macos-local-voice-agents https://github.com/kwindla/macos-local-voice-agents https://github.com/kwindla/macos-local-voice-agents https://github.com/kwindla/macos-local-voice-agents"
X Link 2025-07-27T03:21Z [----] followers, [--] engagements

"@AhmedRezaT Have not had enough time with the model running in this quant yet to evaluate (even vibe evaluate) tool calling. But I definitely plan to"
X Link 2025-07-27T03:43Z [----] followers, [---] engagements

"@McQueenFu 950ms voice to voice is faster than the majority of voice agents running in production today. Expensive I cant argue with. At least in terms of the initial cost of the M4 machine. But inference is free. 😆"
X Link 2025-07-27T04:12Z [----] followers, [--] engagements

"@McQueenFu Its pretty easy with @pipecat_ai"
X Link 2025-07-27T05:14Z [----] followers, [--] engagements

"I agree with this and its nicely put. But I think Id describe it as reinventing software development tooling over the next few years rather than vibe coding. Its clear now that the bottleneck is not the language models and harnesses. The bottleneck is the shape of the libraries and components we design for them to use. We need a (partly) new set of LEGO blocks for the vibe coding tools to snap together. There is an argument vibe coding will make apps more safe not less. A lot of sloppy languages/systems exist because they're easier to learn. But with AI that is no longer needed. So we can"
X Link 2025-07-27T16:07Z [----] followers, [----] engagements

"Its not trivial to train a really good unified end-to-end audio model (projectors between the stages etc). The Ultravox work is impressive I do think native audio is the long-term future. But you lose some things when you fuse the inference stages as well as gain some (big) advantages in audio understanding and (potentially) latency. Evals are harder. Observability is harder. Being able to pick and choose the most effective STT and TTS based on your own evals is sometimes very helpful. Im also hoping that we get true bidirectional streaming architectures that we can scale up soon. Im actually"
X Link 2025-07-27T22:17Z 10.1K followers, [----] engagements

"The TTFT (latency) for the Qwen model here is about the same as using Groqs API with their larger models. Prefill time dominates the TTFT and Groqs speed advantage doesnt come into play very much because the network penalty is 100ms most of the individual turns add only a few hundred tokens and KV caching locally works well. Having said that for almost all production use cases you cant run a good enough LLM locally on real users devices. This demo uses 110GB of (unified) RAM And for anything where you care about throughput and not just latency Groq blows the M4 out of the water. @kwindla Is"
X Link 2025-07-27T22:25Z [----] followers, [----] engagements

"@almeida_dril I need to run my voice evals set. Have not done that yet"
X Link 2025-07-27T22:28Z [----] followers, [--] engagements

"No I dont think so for general conversational intelligence. Have an open-ended conversation on the good side of the uncanny valley and tool calls/instruction following work well with just generic prompting techniques. I think our hardware needs to improve. I think the magic number at least with current architectures is going to be 30B parameters. So maybe just barely if quantization-aware training SOTA continues to improve. But Im a cautious no 16GB is not enough. Its dangerous to make these predictions though. I could be wrong"
X Link 2025-07-27T22:37Z [----] followers, [--] engagements

"More great voice AI example code from @uberboffin. Play the game Guess Who interactively with an AI partner. We had a thread here last week about "selective refusal." Selective refusal means prompting or training an LLM to sometimes not respond to input. Or more accurately to generate a no response response. Your application then handles the no response response however makes sense for your conversation flow. Sam's code defines a function call that facilitates selective refusal. Sam also gives the LLM structured data about the game characters and a very clear prompt defining the game rules."
X Link 2025-07-28T19:50Z [----] followers, [----] engagements

"The code: The AI character "Humphrey" is powered by: 🎙@Speechmatics ASR + diarization 🔗@pipecat_ai for WebRTC + function calls 🧠@OpenAI for smarts 🗣@elevenlabsio for that smooth smooth British voice https://github.com/sam-s10s/pipecat-guess-who-irl/tree/main https://github.com/sam-s10s/pipecat-guess-who-irl/tree/main"
X Link 2025-07-28T19:50Z [----] followers, [---] engagements

"Voice + AWS Bedrock & Claude + Strands Agents [---] people came to the AWS Builder Loft in San Francisco yesterday for a voice agents hands-on workshop. @natrugrats talked about @deepgram_ai's speech models and APIs. Ayan Ray gave a great overview of @awscloud AI models and services. Strands Agents is an open source agents SDK that works with any model/API and has extensive AWS integrations. Strands will make as many tool calls as needed to execute a task so in a conversational voice context you need to run the agent in a parallel process alongside the realtime voice pipeline. You also want to"
X Link 2025-07-30T01:22Z 10.4K followers, [----] engagements

"@mattmireles @natrugrats @awscloud My understanding is that the same Claude you know and love in ChatClaude and Claude Code is whats served in Bedrock"
X Link 2025-07-30T02:22Z [----] followers, [---] engagements

"@swyx @aiDotEngineer @DKundel @OpenAI @shresbm @GoogleDeepMind @intercom @bnicholehopkins @dwyer_neil @raizamrtn Sean is @_pion here"
X Link 2025-07-31T19:21Z [----] followers, [---] engagements

"I feel like we've seen more work on vision than on voice from the frontier labs. (And I'm not against it) And a lot of good vision work in the open weights world too. Shout out to @vikhyatk and Moondream My feeling is that the vision capabilities are there in the models and we just need to be building more applications that use vision and more user interface experiments that aim at the new kinds of software we want to enable. Also image generation that's fast enough to use interactively in applications is here too. But I'd like to see a lot more developers incorporating near-realtime image"
X Link 2025-07-31T21:51Z 10K followers, [---] engagements

"@ivanfioravanti Thank you for your frontier work in LLM applied physics. 🫡"
X Link 2025-08-01T16:49Z 10K followers, [----] engagements

"Vibe coding multimodal voice user interfaces today with Claude Code OpenAI Codex and Google Jules. Jules understood the assignment and went all-in on the story line. Sound on"
X Link 2025-08-03T04:52Z 10.4K followers, 20.9K engagements

"New turn detection model for voice agents from the excellent team at @krispHQ. Fast accurate turn detection is one of the "hard problems" for all of us building voice AI right now. Many many voice agents use Krisp's background voice cancellation models to improve transcription accuracy and reduce unintended interruptions. It's great to see the Krisp team working on turn detection too and offering customers a really good native audio turn detection model. The launch blog post is a good read for anyone interested in voice AI. It covers: - approaches to turn taking - the importance of latency -"
X Link 2025-08-05T15:59Z 10.5K followers, 18K engagements

"@broadfield_dev TTS is Kokoro (via @Prince_Canumas mlx-audio). Links and code in the repo"
X Link 2025-08-06T06:02Z 10.8K followers, [----] engagements

"@wiltongorske @ollama I can help make that happen"
X Link 2025-08-06T22:24Z 10.5K followers, [---] engagements

"GPT-5 is out in the world Here's a single-file voice agent powered by GPT-5. All you need is an OpenAI API key and Python. export OPENAI_API_KEY=sk_proj-. uv run gpt-5-voice-agent .py The first time you run this it will take about [--] seconds to install all the dependencies accept connections and begin processing audio and video. For voice AI use cases you probably want these parameter settings for GPT-5. service_tier: priority reasoning_effort: minimal verbosity: low Note that using the "priority" service tier doubles the cost per token. Having this option is great for latency sensitive"
X Link 2025-08-07T23:20Z 10.5K followers, [----] engagements

"Here's the code: The code above uses the standard GPT-5 model plus OpenAI's transcription and voice generation models. OpenAI also released a new version of the natively voice-to-voice Realtime model and API today. Congratulations to everyone who worked on all the new things that shipped today For more expansive starter kit code that shows how to use both the three-model approach and the Realtime API see this guide: If you're interested in a technical deep dive into voice AI and building production voice agents check out the Voice AI & Voice Agents Primer: https://voiceaiandvoiceagents.com/"
X Link 2025-08-07T23:20Z 10.5K followers, [----] engagements

"@Sanava_AI Why did you comment out verbosity in the code Regression in the openai library required pinning to a previous version (that didn't yet support the verbosity property). It's fixed now. https://github.com/openai/openai-python/issues/2525 https://github.com/openai/openai-python/issues/2525"
X Link 2025-08-08T14:28Z 10.5K followers, [--] engagements

"GPT-5 is multimodal in the sense that it has vision input. (A lot of people use the term that way. A bit of a different perspective from those of us who are obsessed with realtime voice) The GPT-5 launch live stream had a really nice voice demo. The new model is in ChatGPT now but not yet in the public API. https://www.youtube.com/live/0Uu_VJeVVfosi=nVzHuy4sT09R3J2P&t=1402 https://www.youtube.com/live/0Uu_VJeVVfosi=nVzHuy4sT09R3J2P&t=1402"
X Link 2025-08-08T16:46Z 10.5K followers, [--] engagements

"Quick PSA. Settings for minimizing GPT-5 latency (time to first token). "service_tier": "priority" "reasoning_effort": "minimal" "verbosity": "low". P50 TTFT with these settings is 750ms. With the defaults it's 3s. The default settings are the right starting point for most use cases. It's good that this model can think proactively. As @swyx says "It's a good model ser." For use cases where you care a lot about TTFT use the above settings. Posting this here because I've answered this question a bunch of times today in various DMs and channels"
X Link 2025-08-08T17:19Z 10.5K followers, 11K engagements

"@joshwhiton It's part of the open source voice-ui-kit which supports lots of different network transports. You can use it with Gemini Live API OpenAI Realtime API serverless WebRTC WebSockets and Daily or LiveKit WebRTC. More complete docs coming soon: https://github.com/pipecat-ai/voice-ui-kit/tree/main/examples/01-console https://github.com/pipecat-ai/voice-ui-kit/tree/main/examples/01-console"
X Link 2025-08-08T18:46Z 10.5K followers, [---] engagements

"Here's my advice about how to scale WebRTC after helping hundreds of teams get to production with voice agents. tldr: either run "serverless" WebRTC if you want to host the infrastructure yourself or use a global commercial cloud. Don't try to run your own WebRTC SFUs because building out a reliable performant WebRTC cloud is many engineer-years worth of effort: I also wrote down a bunch of general advice about building and scaling voice AI including network transports here: https://voiceaiandvoiceagents.com/#network-transport"
X Link 2025-08-11T18:35Z 10.8K followers, [--] engagements

"@ID_AA_Carmack The Moshi paper is fantastic. Bidirectional multimodal streaming tokens architecture. https://kyutai.org/Moshi.pdf https://kyutai.org/Moshi.pdf"
X Link 2025-08-11T18:49Z 10.8K followers, [----] engagements

"@gautham_vijay_ @pipecat_ai Thank you for the kind words And for contributing knowledge to the community I totally agree that understanding the WebRTC building blocks is super valuable for voice AI developers"
X Link 2025-08-11T18:58Z 10.5K followers, [--] engagements

"Teaser for something coming soon. Created using FLUX.1 dev on @FAL"
X Link 2025-08-12T04:07Z 10.6K followers, [----] engagements

"@joshwhiton Huh. That doesn't happen for me. I'll do a fresh clone of the repo and try it again later today. Anything interesting in the js console on the client side when you turn off the network"
X Link 2025-08-14T17:49Z 10.5K followers, [--] engagements

"At the @aiDotEngineer World's Fair in June @shresbm and I gave a talk about all the "magic" that goes into making great voice AI experiences. Magic in the sense of making hard things look easy. Magic in the sense of sufficiently advanced technology being indistinguishable from. Shrestha and her team train the LLMs and make the APIs we all use. I work mostly higher up at the orchestration and application levels. We thought it would be fun to show the push-pull tension between making use of the open-ended emergent capabilities of today's SOTA models while also writing scaffolding that makes"
X Link 2025-08-14T20:23Z 10.6K followers, [----] engagements

"Today I refactor the app I have been vibe coding for ten days. I do not expect to return. Give my kinesis foot switches to a junior developer capable of wielding the ancient and pure tools car cdr and cons"
X Link 2025-08-17T16:47Z 10.5K followers, [----] engagements

"Voice-only programming with the new OpenAI Realtime API . I spend a lot of time these days pair programming with LLMs. Often I'm talking rather than typing. This "voice dictation" use case has become an important vibe benchmark for me. Being able to create text input just by talking flexibly in a context dependent way with tool calling is a hard problem for today's models. Natural language dictation requires a very high degree of contextual intelligence instruction following accuracy and tool calling reliability. Today's new gpt-realtime model is quite good at this hard problem. The"
X Link 2025-08-28T18:16Z 10.7K followers, 50K engagements

"My goals for voice input are to: [--]. Be able to talk to my computer the same way I talk to another person. I don't want to have to dictate literal phrases. I want to stop and start go back and correct things I said before rely on previous context have my tools interpret what I mean to say rather than what I literally said and have the model fill in gaps and rewrite things for me on the fly. [--]. Do many of the things I can easily do with a keyboard and mouse. Send input to different windows. Perform sequences of actions. Copy and paste. Take screenshots. [--]. Have context and memory so I don't"
X Link 2025-08-28T18:16Z 10.7K followers, [----] engagements

"I got a bunch of questions about the cost of the Realtime API yesterday after posting this. tldr: OpenAI has followed their usual (and much appreciated) path of cutting the pricing of the Realtime API with every release. Cost is now about $0.04/minute of speech-to-speech time factoring in the implicit token caching. But note: you generally do not get charged for non-talking time because the OpenAI voice activity detection filters out non-speech input. So for a use case like voice programming you're probably only talking 5% of the time and dictation assistant output is extremely brief so it's"
X Link 2025-08-29T16:59Z 10.7K followers, 25K engagements

"I heard you like TUIs. Here's a TUI for the gpt-realtime voice dictation stuff I posted last week. Also a gpt-5 version. And a local @pipecat_ai transport for macOS with acoustic echo cancellation. I'm having a lot of fun blurring the boundaries between voice dictation and using an LLM as a full natural language assistant. I definitely do not have the prompting right yet to be able to talk completely free-form to the LLM and have it dictate the right text to the right window every time. Things work a lot of the time. You get used to a certain level of magic happening. And then a command"
X Link 2025-09-01T19:26Z 10.8K followers, [----] engagements

"This is a really nice piece on five iconic companies informed by Paul's own experience working at Twilio and now building @browserbasehq. Commonalities: be early (to a very big market) be relentless be lucky. The first one is really hard because you will be too early if you're early enough. (But lots of things about startups are hard.) The second one is largely within your control. The third one is probably not something you can control. A long time ago @collision said to me that a startup is very fortunate if its big incumbent competitors are asleep at the wheel. That's not enough by"
X Link 2025-09-02T17:43Z 10.8K followers, [----] engagements

"Voice AI people: join us Wednesday in San Francisco at the @covaldev office for a conversation about reliable speech generation for realtime AI"
X Link 2025-09-03T03:53Z 10.7K followers, [----] engagements

"If you'll be in San Francisco on September 25th come hang out with great people from Cloudflare various AI luminaries a robot hand and me. 🌉 SF we are throwing a watch party of the first two episodes https://t.co/MotYBHMUrj Come hang out with the stars of the show 🤩 @thorwebdev from @elevenlabsio @kwindla from Daily and @pipecat_ai @josephofiowa from @roboflow Yorick the Robot Hand 🍿See you there 🌉 SF we are throwing a watch party of the first two episodes https://t.co/MotYBHMUrj Come hang out with the stars of the show 🤩 @thorwebdev from @elevenlabsio @kwindla from Daily and @pipecat_ai"
X Link 2025-09-05T21:41Z 10.7K followers, [----] engagements

"@_ricburton Yes I did a bunch of kite surfing when I took a break between Oblong and Daily. Dawn patrol surf. Then a few hours of Haskell hacking. Then afternoon kiting. That was a good schedule. This was before foils though so I know theres a new learning curve to climb up"
X Link 2025-09-07T21:24Z 10.7K followers, [--] engagements

"Re talking past each other: Some people are mostly thinking about the "CPU" in karpathy's LLM operating system (loosely speaking) and some people are mostly thinking about tools and libraries (loosely speaking). in ai products quality dominates a lot esp for generally intelligent products everything else like privacy cost etc is just rounding error I agree with this and it pushes in the direction of both using the biggest best (probably proprietary models for the foreseeable future) and using small models everywhere all the time. Application responsiveness is an important part of overall"
X Link 2025-09-10T15:20Z 10.7K followers, [----] engagements

"Blog post with more details about the new v3 version of Smart Turn: Training and inference code on GitHub: Model weights and all data sets are on @huggingface: The Krisp Turn-Taking model integrated into their suite of voice AI models: The Ultravox context-aware endpointing (turn detection) model: https://www.ultravox.ai/blog/ultravad-is-now-open-source-introducing-the-first-context-aware-audio-native-endpointing-model https://krisp.ai/blog/turn-taking-for-voice-ai/ https://huggingface.co/pipecat-ai https://github.com/pipecat-ai/smart-turn/"
X Link 2025-09-12T04:13Z 10.8K followers, [---] engagements

"Tiny SOTA model release today: v3 of the Smart Turn semantic VAD model. Smart Turn is a native audio open source open data open training code model for detecting whether a human has stopped speaking and expects a voice agent to respond. The model now runs in 60ms on most cloud vCPUs faster than that on your local CPU and in 10ms on GPU. Running on CPU makes it essentially free to use this in a voice AI agent. [--] languages and you can contribute data or data labeling to add a language or improve the model performance in any of the existing language. This model is a community effort. We think"
X Link 2025-09-12T04:32Z 10.8K followers, [----] engagements

"Blog post with more details about the new v3 version: Training and inference code on GitHub: Model weights and all data sets are on @huggingface: The Krisp Turn-Taking model integrated into their suite of voice AI models: The Ultravox context-aware endpointing (turn detection) model: https://www.ultravox.ai/blog/ultravad-is-now-open-source-introducing-the-first-context-aware-audio-native-endpointing-model https://krisp.ai/blog/turn-taking-for-voice-ai/ https://huggingface.co/pipecat-ai https://github.com/pipecat-ai/smart-turn/"
X Link 2025-09-12T04:32Z 10.7K followers, [---] engagements

"There are benchmarks and notes about technical goals in all the blog posts I linked to and the GitHub Smart Turn repo. You can actually run the training code for the Smart Turn model yourself It's all in that repo. You can train the model locally if you have a GPU or run the training on @modal"
X Link 2025-09-12T15:02Z 10.8K followers, [--] engagements

"@tensordot_ Hindi Marathi and Bengali today. Language table is here: And you can contribute data to add or improve language support https://www.daily.co/blog/announcing-smart-turn-v3-with-cpu-inference-in-just-12ms/ https://www.daily.co/blog/announcing-smart-turn-v3-with-cpu-inference-in-just-12ms/"
X Link 2025-09-12T15:33Z 10.7K followers, [---] engagements

"Introducing the @aiDotEngineer World's Fair multi-turn conversation benchmark. I wrote a post for the @AITinkerers newsletter about: - why multi-turn is hard for LLMs - the particular challenges of audio/voice and - a benchmark I use to think about LLM performance for conversational agents"
X Link 2025-09-12T15:48Z 10.8K followers, [----] engagements

"@notnotrishi Thats just demo/training code I think. I was talking about this: https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/audio/turn/smart_turn/local_smart_turn_v3.py https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/audio/turn/smart_turn/local_smart_turn_v3.py"
X Link 2025-09-14T00:39Z 10.8K followers, [---] engagements

"Pipecat TV Episode [--] - Marcus joins the show to talk about the Smart Turn model v3 release (making CPUs go brrr). Plus some notes about the new universal LLMContext component which now supports the following LLM Service classes: - AWS Bedrock - Azure - Cerebras - Deepseek - Fireworks AI - Google AI Studio - Google Vertex AI - Grok - Groq - Mistral - NVIDIA NIM - Ollama - OpenAI - OpenPipe - OpenRouter - Perplexity - Qwen - SambaNova - http://Together.ai After the first episode of Pipecat TV its been hard to go unnoticed outside 😅. It went so well that we couldnt resist recording a second"
X Link 2025-09-15T19:50Z 10.8K followers, [----] engagements

"Last time I looked at the Alexa SDKs the environment was still a little too constrained to run full Pipecat agents with Alexa. That might have changed. Would be a cool thing to be able to do We support a lot of hardware environments ranging from ESP32 to Raspberry Pi to NVIDIA Jetson https://x.com/thorwebdev/status/1945158921179570557 https://github.com/pipecat-ai/pipecat-esp32 https://x.com/thorwebdev/status/1945158921179570557 https://github.com/pipecat-ai/pipecat-esp32"
X Link 2025-09-15T19:52Z 10.8K followers, [--] engagements

"Squobert runs on a Raspberry Pi: An even smaller/cheaper embedded hardware platform and one that's very widely used is the ESP32 family of microcontrollers: @_pion posted that he'll be in San Francisco at a hardware AI meetup on 11/1: https://luma.com/physical-ai-meetup https://github.com/pipecat-ai/pipecat-esp32 https://github.com/chadbailey59/squobert/ https://github.com/pipecat-ai/pipecat-esp32 https://github.com/chadbailey59/squobert/ https://luma.com/physical-ai-meetup https://github.com/pipecat-ai/pipecat-esp32 https://github.com/chadbailey59/squobert/"
X Link 2025-09-18T20:52Z 10.8K followers, [---] engagements

"@pierre_s @aiDotEngineer @_pion @chadbailey59 Yes but WebSockets are totally fine for server-to-server connections hosted in data centers with good peering So no worries. Read more here for a deeper dive: https://voiceaiandvoiceagents.com/ https://voiceaiandvoiceagents.com/"
X Link 2025-09-19T05:14Z 10.8K followers, [--] engagements

".@craigsdennis introducing AI Avenue Episode [--] at haus lava lamp (@Cloudflare HQ)"
X Link 2025-09-26T01:46Z 10.8K followers, [---] engagements

"Sun surf and beach pianos are hard to beat. Damn @rajivayyangar and @kwindla making a pretty strong case for moving to SF 🌁🌊🤩 https://t.co/HkIrYcI3Ci Damn @rajivayyangar and @kwindla making a pretty strong case for moving to SF 🌁🌊🤩 https://t.co/HkIrYcI3Ci"
X Link 2025-09-26T22:03Z 10.8K followers, [----] engagements

"Join us for a hackathon at @ycombinator on October 11th. Gemini x Pipecat realtime AI fun and games Build an application using Gemini and Pipecat. See some new APIs. Show off interesting things you're doing in your startup or side project. Hang out with engineers from Google DeepMind and Google Cloud the AI Tinkerers community and YC companies Daily Boundary Coval Langfuse and Tavus. Eat Outta Sight pizza. Limited space . apply below"
X Link 2025-09-27T21:41Z 10.8K followers, 24K engagements

"@anarchyco Would love to take a sun surf and piano tour of Spain"
X Link 2025-09-29T19:31Z 10.8K followers, [--] engagements

"A new transcription model from @DeepgramAI launched today: Flux. Flux is completely free for all of October and is integrated into Pipecat and Pipecat Cloud. This model shows where speech recognition is headed as speech models evolve to enable more and more voice agent use cases. Deepgram has always been the market leader in very low latency transcription. (Which is critical for conversational voice) My "magic number" here is 300ms. I want the finalized transcript to be delivered no more than 300ms after the user stops speaking. One reason that 300ms is a good baseline number is that the open"
X Link 2025-10-03T00:08Z 10.8K followers, [----] engagements

"Deepgram's blog post: Pipecat Deepgram and Flux docs: Using Flux (for free) in Pipecat cloud no API key needed: https://docs.pipecat.ai/deployment/pipecat-cloud/guides/managed-api-keys#managed-api-keys https://docs.pipecat.ai/server/services/stt/deepgram https://deepgram.com/flux https://docs.pipecat.ai/deployment/pipecat-cloud/guides/managed-api-keys#managed-api-keys https://docs.pipecat.ai/server/services/stt/deepgram https://deepgram.com/flux"
X Link 2025-10-03T00:08Z 10.8K followers, [---] engagements

"I will say that "pre-filling" a system instruction using grep-ish tools is surprisingly not good in the things I've built for myself. But totally willing to believe this is a skill issue. Things that have worked better for me: - Pre-filling deterministically based on something like a CRM lookup - Dynamically updating using filesystem tools but with a fair amount of work put into the harness here. Naive approaches don't usually work as well as I hope they will. - Pre-filling or dynamic lookup using vector search. Again naive approaches are frustratingly not good. Chunking etc require some"
X Link 2025-10-04T19:15Z 10.8K followers, [--] engagements

"As someone who does a lot of not just vibe coding using voice input I'm 100% convinced that much (maybe most) programming will be done by voice. I still type a lot too. But once you get used to voice mode and change your workflow to be effective with voice it feels scratchy going back to typing. A few notes. - My setup is often 1) a clean-up and processing voice loop 2) with a fair amount of LLM prompting and iteration before I send the text into the coding environment. As you see in the video @patloeber reposted below a little bit of cleanup is helpful. I would argue that a lot of cleanup"
X Link 2025-10-08T18:23Z 10.8K followers, [----] engagements

"Source code: Hackathon sign-up link: Credit to @mark_backman for the code the really nice README and the demo video. https://events.ycombinator.com/pipecat-gemini-yc25 https://github.com/daily-co/pcc-gemini-twilio https://events.ycombinator.com/pipecat-gemini-yc25 https://github.com/daily-co/pcc-gemini-twilio"
X Link 2025-10-08T18:43Z 10.9K followers, [---] engagements

"Starter kit for a web app that can see your screen and talk to you. - @GoogleDeepMind Gemini Live API for voice conversation and vision. - React front end built with Next.js and voice-ui-kit. - Deploy to Pipecat Cloud or anywhere you can run @pipecat_ai code. Come build multimodal realtime AI stuff with us this Saturday at YC in San Francisco. We're doing an all-day Gemini x Pipecat hackathon"
X Link 2025-10-09T22:02Z 10.9K followers, 10.7K engagements

"Github repo: Hackathon application: Open source voice-ui-kit docs: https://github.com/pipecat-ai/voice-ui-kit/ https://events.ycombinator.com/pipecat-gemini-yc25 https://github.com/daily-co/pcc-gemini-screen-voice-ui-kit https://github.com/pipecat-ai/voice-ui-kit/ https://events.ycombinator.com/pipecat-gemini-yc25 https://github.com/daily-co/pcc-gemini-screen-voice-ui-kit"
X Link 2025-10-09T22:02Z 10.8K followers, [---] engagements

"Filipi asks Gemini how to build a WhatsApp voice agent with Gemini in a WhatsApp voice call powered by Gemini. Facebook recently launched the WhatsApp Business Calling API which integrates voice calls into WhatsApp's chat threads interface. WhatsApp users can initiate voice calls to verified business accounts. Businesses can get customers' permission to start a voice call. Businesses can answer the calls inside WhatsApp route the calls to a call center or . answer the calls with a @pipecat_ai voice agent. See below for a guide to WhatsApp voice AI and a github repo with code you can use as a"
X Link 2025-10-10T15:25Z 10.9K followers, [---] engagements

"Pipecat WhatsApp voice agent guide: Starter kit: Application for Saturday hackathon at YC: https://events.ycombinator.com/pipecat-gemini-yc25 https://github.com/daily-co/pcc-gemini-whatsapp/ https://docs.pipecat.ai/guides/features/whatsapp https://events.ycombinator.com/pipecat-gemini-yc25 https://github.com/daily-co/pcc-gemini-whatsapp/ https://docs.pipecat.ai/guides/features/whatsapp"
X Link 2025-10-10T15:25Z 10.9K followers, [---] engagements

"Bob the humanoid robot"
X Link 2025-10-15T04:10Z 10.9K followers, [---] engagements

"Arthur MCP. An MCP server that integrated tools from all the sponsoring companies. https://x.com/moeintechoff/status/1977494768629522686 It was an eye-opening experience at @ycombinator yesterday exploring the world of Voice/Video AI. Hosted by @GoogleDeepMind x @pipecat_ai (@trydailys framework for voice and multi-modal conversational AI) I had the opportunity to see how companies today are shaping the future https://t.co/lFYBzvsSZ3 https://x.com/moeintechoff/status/1977494768629522686 It was an eye-opening experience at @ycombinator yesterday exploring the world of Voice/Video AI. Hosted by"
X Link 2025-10-15T04:10Z 10.9K followers, [---] engagements

"Hey Kids Get Off My Lawn: The Once and Future Visual Programming Environment http://t.co/tduxASNG http://techcrunch.com/2012/05/27/hey-kids-get-off-my-lawn-the-once-and-future-visual-programming-environment/ http://techcrunch.com/2012/05/27/hey-kids-get-off-my-lawn-the-once-and-future-visual-programming-environment/"
X Link 2012-05-27T17:53Z 10.7K followers, [--] engagements

"Foundry beer and now Foundry coffee @jasonmendelson @bfeld @ryan_mcintyre and @sether you guys are everywhere http://t.co/mcZi5KI5"
X Link 2012-09-13T00:41Z [----] followers, [--] engagements

"@nelson Just had an emergency no-flaps landing at washington dulles. I've never flown from DCA to IAD before. Fast touch-down"
X Link 2013-05-17T00:18Z 10.7K followers, [--] engagements

"@nelson mmm questions: salt/vinegar/%; did you blanche"
X Link 2014-05-16T20:45Z 10.7K followers, [--] engagements

"@nelson Peach crisp = (pie - crust) * oatmeal"
X Link 2014-08-11T18:17Z 10.7K followers, [--] engagements

"@nelson a "friend's" machine "
X Link 2015-01-22T02:57Z 10.7K followers, [--] engagements

"Euell Gibbons. Stalking the Blue-Eyed Scallop. Van Nees [----]. http://t.co/Kpz68NGL3n"
X Link 2015-04-27T17:26Z [----] followers, [--] engagements

"@nelson Genuinely curious . is it not worth the $$ to subscribe Does The Economist max out your neoliberal-old-timey-news budget 😂"
X Link 2015-05-06T21:07Z 10.7K followers, [--] engagements

"Tomasso Toffoli and Norman Margolus. Cellular Automata Machines. MIT [----]. http://t.co/eU35FbiWIl"
X Link 2015-06-19T18:44Z [----] followers, [--] engagements

"@nelson That's lovely She mentions that she started with a vector watershed map she found online. Your work"
X Link 2015-09-16T23:29Z 10.7K followers, [--] engagements

"C'mon @nytimes this is ridiculous. You can do better. https://mobile.nytimes.com/2017/08/18/opinion/joseph-conrad-congo-river.htmlreferer=https://t.co/kPO4m540LWamp=1 https://mobile.nytimes.com/2017/08/18/opinion/joseph-conrad-congo-river.htmlreferer=https://t.co/kPO4m540LWamp=1"
X Link 2017-08-20T02:21Z 10.7K followers, [--] engagements

Limited data mode. Full metrics available with subscription: lunarcrush.com/pricing