#  @kwindla kwindla kwindla posts on X about realtime, ai, voice, open ai the most. They currently have [------] followers and [---] posts still getting attention that total [-----] engagements in the last [--] hours. ### Engagements: [-----] [#](/creator/twitter::16375739/interactions)  - [--] Week [------] +490% - [--] Month [------] -76% - [--] Months [-------] -56% - [--] Year [---------] +40% ### Mentions: [--] [#](/creator/twitter::16375739/posts_active)  - [--] Months [--] -50% - [--] Year [---] +51% ### Followers: [------] [#](/creator/twitter::16375739/followers)  - [--] Week [------] +1.30% - [--] Month [------] +2.80% - [--] Months [------] +16% - [--] Year [------] +77% ### CreatorRank: [---------] [#](/creator/twitter::16375739/influencer_rank)  ### Social Influence **Social category influence** [technology brands](/list/technology-brands) [stocks](/list/stocks) [social networks](/list/social-networks) [finance](/list/finance) [countries](/list/countries) [vc firms](/list/vc-firms) [travel destinations](/list/travel-destinations) [gaming](/list/gaming) [celebrities](/list/celebrities) [products](/list/products) **Social topic influence** [realtime](/topic/realtime), [ai](/topic/ai), [voice](/topic/voice), [open ai](/topic/open-ai), [agent](/topic/agent) #3834, [llm](/topic/llm) #1019, [inference](/topic/inference) #255, [in the](/topic/in-the), [$googl](/topic/$googl), [model](/topic/model) #2713 **Top accounts mentioned or mentioned by** [@pipecatai](/creator/undefined) [@trydaily](/creator/undefined) [@openai](/creator/undefined) [@twilio](/creator/undefined) [@chadbailey59](/creator/undefined) [@krisphq](/creator/undefined) [@aidotengineer](/creator/undefined) [@deepgramai](/creator/undefined) [@groqinc](/creator/undefined) [@huggingface](/creator/undefined) [@cartesiaai](/creator/undefined) [@aconchillo](/creator/undefined) [@nelson](/creator/undefined) [@googledeepmind](/creator/undefined) [@producthunt](/creator/undefined) [@jonptaylor](/creator/undefined) [@moodisadi](/creator/undefined) [@markbackman](/creator/undefined) [@rajivayyangar](/creator/undefined) [@solarislll](/creator/undefined) **Top assets mentioned** [Alphabet Inc Class A (GOOGL)](/topic/$googl) [Cloudflare, Inc. (NET)](/topic/cloudflare) [Robot Consulting Co., Ltd. (LAWR)](/topic/robot) ### Top Social Posts Top posts by engagements in the last [--] hours "Pipecat 0.0.99 is a pretty big release [--] items in the "Added" section including vision (image input) support for OpenAI Realtime word-level timestamps in AzureTTSService the @krisp_ai VIVA turn detection model and Grok Realtime voice-to-voice. There's also a fundamental new abstraction in this release: turn and interruption "strategies." We started working on Pipecat in [----]. () In those early days we had just a few STT TTS and LLM models we could use for voice agents. The only turn detection option was Silero VAD. We were building fairly simple pipelines and targeting fairly simple use" [X Link](https://x.com/kwindla/status/2011660766446043243) 2026-01-15T04:44Z 12.2K followers, [----] engagements "I posted about context engineering for voice agents a few days ago and realized I should also point people to @mark_backman's "Instruction Following and Workflows" tutorial from last year. Mark takes you from "this is going to be easy" through the "you're looking like the crazy guy with the conspiracy board" meme all the way out the other side to prompt and context engineering success. Or at least gives you tools to tackle long multi-turn realworld agent challenges: - LLM reliability and context window limits - Strategies for managing context - Pipecat Flows state machine library - Will we" [X Link](https://x.com/kwindla/status/2011919339906351312) 2026-01-15T21:52Z 12K followers, [---] engagements "Kushagra Jain put together a comprehensive Pipecat introduction and architecture/concepts walk-through as a [--] minute video plus slides. Lots of good general Voice AI information here (latency interruptions network transport turn taking inference state etc) which Kushagra connects to the specific Pipecat building blocks that you use in a production voice agent for each of those things. https://twitter.com/i/web/status/2013313713533788176 https://twitter.com/i/web/status/2013313713533788176" [X Link](https://x.com/kwindla/status/2013313713533788176) 2026-01-19T18:12Z 12.2K followers, [----] engagements "Small concrete example of a Claude skill that handles a repetitive previously human-dependent software engineering task. @aconchillo wrote a skill to generate changelogs for releases using the pipecat repo conventions" [X Link](https://x.com/kwindla/status/2014184899415085550) 2026-01-22T03:54Z 12.2K followers, [---] engagements "I will be hacking on self-improving agents at WeaveHacks [--]. And Im bringing a bunch of Pipecat swag. ⚡ WeaveHacks [--] is happening at wandb HQ in SF Jan 31-Feb [--] @altryne will be hosting with a stacked judge panel. Self-improving agents. A literal robot dog grand prize + over $15K in other prizes. Sponsored by @Redisinc @browserbase @vercel @trydaily @googlecloud. https://t.co/xHGdlhK00i ⚡ WeaveHacks [--] is happening at wandb HQ in SF Jan 31-Feb [--] @altryne will be hosting with a stacked judge panel. Self-improving agents. A literal robot dog grand prize + over $15K in other prizes. Sponsored by" [X Link](https://x.com/kwindla/status/2014796987431113120) 2026-01-23T20:26Z 12.2K followers, [----] engagements "@nvbalaji I don't think input and output were easy to define or model in the pre-AI software era. I say that after many years of arguments about things like Model View Controller abstractions and what the ideal design for a React state management library is. :-)" [X Link](https://x.com/kwindla/status/2015499823210991889) 2026-01-25T18:59Z 12K followers, [--] engagements "Observability for voice agents is evolving quickly. And just in time. Tracing evals against real data prompt management latency and reliability metrics simulation-based testing . these are all really really valuable to teams shipping voice AI products. It's been great working with the @ArizePhoenix team on their Pipecat integration. 🚀 New OpenInference integration with @pipecat_ai Weve added OpenInference instrumentation for Pipecat using OpenTelemetry. Pipecat services can now emit standardized traces and spans with semantic context across realtime agent pipelines. The integration" [X Link](https://x.com/kwindla/status/2015920371221033080) 2026-01-26T22:50Z 12K followers, [----] engagements "Voice-only programming with Claude Code . I've been playing with @aconchillo's MCP server that lets you talk to Claude Code from anywhere today. I always have multiple Claudes running and I often want to check in on them when I'm not in front of a computer. Here's a video of Claude doing some front-end web testing hitting an issue and getting input from me and then reporting that the test passed. In the video the Pipecat bot is using Deepgram for transcription and Cartesia for the voice. (Note: I sped up the web testing clickety-click sections of the video.) The code for the MCP server and" [X Link](https://x.com/kwindla/status/2015956506118914221) 2026-01-27T01:14Z 12.2K followers, [----] engagements "Pipecat MCP Server: This is infinitely customizable. Getting started with Pipecat: https://docs.pipecat.ai/getting-started/quickstart https://github.com/pipecat-ai/pipecat-mcp-server https://docs.pipecat.ai/getting-started/quickstart https://github.com/pipecat-ai/pipecat-mcp-server" [X Link](https://x.com/kwindla/status/2015956508522250311) 2026-01-27T01:14Z 12.2K followers, [---] engagements "Async automatic non-blocking context compaction for long-running agents. Last week I gave a talk called Space Machine Sandboxes at the @daytonaio AI builders meetup about patterns for long-running agents. I work a lot on voice AI agents which are fundamentally multi-turn long-context loops. I also build lots of other AI agent stuff often as part of bigger systems that include voice. One of the patterns I showed in the talk is non-blocking compaction. Here's a short clip. https://twitter.com/i/web/status/2016288112629187054 https://twitter.com/i/web/status/2016288112629187054" [X Link](https://x.com/kwindla/status/2016288112629187054) 2026-01-27T23:12Z 12.2K followers, 26.5K engagements "The full video is here: Here's a pointer to source code. This is not a clean example It's part of the code of the game I talked about in the talk. But let me know if you have questions: And Claude's summary: This is a 12-minute talk about building AI agents using a space trading game called Gradient Bang as a sandbox for experimenting with agent patterns. The core thesis is that voice agents coding agents and LLM-powered games are fundamentally the same thing an LLM in a loop with tool calls and context engineering. --- Topics 00:00 Introduction Kwindla from Daily working on Pipecat" [X Link](https://x.com/kwindla/status/2016288114529206527) 2026-01-27T23:12Z 12.2K followers, [----] engagements "@ishaansehgal @aconchillo Are you using @pipecat_ai" [X Link](https://x.com/kwindla/status/2016292387266691162) 2026-01-27T23:29Z 12.2K followers, [--] engagements "@ApplyWiseAi @daytonaio Rules of thumb: - keep system instruction under 5k tokens (but you can swap it out for different "states" in a complex voice workflow) - compact the conversation after [--] turns if you can" [X Link](https://x.com/kwindla/status/2016652411943997925) 2026-01-28T23:19Z 12K followers, [--] engagements "Benchmarking LLMs for voice agent use cases. New open source repo along with a deep dive into how we think about measuring LLM performance. The headline results: - The newest SOTA models are all *really* good but too slow for production voice agents. GPT-4.1 and Gemini [---] Flash are still the most widely used models in production. The benchmark shows why. - Ultravox [---] shows that it's possible to close the "intelligence gap" between speech-to-speech models and text-mode LLMs. This is a big deal - Open weights models are climbing up the capability curve. Nemotron [--] Nano is almost as capable" [X Link](https://x.com/anyuser/status/2018439972123185379) 2026-02-02T21:42Z 12.2K followers, [----] engagements "@chadbailey59 This is good framing and probably clearer and more succinct than I managed in my write-up about the benchmark. You can build ultra low latency super reliable voice agents but you have to do context engineering" [X Link](https://x.com/kwindla/status/2018729399470993780) 2026-02-03T16:52Z 12K followers, [---] engagements "I sat down with @zachk and @bnicholehopkins to talk about how we benchmark models for voice AI. Benchmarks are hard to do well and good ones are really useful We covered what makes an LLM actually "intelligent" in a real-world voice conversation the latency vs intelligence trade-off how speech-to-speech models compare to text-mode LLMs infrastructure and full stack challenges and what we're all most focused on in [----]. https://twitter.com/i/web/status/2019120855570366548 https://twitter.com/i/web/status/2019120855570366548" [X Link](https://x.com/anyuser/status/2019120855570366548) 2026-02-04T18:48Z 12.2K followers, [----] engagements "Open source voice agent LLM benchmark: Technical deep dive into voice agent benchmarking: https://www.daily.co/blog/benchmarking-llms-for-voice-agent-use-cases/ https://github.com/kwindla/aiewf-eval https://www.daily.co/blog/benchmarking-llms-for-voice-agent-use-cases/ https://github.com/kwindla/aiewf-eval" [X Link](https://x.com/anyuser/status/2019120857923375586) 2026-02-04T18:48Z 12.2K followers, [----] engagements "@andxdy @terronk @rootvc Nice Related: (We are a Root Ventures company.) https://github.com/pipecat-ai/pipecat-mcp-server https://github.com/pipecat-ai/pipecat-mcp-server" [X Link](https://x.com/kwindla/status/2019528306618667389) 2026-02-05T21:47Z 12.1K followers, [--] engagements "The NVIDIA DGX Spark is a desktop GPU workstation with 128GB of unified memory. Working with the team at @NVIDIAAIDev we've been using these little powerhouse machines for voice agent development testing new models and inference stacks and training LLMs and audio models. Today we published a guide to training the Smart Turn model on the DGX Spark. Smart Turn is a fully open source (and open training data) native audio turn detection model that supports [--] languages. The guide walks you through installing the right dependencies for this new Arm + Blackwell architecture and includes benchmarks" [X Link](https://x.com/anyuser/status/2021353464270590328) 2026-02-10T22:39Z 12.2K followers, [----] engagements "Blog post link: The Smart Turn open source turn detection model: https://github.com/pipecat-ai/smart-turn https://www.daily.co/blog/training-smart-turn-on-the-nvidia-dgx-spark/ https://github.com/pipecat-ai/smart-turn https://www.daily.co/blog/training-smart-turn-on-the-nvidia-dgx-spark/" [X Link](https://x.com/anyuser/status/2021353466397065708) 2026-02-10T22:39Z 12.2K followers, [---] engagements "New repo: Pipecat Skills for Claude Code So far: - Create and configure a basic voice agent (running locally using any combination of models and services) - Deploy to Pipecat Cloud for production - Start the Pipecat MCP Server to talk to Claude Code via voice (including remotely from your phone) I'm working on an end-to-end testing skill. https://twitter.com/i/web/status/2022011497996816826 https://twitter.com/i/web/status/2022011497996816826" [X Link](https://x.com/anyuser/status/2022011497996816826) 2026-02-12T18:14Z 12.2K followers, [---] engagements "If you have skills that are useful for voice agent development contribute to the repo https://github.com/pipecat-ai/skills https://github.com/pipecat-ai/skills" [X Link](https://x.com/anyuser/status/2022011499599081701) 2026-02-12T18:14Z 12.2K followers, [---] engagements "Why do you not call the UI a sub agent if you are not speaking to it directly In this pattern I am speaking to the UI agent directly. It sees the speech input. But it doesn't respond conversationally. It performs specialized tasks related to the UI. I don't think of it as a sub-agent because it isn't controlled by the voice model. I think of it as a parallel agent or a "parallel inference loop." The reason not to have the voice agent control the UI sub-agent is that I think it's hard to implement that without adding latency. I do use sub-agent patterns for other things where the control is" [X Link](https://x.com/kwindla/status/2022419275232026835) 2026-02-13T21:15Z 12.2K followers, [--] engagements "These are voice agents. Pipecat supports Gemini Live (and Ultravox and OpenAI Realtime). But almost all production voice agents today use multiple models (STT - LLM - TTS) instead of a single speech-to-speech model. You get better latency intelligence and observability from a multi-model approach. I fully expect speech-to-speech models to have more market share over time. But right now SOTA is the multi-model pipeline. https://twitter.com/i/web/status/2022449946881069165 https://twitter.com/i/web/status/2022449946881069165" [X Link](https://x.com/kwindla/status/2022449946881069165) 2026-02-13T23:16Z 12.2K followers, [--] engagements "@MAnfilofyev Super-impressive work from the @ultravox_dot_ai team on v0.7" [X Link](https://x.com/kwindla/status/2023249452463767797) 2026-02-16T04:13Z 12.2K followers, [---] engagements "These days Sergio Sillero Head of the Cloud Data & AI at MAPFRE is programming via voice while he shops for groceries. If you're deep in the Claude Code / Ralph Wiggum / tmux world this is not super surprising to you. If you're not it sounds like crazy ridiculous hype. Sergio wrote some voice interface code for his Meta Ray-Bans using the @pipecat_ai MCP server that lets him keep working on a project in @AnthropicAI's Claude Code when he steps away from his desk. https://twitter.com/i/web/status/2023264920968757521 https://twitter.com/i/web/status/2023264920968757521" [X Link](https://x.com/anyuser/status/2023264920968757521) 2026-02-16T05:15Z 12.2K followers, [---] engagements ".@tavus just published a nice blog post about their "real-time conversation flow and floor transfer" model Sparrow-1. This model does turn detection predicting when it's the Tavus video agent's turn to speak. It does this by analyzing conversation audio in a continuous stream and learning and adapting to user behavior. This model is an impressive achievement. I've had a few opportunities to talk to @code_brian who led the R&D on this model at Tavus about his work. I love Brian's approach to this problem. Among other things the Sparrow-1 architecture allows this model to do things like handle" [X Link](https://x.com/kwindla/status/2011286036207583740) 2026-01-14T03:55Z 12.2K followers, [----] engagements "https://github.com/pipecat-ai/pipecat/pull/3521 https://github.com/pipecat-ai/pipecat/pull/3521" [X Link](https://x.com/kwindla/status/2014184902019703102) 2026-01-22T03:54Z 12.2K followers, [---] engagements "@MoodiSadi Doing compacting in a parallel pipecat pipeline works really well in all the places I've used that approach: https://github.com/pipecat-ai/gradient-bang/blob/0234d85dda47fd0b4b72e0b140a72a5c8a54bb4d/src/gradientbang/pipecat_server/bot.py#L224 https://github.com/pipecat-ai/gradient-bang/blob/0234d85dda47fd0b4b72e0b140a72a5c8a54bb4d/src/gradientbang/pipecat_server/bot.py#L224" [X Link](https://x.com/kwindla/status/2018442535031710055) 2026-02-02T21:52Z 12.2K followers, [---] engagements "New text-to-speech model from @rimelabs today: Arcana v3. Rime's models excel at customization and personality. The new model is fast available in [--] languages and you can use it as a cloud API or run it on-prem. The model also outputs word-level timestamps which is very important for maintaining accurate LLM context during a voice agent conversation. Listen to Arcana v3 in this video. @chadbailey59 uses the open source Pipecat CLI to set up a voice agent from scratch customize the prompt and talk to it" [X Link](https://x.com/anyuser/status/2019199774604554709) 2026-02-05T00:01Z 12.2K followers, [----] engagements "Arcana v3 launch post: Pipecat CLI: https://docs.pipecat.ai/cli/overview https://rime.ai/resources/arcana-v3 https://docs.pipecat.ai/cli/overview https://rime.ai/resources/arcana-v3" [X Link](https://x.com/anyuser/status/2019199776106115239) 2026-02-05T00:01Z 12.2K followers, [---] engagements "Voice-controlled UI. This is an agent design pattern I'm calling EPIC "explicit prompting for implicit coordination." Feel free to suggest a better name. :-) In the video I'm navigating around a map conversationally pulling in information dynamically from tool calls and realtime streamed events. There are two separate agents (inference loops) here: a voice agent and a UI control agent. They know about each other (at the prompt level) but they work independently. https://twitter.com/i/web/status/2022087764720988296 https://twitter.com/i/web/status/2022087764720988296" [X Link](https://x.com/anyuser/status/2022087764720988296) 2026-02-12T23:17Z 12.2K followers, 14.1K engagements "The Claude Code / Ralph Wiggum moment is exciting for a lot of reasons. One of them is that all of us building AI systems that are just a little bit beyond the capabilities of just prompting a SOTA model now have a shared set of baseline ideas we're building on. Plus an overlapping set of open questions - An agent is an LLM in a loop. (Plus a bunch of tooling integration and domain-specific optimization.) - Context management is a critical job. (Lots of ways to think about this.) - You almost certainly need multiple agents/models/processors/loops/whatever. (Lots of ways to think about this" [X Link](https://x.com/kwindla/status/2015185924788015350) 2026-01-24T22:12Z 12.2K followers, [----] engagements "The critical things here are: - We can't block the voice agent's fast responses. - The voice agent already has a lot of instructions in its context and a large number of tools to call so we don't want to give it more to do each inference turn. So we prompt the voice agent to know at a high level what the UI agent will do but to ignore or respond minimally to UI-related requests. This adds relatively little complexity to the voice agent system instruction. We prompt the UI agent with a small subset of world knowledge a few tools and a lot of examples about how to perform useful UI actions in" [X Link](https://x.com/anyuser/status/2022087769242448134) 2026-02-12T23:17Z 12.2K followers, [----] engagements "Detailed technical post about this voice agents STT benchmark: Benchmark source code: Benchmark data set on @huggingface . [----] human speech samples captured from real voice agent interactions with verified ground truth transcriptions: https://huggingface.co/datasets/pipecat-ai/stt-benchmark-data https://github.com/pipecat-ai/stt-benchmark https://www.daily.co/blog/benchmarking-stt-for-voice-agents/ https://huggingface.co/datasets/pipecat-ai/stt-benchmark-data https://github.com/pipecat-ai/stt-benchmark https://www.daily.co/blog/benchmarking-stt-for-voice-agents/" [X Link](https://x.com/anyuser/status/2022426777285788098) 2026-02-13T21:44Z 12.2K followers, [----] engagements "@huggingface We also published a benchmark of LLM performance in real-world voice agent use cases recently (long multi-turn conversations with multiple tool calls and accurate instruction following required). https://x.com/kwindla/status/2019120857923375586s=20 Open source voice agent LLM benchmark: https://t.co/CbLijComqQ Technical deep dive into voice agent benchmarking: https://t.co/gHTNRSjwdI https://x.com/kwindla/status/2019120857923375586s=20 Open source voice agent LLM benchmark: https://t.co/CbLijComqQ Technical deep dive into voice agent benchmarking: https://t.co/gHTNRSjwdI" [X Link](https://x.com/anyuser/status/2022428506408558870) 2026-02-13T21:51Z 12.2K followers, [----] engagements "Final transcript what about time until transcription starts streaming In general what we care about is the time from end of speech until the final transcript segment is available. We need the full transcript in order to run LLM inference. I've experimented a fair amount with greedy LLM inference on partial transcript segments and there are not enough gains to make up for the extra work. So "time to first token" from a transcription model isn't a useful metric. This is different from how we measure latency for LLMs and TTS models where we definitely focus on TTFT/TTFB" [X Link](https://x.com/kwindla/status/2022447926983954523) 2026-02-13T23:08Z 12.2K followers, [---] engagements "Our goal was to set up the test the same way real-world input pipelines most often work. [--]. Audio chunks are sent to the STT service at real-time pacing. [--]. Silero VAD is configured to trigger after 200ms of non-speech frames. [--]. When the VAD triggers the STT service is sent a finalize signal. (Not all services support explicit finalization. But we think it's an important feature for real-time STT.) [--]. TTFS is the time between the first non-speech audio frame and the last transcription segment. If you use a service that sends you VAD or end-of-turn events it will function much the same way as" [X Link](https://x.com/kwindla/status/2022566485370245271) 2026-02-14T07:00Z 12.2K followers, [--] engagements "NVIDIA just released a new open source transcription model Nemotron Speech ASR designed from the ground up for low-latency use cases like voice agents. Here's a voice agent built with this new model. 24ms transcription finalization and total voice-to-voice inference time under 500ms. This agent actually uses *three* NVIDIA open source models: - Nemotron Speech ASR - Nemotron [--] Nano 30GB in a 4-bit quant (released in December) - A preview checkpoint of the upcoming Magpie text-to-speech model These models are all truly open source: weights training data training code and inference code. This" [X Link](https://x.com/anyuser/status/2008601714392514722) 2026-01-06T18:09Z 12.2K followers, 279.5K engagements "@picanteverde @simonw I love the Sesame work but there's no API and the consumer app is still Test Flight only as far as I know. Th version that was released as open source is not a fully capable model" [X Link](https://x.com/kwindla/status/2023255052631318851) 2026-02-16T04:36Z 12.2K followers, [---] engagements "I don't think Sergio is here so you have to go follow him on the other thing: He's planning to demo his Ray-Bans + Claude Code integration at the February 25th Voice AI Meetup in Barcelona: https://www.voiceaispace.com/events/voice-ai-space-barcelona-meetup https://www.linkedin.com/feed/update/urn:li:activity:7428880495075078144/ https://www.voiceaispace.com/events/voice-ai-space-barcelona-meetup https://www.linkedin.com/feed/update/urn:li:activity:7428880495075078144/" [X Link](https://x.com/anyuser/status/2023264922638160349) 2026-02-16T05:15Z 12.2K followers, [---] engagements "@_dr5w @simonw True. But for most of these there are only [--] providers. OpenAI/Azure or DeepMind/Vertex. 😀" [X Link](https://x.com/kwindla/status/2023270217921785946) 2026-02-16T05:36Z 12.2K followers, [--] engagements "@kstonekuan @aconchillo You've built some really cool stuff I would love to see a configuration for the pipecat-mc-server where Tambourine lets you verbally edit what you want to say to Claude and then "submit." If that makes sense" [X Link](https://x.com/kwindla/status/2016320629713338713) 2026-01-28T01:21Z 12.2K followers, [--] engagements "@riteshchopra Yes definitely Progressive "skills" loading inside a Pipecat pipeline is something we're doing fairly often these days. For a version of this in a fun voice agent context see the LoadGameInfo tool here: https://github.com/pipecat-ai/gradient-bang/blob/0a5c19f1459b2ddbbb4bc6182944239c63e4c702/src/gradientbang/pipecat_server/voice_task_manager.py#L26 https://github.com/pipecat-ai/gradient-bang/blob/0a5c19f1459b2ddbbb4bc6182944239c63e4c702/src/gradientbang/pipecat_server/voice_task_manager.py#L26" [X Link](https://x.com/kwindla/status/2022027383965200619) 2026-02-12T19:17Z 12.2K followers, [--] engagements "Wake up babe. New Pareto frontier chart just dropped. Benchmarking STT for voice agents: we just published one of the internal benchmarks we use to measure latency and real-world performance of transcription models. - Median P95 and P99 "time to final transcript" numbers for hosted STT APIs. - A standardized "Semantic Word Error Rate" metric that measures transcription accuracy in the context of a voice agent pipeline. - We worked with all the model providers to optimize the configurations and @pipecat_ai implementations so that the benchmark is as fair and representative as we can possibly" [X Link](https://x.com/anyuser/status/2022426774815281630) 2026-02-13T21:44Z 12.2K followers, 11.1K engagements "@iAmHenryMascot It's a new game we're building: https://github.com/pipecat-ai/gradient-bang https://github.com/pipecat-ai/gradient-bang" [X Link](https://x.com/kwindla/status/2022747330844500083) 2026-02-14T18:58Z 12.2K followers, [--] engagements "Spending valentine's day exactly as you'd expect. (Arguing politely on LinkedIn about how to accurately measure latency and word error rates.) Wake up babe. New Pareto frontier chart just dropped. Benchmarking STT for voice agents: we just published one of the internal benchmarks we use to measure latency and real-world performance of transcription models. - Median P95 and P99 "time to final transcript" numbers https://t.co/y9qCrJLe0L Wake up babe. New Pareto frontier chart just dropped. Benchmarking STT for voice agents: we just published one of the internal benchmarks we use to measure" [X Link](https://x.com/anyuser/status/2022751171761623153) 2026-02-14T19:13Z 12.2K followers, [----] engagements "I do think Gemini Live has a lot of potential. It's currently too slow (2.5s voice-to-voice P50) and the API is missing important features for real-world voice workflows. You can't do context engineering mid-conversation. If you really need a speech-to-speech model for production use you're better off right now with gpt-realtime. But I expect the Gemini Live team to make progress this year https://twitter.com/i/web/status/2023261706970124674 https://twitter.com/i/web/status/2023261706970124674" [X Link](https://x.com/kwindla/status/2023261706970124674) 2026-02-16T05:02Z 12.2K followers, [---] engagements "My thinking about this has evolved a lot now that we have real-world data from millions of interactions with voice agents. I used to aim for 500-800ms voice-to-voice latency. It turns out that people are totally fine in real conversations until latency gets above 1500ms. So now I talk about 1500ms as the "hard" cutoff that you need your P95 to be under. Note this is voice-to-voice measured on the client side so that you include networks audio buffers OS and bluetooth playout delays etc. https://twitter.com/i/web/status/2023447213725413853 https://twitter.com/i/web/status/2023447213725413853" [X Link](https://x.com/kwindla/status/2023447213725413853) 2026-02-16T17:19Z 12.2K followers, [--] engagements "@nelson With the caveat that what I *think* a novel means isn't necessarily at all what it means . I read this as part of the characterization of the narrator. He's self-absorbed and a bit myopic perhaps the way almost all young people are perhaps more so" [X Link](https://x.com/kwindla/status/1262429635607851008) 2020-05-18T17:07Z 10.9K followers, [--] engagements "@nelson Do you have an Android tablet recommendation or two Id like a small tablet for testing ideally something with a great screen and a good processor. (And also a midrange mass market tablet for testing video calls. 😀)" [X Link](https://x.com/kwindla/status/1263513834926436353) 2020-05-21T16:55Z 10.9K followers, [--] engagements "I'm happy and grateful to announce @trydaily has raised a $40M Series B led by @RenegadePtnrs. 🎉 @imthemusic is joining our board to work with us on the future of video audio and WebRTC. http://www.daily.co/blog/announcing-our-40m-series-b/ http://www.daily.co/blog/announcing-our-40m-series-b/" [X Link](https://x.com/kwindla/status/1458472117154942977) 2021-11-10T16:30Z 10.9K followers, [---] engagements "@jennylefcourt @trydaily @ninacali4 @imthemusic It's been such a wonderful thing to get to work with you and learn from you @jennylefcourt Thank you" [X Link](https://x.com/kwindla/status/1458479981672087552) 2021-11-10T17:01Z 10.9K followers, [--] engagements "Really exciting to see native WebRTC support in the OpenAI Realtime API We're launching updates to the OpenAI RealtimeAPI today: - WebRTC support - [--] new models - Big price cut - New API patterns Very excited about these changes we think they will unblock previously difficult applications https://t.co/nGu65da5U0 We're launching updates to the OpenAI RealtimeAPI today: - WebRTC support - [--] new models - Big price cut - New API patterns Very excited about these changes we think they will unblock previously difficult applications https://t.co/nGu65da5U0" [X Link](https://x.com/kwindla/status/1869091921714626930) 2024-12-17T18:46Z 11.7K followers, [----] engagements "Launch day for @Google Imagen [--] We updated Story Bot one of the classic @pipecat_ai voice interaction starter kits to use Gemini [---] Flash and Imagen. The new experience with these two models is wow Watch @chadbailey59's interactive story about a dragon who feels a little bit out of place and her magical friend. The consistency of the images that Imagen creates through the whole story is a new capability. We havent seen that from any other model weve experimented with. Clone the repo to build your own voice+images experience" [X Link](https://x.com/kwindla/status/1887595394353463735) 2025-02-06T20:13Z [----] followers, 22.4K engagements "Registration link: https://lu.ma/ffpyl57n https://lu.ma/ffpyl57n" [X Link](https://x.com/kwindla/status/1891306046066430348) 2025-02-17T01:57Z [----] followers, [---] engagements "Thank you to @Vapi_AI for hosting this time in their office in San Francisco" [X Link](https://x.com/kwindla/status/1891346726088311085) 2025-02-17T04:39Z [----] followers, [---] engagements "@gaunernst This is really interesting to read. Kudos Curious if you've thought about ttfb (latency) speed ups while doing this hacking. For conversational voice use cases latency matters a lot lower latency improves the user experience measurably" [X Link](https://x.com/kwindla/status/1892365338592907746) 2025-02-20T00:07Z [----] followers, [--] engagements "Apologies if I missed it but I think the times in the graphic in your first tweet are for the complete generation. I was specifically thinking about using Kokoro in streaming mode. TTFB is time-to-first-byte in a streaming context. It can be a little hard to measure unless you specifically set up a time to measure the delay between starting inference and getting the first audio bytes in a stream. The idea is that for voice conversations as long as the model is just a little bit faster than realtime that's fast enough. But the time it takes to "start streaming" is something that humans are" [X Link](https://x.com/kwindla/status/1892372521443278927) 2025-02-20T00:35Z [----] followers, [--] engagements "@gaunernst I should say that I haven't looked at the Kokoro streaming support yet other than that I saw the merge linked above The streaming support is new. The issue has js sample code. I'm mostly interested in using Kokoro from Python (in @pipecat_ai) pipelines" [X Link](https://x.com/kwindla/status/1892374098069631383) 2025-02-20T00:41Z [----] followers, [--] engagements "@samptampubolon Yes. Speech input - VAD with a short timeout - smart-vad model - . rest of voice AI processing pipeline" [X Link](https://x.com/kwindla/status/1897824715148062994) 2025-03-07T01:40Z 11.9K followers, [----] engagements "Instant startup for voice AI agents . Typical startup time for a voice agent today is 2-4 seconds. Here's an example starter kit that shows how to reduce that to 200ms. This code uses client-side buffering and careful sequencing to start capturing audio for the voice AI to respond to as soon as the local microphone is live. In the demo video here are the relevant numbers: - run 1: network and bot ready 1905ms. audio live: 157ms - run 2: network and bot ready 1828ms. audio live: 155ms - run 2: network and bot ready 1848ms. audio live: 161ms You can see in the video that I start to talk before" [X Link](https://x.com/kwindla/status/1899890631344087412) 2025-03-12T18:29Z 10.9K followers, 18.1K engagements "@kylo_the_cat You can shape that behavior pretty flexibly with a system_instruction" [X Link](https://x.com/kwindla/status/1900285314566152665) 2025-03-13T20:38Z 10.9K followers, [--] engagements "You are talking about the Gemini Multimodal Live API service in @pipecat_ai There was a PR for Vertex. I thought it got merged but Ill check. Do you have a Vertex use case for the Live API at scale If so DM me. Id like to help however I can and make sure everything is optimized for you" [X Link](https://x.com/kwindla/status/1905654126505328697) 2025-03-28T16:12Z 10.9K followers, [--] engagements "@SumitPaul18_9 @oyacaro @OpenAI @GeminiApp Gemini [---] Flash native audio voice-to-voice is very fast. 700-900ms typically. It's new enough -- just GA last week on Vertext -- that we don't have a lot of monitoring data yet. But it's impressive" [X Link](https://x.com/kwindla/status/1936824851735671166) 2025-06-22T16:33Z 10.9K followers, [--] engagements "AINews (@Smol_AI to subscribe) today summarizes the buzz about "context engineering". Credit to @dexhorthy for coining this very useful term. I've been talking to voice AI developers a lot over the past few months about the need to do _this thing_ for a while: compress summarize focus tune the context in specific ways for specific segments of a voice AI conversation/workflow. It's hugely useful to have just the right term to describe "this thing". Makes it much easier to talk about" [X Link](https://x.com/kwindla/status/1938105866479407151) 2025-06-26T05:23Z 11.1K followers, [----] engagements "You don't need a WebRTC server for voice agents. If you're deploying your own voice AI infrastructure you should almost certainly be using the new() serverless WebRTC approach. Serverless is much simpler which translates to faster development better scaling and higher reliability. You'll have slightly lower latency too compared to doing a network hop through a (single zone) WebRTC server cluster.() More notes below " [X Link](https://x.com/kwindla/status/1947467873166733483) 2025-07-22T01:24Z 11.9K followers, 18.5K engagements "@hiteshGautam26 For fine-tuning I would start but reading the guides and posts from the @OpenPipeAI team. https://openpipe.ai/blog https://openpipe.ai/blog" [X Link](https://x.com/kwindla/status/1949338303636930682) 2025-07-27T05:17Z 11.9K followers, [--] engagements "The launch livestream: The Realtime API docs: https://platform.openai.com/docs/guides/realtime https://www.youtube.com/watchv=nfBbmtMJhX0 https://platform.openai.com/docs/guides/realtime https://www.youtube.com/watchv=nfBbmtMJhX0" [X Link](https://x.com/kwindla/status/1961131022776758569) 2025-08-28T18:17Z 11.2K followers, [----] engagements "At the @aiDotEngineer World's Fair in June @_pion and I gave a talk about networking fundamentals for voice AI plus how to get started with voice AI on small hardware devices. @chadbailey59 built a great little voice AI toy named Squobert. Squobert helped us out during the talk. If you're interested in building voice AI toys consumer devices or just hacking on voice-controlled hardware experiments check out the video of the talk and links to code " [X Link](https://x.com/kwindla/status/1968780300005367970) 2025-09-18T20:52Z 10.9K followers, [---] engagements "Call 1-970-LIVE-API (1-970-548-3274) to play Truth or Lies with @GoogleDeepMind Gemini. (Are sunsets on Mars blue) - Gemini [---] Flash - Google Live API or TTS - LLM - TTS - @twilio for the phone - Deploy for production to Pipecat Cloud Full source code below. If you're interested in building voice agents join us this Saturday at @ycombinator for a Gemini x Pipecat hackathon" [X Link](https://x.com/kwindla/status/1975995392815923679) 2025-10-08T18:43Z 10.9K followers, [----] engagements "@IronRedSandHive @DynamicWebPaige Oh I love that. @joshwhiton and I have hacked a little bit on some Kokoro + Pipecat stuff: https://github.com/kwindla/macos-local-voice-agents https://github.com/kwindla/macos-local-voice-agents" [X Link](https://x.com/kwindla/status/1978118783588831380) 2025-10-14T15:20Z 11K followers, [--] engagements "Gemini x Pipecat hackathon last Saturday. [---] Developers. [--] judges from Google YC companies and the multimodal AI ecosystem. [--] projects submitted. We're continuing remotely all week. You can still sign up your team and compete for $300000 in API credits. You don't have to start a project from scratch. You can port something you've been working on to use a Gemini model. You can add a realtime voice or video feature to your startup's product. Here are the winners from the in-person day on Saturday " [X Link](https://x.com/kwindla/status/1978312502468546586) 2025-10-15T04:10Z 10.9K followers, [----] engagements "The new Sonic-3 voice model from @cartesia_ai launched today. The big additions are increased emotional range emotion steering and laughter tags. For example: emotion value="curious" / I wrote some quick demo code prompting Gemini Flash to know about the emotion tags. You can hear the results in the video and see the emotion tags in the Pipecat developer console. Code is below" [X Link](https://x.com/kwindla/status/1983249935257735661) 2025-10-28T19:10Z 10.9K followers, 11.7K engagements "Here's the code I ran to record the video: Sonic-3 docs: Some notes: - If you shipped this to production with a UI that shows transcripts to the user you'd add a simple frame processor that strips the emotion and laughter tags out of the LLM text stream after they're parsed by the Pipecat CartesiaTTSService. - My Gemini Flash prompting was very simple. You could definitely do some fun stuff here if you had a particular style valence in mind. - Cartesia is training different voices to have different "dynamic range." (My wording not theirs.) For a customer support agent you'd want to keep the" [X Link](https://x.com/kwindla/status/1983249937392644337) 2025-10-28T19:10Z 10.9K followers, [---] engagements ".@mark_backman and @aconchillo walk through a new voice agent CLI they've been working on in the latest episode of Pipecat TV. Mark shows using the CLI to build a voice agent in [--] minutes. The CLI guides you through choosing transcription LLM and voice models and configuring functions like recording and turn detection. You can test this voice agent locally and then deploy it to Pipecat Cloud or to your own infrastructure. (And you can wire up phone numbers for inbound or outbound telephony use cases.) There are a lot of interesting little sub-problems in building a good CLI like this. One" [X Link](https://x.com/kwindla/status/1983613586237985004) 2025-10-29T19:15Z 10.9K followers, [---] engagements "Pokebench. The one true test of vision-language models" [X Link](https://x.com/kwindla/status/1984690555973222646) 2025-11-01T18:34Z 10.9K followers, [---] engagements "Agree with this. The progress towards natively speech-to-speech models is exciting. But almost all production voice AI right now is stt-llm-tts. - Most voice agent use cases need the best possible function calling and instruction following performance from the LLM. There's a big delta here between SOTA LLMs in text mode and the current speech-to-speech models. - Observability debugging and hill climbing is a lot easier with the cascaded approach. - Architectural flexibility matters to an increasing number of use cases. For example building multi-agent systems with partially shared context is" [X Link](https://x.com/kwindla/status/1986473392476987829) 2025-11-06T16:38Z 11K followers, [--] engagements "@abilash_speaks @pipecat_ai Is your problem transcription accuracy or turn detection For turn detection you should definitely use an audio-native model like smart turn or the built-in turn detection in deepgram flux" [X Link](https://x.com/kwindla/status/1986473662795685953) 2025-11-06T16:40Z 11K followers, [--] engagements "dynamic user interfaces With all the things we'll talk about at the meetup next week it feels like we're only just starting to scratch the surface of what's possible/interesting. That's really really true of the dynamic UI stuff. The demo I have to show of that is both the most fun (because @JonPTaylor's work is amazing) and also the least technically adventurous (because we need to do a *lot* more experimentation here) I'm 100% convinced that at some point LLMs will be "designing" all our UIs on the fly. But we don't know how to built those components yet. I lived through the transition from" [X Link](https://x.com/kwindla/status/1986660055585100027) 2025-11-07T05:00Z 11K followers, [----] engagements "At the voice agent meetup next week the theme is new patterns for agents. We are increasingly building voice agents that are much more than a single LLM prompt running in a loop. State machines various kinds of multi-agent systems combining "fast" and "thinking" models guardrails processes memory sub-systems . I'm starting to document some these new patterns. Here's one that I particularly like: using a tool call to start a long-running task returning from that tool call immediately with a very simple success/failure response then injecting events into the voice agent context for as long as" [X Link](https://x.com/kwindla/status/1986952416848454012) 2025-11-08T00:22Z 11K followers, 13.5K engagements "Not arguing that sending your avatar to a meeting is rude if that's not the expectation. But . other than that I have a different take here. If this person's avatar is plugged into their personal knowledge base or prepped specifically for this meeting the interaction here is very different from what you'll get from talking to Claude. That's a big deal This is not a generic LLM. The future shock here is a little like video calls. I can't tell you how many people said to me "I'll never do video calls it's just better to talk on the phone" when we were starting Daily. Those people were wrong." [X Link](https://x.com/kwindla/status/1988013479681470942) 2025-11-10T22:38Z 11K followers, [----] engagements "I really love this use case: personalized product demos. There are at least three things this approach to building realtime AI experiences unlocks: - Perfect lookup and perfect recall. Demonstrate any part of a complex product; go down any path with me; if I only have a few minutes but I get totally engaged I can come back later and pick up exactly where we left off. - Removes people as the bottleneck for this kind of white glove experience. Interactive conversational engagement isn't gated anymore by the scarcity of someone who is really good at the product demo being available - I would" [X Link](https://x.com/kwindla/status/1988039280984498436) 2025-11-11T00:21Z 11.1K followers, [----] engagements "An alternative possibility: Anthropic ships a 200ms TTFT voice-to-voice model that is as "smart" as Sonnet [---] and this unlocks so many new use cases that all of us who build tooling and applications higher up the stack scramble to build stuff that leverages the model. The thesis here is that we will always want to do "20% more" than the best available model is capable of natively. You get that extra 20% by: - Context engineering. Writing a non-trivial amount of code to give the model the most useful tokens every inference call. State machines. Parallel inference loops that" [X Link](https://x.com/kwindla/status/1988311628417560827) 2025-11-11T18:23Z 11.1K followers, 13.7K engagements ".@TeamHathora launched model inference today on @ProductHunt. Go check it out. - Open weights models deployed in [--] regions for low-latency inference. - Running on the Hathora network which was designed from the ground up for low-latency gaming use cases. This is a big deal for voice AI developers because Hathora is filling a gap in the market: making it easier to use open weights models from anywhere in the world plus optimizing audio inference for time to first token/byte. @tarunipaleru from Hathora will be at the voice AI meetup tonight in SF talking about low-latency networking optimizing" [X Link](https://x.com/kwindla/status/1988665642996871212) 2025-11-12T17:50Z 11K followers, [----] engagements "Hang out on Tuesday at the AWS AI Loft in San Francisco with engineers from @awscloud @DeepgramAI @trydaily and @pipecat_ai. Build your first voice agent or go deep on scaling enterprise deployments building multi-agent architectures RAG and external systems integrations and more" [X Link](https://x.com/kwindla/status/1989102825620795649) 2025-11-13T22:47Z 11K followers, [----] engagements "This post about customizing models and optimizing inference for voice agents from the team at @modal is really cool. Almost every enterprise I talk to about voice agents wants to be able to use their internal data to iteratively improve the performance of their AI agents. The Modal team worked with Decagon to build model training tooling and data sets train a custom speculative draft model and modify the SGLang inference stack to serve the LLMs used by Decagon optimally on H200 GPUs. The blog post says that they've achieved a P90 latency of 342ms with this combined work. That's really good" [X Link](https://x.com/kwindla/status/1989783798574452876) 2025-11-15T19:53Z 11.1K followers, [----] engagements "I'm really looking forward to @aiDotEngineer CODE in New York this week. If you'll be there and want to hang out and talk about voice agents realtime video AI agent design patterns or building AI agents with AI code generation let me know The AI Engineer events are always great. Great content great people great hallway track great evening events that people host around the edges of the conference. I was telling someone how much growth we've seen on Pipecat Cloud since the summer and looking back at the talk I did about Pipecat Cloud at AIE World's Fair in June. The basic description of why we" [X Link](https://x.com/kwindla/status/1990169559060607315) 2025-11-16T21:26Z 11.1K followers, [----] engagements "My Pipecat Cloud talk at AIEWF: . And here's one of @andthenchat's innovative super-fun voice games . https://x.com/andthenchat/status/1974179408878641252s=20 https://www.youtube.com/watchv=IA4lZjh9sTs Weve entered our Swiftie Era 🎤 SAY LESS: Taylor Edition is now live Give it a try and let us know how you do Link below to play 👇 https://t.co/oSVUqJBug2 https://x.com/andthenchat/status/1974179408878641252s=20 https://www.youtube.com/watchv=IA4lZjh9sTs Weve entered our Swiftie Era 🎤 SAY LESS: Taylor Edition is now live Give it a try and let us know how you do Link below to play 👇" [X Link](https://x.com/kwindla/status/1990169560834834905) 2025-11-16T21:26Z 11.1K followers, [---] engagements "And @thorwebdev showing an embedded hardware open source voice pipeline on a tiny ESP32 device . https://x.com/thorwebdev/status/1945158921179570557 Is this the tiniest little voice agent yet My @elevenlabsio voice clone running on an esp32 microcontroller via @pipecat_ai and WebRTC 🔥 Story time: I recently caught up with Danilo Campos who is building the awesome DeskHog (seriously check it out) at @posthog and he https://t.co/to4bKkrjTD https://x.com/thorwebdev/status/1945158921179570557 Is this the tiniest little voice agent yet My @elevenlabsio voice clone running on an esp32" [X Link](https://x.com/kwindla/status/1990169562571280583) 2025-11-16T21:26Z 11.1K followers, [----] engagements "Im on a plane to New York with Claude Code open on one side of the iPad split screen and Jane Austen on the other (finally reading the Vintage Classics edition of Persuasion with the Brandon Taylor introduction which go read the introduction and then reread the novel; Ill wait). WiFI is spotty. ssh is happy on the iPad at the moment but my laptop knoweth not the Internet. So Im thinking about this jagged frontier of AI code generation while reading about Captain Wentworth and Anne Elliott meeting again after eight years all hopes uncertain. This is both a more productive and less productive" [X Link](https://x.com/kwindla/status/1990579225922019426) 2025-11-18T00:34Z 11K followers, [----] engagements "I'm hanging out at the AI Lightning Talks event on Wednesday in New York hosted by our friends at @TeamHathora. I'll also be giving a very short talk on [--] new voice agent patterns we've been either using in production or experimenting with lately. If you're interested in voice AI (or really any kind of AI agents) please join us" [X Link](https://x.com/kwindla/status/1990986240393457962) 2025-11-19T03:31Z 11.1K followers, [----] engagements "The ThursdAI crew delivers again as they do every Thursday with all the news of the week. This time from the AI Engineer Code Summit in New York. They were streaming from a table right in the middle of everything and I was sorely tempted to go give everyone high fives while they were live on the air. Crazy crazy week in AI and also a crazy podcast episode recorded live on the floor of @aiDotEngineer today with @ryancarson @swyx @dkundel and @thorwebdev (on his 3rd day at DeepMind) 👉 https://t.co/6JoY4JMu6H 👈 We covered all the major releases of this week someone in https://t.co/pTPKHKZqgJ" [X Link](https://x.com/kwindla/status/1991720503090761864) 2025-11-21T04:09Z 11.1K followers, [----] engagements "Pipecat Thanksgiving day release. 🦃 Some highlights: Deepgram AWS SageMaker realtime speech-to-text support improved text aggregation simplified and more powerful error handling new MiniMax Speech [---] HD and Turbo models. SageMaker is AWS's AI platform for deploying and using machine learning models at scale. AWS has brand new support for streaming data in and out of models hosted on SageMaker which is great for voice AI use cases. This Pipecat release includes a generic base class for SageMaker "bidirectional streaming" plus a new DeepgramSageMakerSTTService class. Text aggregation and" [X Link](https://x.com/kwindla/status/1994223172644905266) 2025-11-28T01:53Z 11.9K followers, [----] engagements "Full changelog: https://github.com/pipecat-ai/pipecat/blob/main/CHANGELOG.md https://github.com/pipecat-ai/pipecat/blob/main/CHANGELOG.md" [X Link](https://x.com/kwindla/status/1994224296449946088) 2025-11-28T01:58Z 11.9K followers, [---] engagements "Smart Turn v3.1. Smart Turn is a completely open source open data open training code turn detection model for voice AI trained on audio data across [--] languages. The model operates on the input audio in a voice agent pipeline. Each time the user pauses briefly this model runs and returns a binary decision about whether the user has finished speaking or not. The [---] release has two big improvements: [--]. New data sets for English and Spanish collected and labeled by contributors Liva AI Midcentury and MundoAI. The majority of the training data for the Smart Turn model is synthetically generated." [X Link](https://x.com/kwindla/status/1996290741187027206) 2025-12-03T18:49Z 11.9K followers, 35.2K engagements "Blog post with more details: Data sets for this release contributed by: Liva AI: Midcentury: MundoAI: All data sets are available here: Training code is here: https://github.com/pipecat-ai/smart-turn https://huggingface.co/pipecat-ai https://mundoai.world/ https://www.midcentury.xyz/ https://www.theliva.ai/ https://www.daily.co/blog/improved-accuracy-in-smart-turn-v3-1/ https://huggingface.co/pipecat-ai https://mundoai.world/ https://www.midcentury.xyz/ https://www.theliva.ai/ https://www.daily.co/blog/improved-accuracy-in-smart-turn-v3-1/ https://github.com/pipecat-ai/smart-turn" [X Link](https://x.com/kwindla/status/1996290743158358421) 2025-12-03T18:49Z 11.9K followers, [----] engagements "@garrxth @pipecat_ai Really great working with you on this" [X Link](https://x.com/kwindla/status/1996310027175625000) 2025-12-03T20:06Z 11.9K followers, [--] engagements "@sir4K_zen Accuracy benchmarks (against open test data) for all supported languages: https://www.daily.co/blog/announcing-smart-turn-v3-with-cpu-inference-in-just-12ms/ https://www.daily.co/blog/announcing-smart-turn-v3-with-cpu-inference-in-just-12ms/" [X Link](https://x.com/kwindla/status/1996466638305284159) 2025-12-04T06:28Z 11.1K followers, [---] engagements "@sir4K_zen Training and eval code is all here: https://github.com/pipecat-ai/smart-turn https://github.com/pipecat-ai/smart-turn" [X Link](https://x.com/kwindla/status/1996602514146574638) 2025-12-04T15:28Z 11.9K followers, [--] engagements "@scarlettx_eth You can configure and spin up a local voice agent using the Pipecat CLI. Just say "yes" to enabling the Smart Turn model in the prompts: https://docs.pipecat.ai/cli/overview https://docs.pipecat.ai/cli/overview" [X Link](https://x.com/kwindla/status/1996698445772718144) 2025-12-04T21:49Z 11.9K followers, [--] engagements "No this isn't a transcription model. It's generally used in combination with Silero VAD and a transcription model. The turn detection inference runs in parallel with transcription. Check out the @pipecat_ai getting started docs to see typical configurations: https://docs.pipecat.ai/getting-started/introduction https://docs.pipecat.ai/getting-started/introduction" [X Link](https://x.com/kwindla/status/1996698876984938923) 2025-12-04T21:51Z 11.9K followers, [---] engagements "Pipecat 0.0.97 release. Some highlights: Support for @GradiumAI's new speech-to-text and text-to-speech models. Gradium is a voice-focused AI lab that spun out of the non-profit Kyutai Labs which has been doing architecturally innovative work on neural codecs and speech-language models for the last two years. Continued improvements in the core text aggregator and interruption handling classes both to fix small corner cases and to make behavior as configurable as possible. This is the kind of often-invisible work that underpins Pipecat's ability to support a wide range of models and pipeline" [X Link](https://x.com/kwindla/status/1998182177947984184) 2025-12-09T00:05Z 11.9K followers, [----] engagements "The @GradiumAI launch video is fun This paper about the Kyutai Moshi model authored by the Gradium founders was my favorite paper of 2024: Smart Turn open source open data open training code turn detection model: The PR adding a wait_for_all parameter for compatibility with parallel function calling from reasoning models: (I always try to link to PRs in this kind of post because I think reading the source code of libraries that you use is an under-rated activity) https://github.com/pipecat-ai/pipecat/pull/3120 https://github.com/pipecat-ai/smart-turn https://kyutai.org/Moshi.pdf" [X Link](https://x.com/kwindla/status/1998182180380709358) 2025-12-09T00:05Z 11.1K followers, [---] engagements "The team at @LangChainAI built voice AI support into their agent debugging and monitoring tool LangSmith. LangSmith is built around the concept of "tracing." If you've used OpenTelemetery for application logging you're already familiar with tracing. If you haven't think about it like this: a trace is a record of an operation that an application performs. Here's a very nice video from @_tanushreeeee that walks you through building and debugging a voice agent with full conversation tracing. Using the LangSmith interface you can find a specific agent session then dig into what happened during" [X Link](https://x.com/kwindla/status/1998833359267721249) 2025-12-10T19:13Z 11.9K followers, [----] engagements "How to debug voice agents with LangSmith: Getting started with LangSmith tracing: LangSmith Pipecat integration docs page: I always like to read the code for nifty Pipecat services like the LangSmith tracing processor. It's here though I think this nice work will likely make its way into Pipecat core soon: https://github.com/langchain-ai/voice-agents-tracing/blob/main/pipecat/langsmith_processor.py https://docs.langchain.com/langsmith/trace-with-pipecat https://www.youtube.com/watchv=fA9b4D8IsPQ https://youtu.be/0FmbIgzKAkQ https://docs.langchain.com/langsmith/trace-with-pipecat" [X Link](https://x.com/kwindla/status/1998833361914322953) 2025-12-10T19:13Z 11.9K followers, [---] engagements "New Gemini Live (speech-to-speech) model release today. Using the Google AI Studio API the model name is: gemini-2.5-flash-native-audio-preview-12-2025 The model is also GA (general availability so not considered a beta/preview release) on Google Cloud Vertex under this model name: gemini-live-2.5-flash-native-audio Try it out on the @pipecat_ai landing page" [X Link](https://x.com/kwindla/status/1999586704991375808) 2025-12-12T21:06Z 11.9K followers, 25.2K engagements "Today is Gemini [--] Flash launch day I've been experimenting with pre-release checkpoints of this model and it's very good. I've been using it for various personal voice agent stuff long-running text-mode agent processes and of course running benchmarks. Gemini [--] Flash saturates my relatively hard multi-turn bechmarks even with thinking set to the "MINIMAL" level. And as with Gemini offerings in general cost per token is quite a bit lower than other similarly capable models. The main question for voice AI developers is whether this model will have the same really really good TTFT numbers that" [X Link](https://x.com/kwindla/status/2001391453243871375) 2025-12-17T20:38Z 11.2K followers, [----] engagements "@chinmay Are you using Pipecat" [X Link](https://x.com/kwindla/status/2003352543947292748) 2025-12-23T06:30Z 11.9K followers, [---] engagements "@chinmay Among many other things hooks for observability tooling to understand the kind of issue youre having" [X Link](https://x.com/kwindla/status/2003362865546436750) 2025-12-23T07:11Z 11.9K followers, [--] engagements "@chinmay One way is to use a custom mute filter: Another way is to create a custom processor that blocks InputAudioFrames when you tell it to. https://docs.pipecat.ai/guides/fundamentals/user-input-muting#mute-strategies https://docs.pipecat.ai/guides/fundamentals/user-input-muting#mute-strategies" [X Link](https://x.com/kwindla/status/2003688875320086801) 2025-12-24T04:47Z 11.9K followers, [--] engagements "@chinmay There's an even simpler approach that I forgot about: I believe you can just task.queue_frames(STTMuteFrame(True)) and task.queue_frames(STTMuteFrame(False))" [X Link](https://x.com/kwindla/status/2003984592609710096) 2025-12-25T00:22Z 11.2K followers, [--] engagements "@chinmay Prompting to get reliable function calling is definitely different between different models. I think function call stuff is one of the key places where prompt iteration and context engineering is critical" [X Link](https://x.com/kwindla/status/2003985742624968961) 2025-12-25T00:26Z 11.9K followers, [--] engagements "@chinmay Need to see logs. Recommend posting debug logs to Pipecat Discord" [X Link](https://x.com/kwindla/status/2005344057724436636) 2025-12-28T18:24Z 11.9K followers, [--] engagements "The NVIDIA keynote at CES today was great. Lots of info about really nifty upcoming data center hardware that will power training and inference at scale of course. But also a deep dive into the multi-model multi-modal hybrid cloud/local world that we've been trying to help bring into being with our work on the open source Pipecat realtime multi-modal framework. NVIDIA is all in on open source. I lost track of the number of times Jensen said some version of "the entire thing is open" in the keynote. NVIDIA expects open models to drive a lot of the growth in use cases that wouldn't be practical" [X Link](https://x.com/kwindla/status/2008332241785725432) 2026-01-06T00:18Z 11.8K followers, [----] engagements "Thank you for the kind words about the crazy side-project we did for @aiDotEngineer world's fair mostly because I really love printed books. We had a lot of stuff we felt like we'd learned about build voice agents for production that we wanted to share and what better way to do that then make something people could hold in their hands Looking forward to doing fun stuff with you in [----] https://twitter.com/i/web/status/2008335942030139702 https://twitter.com/i/web/status/2008335942030139702" [X Link](https://x.com/kwindla/status/2008335942030139702) 2026-01-06T00:32Z 11.9K followers, [---] engagements "Here's a technical write-up about the voice agent in the video above the three NVIDIA models how to deploy to production and some fun optimizations if you're running locally on a single GPU: https://www.daily.co/blog/building-voice-agents-with-nvidia-open-models/ https://www.daily.co/blog/building-voice-agents-with-nvidia-open-models/" [X Link](https://x.com/kwindla/status/2008601716212854880) 2026-01-06T18:09Z 11.9K followers, [----] engagements "Code is all here: You can deploy these models to @modal cloud really easily. (I love the Modal developer experience.) To run locally you'll need to build a Docker container (because you know bleeding edge vLLM llama.cpp CUDA for Blackwell etc). But the Dockerfile in the repo should "just work" on DGX Spark and RTX [----]. If you have trouble or make patches to extend to other platforms please let me know https://github.com/pipecat-ai/nemotron-january-2026/ https://github.com/pipecat-ai/nemotron-january-2026/ https://github.com/pipecat-ai/nemotron-january-2026/" [X Link](https://x.com/kwindla/status/2008601717987045382) 2026-01-06T18:09Z 11.9K followers, [----] engagements "@Scobleizer Built with @pipecat_ai" [X Link](https://x.com/kwindla/status/2008627761267831163) 2026-01-06T19:52Z 11.9K followers, [---] engagements "I've learned so much from running models locally. And on multiple platforms (mac with MLX RTX 5090). Also the pain of building bleeding edge versions of torch vllm llama.cpp sglang etc reminds me of the early days of Linux when every time I needed to do anything I had to recompile the kernel. :-) https://twitter.com/i/web/status/2008628807180427492 https://twitter.com/i/web/status/2008628807180427492" [X Link](https://x.com/kwindla/status/2008628807180427492) 2026-01-06T19:56Z 11.8K followers, [---] engagements "@MoodiSadi That's true for Pipecat (you need WSL). But the actual Docker container should work on Windows I would have thought" [X Link](https://x.com/kwindla/status/2008631220947599844) 2026-01-06T20:06Z 11.9K followers, [--] engagements "@slowhandzen @BryceWeiner @tabali_tigi @kamalrhubbard @2Randos @MeemStein @tykillxz @kaur_q24 @newmanass @diegopuig13 @parttimewhore1 @SouthDallasFood @mrubel495 @jessie_murphy_ @kas7649 @Jomari_P @FlashpointIs @SezarSurge @S3kr3tt0 @notsofast @ParallelTCG @Variety @DrTiaSpores @ggchronicles_ @DrFresch @Goldoshi @netflix @InvincibleHQ @WGAWest It's all relative. If I win the actual lottery I'm going to put in a pre-order for DGX Station: https://www.nvidia.com/en-us/products/workstations/dgx-station/ https://www.nvidia.com/en-us/products/workstations/dgx-station/" [X Link](https://x.com/kwindla/status/2008675338445091259) 2026-01-06T23:01Z 11.8K followers, [--] engagements "@max_does_tech You can hear it in the demo Thats the Pipecat open source native audio smart turn model. https://github.com/pipecat-ai/smart-turn https://github.com/pipecat-ai/smart-turn" [X Link](https://x.com/kwindla/status/2008716097164636215) 2026-01-07T01:43Z 11.9K followers, [----] engagements "@letsbuildmore Sadly we are still a ways away from LLMs small enough to run on an iPhone that can do good open-ended conversation. Whats your use case" [X Link](https://x.com/kwindla/status/2008716396675690570) 2026-01-07T01:44Z 11.8K followers, [---] engagements "@computeless I think it depends on the use case. If your LLM is in the cloud it often makes sense to run the transcription and LLM pipeline tightly coupled in the cloud. (See all the production agents running on @pipecat_ai for things like phone answering and customer support for example.)" [X Link](https://x.com/kwindla/status/2008718103837122994) 2026-01-07T01:51Z 11.9K followers, [---] engagements "This robot assistant from the NVIDIA CES Keynote on Monday is going viral. @NaderLikeLadder explains all the hottest emerging AI trends in one demo: AI applications in [----] will be multi-model multi-modal hybrid cloud/local use open source models as well as proprietary models control robots and embedded devices in the physical world and have voice interfaces. (And the demo had a cute robot *and* a cute dog. Gold.) The demo was built with @pipecat_ai. NVIDIA posted a really nice technical walk-through and complete code. The Reachy Mini robot from @huggingface is open source hardware. (You can" [X Link](https://x.com/kwindla/status/2008743885523349774) 2026-01-07T03:33Z 11.9K followers, 48.6K engagements "How to build a robot assistant: GitHub repo: Get your own Reachy Mini robot: https://huggingface.co/blog/reachy-mini https://github.com/brevdev/reachy-personal-assistant https://huggingface.co/blog/nvidia-reachy-mini https://huggingface.co/blog/reachy-mini https://github.com/brevdev/reachy-personal-assistant https://huggingface.co/blog/nvidia-reachy-mini" [X Link](https://x.com/kwindla/status/2008743887318487344) 2026-01-07T03:33Z 11.9K followers, [----] engagements "@darshil @RealtimeUK Technical write-up and GitHub repo you can clone (run locally on a [----] or deploy to the cloud to test): - - https://github.com/pipecat-ai/nemotron-january-2026 https://www.daily.co/blog/building-voice-agents-with-nvidia-open-models/ https://github.com/pipecat-ai/nemotron-january-2026 https://www.daily.co/blog/building-voice-agents-with-nvidia-open-models/" [X Link](https://x.com/kwindla/status/2008790218217079253) 2026-01-07T06:38Z 11.9K followers, [--] engagements "@anayatkhan09 Hot take but sub-300ms is *too* fast unless turn detection is *perfect*. I dont even like talking to most actual people who do sub-300ms responses. 😜" [X Link](https://x.com/kwindla/status/2008796004880183581) 2026-01-07T07:01Z 11.8K followers, [---] engagements "@TshwaneGaming Pipecat is a completely open source vendor neutral framework for building voice agents. It's the most widely used voice AI agent framework so it's a good place to start: https://www.pipecat.ai/ https://www.pipecat.ai/" [X Link](https://x.com/kwindla/status/2008955390134993294) 2026-01-07T17:34Z 11.9K followers, [--] engagements "New release of the @pipecat_ai Smart Turn model today. (Plus a funny LLM outtake in the demo video .) This is a point release (version 3.2) with some nice quantitative improvements for short speech segments and noisy environments. Good turn detection is important for voice agents. Smart Turn is an open source native audio turn detection model that you can drop into any voice agent to give you very fast accurate turn detection. In Pipecat pipelines we generally run Smart Turn in parallel with transcription. This parallelization gives you the fastest possible end-to-end latency. If you're using" [X Link](https://x.com/kwindla/status/2009028816052552084) 2026-01-07T22:26Z 11.9K followers, [----] engagements "Here's the Smart Turn v3 announcement post: Model training code weights and inference code: Getting started with Smart Turn in Pipecat voice agents: Here's the code for the NVIDIA open source voice agent in the demo: https://github.com/pipecat-ai/nemotron-january-2026/tree/khk/smart-turn-3.2 https://docs.pipecat.ai/server/utilities/smart-turn/smart-turn-overview https://github.com/pipecat-ai/smart-turn https://www.daily.co/blog/smart-turn-v3-2-handling-noisy-environments-and-short-responses/ https://docs.pipecat.ai/server/utilities/smart-turn/smart-turn-overview" [X Link](https://x.com/kwindla/status/2009028818305229159) 2026-01-07T22:26Z 11.9K followers, [---] engagements "@MoodiSadi @pipecat_ai Yes. The main branch uses the Smart Turn v3.1 model weights. (They were the newest weights as of yesterday) I put up a branch that uses the v3.2 weights today (basically a 2-line change): https://github.com/pipecat-ai/nemotron-january-2026/tree/khk/smart-turn-3.2 https://github.com/pipecat-ai/nemotron-january-2026/tree/khk/smart-turn-3.2" [X Link](https://x.com/kwindla/status/2009032949887369338) 2026-01-07T22:42Z 11.9K followers, [---] engagements "Nemotron [--] Nano is definitely on the small side for a conversational voice LLM. And the choppiness might be the experimental Magpie voice inference code we wrote which is optimized for running on an RTX [----]. This is a tech demo showing preview models and the ability to run locally for specific use cases rather than production agent code. For a typical production voice agent use case you could start with a more conventional Pipecat pipeline: https://docs.pipecat.ai/getting-started/quickstart https://docs.pipecat.ai/getting-started/quickstart" [X Link](https://x.com/kwindla/status/2009034687662960768) 2026-01-07T22:49Z 11.9K followers, [--] engagements "We should talk to Marcus (who does the heavy lifting on the model training) about it. There is a threshold parameter in the inference code so it would be easy to add. But the model is trained to have a very bimodal distribution though. The theory is that over-fitting is better in a relatively data-constrained training context. Another idea: if you record yourself doing real sessions and instrument things so that you can easily pull out the audio from the models wrong decisions we can add to the data set. I think with [---] real samples the model will be 10x better for you." [X Link](https://x.com/kwindla/status/2009105789773074555) 2026-01-08T03:32Z 11.9K followers, [--] engagements "@letsbuildmore Terminus iOS app for ssh. Connect to one of my desktop machines at home into tmux sessions. Works great except the Claude Codes terminal buffering code isnt optimized for this exactly so sometimes I see a lot of terminal repaints" [X Link](https://x.com/kwindla/status/2009106419338088500) 2026-01-08T03:34Z 11.8K followers, [---] engagements "Inference is fast because all three models are fast we customized the inference code for streaming (possible because the whole stack is open source) and we built on the @pipecat_ai realtime agent core code that's designed to do this very fast multimodal AI processing. To be fast enough for human-quality voice conversations you need good performance at all three of those levels: model architecture inference stack and agent architecture. We wrote about all the optimization here: Happy to answer questions https://www.daily.co/blog/building-voice-agents-with-nvidia-open-models/" [X Link](https://x.com/kwindla/status/2009318356088967295) 2026-01-08T17:36Z 11.9K followers, [--] engagements "This model has a "confidence" threshold internally but it's not intended to be used in production. We intensionally trained this model to have a strongly bimodal distribution. The intuition here is that in a domain where data quantity is the limiting factor it's best to aim to overfit in training (loosely speaking). We also don't use the length of the trailing silence as input to training. I'm a little less sure about this decision to be honest and can imagine re-visiting it. https://twitter.com/i/web/status/2009320169135317191 https://twitter.com/i/web/status/2009320169135317191" [X Link](https://x.com/kwindla/status/2009320169135317191) 2026-01-08T17:43Z 11.9K followers, [--] engagements "Pipecat Cloud is @trydaily's enterprise hosting platform for open source voice agents. Today after a 9-month beta period we're promoting Pipecat Cloud to General Availability With Pipecat Cloud you build your voice agent on @pipecat_ais open source vendor neutral core add your custom code and agent logic and then docker push to Pipecat Cloud. As with everything we do Pipecat Cloud is engineered to give you flexibility to not lock you into any service including Pipecat Cloud itself. Any code that you can host on Pipecat Cloud you can self-host with no changes at all. We've focused on" [X Link](https://x.com/kwindla/status/2009382966284452175) 2026-01-08T21:53Z 11.9K followers, [----] engagements "Pipecat Cloud announcement blog post with more details: Quickstart: https://docs.pipecat.ai/getting-started/quickstart https://www.daily.co/blog/pipecat-cloud-is-now-generally-available/ https://docs.pipecat.ai/getting-started/quickstart https://www.daily.co/blog/pipecat-cloud-is-now-generally-available/" [X Link](https://x.com/kwindla/status/2009382969438289981) 2026-01-08T21:53Z 11.9K followers, [---] engagements "Last May @mark_backman hosted the infrastructure session of our month-long community Voice AI course. Mark's overview is still the best primer on how to deploy voice agents to production: job processing and compute cluster requirements network routing and audio transport telephony interconnect etc. It's definitely worth watching the whole thing if you're part of a team building voice agents. I was re-watching Mark's video today because we just declared GA for Pipecat Cloud the enterprise hosting platform for open source voice agents. I was curious how much has changed since Mark's overview" [X Link](https://x.com/kwindla/status/2009698693633716329) 2026-01-09T18:48Z 11.9K followers, [----] engagements "Full video of Mark's infrastructure session: Get started with Pipecat Cloud: https://docs.pipecat.ai/getting-started/quickstart https://www.youtube.com/watchv=pRaVTv8RqiU https://docs.pipecat.ai/getting-started/quickstart https://www.youtube.com/watchv=pRaVTv8RqiU" [X Link](https://x.com/kwindla/status/2009698695814775129) 2026-01-09T18:48Z 11.9K followers, [---] engagements "100%. Native voice in Claude Code is something I'm sure a lot of people would use based on my experience hacking together various versions and forcing myself to use voice until I got over the "this is new I'm used to the keyboard" hump. I've posted a few times about my personal hacked-up versions of this. https://x.com/kwindla/status/1962597878138053036s=20 https://x.com/kwindla/status/1949308553015263609s=20 https://x.com/kwindla/status/1962597878138053036s=20 https://x.com/kwindla/status/1949308553015263609s=20" [X Link](https://x.com/kwindla/status/2009744872228696425) 2026-01-09T21:51Z 11.8K followers, [--] engagements ".@chadbailey59 wrote a nice introduction to the "structured conversations" approach to building super-reliable voice agents. If you've been experimenting with Ralph Wiggum coding loops you already understand the most important thing about structured conversations: throwing away old context is the *only* way to design complex LLM workflows that execute reliably. Pipecat Flows is an open source library that supports structured conversations and context engineering for complex voice agent worfklows. Things like: - Food ordering - the agent needs to answer questions record items confirm a" [X Link](https://x.com/kwindla/status/2010798611295322165) 2026-01-12T19:38Z 11.9K followers, [----] engagements "Beyond the Context Window: Why Your Voice Agent Needs Structure: Pipecat Flows open source structured conversations framework: More on engineering reliability especially for instruction following and tool calling in multi-turn conversations: https://voiceaiandvoiceagents.com/#scripting https://github.com/pipecat-ai/pipecat-flows/ https://www.daily.co/blog/beyond-the-context-window-why-your-voice-agent-needs-structure-with-pipecat-flows/ https://github.com/pipecat-ai/pipecat-flows/ https://www.daily.co/blog/beyond-the-context-window-why-your-voice-agent-needs-structure-with-pipecat-flows/" [X Link](https://x.com/kwindla/status/2010798613774156166) 2026-01-12T19:38Z 11.9K followers, [---] engagements "Changelog: Pipecat quickstart: https://docs.pipecat.ai/getting-started/quickstart https://github.com/pipecat-ai/pipecat/releases/tag/v0.0.99 https://docs.pipecat.ai/getting-started/quickstart https://github.com/pipecat-ai/pipecat/releases/tag/v0.0.99" [X Link](https://x.com/kwindla/status/2011660769017131459) 2026-01-15T04:44Z 11.9K followers, [---] engagements "@MoodiSadi @Krisp_ai Personally I want to see @mark_backman and @aconchillo do a mini-tutorial on the next episode of Pipecat TV" [X Link](https://x.com/kwindla/status/2011663312333799731) 2026-01-15T04:54Z 11.9K followers, [--] engagements "@codewithimanshu @Krisp_ai All the things: voice-to-voice text-to-voice voice-to-text video-to-voice voice-to-video video-to-video voice-to-code voice-to-liquid-ui " [X Link](https://x.com/kwindla/status/2011862605938524311) 2026-01-15T18:06Z 11.9K followers, [--] engagements "Full session: Getting started with open source voice agent tooling: Chad's Pipecat Flows technical overview: https://www.daily.co/blog/beyond-the-context-window-why-your-voice-agent-needs-structure-with-pipecat-flows/ https://docs.pipecat.ai/getting-started/quickstart https://www.youtube.com/watchv=j-ARPPjJtRQ&list=PLzU2zoMTQIHjMPZ-OnpC3ozZs3bp3kIUs https://www.daily.co/blog/beyond-the-context-window-why-your-voice-agent-needs-structure-with-pipecat-flows/ https://docs.pipecat.ai/getting-started/quickstart https://www.youtube.com/watchv=j-ARPPjJtRQ&list=PLzU2zoMTQIHjMPZ-OnpC3ozZs3bp3kIUs" [X Link](https://x.com/kwindla/status/2011919341655310636) 2026-01-15T21:52Z 11.9K followers, [---] engagements "If you're getting started with voice agents and Android the Pipecat Android demo client has all the core components a client-side voice AI app needs: voice input and output device control and network transport. Marcus just updated the code which now supports two WebRTC transports. The Pipecat SmallWebRTCTransport for zero-dependency peer-to-peer connections. And the Daily WebRTC transport for large-scale production use. The demo bot also sends a video stream which the app renders. You can actually use this code to connect to any voice AI service that implements the RTVI standard too not just" [X Link](https://x.com/kwindla/status/2012299936608698780) 2026-01-16T23:04Z 11.9K followers, [---] engagements "Pipecat Android demo client: Pipecat Simple Chatbot example (for the client to connect to). Two flavors of bot are provided: gpt-4o and Gemini Live. But you can modify the bot to use any TTS LLM STT or speech-to-speech model: https://github.com/pipecat-ai/pipecat-examples/tree/main/simple-chatbot https://github.com/pipecat-ai/pipecat-examples/tree/main/simple-chatbot/client/android https://github.com/pipecat-ai/pipecat-examples/tree/main/simple-chatbot https://github.com/pipecat-ai/pipecat-examples/tree/main/simple-chatbot/client/android" [X Link](https://x.com/kwindla/status/2012299937875431449) 2026-01-16T23:04Z 11.9K followers, [---] engagements "@nasimuddin01 We will definitely support Gemini Live in Pipecat Flows as soon as the Live API allows modifying the conversation context and tools list. That's the missing piece right now" [X Link](https://x.com/kwindla/status/2012301403050684701) 2026-01-16T23:10Z 11.9K followers, [--] engagements ".@maxipesfix forked the open source audio Smart Turn model and added video Smart Turn is a "turn detection" model used in a conversational agent to decide when the agent should respond. The model training data and training code are all completely open source. When we built the first version of Smart Turn enabling this kind of extention and collaboration is exactly why we wanted to make everything open source. Maxim's blog post is super useful to read if you're interested in training multimodal models. It describes the design choices and technical details (3D ResNet late fusion two-stage" [X Link](https://x.com/kwindla/status/2012637353953869912) 2026-01-17T21:25Z 11.9K followers, [----] engagements "macOS really wants me to install some updates and reboot so I'm going through all the "I need to look at these tabs" browser tabs before they are lost to me forever. Last month @freeplay_ai launched a nifty new AI analytics feature that looks quite useful for voice agents. In the post below @cairns describes this tooling as helping you identify patterns in production data. For example by surfacing similar examples that initial human review didn't catch and suggesting next steps (building specific kinds of evals creating new test data sets automated optimization runs). Here's my mental model" [X Link](https://x.com/kwindla/status/2012981124490809390) 2026-01-18T20:11Z 11.9K followers, [----] engagements "@pxng0lin Oh cool. Yeah CUDA version stuff often is a time sink for me. Feel free to submit a PR that adds another Dockerfile of it might be useful for other people" [X Link](https://x.com/kwindla/status/2013009501629120600) 2026-01-18T22:03Z 11.9K followers, [---] engagements "I'll be hanging out at the Weights & Biases office in San Francisco with a couple hundred old and new friends next weekend. Come join us and build some self-improving agents WeaveHacks is back Jan 31-Feb [--] at W&B HQ in SF. This time we're building self-improving agents. We've seen @GeoffreyHuntley's Ralph and @Steve_Yegge's Gas Town push the boundaries of what agents can do. Now it's your turn to build what comes next. Details below. 👇 https://t.co/2396L2sd1r WeaveHacks is back Jan 31-Feb [--] at W&B HQ in SF. This time we're building self-improving agents. We've seen @GeoffreyHuntley's Ralph" [X Link](https://x.com/kwindla/status/2014490113997475999) 2026-01-23T00:07Z 11.9K followers, [----] engagements "Sunday second screen World Cup reading: memories of playing soccer in Cte dIvoire - https://medium.com/@kwindla/scratch-the-surface-and-the-beautiful-game-is-always-there-5440dbc1781c https://medium.com/@kwindla/scratch-the-surface-and-the-beautiful-game-is-always-there-5440dbc1781c" [X Link](https://x.com/kwindla/status/488415117873131520) 2014-07-13T20:10Z 10.9K followers, [--] engagements "Tap tap tap . announcement: I just reserved my bot on @botdotme All the social medias are belong to us. http://bot.me/ http://bot.me/" [X Link](https://x.com/kwindla/status/747508270361780227) 2016-06-27T19:13Z 10.9K followers, [--] engagements "@nelson Oh man I love Hollinghurst" [X Link](https://x.com/kwindla/status/1262424930097836033) 2020-05-18T16:48Z 10.7K followers, [--] engagements "The Rainbow Room The Golden Gate Bridge and AI " [X Link](https://x.com/kwindla/status/1664338083888889856) 2023-06-01T18:28Z [----] followers, [---] engagements "AI thought of the day: "without context a Large Language Model is basically a reddit simulator." @charles_irl. I think"reddit simulator" is a more useful mental model than "blurry JPEG of the web"" [X Link](https://x.com/kwindla/status/1699900156274184357) 2023-09-07T21:39Z [----] followers, [----] engagements "I took a break to check the surf. But mostly I checked Slack. Heres a picture to balance things out. It really is beautiful today" [X Link](https://x.com/kwindla/status/1700282630694928470) 2023-09-08T22:59Z [----] followers, [--] engagements "This is my favorite thing on @ProductHunt today. I don't make films but I love story boards and for some kinds of design and planning work "thinking in story boards" works better than anything else for me. https://www.producthunt.com/posts/storiaboard https://www.producthunt.com/posts/storiaboard" [X Link](https://x.com/kwindla/status/1704942003275153627) 2023-09-21T19:33Z [----] followers, [----] engagements "AI thought of the day . This morning in an ambiguous situation at a four-way stop a self-driving car proceeded through the intersection when I thought it was my turn to go. Was it dangerous No. Did it make me laugh Yes. This morning did I see several human drivers at four-way stops drive more aggressively Also yes. Did I see several human drivers do things I actually thought were dangerous Again yes. Have we collected enough data to know if self-driving cars cause fewer injuries per mile than the average human driver I have no idea. Do I think today's self-driving cars are better drivers than" [X Link](https://x.com/kwindla/status/1705369580611490029) 2023-09-22T23:52Z [----] followers, [--] engagements "My favorite thing on @ProductHunt today: PumpGTP #AWS group buying + LLM-powered devops engineering expertise. AWS is hard to beat for flexibility and breadth of cloud services. This is especially valuable early on in building out a new product or tech stack. If you're fortunate enough to grow though AWS cost can scale alarmingly and the "committed spend" discounts AWS offers are complex to negotiate and as rigid as the pay-as-you-go AWS offerings are flexible. PumpGPT aggregates infrastructure discounts and gives you back that flexibility. They've also just announced an AWS Certified in" [X Link](https://x.com/kwindla/status/1706082676154024132) 2023-09-24T23:06Z [----] followers, [----] engagements "My "AI thought of the day" today is a long one A whole blog post. Over the next couple of weeks we're launching two new AI-focused toolkits publishing a bunch of sample code and fun demos and announcing several partnerships. We've spent the last six years building the world's best infrastructure and SDKs for real-time audio and video. Now we're building on that work to support all the exciting next-generation use cases that combine LLMs (and other AI tools) with WebRTC voice video streaming and recording" [X Link](https://x.com/kwindla/status/1706743058086477988) 2023-09-26T18:50Z [----] followers, [----] engagements "My favorite launch on @ProductHunt today: Founder Salary Report I *just* had this conversation today with a friend: "I raised a seed round and one of the things I'm trying to think through is how much to pay myself. I'm not paying myself anything right now but should I be And if so how much" This was maybe the 5th or 6th topic we talked about sharing thoughts and notes about our experiences as startup founders. It's never *the* most important thing. But it's something every founder does have to think about at some point. And advice from early investors is likely to be all over the map. So" [X Link](https://x.com/kwindla/status/1706782640609903058) 2023-09-26T21:27Z [----] followers, [----] engagements ". and it's out in the world An SDK for AI + real-time audio video and data. This library makes it so easy to connect an LLM into a video call that I've been tinkering with WebRTC + GPT-4 in a colab notebook. 🤯" [X Link](https://x.com/kwindla/status/1707209390057771374) 2023-09-28T01:43Z [----] followers, [----] engagements "Here's a closer look at the tech stack and APIs underneath the AI-powered Clinical Notes tools for #telehealth that we released last week. "Todays most advanced 'frontier' Large Language Models have an impressive range of use cases. But perhaps themostimpressive thing about them to a computer programmer is that they are good at turning*unstructured* input datainto*structured* output. This is a genuinely new capability and is perhaps the biggest reason so many engineers are so excited about these new tools." Thank you to our partners @DeepgramAI and @ScienceDotIO who have created amazing" [X Link](https://x.com/kwindla/status/1707437941658906962) 2023-09-28T16:51Z [----] followers, [----] engagements "AI thought of the day: I've been accumulating a list of the various ways "AI" feels like a platform shift. At some level this is just a grab bag of analogies to previous technology trends and patterns. Or as the kids say a vibe. On the other hand there is fairly broad agreement (in retrospect) about what the *big* tech platform shifts have been in my lifetime: the PC the Internet the mobile phone. Anyway I thought of a new one while listening to @eriktorenberg and @labenz on The Cognitive Revolution podcast . MICROPAYMENTS. Which you know all caps because micropayments are one of those" [X Link](https://x.com/kwindla/status/1707809451816935845) 2023-09-29T17:27Z [----] followers, [---] engagements "Also regulatory compliance legal pricing customer support" [X Link](https://x.com/kwindla/status/1710478290765410532) 2023-10-07T02:12Z [----] followers, [---] engagements "Thank you for having me on the AI Chat podcast @jaeden_ai. Really fun conversation" [X Link](https://x.com/kwindla/status/1713227965188698323) 2023-10-14T16:19Z [----] followers, [---] engagements "AI thought of the day: nobody knows what they're doing. Half the conversations I have with engineers building the most interesting capable state of the art tools right now and half the conversations I listen to on podcasts swing around at some point to the "nobody in AI knows what they're doing right now including me" topic. We don't really know much about how to train large language models effectively. Relatedly we don't know much about how to optimize the data sets we use for training. Evaluation (figuring out what a model is good at) is mostly ad hoc. Prompting is a dark art. Retrieval" [X Link](https://x.com/kwindla/status/1715912842690584591) 2023-10-22T02:07Z [----] followers, [---] engagements "The DALL-E [--] prompt for the image was "An image of a computer programmer writing code that implements a very large-scale distributed system smiling to herself and imagining the vast world of computation that she is creating." (Then three rounds of variations)" [X Link](https://x.com/kwindla/status/1715913059364213074) 2023-10-22T02:08Z [----] followers, [--] engagements "Speaking of moving the conversation forward when @terronk tested an early version of this demo he started out with "tell me a story using Disney IP" Which turned out to be 1) a totally 🔥 voice prompt and 2) an interesting experiment to run if you're building LLM apps" [X Link](https://x.com/kwindla/status/1717562495983133083) 2023-10-26T15:23Z [----] followers, [---] engagements "There were way way more Nintendo costumes than Star Wars costumes at trick-or-treating by the beach on Sunday. Im old so I still think of movies as the big thing. But that has not been true for a while. Video games are the big thing" [X Link](https://x.com/kwindla/status/1719211039982850512) 2023-10-31T04:33Z [----] followers, [---] engagements "My favorite launch on @ProductHunt today: LangChain Templates LangChain is a framework for building AI-powered applications. It's widely used evolving quickly and strikes a nice balance between stability and experimentation. Depending on what you're doing LangChain might be a good choice for prototyping production or both. In addition reading how various features and integrations are implemented in LangChain is a great way to learn. (I'm a big fan of learning by reading source code.) The new templates feature gives you even easier ways to set up and deploy apps (and even more source code to" [X Link](https://x.com/kwindla/status/1719414459834450317) 2023-10-31T18:02Z [----] followers, [----] engagements "Agreed. Observationally: it is extremely hard for companies operating at massive scale to ship net new product lines. Which has implications for acquisition strategies anti-trust regulation and startup planning among other things. Amazon and Apple both seem to have developed organizational hacks to partially counteract this. Interestingly from the outside at least these two companies mechanisms seem radically different. So theres not one answer here. Relatedly its extremely hard to scale up without losing a lot of product velocity. Kudos to Open AI for continuing to ship great stuff at speed" [X Link](https://x.com/kwindla/status/1721908807469760598) 2023-11-07T15:13Z [----] followers, [--] engagements "There are a few commercial options. I think @topazlabs is the most accessible tool that I've seen. I'm a little surprised that there aren't good open source models for this yet. There are lots of research papers. But searches on @huggingface and @sievedata didn't turn up anything ready to use" [X Link](https://x.com/kwindla/status/1722010423967518916) 2023-11-07T21:57Z [----] followers, [--] engagements "Speech-to-speech translation is a radically under-appreciated capability of SOTA AI. You can now have a conversation with most people in the world in their own language. (And they can talk to you in yours.) Translation accuracy is more than good enough to have a natural conversation. So is translation latency. And both are going to continue to improve. Here's a real-time real-world AI text-to-speech comparison between #Azure Speech and @play_ht . (Transcription by @DeepgramAI and translation by @OpenAI GPT-4.) https://t.co/1fmhy90rZC Here's a real-time real-world AI text-to-speech comparison" [X Link](https://x.com/kwindla/status/1725606168117346363) 2023-11-17T20:05Z [----] followers, [---] engagements "ChatGPT has replaced approximately half of my general googling both work and non-work. ChatGPT Vision is better than Google Translate's "camera" function. Which turns out to be really nice for helping with 3rd grade Spanish homework. When I have piece of writing to do and am a little bit stuck in semi-procrastination mode occasionally I just ask ChatGPT to draft me something. Because I'm very particular about writing I rarely end up using anything that ChatGPT produced (even when it's quite good for the task at hand). But the quick back-and-forth reliably gets me out of semi-procrastination" [X Link](https://x.com/kwindla/status/1730276467488256261) 2023-11-30T17:23Z [----] followers, [---] engagements "Twilio announced today that they are discontinuing their Programmable Video service. #WebRTC is a small world. If you are impacted in any way by this change my DMs are open and I'm here to be helpful however I can" [X Link](https://x.com/kwindla/status/1731831467729436909) 2023-12-05T00:22Z [----] followers, 18.6K engagements "@srs_server I humbly suggest looking at @trydaily. All the features youve relied on from a great provider like Twilio scalable global infrastructure HIPAA and SOC [--] plus SDKs that make it easy to build AI-powered features like meeting copilots and RAG video search" [X Link](https://x.com/kwindla/status/1731911635512779124) 2023-12-05T05:41Z [----] followers, [--] engagements "The follow-up today is that Twilio is partnering with Zoom to transition their WebRTC customers to Zoom. For engineering decision-makers trying to understand how this news impacts their products and teams it's important to highlight that moving to Zoom is a core technology transition as well as a port to a new SDK. Zoom's tech stack is very good. It's also not WebRTC. There are some positives and some negatives to this. Zoom doesn't have to be standards compliant so can optimize some elements of their implementation without worrying about compatibility with the WebRTC implementations in" [X Link](https://x.com/kwindla/status/1732114579830997224) 2023-12-05T19:07Z [----] followers, [---] engagements "As exciting as anticipated advances in materials science and biology are and as transformative as cognition too cheap to meter will be lets be honest: its really all about talking to whales. Sperm whales have equivalents to human vowels. We uncovered spectral properties in whales clicks that are recurrent across whales independent of traditional types and compositional. We got clues to look into spectral properties from our AI interpretability technique CDEV. https://t.co/8sEAzPkMfo Sperm whales have equivalents to human vowels. We uncovered spectral properties in whales clicks that are" [X Link](https://x.com/kwindla/status/1732243200247779740) 2023-12-06T03:38Z [----] followers, [---] engagements "@embirico @nickarner @with_multi @webrtc @Zoom @twilio @trydaily @embirico We have native macOS builds internally but haven't had enough customer pull to treat native macOS as a GA target. I'm assuming Electron isn't sufficient for what you're doing" [X Link](https://x.com/kwindla/status/1735752851715133814) 2023-12-15T20:05Z [----] followers, [--] engagements "Motivated by Twilio's announcement that Twilio Video is going away I've been spending some time digging into what the latest version of Zoom's Web SDK can (and can't) do. It's definitely getting better but it's still much less performant than native WebRTC. And it's still missing a lot of things that web video apps need" [X Link](https://x.com/kwindla/status/1736902762678640699) 2023-12-19T00:14Z [----] followers, [----] engagements "Agreed that looking at data manually is super important. Like many things in test/evaluation/QA you can write lots and lots of test code (and you should) but unless you are also always doing manual validation youll definitely miss things as you scale up. Relatedly heres GPT-4V going waaay off the rails in response to a prompt that I thought was relatively simple" [X Link](https://x.com/kwindla/status/1738614713070846359) 2023-12-23T17:37Z [----] followers, [---] engagements "This is thinking too small. The question is not at what market cap can you wear the same regular clothes every day. The question is at what market cap you can wear different *pajamas* (and only pajamas) every day. Asking for a friend but at what market cap can you pull a Jensen Huang and wear the same clothes every day Massive alpha in not having to care. Asking for a friend but at what market cap can you pull a Jensen Huang and wear the same clothes every day Massive alpha in not having to care" [X Link](https://x.com/kwindla/status/1745260971227299850) 2024-01-11T01:46Z [----] followers, [---] engagements "@jeiting I dont know man. Seems like nobody has ever tried to compete with Salesforce for example. Seems easy. My startup barely needs even half the features" [X Link](https://x.com/kwindla/status/1745544726940651614) 2024-01-11T20:34Z [----] followers, [--] engagements "I agree with all of this. Today's glass rectangle will not be the primary device we all carry around ten years from now. Been thinking (a lot) lately about something Alan Kay used to say (a lot): People who are really serious about software should make their own hardware Friday afternoon hot take: I think we're at the start of a new wave of computing hardware (devices). Humane Pin Rabbit R1 Rewind pendant Meta Raybans Meta Quest Vision Pro more to come. Let's go back [--] years. It's [----]. Someone cool has the coolest phone: a Motorola Friday afternoon hot take: I think we're at the start of a" [X Link](https://x.com/kwindla/status/1748488760306889204) 2024-01-19T23:33Z [----] followers, [---] engagements "The Waymo::Uber experience difference today feels similar to the Uber::taxi experience difference in [----]. Quality of Uber drivers is tanking I think they just hire anyone now no quality control at all Quality of Uber drivers is tanking I think they just hire anyone now no quality control at all" [X Link](https://x.com/kwindla/status/1749879693271310393) 2024-01-23T19:40Z [----] followers, [----] engagements "I sat down to write a talk about how to migrate code between SaaS platforms (in this case focusing on migrating from Twilio Video which is now EOL). The first draft was [----] words. Don't worry I edited it down in the second draft. (Somewhat.) Here's the script and video if you're interested" [X Link](https://x.com/kwindla/status/1750748341095559212) 2024-01-26T05:11Z [----] followers, [---] engagements "The short answer is that though they had some excellent engineers working on video Twilio did not invest enough in that product. Real-time video is complex both at the infrastructure and SDK level and the surface area of features that customers want/need is pretty large. We are very lucky that venture investors believed in us and our market enough to fund us to build what the market needed" [X Link](https://x.com/kwindla/status/1750753268102561985) 2024-01-26T05:31Z [----] followers, [--] engagements "So this Lumiere thing (which looks awesome) is another AI "release" from Google that I can't try out even as a demo huh" [X Link](https://x.com/kwindla/status/1750914273478586834) 2024-01-26T16:11Z [----] followers, 80.3K engagements "Voice + AI meetup on Wednesday at the Cloudflare office in San Francisco. Pizza conversation and demos. Thanks to this meeting's panelists: @natrugrats @rajivayyangar and @Prafulfillment. And thanks to @CloudflareDev for letting us come hang out with the lava lamps. RSVP here:" [X Link](https://x.com/kwindla/status/1751790871497068590) 2024-01-29T02:14Z [----] followers, [----] engagements "Really fun event last night. [--] people came out in the rain to talk about voice/real-time/AI topics. We threw up a little weather-appropriate AI-generated textures toy demo in the background on the big screen. Thanks to @fal_ai_data for the high-fps Stable Diffusion Turbo API endpoint. 🌧🔥" [X Link](https://x.com/kwindla/status/1753172008207372727) 2024-02-01T21:42Z [----] followers, [----] engagements "dark mirror (real-time image-to-image thanks to @fal_ai_data)" [X Link](https://x.com/kwindla/status/1753606936300597302) 2024-02-03T02:30Z [----] followers, [----] engagements "My dreams last night were Vision Pro punk. Crystalline rectangles fleched translucent and razor-lit stratified above muddy reality" [X Link](https://x.com/kwindla/status/1755641739254395157) 2024-02-08T17:16Z [----] followers, [---] engagements Limited data mode. Full metrics available with subscription: lunarcrush.com/pricing
@kwindla kwindlakwindla posts on X about realtime, ai, voice, open ai the most. They currently have [------] followers and [---] posts still getting attention that total [-----] engagements in the last [--] hours.
Social category influence technology brands stocks social networks finance countries vc firms travel destinations gaming celebrities products
Social topic influence realtime, ai, voice, open ai, agent #3834, llm #1019, inference #255, in the, $googl, model #2713
Top accounts mentioned or mentioned by @pipecatai @trydaily @openai @twilio @chadbailey59 @krisphq @aidotengineer @deepgramai @groqinc @huggingface @cartesiaai @aconchillo @nelson @googledeepmind @producthunt @jonptaylor @moodisadi @markbackman @rajivayyangar @solarislll
Top assets mentioned Alphabet Inc Class A (GOOGL) Cloudflare, Inc. (NET) Robot Consulting Co., Ltd. (LAWR)
Top posts by engagements in the last [--] hours
"Pipecat 0.0.99 is a pretty big release [--] items in the "Added" section including vision (image input) support for OpenAI Realtime word-level timestamps in AzureTTSService the @krisp_ai VIVA turn detection model and Grok Realtime voice-to-voice. There's also a fundamental new abstraction in this release: turn and interruption "strategies." We started working on Pipecat in [----]. () In those early days we had just a few STT TTS and LLM models we could use for voice agents. The only turn detection option was Silero VAD. We were building fairly simple pipelines and targeting fairly simple use"
X Link 2026-01-15T04:44Z 12.2K followers, [----] engagements
"I posted about context engineering for voice agents a few days ago and realized I should also point people to @mark_backman's "Instruction Following and Workflows" tutorial from last year. Mark takes you from "this is going to be easy" through the "you're looking like the crazy guy with the conspiracy board" meme all the way out the other side to prompt and context engineering success. Or at least gives you tools to tackle long multi-turn realworld agent challenges: - LLM reliability and context window limits - Strategies for managing context - Pipecat Flows state machine library - Will we"
X Link 2026-01-15T21:52Z 12K followers, [---] engagements
"Kushagra Jain put together a comprehensive Pipecat introduction and architecture/concepts walk-through as a [--] minute video plus slides. Lots of good general Voice AI information here (latency interruptions network transport turn taking inference state etc) which Kushagra connects to the specific Pipecat building blocks that you use in a production voice agent for each of those things. https://twitter.com/i/web/status/2013313713533788176 https://twitter.com/i/web/status/2013313713533788176"
X Link 2026-01-19T18:12Z 12.2K followers, [----] engagements
"Small concrete example of a Claude skill that handles a repetitive previously human-dependent software engineering task. @aconchillo wrote a skill to generate changelogs for releases using the pipecat repo conventions"
X Link 2026-01-22T03:54Z 12.2K followers, [---] engagements
"I will be hacking on self-improving agents at WeaveHacks [--]. And Im bringing a bunch of Pipecat swag. ⚡ WeaveHacks [--] is happening at wandb HQ in SF Jan 31-Feb [--] @altryne will be hosting with a stacked judge panel. Self-improving agents. A literal robot dog grand prize + over $15K in other prizes. Sponsored by @Redisinc @browserbase @vercel @trydaily @googlecloud. https://t.co/xHGdlhK00i ⚡ WeaveHacks [--] is happening at wandb HQ in SF Jan 31-Feb [--] @altryne will be hosting with a stacked judge panel. Self-improving agents. A literal robot dog grand prize + over $15K in other prizes. Sponsored by"
X Link 2026-01-23T20:26Z 12.2K followers, [----] engagements
"@nvbalaji I don't think input and output were easy to define or model in the pre-AI software era. I say that after many years of arguments about things like Model View Controller abstractions and what the ideal design for a React state management library is. :-)"
X Link 2026-01-25T18:59Z 12K followers, [--] engagements
"Observability for voice agents is evolving quickly. And just in time. Tracing evals against real data prompt management latency and reliability metrics simulation-based testing . these are all really really valuable to teams shipping voice AI products. It's been great working with the @ArizePhoenix team on their Pipecat integration. 🚀 New OpenInference integration with @pipecat_ai Weve added OpenInference instrumentation for Pipecat using OpenTelemetry. Pipecat services can now emit standardized traces and spans with semantic context across realtime agent pipelines. The integration"
X Link 2026-01-26T22:50Z 12K followers, [----] engagements
"Voice-only programming with Claude Code . I've been playing with @aconchillo's MCP server that lets you talk to Claude Code from anywhere today. I always have multiple Claudes running and I often want to check in on them when I'm not in front of a computer. Here's a video of Claude doing some front-end web testing hitting an issue and getting input from me and then reporting that the test passed. In the video the Pipecat bot is using Deepgram for transcription and Cartesia for the voice. (Note: I sped up the web testing clickety-click sections of the video.) The code for the MCP server and"
X Link 2026-01-27T01:14Z 12.2K followers, [----] engagements
"Pipecat MCP Server: This is infinitely customizable. Getting started with Pipecat: https://docs.pipecat.ai/getting-started/quickstart https://github.com/pipecat-ai/pipecat-mcp-server https://docs.pipecat.ai/getting-started/quickstart https://github.com/pipecat-ai/pipecat-mcp-server"
X Link 2026-01-27T01:14Z 12.2K followers, [---] engagements
"Async automatic non-blocking context compaction for long-running agents. Last week I gave a talk called Space Machine Sandboxes at the @daytonaio AI builders meetup about patterns for long-running agents. I work a lot on voice AI agents which are fundamentally multi-turn long-context loops. I also build lots of other AI agent stuff often as part of bigger systems that include voice. One of the patterns I showed in the talk is non-blocking compaction. Here's a short clip. https://twitter.com/i/web/status/2016288112629187054 https://twitter.com/i/web/status/2016288112629187054"
X Link 2026-01-27T23:12Z 12.2K followers, 26.5K engagements
"The full video is here: Here's a pointer to source code. This is not a clean example It's part of the code of the game I talked about in the talk. But let me know if you have questions: And Claude's summary: This is a 12-minute talk about building AI agents using a space trading game called Gradient Bang as a sandbox for experimenting with agent patterns. The core thesis is that voice agents coding agents and LLM-powered games are fundamentally the same thing an LLM in a loop with tool calls and context engineering. --- Topics 00:00 Introduction Kwindla from Daily working on Pipecat"
X Link 2026-01-27T23:12Z 12.2K followers, [----] engagements
"@ishaansehgal @aconchillo Are you using @pipecat_ai"
X Link 2026-01-27T23:29Z 12.2K followers, [--] engagements
"@ApplyWiseAi @daytonaio Rules of thumb: - keep system instruction under 5k tokens (but you can swap it out for different "states" in a complex voice workflow) - compact the conversation after [--] turns if you can"
X Link 2026-01-28T23:19Z 12K followers, [--] engagements
"Benchmarking LLMs for voice agent use cases. New open source repo along with a deep dive into how we think about measuring LLM performance. The headline results: - The newest SOTA models are all really good but too slow for production voice agents. GPT-4.1 and Gemini [---] Flash are still the most widely used models in production. The benchmark shows why. - Ultravox [---] shows that it's possible to close the "intelligence gap" between speech-to-speech models and text-mode LLMs. This is a big deal - Open weights models are climbing up the capability curve. Nemotron [--] Nano is almost as capable"
X Link 2026-02-02T21:42Z 12.2K followers, [----] engagements
"@chadbailey59 This is good framing and probably clearer and more succinct than I managed in my write-up about the benchmark. You can build ultra low latency super reliable voice agents but you have to do context engineering"
X Link 2026-02-03T16:52Z 12K followers, [---] engagements
"I sat down with @zachk and @bnicholehopkins to talk about how we benchmark models for voice AI. Benchmarks are hard to do well and good ones are really useful We covered what makes an LLM actually "intelligent" in a real-world voice conversation the latency vs intelligence trade-off how speech-to-speech models compare to text-mode LLMs infrastructure and full stack challenges and what we're all most focused on in [----]. https://twitter.com/i/web/status/2019120855570366548 https://twitter.com/i/web/status/2019120855570366548"
X Link 2026-02-04T18:48Z 12.2K followers, [----] engagements
"Open source voice agent LLM benchmark: Technical deep dive into voice agent benchmarking: https://www.daily.co/blog/benchmarking-llms-for-voice-agent-use-cases/ https://github.com/kwindla/aiewf-eval https://www.daily.co/blog/benchmarking-llms-for-voice-agent-use-cases/ https://github.com/kwindla/aiewf-eval"
X Link 2026-02-04T18:48Z 12.2K followers, [----] engagements
"@andxdy @terronk @rootvc Nice Related: (We are a Root Ventures company.) https://github.com/pipecat-ai/pipecat-mcp-server https://github.com/pipecat-ai/pipecat-mcp-server"
X Link 2026-02-05T21:47Z 12.1K followers, [--] engagements
"The NVIDIA DGX Spark is a desktop GPU workstation with 128GB of unified memory. Working with the team at @NVIDIAAIDev we've been using these little powerhouse machines for voice agent development testing new models and inference stacks and training LLMs and audio models. Today we published a guide to training the Smart Turn model on the DGX Spark. Smart Turn is a fully open source (and open training data) native audio turn detection model that supports [--] languages. The guide walks you through installing the right dependencies for this new Arm + Blackwell architecture and includes benchmarks"
X Link 2026-02-10T22:39Z 12.2K followers, [----] engagements
"Blog post link: The Smart Turn open source turn detection model: https://github.com/pipecat-ai/smart-turn https://www.daily.co/blog/training-smart-turn-on-the-nvidia-dgx-spark/ https://github.com/pipecat-ai/smart-turn https://www.daily.co/blog/training-smart-turn-on-the-nvidia-dgx-spark/"
X Link 2026-02-10T22:39Z 12.2K followers, [---] engagements
"New repo: Pipecat Skills for Claude Code So far: - Create and configure a basic voice agent (running locally using any combination of models and services) - Deploy to Pipecat Cloud for production - Start the Pipecat MCP Server to talk to Claude Code via voice (including remotely from your phone) I'm working on an end-to-end testing skill. https://twitter.com/i/web/status/2022011497996816826 https://twitter.com/i/web/status/2022011497996816826"
X Link 2026-02-12T18:14Z 12.2K followers, [---] engagements
"If you have skills that are useful for voice agent development contribute to the repo https://github.com/pipecat-ai/skills https://github.com/pipecat-ai/skills"
X Link 2026-02-12T18:14Z 12.2K followers, [---] engagements
"Why do you not call the UI a sub agent if you are not speaking to it directly In this pattern I am speaking to the UI agent directly. It sees the speech input. But it doesn't respond conversationally. It performs specialized tasks related to the UI. I don't think of it as a sub-agent because it isn't controlled by the voice model. I think of it as a parallel agent or a "parallel inference loop." The reason not to have the voice agent control the UI sub-agent is that I think it's hard to implement that without adding latency. I do use sub-agent patterns for other things where the control is"
X Link 2026-02-13T21:15Z 12.2K followers, [--] engagements
"These are voice agents. Pipecat supports Gemini Live (and Ultravox and OpenAI Realtime). But almost all production voice agents today use multiple models (STT - LLM - TTS) instead of a single speech-to-speech model. You get better latency intelligence and observability from a multi-model approach. I fully expect speech-to-speech models to have more market share over time. But right now SOTA is the multi-model pipeline. https://twitter.com/i/web/status/2022449946881069165 https://twitter.com/i/web/status/2022449946881069165"
X Link 2026-02-13T23:16Z 12.2K followers, [--] engagements
"@MAnfilofyev Super-impressive work from the @ultravox_dot_ai team on v0.7"
X Link 2026-02-16T04:13Z 12.2K followers, [---] engagements
"These days Sergio Sillero Head of the Cloud Data & AI at MAPFRE is programming via voice while he shops for groceries. If you're deep in the Claude Code / Ralph Wiggum / tmux world this is not super surprising to you. If you're not it sounds like crazy ridiculous hype. Sergio wrote some voice interface code for his Meta Ray-Bans using the @pipecat_ai MCP server that lets him keep working on a project in @AnthropicAI's Claude Code when he steps away from his desk. https://twitter.com/i/web/status/2023264920968757521 https://twitter.com/i/web/status/2023264920968757521"
X Link 2026-02-16T05:15Z 12.2K followers, [---] engagements
".@tavus just published a nice blog post about their "real-time conversation flow and floor transfer" model Sparrow-1. This model does turn detection predicting when it's the Tavus video agent's turn to speak. It does this by analyzing conversation audio in a continuous stream and learning and adapting to user behavior. This model is an impressive achievement. I've had a few opportunities to talk to @code_brian who led the R&D on this model at Tavus about his work. I love Brian's approach to this problem. Among other things the Sparrow-1 architecture allows this model to do things like handle"
X Link 2026-01-14T03:55Z 12.2K followers, [----] engagements
"https://github.com/pipecat-ai/pipecat/pull/3521 https://github.com/pipecat-ai/pipecat/pull/3521"
X Link 2026-01-22T03:54Z 12.2K followers, [---] engagements
"@MoodiSadi Doing compacting in a parallel pipecat pipeline works really well in all the places I've used that approach: https://github.com/pipecat-ai/gradient-bang/blob/0234d85dda47fd0b4b72e0b140a72a5c8a54bb4d/src/gradientbang/pipecat_server/bot.py#L224 https://github.com/pipecat-ai/gradient-bang/blob/0234d85dda47fd0b4b72e0b140a72a5c8a54bb4d/src/gradientbang/pipecat_server/bot.py#L224"
X Link 2026-02-02T21:52Z 12.2K followers, [---] engagements
"New text-to-speech model from @rimelabs today: Arcana v3. Rime's models excel at customization and personality. The new model is fast available in [--] languages and you can use it as a cloud API or run it on-prem. The model also outputs word-level timestamps which is very important for maintaining accurate LLM context during a voice agent conversation. Listen to Arcana v3 in this video. @chadbailey59 uses the open source Pipecat CLI to set up a voice agent from scratch customize the prompt and talk to it"
X Link 2026-02-05T00:01Z 12.2K followers, [----] engagements
"Arcana v3 launch post: Pipecat CLI: https://docs.pipecat.ai/cli/overview https://rime.ai/resources/arcana-v3 https://docs.pipecat.ai/cli/overview https://rime.ai/resources/arcana-v3"
X Link 2026-02-05T00:01Z 12.2K followers, [---] engagements
"Voice-controlled UI. This is an agent design pattern I'm calling EPIC "explicit prompting for implicit coordination." Feel free to suggest a better name. :-) In the video I'm navigating around a map conversationally pulling in information dynamically from tool calls and realtime streamed events. There are two separate agents (inference loops) here: a voice agent and a UI control agent. They know about each other (at the prompt level) but they work independently. https://twitter.com/i/web/status/2022087764720988296 https://twitter.com/i/web/status/2022087764720988296"
X Link 2026-02-12T23:17Z 12.2K followers, 14.1K engagements
"The Claude Code / Ralph Wiggum moment is exciting for a lot of reasons. One of them is that all of us building AI systems that are just a little bit beyond the capabilities of just prompting a SOTA model now have a shared set of baseline ideas we're building on. Plus an overlapping set of open questions - An agent is an LLM in a loop. (Plus a bunch of tooling integration and domain-specific optimization.) - Context management is a critical job. (Lots of ways to think about this.) - You almost certainly need multiple agents/models/processors/loops/whatever. (Lots of ways to think about this"
X Link 2026-01-24T22:12Z 12.2K followers, [----] engagements
"The critical things here are: - We can't block the voice agent's fast responses. - The voice agent already has a lot of instructions in its context and a large number of tools to call so we don't want to give it more to do each inference turn. So we prompt the voice agent to know at a high level what the UI agent will do but to ignore or respond minimally to UI-related requests. This adds relatively little complexity to the voice agent system instruction. We prompt the UI agent with a small subset of world knowledge a few tools and a lot of examples about how to perform useful UI actions in"
X Link 2026-02-12T23:17Z 12.2K followers, [----] engagements
"Detailed technical post about this voice agents STT benchmark: Benchmark source code: Benchmark data set on @huggingface . [----] human speech samples captured from real voice agent interactions with verified ground truth transcriptions: https://huggingface.co/datasets/pipecat-ai/stt-benchmark-data https://github.com/pipecat-ai/stt-benchmark https://www.daily.co/blog/benchmarking-stt-for-voice-agents/ https://huggingface.co/datasets/pipecat-ai/stt-benchmark-data https://github.com/pipecat-ai/stt-benchmark https://www.daily.co/blog/benchmarking-stt-for-voice-agents/"
X Link 2026-02-13T21:44Z 12.2K followers, [----] engagements
"@huggingface We also published a benchmark of LLM performance in real-world voice agent use cases recently (long multi-turn conversations with multiple tool calls and accurate instruction following required). https://x.com/kwindla/status/2019120857923375586s=20 Open source voice agent LLM benchmark: https://t.co/CbLijComqQ Technical deep dive into voice agent benchmarking: https://t.co/gHTNRSjwdI https://x.com/kwindla/status/2019120857923375586s=20 Open source voice agent LLM benchmark: https://t.co/CbLijComqQ Technical deep dive into voice agent benchmarking: https://t.co/gHTNRSjwdI"
X Link 2026-02-13T21:51Z 12.2K followers, [----] engagements
"Final transcript what about time until transcription starts streaming In general what we care about is the time from end of speech until the final transcript segment is available. We need the full transcript in order to run LLM inference. I've experimented a fair amount with greedy LLM inference on partial transcript segments and there are not enough gains to make up for the extra work. So "time to first token" from a transcription model isn't a useful metric. This is different from how we measure latency for LLMs and TTS models where we definitely focus on TTFT/TTFB"
X Link 2026-02-13T23:08Z 12.2K followers, [---] engagements
"Our goal was to set up the test the same way real-world input pipelines most often work. [--]. Audio chunks are sent to the STT service at real-time pacing. [--]. Silero VAD is configured to trigger after 200ms of non-speech frames. [--]. When the VAD triggers the STT service is sent a finalize signal. (Not all services support explicit finalization. But we think it's an important feature for real-time STT.) [--]. TTFS is the time between the first non-speech audio frame and the last transcription segment. If you use a service that sends you VAD or end-of-turn events it will function much the same way as"
X Link 2026-02-14T07:00Z 12.2K followers, [--] engagements
"NVIDIA just released a new open source transcription model Nemotron Speech ASR designed from the ground up for low-latency use cases like voice agents. Here's a voice agent built with this new model. 24ms transcription finalization and total voice-to-voice inference time under 500ms. This agent actually uses three NVIDIA open source models: - Nemotron Speech ASR - Nemotron [--] Nano 30GB in a 4-bit quant (released in December) - A preview checkpoint of the upcoming Magpie text-to-speech model These models are all truly open source: weights training data training code and inference code. This"
X Link 2026-01-06T18:09Z 12.2K followers, 279.5K engagements
"@picanteverde @simonw I love the Sesame work but there's no API and the consumer app is still Test Flight only as far as I know. Th version that was released as open source is not a fully capable model"
X Link 2026-02-16T04:36Z 12.2K followers, [---] engagements
"I don't think Sergio is here so you have to go follow him on the other thing: He's planning to demo his Ray-Bans + Claude Code integration at the February 25th Voice AI Meetup in Barcelona: https://www.voiceaispace.com/events/voice-ai-space-barcelona-meetup https://www.linkedin.com/feed/update/urn:li:activity:7428880495075078144/ https://www.voiceaispace.com/events/voice-ai-space-barcelona-meetup https://www.linkedin.com/feed/update/urn:li:activity:7428880495075078144/"
X Link 2026-02-16T05:15Z 12.2K followers, [---] engagements
"@_dr5w @simonw True. But for most of these there are only [--] providers. OpenAI/Azure or DeepMind/Vertex. 😀"
X Link 2026-02-16T05:36Z 12.2K followers, [--] engagements
"@kstonekuan @aconchillo You've built some really cool stuff I would love to see a configuration for the pipecat-mc-server where Tambourine lets you verbally edit what you want to say to Claude and then "submit." If that makes sense"
X Link 2026-01-28T01:21Z 12.2K followers, [--] engagements
"@riteshchopra Yes definitely Progressive "skills" loading inside a Pipecat pipeline is something we're doing fairly often these days. For a version of this in a fun voice agent context see the LoadGameInfo tool here: https://github.com/pipecat-ai/gradient-bang/blob/0a5c19f1459b2ddbbb4bc6182944239c63e4c702/src/gradientbang/pipecat_server/voice_task_manager.py#L26 https://github.com/pipecat-ai/gradient-bang/blob/0a5c19f1459b2ddbbb4bc6182944239c63e4c702/src/gradientbang/pipecat_server/voice_task_manager.py#L26"
X Link 2026-02-12T19:17Z 12.2K followers, [--] engagements
"Wake up babe. New Pareto frontier chart just dropped. Benchmarking STT for voice agents: we just published one of the internal benchmarks we use to measure latency and real-world performance of transcription models. - Median P95 and P99 "time to final transcript" numbers for hosted STT APIs. - A standardized "Semantic Word Error Rate" metric that measures transcription accuracy in the context of a voice agent pipeline. - We worked with all the model providers to optimize the configurations and @pipecat_ai implementations so that the benchmark is as fair and representative as we can possibly"
X Link 2026-02-13T21:44Z 12.2K followers, 11.1K engagements
"@iAmHenryMascot It's a new game we're building: https://github.com/pipecat-ai/gradient-bang https://github.com/pipecat-ai/gradient-bang"
X Link 2026-02-14T18:58Z 12.2K followers, [--] engagements
"Spending valentine's day exactly as you'd expect. (Arguing politely on LinkedIn about how to accurately measure latency and word error rates.) Wake up babe. New Pareto frontier chart just dropped. Benchmarking STT for voice agents: we just published one of the internal benchmarks we use to measure latency and real-world performance of transcription models. - Median P95 and P99 "time to final transcript" numbers https://t.co/y9qCrJLe0L Wake up babe. New Pareto frontier chart just dropped. Benchmarking STT for voice agents: we just published one of the internal benchmarks we use to measure"
X Link 2026-02-14T19:13Z 12.2K followers, [----] engagements
"I do think Gemini Live has a lot of potential. It's currently too slow (2.5s voice-to-voice P50) and the API is missing important features for real-world voice workflows. You can't do context engineering mid-conversation. If you really need a speech-to-speech model for production use you're better off right now with gpt-realtime. But I expect the Gemini Live team to make progress this year https://twitter.com/i/web/status/2023261706970124674 https://twitter.com/i/web/status/2023261706970124674"
X Link 2026-02-16T05:02Z 12.2K followers, [---] engagements
"My thinking about this has evolved a lot now that we have real-world data from millions of interactions with voice agents. I used to aim for 500-800ms voice-to-voice latency. It turns out that people are totally fine in real conversations until latency gets above 1500ms. So now I talk about 1500ms as the "hard" cutoff that you need your P95 to be under. Note this is voice-to-voice measured on the client side so that you include networks audio buffers OS and bluetooth playout delays etc. https://twitter.com/i/web/status/2023447213725413853 https://twitter.com/i/web/status/2023447213725413853"
X Link 2026-02-16T17:19Z 12.2K followers, [--] engagements
"@nelson With the caveat that what I think a novel means isn't necessarily at all what it means . I read this as part of the characterization of the narrator. He's self-absorbed and a bit myopic perhaps the way almost all young people are perhaps more so"
X Link 2020-05-18T17:07Z 10.9K followers, [--] engagements
"@nelson Do you have an Android tablet recommendation or two Id like a small tablet for testing ideally something with a great screen and a good processor. (And also a midrange mass market tablet for testing video calls. 😀)"
X Link 2020-05-21T16:55Z 10.9K followers, [--] engagements
"I'm happy and grateful to announce @trydaily has raised a $40M Series B led by @RenegadePtnrs. 🎉 @imthemusic is joining our board to work with us on the future of video audio and WebRTC. http://www.daily.co/blog/announcing-our-40m-series-b/ http://www.daily.co/blog/announcing-our-40m-series-b/"
X Link 2021-11-10T16:30Z 10.9K followers, [---] engagements
"@jennylefcourt @trydaily @ninacali4 @imthemusic It's been such a wonderful thing to get to work with you and learn from you @jennylefcourt Thank you"
X Link 2021-11-10T17:01Z 10.9K followers, [--] engagements
"Really exciting to see native WebRTC support in the OpenAI Realtime API We're launching updates to the OpenAI RealtimeAPI today: - WebRTC support - [--] new models - Big price cut - New API patterns Very excited about these changes we think they will unblock previously difficult applications https://t.co/nGu65da5U0 We're launching updates to the OpenAI RealtimeAPI today: - WebRTC support - [--] new models - Big price cut - New API patterns Very excited about these changes we think they will unblock previously difficult applications https://t.co/nGu65da5U0"
X Link 2024-12-17T18:46Z 11.7K followers, [----] engagements
"Launch day for @Google Imagen [--] We updated Story Bot one of the classic @pipecat_ai voice interaction starter kits to use Gemini [---] Flash and Imagen. The new experience with these two models is wow Watch @chadbailey59's interactive story about a dragon who feels a little bit out of place and her magical friend. The consistency of the images that Imagen creates through the whole story is a new capability. We havent seen that from any other model weve experimented with. Clone the repo to build your own voice+images experience"
X Link 2025-02-06T20:13Z [----] followers, 22.4K engagements
"Registration link: https://lu.ma/ffpyl57n https://lu.ma/ffpyl57n"
X Link 2025-02-17T01:57Z [----] followers, [---] engagements
"Thank you to @Vapi_AI for hosting this time in their office in San Francisco"
X Link 2025-02-17T04:39Z [----] followers, [---] engagements
"@gaunernst This is really interesting to read. Kudos Curious if you've thought about ttfb (latency) speed ups while doing this hacking. For conversational voice use cases latency matters a lot lower latency improves the user experience measurably"
X Link 2025-02-20T00:07Z [----] followers, [--] engagements
"Apologies if I missed it but I think the times in the graphic in your first tweet are for the complete generation. I was specifically thinking about using Kokoro in streaming mode. TTFB is time-to-first-byte in a streaming context. It can be a little hard to measure unless you specifically set up a time to measure the delay between starting inference and getting the first audio bytes in a stream. The idea is that for voice conversations as long as the model is just a little bit faster than realtime that's fast enough. But the time it takes to "start streaming" is something that humans are"
X Link 2025-02-20T00:35Z [----] followers, [--] engagements
"@gaunernst I should say that I haven't looked at the Kokoro streaming support yet other than that I saw the merge linked above The streaming support is new. The issue has js sample code. I'm mostly interested in using Kokoro from Python (in @pipecat_ai) pipelines"
X Link 2025-02-20T00:41Z [----] followers, [--] engagements
"@samptampubolon Yes. Speech input - VAD with a short timeout - smart-vad model - . rest of voice AI processing pipeline"
X Link 2025-03-07T01:40Z 11.9K followers, [----] engagements
"Instant startup for voice AI agents . Typical startup time for a voice agent today is 2-4 seconds. Here's an example starter kit that shows how to reduce that to 200ms. This code uses client-side buffering and careful sequencing to start capturing audio for the voice AI to respond to as soon as the local microphone is live. In the demo video here are the relevant numbers: - run 1: network and bot ready 1905ms. audio live: 157ms - run 2: network and bot ready 1828ms. audio live: 155ms - run 2: network and bot ready 1848ms. audio live: 161ms You can see in the video that I start to talk before"
X Link 2025-03-12T18:29Z 10.9K followers, 18.1K engagements
"@kylo_the_cat You can shape that behavior pretty flexibly with a system_instruction"
X Link 2025-03-13T20:38Z 10.9K followers, [--] engagements
"You are talking about the Gemini Multimodal Live API service in @pipecat_ai There was a PR for Vertex. I thought it got merged but Ill check. Do you have a Vertex use case for the Live API at scale If so DM me. Id like to help however I can and make sure everything is optimized for you"
X Link 2025-03-28T16:12Z 10.9K followers, [--] engagements
"@SumitPaul18_9 @oyacaro @OpenAI @GeminiApp Gemini [---] Flash native audio voice-to-voice is very fast. 700-900ms typically. It's new enough -- just GA last week on Vertext -- that we don't have a lot of monitoring data yet. But it's impressive"
X Link 2025-06-22T16:33Z 10.9K followers, [--] engagements
"AINews (@Smol_AI to subscribe) today summarizes the buzz about "context engineering". Credit to @dexhorthy for coining this very useful term. I've been talking to voice AI developers a lot over the past few months about the need to do this thing for a while: compress summarize focus tune the context in specific ways for specific segments of a voice AI conversation/workflow. It's hugely useful to have just the right term to describe "this thing". Makes it much easier to talk about"
X Link 2025-06-26T05:23Z 11.1K followers, [----] engagements
"You don't need a WebRTC server for voice agents. If you're deploying your own voice AI infrastructure you should almost certainly be using the new() serverless WebRTC approach. Serverless is much simpler which translates to faster development better scaling and higher reliability. You'll have slightly lower latency too compared to doing a network hop through a (single zone) WebRTC server cluster.() More notes below "
X Link 2025-07-22T01:24Z 11.9K followers, 18.5K engagements
"@hiteshGautam26 For fine-tuning I would start but reading the guides and posts from the @OpenPipeAI team. https://openpipe.ai/blog https://openpipe.ai/blog"
X Link 2025-07-27T05:17Z 11.9K followers, [--] engagements
"The launch livestream: The Realtime API docs: https://platform.openai.com/docs/guides/realtime https://www.youtube.com/watchv=nfBbmtMJhX0 https://platform.openai.com/docs/guides/realtime https://www.youtube.com/watchv=nfBbmtMJhX0"
X Link 2025-08-28T18:17Z 11.2K followers, [----] engagements
"At the @aiDotEngineer World's Fair in June @_pion and I gave a talk about networking fundamentals for voice AI plus how to get started with voice AI on small hardware devices. @chadbailey59 built a great little voice AI toy named Squobert. Squobert helped us out during the talk. If you're interested in building voice AI toys consumer devices or just hacking on voice-controlled hardware experiments check out the video of the talk and links to code "
X Link 2025-09-18T20:52Z 10.9K followers, [---] engagements
"Call 1-970-LIVE-API (1-970-548-3274) to play Truth or Lies with @GoogleDeepMind Gemini. (Are sunsets on Mars blue) - Gemini [---] Flash - Google Live API or TTS - LLM - TTS - @twilio for the phone - Deploy for production to Pipecat Cloud Full source code below. If you're interested in building voice agents join us this Saturday at @ycombinator for a Gemini x Pipecat hackathon"
X Link 2025-10-08T18:43Z 10.9K followers, [----] engagements
"@IronRedSandHive @DynamicWebPaige Oh I love that. @joshwhiton and I have hacked a little bit on some Kokoro + Pipecat stuff: https://github.com/kwindla/macos-local-voice-agents https://github.com/kwindla/macos-local-voice-agents"
X Link 2025-10-14T15:20Z 11K followers, [--] engagements
"Gemini x Pipecat hackathon last Saturday. [---] Developers. [--] judges from Google YC companies and the multimodal AI ecosystem. [--] projects submitted. We're continuing remotely all week. You can still sign up your team and compete for $300000 in API credits. You don't have to start a project from scratch. You can port something you've been working on to use a Gemini model. You can add a realtime voice or video feature to your startup's product. Here are the winners from the in-person day on Saturday "
X Link 2025-10-15T04:10Z 10.9K followers, [----] engagements
"The new Sonic-3 voice model from @cartesia_ai launched today. The big additions are increased emotional range emotion steering and laughter tags. For example: emotion value="curious" / I wrote some quick demo code prompting Gemini Flash to know about the emotion tags. You can hear the results in the video and see the emotion tags in the Pipecat developer console. Code is below"
X Link 2025-10-28T19:10Z 10.9K followers, 11.7K engagements
"Here's the code I ran to record the video: Sonic-3 docs: Some notes: - If you shipped this to production with a UI that shows transcripts to the user you'd add a simple frame processor that strips the emotion and laughter tags out of the LLM text stream after they're parsed by the Pipecat CartesiaTTSService. - My Gemini Flash prompting was very simple. You could definitely do some fun stuff here if you had a particular style valence in mind. - Cartesia is training different voices to have different "dynamic range." (My wording not theirs.) For a customer support agent you'd want to keep the"
X Link 2025-10-28T19:10Z 10.9K followers, [---] engagements
".@mark_backman and @aconchillo walk through a new voice agent CLI they've been working on in the latest episode of Pipecat TV. Mark shows using the CLI to build a voice agent in [--] minutes. The CLI guides you through choosing transcription LLM and voice models and configuring functions like recording and turn detection. You can test this voice agent locally and then deploy it to Pipecat Cloud or to your own infrastructure. (And you can wire up phone numbers for inbound or outbound telephony use cases.) There are a lot of interesting little sub-problems in building a good CLI like this. One"
X Link 2025-10-29T19:15Z 10.9K followers, [---] engagements
"Pokebench. The one true test of vision-language models"
X Link 2025-11-01T18:34Z 10.9K followers, [---] engagements
"Agree with this. The progress towards natively speech-to-speech models is exciting. But almost all production voice AI right now is stt-llm-tts. - Most voice agent use cases need the best possible function calling and instruction following performance from the LLM. There's a big delta here between SOTA LLMs in text mode and the current speech-to-speech models. - Observability debugging and hill climbing is a lot easier with the cascaded approach. - Architectural flexibility matters to an increasing number of use cases. For example building multi-agent systems with partially shared context is"
X Link 2025-11-06T16:38Z 11K followers, [--] engagements
"@abilash_speaks @pipecat_ai Is your problem transcription accuracy or turn detection For turn detection you should definitely use an audio-native model like smart turn or the built-in turn detection in deepgram flux"
X Link 2025-11-06T16:40Z 11K followers, [--] engagements
"dynamic user interfaces With all the things we'll talk about at the meetup next week it feels like we're only just starting to scratch the surface of what's possible/interesting. That's really really true of the dynamic UI stuff. The demo I have to show of that is both the most fun (because @JonPTaylor's work is amazing) and also the least technically adventurous (because we need to do a lot more experimentation here) I'm 100% convinced that at some point LLMs will be "designing" all our UIs on the fly. But we don't know how to built those components yet. I lived through the transition from"
X Link 2025-11-07T05:00Z 11K followers, [----] engagements
"At the voice agent meetup next week the theme is new patterns for agents. We are increasingly building voice agents that are much more than a single LLM prompt running in a loop. State machines various kinds of multi-agent systems combining "fast" and "thinking" models guardrails processes memory sub-systems . I'm starting to document some these new patterns. Here's one that I particularly like: using a tool call to start a long-running task returning from that tool call immediately with a very simple success/failure response then injecting events into the voice agent context for as long as"
X Link 2025-11-08T00:22Z 11K followers, 13.5K engagements
"Not arguing that sending your avatar to a meeting is rude if that's not the expectation. But . other than that I have a different take here. If this person's avatar is plugged into their personal knowledge base or prepped specifically for this meeting the interaction here is very different from what you'll get from talking to Claude. That's a big deal This is not a generic LLM. The future shock here is a little like video calls. I can't tell you how many people said to me "I'll never do video calls it's just better to talk on the phone" when we were starting Daily. Those people were wrong."
X Link 2025-11-10T22:38Z 11K followers, [----] engagements
"I really love this use case: personalized product demos. There are at least three things this approach to building realtime AI experiences unlocks: - Perfect lookup and perfect recall. Demonstrate any part of a complex product; go down any path with me; if I only have a few minutes but I get totally engaged I can come back later and pick up exactly where we left off. - Removes people as the bottleneck for this kind of white glove experience. Interactive conversational engagement isn't gated anymore by the scarcity of someone who is really good at the product demo being available - I would"
X Link 2025-11-11T00:21Z 11.1K followers, [----] engagements
"An alternative possibility: Anthropic ships a 200ms TTFT voice-to-voice model that is as "smart" as Sonnet [---] and this unlocks so many new use cases that all of us who build tooling and applications higher up the stack scramble to build stuff that leverages the model. The thesis here is that we will always want to do "20% more" than the best available model is capable of natively. You get that extra 20% by: - Context engineering. Writing a non-trivial amount of code to give the model the most useful tokens every inference call. State machines. Parallel inference loops that"
X Link 2025-11-11T18:23Z 11.1K followers, 13.7K engagements
".@TeamHathora launched model inference today on @ProductHunt. Go check it out. - Open weights models deployed in [--] regions for low-latency inference. - Running on the Hathora network which was designed from the ground up for low-latency gaming use cases. This is a big deal for voice AI developers because Hathora is filling a gap in the market: making it easier to use open weights models from anywhere in the world plus optimizing audio inference for time to first token/byte. @tarunipaleru from Hathora will be at the voice AI meetup tonight in SF talking about low-latency networking optimizing"
X Link 2025-11-12T17:50Z 11K followers, [----] engagements
"Hang out on Tuesday at the AWS AI Loft in San Francisco with engineers from @awscloud @DeepgramAI @trydaily and @pipecat_ai. Build your first voice agent or go deep on scaling enterprise deployments building multi-agent architectures RAG and external systems integrations and more"
X Link 2025-11-13T22:47Z 11K followers, [----] engagements
"This post about customizing models and optimizing inference for voice agents from the team at @modal is really cool. Almost every enterprise I talk to about voice agents wants to be able to use their internal data to iteratively improve the performance of their AI agents. The Modal team worked with Decagon to build model training tooling and data sets train a custom speculative draft model and modify the SGLang inference stack to serve the LLMs used by Decagon optimally on H200 GPUs. The blog post says that they've achieved a P90 latency of 342ms with this combined work. That's really good"
X Link 2025-11-15T19:53Z 11.1K followers, [----] engagements
"I'm really looking forward to @aiDotEngineer CODE in New York this week. If you'll be there and want to hang out and talk about voice agents realtime video AI agent design patterns or building AI agents with AI code generation let me know The AI Engineer events are always great. Great content great people great hallway track great evening events that people host around the edges of the conference. I was telling someone how much growth we've seen on Pipecat Cloud since the summer and looking back at the talk I did about Pipecat Cloud at AIE World's Fair in June. The basic description of why we"
X Link 2025-11-16T21:26Z 11.1K followers, [----] engagements
"My Pipecat Cloud talk at AIEWF: . And here's one of @andthenchat's innovative super-fun voice games . https://x.com/andthenchat/status/1974179408878641252s=20 https://www.youtube.com/watchv=IA4lZjh9sTs Weve entered our Swiftie Era 🎤 SAY LESS: Taylor Edition is now live Give it a try and let us know how you do Link below to play 👇 https://t.co/oSVUqJBug2 https://x.com/andthenchat/status/1974179408878641252s=20 https://www.youtube.com/watchv=IA4lZjh9sTs Weve entered our Swiftie Era 🎤 SAY LESS: Taylor Edition is now live Give it a try and let us know how you do Link below to play 👇"
X Link 2025-11-16T21:26Z 11.1K followers, [---] engagements
"And @thorwebdev showing an embedded hardware open source voice pipeline on a tiny ESP32 device . https://x.com/thorwebdev/status/1945158921179570557 Is this the tiniest little voice agent yet My @elevenlabsio voice clone running on an esp32 microcontroller via @pipecat_ai and WebRTC 🔥 Story time: I recently caught up with Danilo Campos who is building the awesome DeskHog (seriously check it out) at @posthog and he https://t.co/to4bKkrjTD https://x.com/thorwebdev/status/1945158921179570557 Is this the tiniest little voice agent yet My @elevenlabsio voice clone running on an esp32"
X Link 2025-11-16T21:26Z 11.1K followers, [----] engagements
"Im on a plane to New York with Claude Code open on one side of the iPad split screen and Jane Austen on the other (finally reading the Vintage Classics edition of Persuasion with the Brandon Taylor introduction which go read the introduction and then reread the novel; Ill wait). WiFI is spotty. ssh is happy on the iPad at the moment but my laptop knoweth not the Internet. So Im thinking about this jagged frontier of AI code generation while reading about Captain Wentworth and Anne Elliott meeting again after eight years all hopes uncertain. This is both a more productive and less productive"
X Link 2025-11-18T00:34Z 11K followers, [----] engagements
"I'm hanging out at the AI Lightning Talks event on Wednesday in New York hosted by our friends at @TeamHathora. I'll also be giving a very short talk on [--] new voice agent patterns we've been either using in production or experimenting with lately. If you're interested in voice AI (or really any kind of AI agents) please join us"
X Link 2025-11-19T03:31Z 11.1K followers, [----] engagements
"The ThursdAI crew delivers again as they do every Thursday with all the news of the week. This time from the AI Engineer Code Summit in New York. They were streaming from a table right in the middle of everything and I was sorely tempted to go give everyone high fives while they were live on the air. Crazy crazy week in AI and also a crazy podcast episode recorded live on the floor of @aiDotEngineer today with @ryancarson @swyx @dkundel and @thorwebdev (on his 3rd day at DeepMind) 👉 https://t.co/6JoY4JMu6H 👈 We covered all the major releases of this week someone in https://t.co/pTPKHKZqgJ"
X Link 2025-11-21T04:09Z 11.1K followers, [----] engagements
"Pipecat Thanksgiving day release. 🦃 Some highlights: Deepgram AWS SageMaker realtime speech-to-text support improved text aggregation simplified and more powerful error handling new MiniMax Speech [---] HD and Turbo models. SageMaker is AWS's AI platform for deploying and using machine learning models at scale. AWS has brand new support for streaming data in and out of models hosted on SageMaker which is great for voice AI use cases. This Pipecat release includes a generic base class for SageMaker "bidirectional streaming" plus a new DeepgramSageMakerSTTService class. Text aggregation and"
X Link 2025-11-28T01:53Z 11.9K followers, [----] engagements
"Full changelog: https://github.com/pipecat-ai/pipecat/blob/main/CHANGELOG.md https://github.com/pipecat-ai/pipecat/blob/main/CHANGELOG.md"
X Link 2025-11-28T01:58Z 11.9K followers, [---] engagements
"Smart Turn v3.1. Smart Turn is a completely open source open data open training code turn detection model for voice AI trained on audio data across [--] languages. The model operates on the input audio in a voice agent pipeline. Each time the user pauses briefly this model runs and returns a binary decision about whether the user has finished speaking or not. The [---] release has two big improvements: [--]. New data sets for English and Spanish collected and labeled by contributors Liva AI Midcentury and MundoAI. The majority of the training data for the Smart Turn model is synthetically generated."
X Link 2025-12-03T18:49Z 11.9K followers, 35.2K engagements
"Blog post with more details: Data sets for this release contributed by: Liva AI: Midcentury: MundoAI: All data sets are available here: Training code is here: https://github.com/pipecat-ai/smart-turn https://huggingface.co/pipecat-ai https://mundoai.world/ https://www.midcentury.xyz/ https://www.theliva.ai/ https://www.daily.co/blog/improved-accuracy-in-smart-turn-v3-1/ https://huggingface.co/pipecat-ai https://mundoai.world/ https://www.midcentury.xyz/ https://www.theliva.ai/ https://www.daily.co/blog/improved-accuracy-in-smart-turn-v3-1/ https://github.com/pipecat-ai/smart-turn"
X Link 2025-12-03T18:49Z 11.9K followers, [----] engagements
"@garrxth @pipecat_ai Really great working with you on this"
X Link 2025-12-03T20:06Z 11.9K followers, [--] engagements
"@sir4K_zen Accuracy benchmarks (against open test data) for all supported languages: https://www.daily.co/blog/announcing-smart-turn-v3-with-cpu-inference-in-just-12ms/ https://www.daily.co/blog/announcing-smart-turn-v3-with-cpu-inference-in-just-12ms/"
X Link 2025-12-04T06:28Z 11.1K followers, [---] engagements
"@sir4K_zen Training and eval code is all here: https://github.com/pipecat-ai/smart-turn https://github.com/pipecat-ai/smart-turn"
X Link 2025-12-04T15:28Z 11.9K followers, [--] engagements
"@scarlettx_eth You can configure and spin up a local voice agent using the Pipecat CLI. Just say "yes" to enabling the Smart Turn model in the prompts: https://docs.pipecat.ai/cli/overview https://docs.pipecat.ai/cli/overview"
X Link 2025-12-04T21:49Z 11.9K followers, [--] engagements
"No this isn't a transcription model. It's generally used in combination with Silero VAD and a transcription model. The turn detection inference runs in parallel with transcription. Check out the @pipecat_ai getting started docs to see typical configurations: https://docs.pipecat.ai/getting-started/introduction https://docs.pipecat.ai/getting-started/introduction"
X Link 2025-12-04T21:51Z 11.9K followers, [---] engagements
"Pipecat 0.0.97 release. Some highlights: Support for @GradiumAI's new speech-to-text and text-to-speech models. Gradium is a voice-focused AI lab that spun out of the non-profit Kyutai Labs which has been doing architecturally innovative work on neural codecs and speech-language models for the last two years. Continued improvements in the core text aggregator and interruption handling classes both to fix small corner cases and to make behavior as configurable as possible. This is the kind of often-invisible work that underpins Pipecat's ability to support a wide range of models and pipeline"
X Link 2025-12-09T00:05Z 11.9K followers, [----] engagements
"The @GradiumAI launch video is fun This paper about the Kyutai Moshi model authored by the Gradium founders was my favorite paper of 2024: Smart Turn open source open data open training code turn detection model: The PR adding a wait_for_all parameter for compatibility with parallel function calling from reasoning models: (I always try to link to PRs in this kind of post because I think reading the source code of libraries that you use is an under-rated activity) https://github.com/pipecat-ai/pipecat/pull/3120 https://github.com/pipecat-ai/smart-turn https://kyutai.org/Moshi.pdf"
X Link 2025-12-09T00:05Z 11.1K followers, [---] engagements
"The team at @LangChainAI built voice AI support into their agent debugging and monitoring tool LangSmith. LangSmith is built around the concept of "tracing." If you've used OpenTelemetery for application logging you're already familiar with tracing. If you haven't think about it like this: a trace is a record of an operation that an application performs. Here's a very nice video from @_tanushreeeee that walks you through building and debugging a voice agent with full conversation tracing. Using the LangSmith interface you can find a specific agent session then dig into what happened during"
X Link 2025-12-10T19:13Z 11.9K followers, [----] engagements
"How to debug voice agents with LangSmith: Getting started with LangSmith tracing: LangSmith Pipecat integration docs page: I always like to read the code for nifty Pipecat services like the LangSmith tracing processor. It's here though I think this nice work will likely make its way into Pipecat core soon: https://github.com/langchain-ai/voice-agents-tracing/blob/main/pipecat/langsmith_processor.py https://docs.langchain.com/langsmith/trace-with-pipecat https://www.youtube.com/watchv=fA9b4D8IsPQ https://youtu.be/0FmbIgzKAkQ https://docs.langchain.com/langsmith/trace-with-pipecat"
X Link 2025-12-10T19:13Z 11.9K followers, [---] engagements
"New Gemini Live (speech-to-speech) model release today. Using the Google AI Studio API the model name is: gemini-2.5-flash-native-audio-preview-12-2025 The model is also GA (general availability so not considered a beta/preview release) on Google Cloud Vertex under this model name: gemini-live-2.5-flash-native-audio Try it out on the @pipecat_ai landing page"
X Link 2025-12-12T21:06Z 11.9K followers, 25.2K engagements
"Today is Gemini [--] Flash launch day I've been experimenting with pre-release checkpoints of this model and it's very good. I've been using it for various personal voice agent stuff long-running text-mode agent processes and of course running benchmarks. Gemini [--] Flash saturates my relatively hard multi-turn bechmarks even with thinking set to the "MINIMAL" level. And as with Gemini offerings in general cost per token is quite a bit lower than other similarly capable models. The main question for voice AI developers is whether this model will have the same really really good TTFT numbers that"
X Link 2025-12-17T20:38Z 11.2K followers, [----] engagements
"@chinmay Are you using Pipecat"
X Link 2025-12-23T06:30Z 11.9K followers, [---] engagements
"@chinmay Among many other things hooks for observability tooling to understand the kind of issue youre having"
X Link 2025-12-23T07:11Z 11.9K followers, [--] engagements
"@chinmay One way is to use a custom mute filter: Another way is to create a custom processor that blocks InputAudioFrames when you tell it to. https://docs.pipecat.ai/guides/fundamentals/user-input-muting#mute-strategies https://docs.pipecat.ai/guides/fundamentals/user-input-muting#mute-strategies"
X Link 2025-12-24T04:47Z 11.9K followers, [--] engagements
"@chinmay There's an even simpler approach that I forgot about: I believe you can just task.queue_frames(STTMuteFrame(True)) and task.queue_frames(STTMuteFrame(False))"
X Link 2025-12-25T00:22Z 11.2K followers, [--] engagements
"@chinmay Prompting to get reliable function calling is definitely different between different models. I think function call stuff is one of the key places where prompt iteration and context engineering is critical"
X Link 2025-12-25T00:26Z 11.9K followers, [--] engagements
"@chinmay Need to see logs. Recommend posting debug logs to Pipecat Discord"
X Link 2025-12-28T18:24Z 11.9K followers, [--] engagements
"The NVIDIA keynote at CES today was great. Lots of info about really nifty upcoming data center hardware that will power training and inference at scale of course. But also a deep dive into the multi-model multi-modal hybrid cloud/local world that we've been trying to help bring into being with our work on the open source Pipecat realtime multi-modal framework. NVIDIA is all in on open source. I lost track of the number of times Jensen said some version of "the entire thing is open" in the keynote. NVIDIA expects open models to drive a lot of the growth in use cases that wouldn't be practical"
X Link 2026-01-06T00:18Z 11.8K followers, [----] engagements
"Thank you for the kind words about the crazy side-project we did for @aiDotEngineer world's fair mostly because I really love printed books. We had a lot of stuff we felt like we'd learned about build voice agents for production that we wanted to share and what better way to do that then make something people could hold in their hands Looking forward to doing fun stuff with you in [----] https://twitter.com/i/web/status/2008335942030139702 https://twitter.com/i/web/status/2008335942030139702"
X Link 2026-01-06T00:32Z 11.9K followers, [---] engagements
"Here's a technical write-up about the voice agent in the video above the three NVIDIA models how to deploy to production and some fun optimizations if you're running locally on a single GPU: https://www.daily.co/blog/building-voice-agents-with-nvidia-open-models/ https://www.daily.co/blog/building-voice-agents-with-nvidia-open-models/"
X Link 2026-01-06T18:09Z 11.9K followers, [----] engagements
"Code is all here: You can deploy these models to @modal cloud really easily. (I love the Modal developer experience.) To run locally you'll need to build a Docker container (because you know bleeding edge vLLM llama.cpp CUDA for Blackwell etc). But the Dockerfile in the repo should "just work" on DGX Spark and RTX [----]. If you have trouble or make patches to extend to other platforms please let me know https://github.com/pipecat-ai/nemotron-january-2026/ https://github.com/pipecat-ai/nemotron-january-2026/ https://github.com/pipecat-ai/nemotron-january-2026/"
X Link 2026-01-06T18:09Z 11.9K followers, [----] engagements
"@Scobleizer Built with @pipecat_ai"
X Link 2026-01-06T19:52Z 11.9K followers, [---] engagements
"I've learned so much from running models locally. And on multiple platforms (mac with MLX RTX 5090). Also the pain of building bleeding edge versions of torch vllm llama.cpp sglang etc reminds me of the early days of Linux when every time I needed to do anything I had to recompile the kernel. :-) https://twitter.com/i/web/status/2008628807180427492 https://twitter.com/i/web/status/2008628807180427492"
X Link 2026-01-06T19:56Z 11.8K followers, [---] engagements
"@MoodiSadi That's true for Pipecat (you need WSL). But the actual Docker container should work on Windows I would have thought"
X Link 2026-01-06T20:06Z 11.9K followers, [--] engagements
"@slowhandzen @BryceWeiner @tabali_tigi @kamalrhubbard @2Randos @MeemStein @tykillxz @kaur_q24 @newmanass @diegopuig13 @parttimewhore1 @SouthDallasFood @mrubel495 @jessie_murphy_ @kas7649 @Jomari_P @FlashpointIs @SezarSurge @S3kr3tt0 @notsofast @ParallelTCG @Variety @DrTiaSpores @ggchronicles_ @DrFresch @Goldoshi @netflix @InvincibleHQ @WGAWest It's all relative. If I win the actual lottery I'm going to put in a pre-order for DGX Station: https://www.nvidia.com/en-us/products/workstations/dgx-station/ https://www.nvidia.com/en-us/products/workstations/dgx-station/"
X Link 2026-01-06T23:01Z 11.8K followers, [--] engagements
"@max_does_tech You can hear it in the demo Thats the Pipecat open source native audio smart turn model. https://github.com/pipecat-ai/smart-turn https://github.com/pipecat-ai/smart-turn"
X Link 2026-01-07T01:43Z 11.9K followers, [----] engagements
"@letsbuildmore Sadly we are still a ways away from LLMs small enough to run on an iPhone that can do good open-ended conversation. Whats your use case"
X Link 2026-01-07T01:44Z 11.8K followers, [---] engagements
"@computeless I think it depends on the use case. If your LLM is in the cloud it often makes sense to run the transcription and LLM pipeline tightly coupled in the cloud. (See all the production agents running on @pipecat_ai for things like phone answering and customer support for example.)"
X Link 2026-01-07T01:51Z 11.9K followers, [---] engagements
"This robot assistant from the NVIDIA CES Keynote on Monday is going viral. @NaderLikeLadder explains all the hottest emerging AI trends in one demo: AI applications in [----] will be multi-model multi-modal hybrid cloud/local use open source models as well as proprietary models control robots and embedded devices in the physical world and have voice interfaces. (And the demo had a cute robot and a cute dog. Gold.) The demo was built with @pipecat_ai. NVIDIA posted a really nice technical walk-through and complete code. The Reachy Mini robot from @huggingface is open source hardware. (You can"
X Link 2026-01-07T03:33Z 11.9K followers, 48.6K engagements
"How to build a robot assistant: GitHub repo: Get your own Reachy Mini robot: https://huggingface.co/blog/reachy-mini https://github.com/brevdev/reachy-personal-assistant https://huggingface.co/blog/nvidia-reachy-mini https://huggingface.co/blog/reachy-mini https://github.com/brevdev/reachy-personal-assistant https://huggingface.co/blog/nvidia-reachy-mini"
X Link 2026-01-07T03:33Z 11.9K followers, [----] engagements
"@darshil @RealtimeUK Technical write-up and GitHub repo you can clone (run locally on a [----] or deploy to the cloud to test): - - https://github.com/pipecat-ai/nemotron-january-2026 https://www.daily.co/blog/building-voice-agents-with-nvidia-open-models/ https://github.com/pipecat-ai/nemotron-january-2026 https://www.daily.co/blog/building-voice-agents-with-nvidia-open-models/"
X Link 2026-01-07T06:38Z 11.9K followers, [--] engagements
"@anayatkhan09 Hot take but sub-300ms is too fast unless turn detection is perfect. I dont even like talking to most actual people who do sub-300ms responses. 😜"
X Link 2026-01-07T07:01Z 11.8K followers, [---] engagements
"@TshwaneGaming Pipecat is a completely open source vendor neutral framework for building voice agents. It's the most widely used voice AI agent framework so it's a good place to start: https://www.pipecat.ai/ https://www.pipecat.ai/"
X Link 2026-01-07T17:34Z 11.9K followers, [--] engagements
"New release of the @pipecat_ai Smart Turn model today. (Plus a funny LLM outtake in the demo video .) This is a point release (version 3.2) with some nice quantitative improvements for short speech segments and noisy environments. Good turn detection is important for voice agents. Smart Turn is an open source native audio turn detection model that you can drop into any voice agent to give you very fast accurate turn detection. In Pipecat pipelines we generally run Smart Turn in parallel with transcription. This parallelization gives you the fastest possible end-to-end latency. If you're using"
X Link 2026-01-07T22:26Z 11.9K followers, [----] engagements
"Here's the Smart Turn v3 announcement post: Model training code weights and inference code: Getting started with Smart Turn in Pipecat voice agents: Here's the code for the NVIDIA open source voice agent in the demo: https://github.com/pipecat-ai/nemotron-january-2026/tree/khk/smart-turn-3.2 https://docs.pipecat.ai/server/utilities/smart-turn/smart-turn-overview https://github.com/pipecat-ai/smart-turn https://www.daily.co/blog/smart-turn-v3-2-handling-noisy-environments-and-short-responses/ https://docs.pipecat.ai/server/utilities/smart-turn/smart-turn-overview"
X Link 2026-01-07T22:26Z 11.9K followers, [---] engagements
"@MoodiSadi @pipecat_ai Yes. The main branch uses the Smart Turn v3.1 model weights. (They were the newest weights as of yesterday) I put up a branch that uses the v3.2 weights today (basically a 2-line change): https://github.com/pipecat-ai/nemotron-january-2026/tree/khk/smart-turn-3.2 https://github.com/pipecat-ai/nemotron-january-2026/tree/khk/smart-turn-3.2"
X Link 2026-01-07T22:42Z 11.9K followers, [---] engagements
"Nemotron [--] Nano is definitely on the small side for a conversational voice LLM. And the choppiness might be the experimental Magpie voice inference code we wrote which is optimized for running on an RTX [----]. This is a tech demo showing preview models and the ability to run locally for specific use cases rather than production agent code. For a typical production voice agent use case you could start with a more conventional Pipecat pipeline: https://docs.pipecat.ai/getting-started/quickstart https://docs.pipecat.ai/getting-started/quickstart"
X Link 2026-01-07T22:49Z 11.9K followers, [--] engagements
"We should talk to Marcus (who does the heavy lifting on the model training) about it. There is a threshold parameter in the inference code so it would be easy to add. But the model is trained to have a very bimodal distribution though. The theory is that over-fitting is better in a relatively data-constrained training context. Another idea: if you record yourself doing real sessions and instrument things so that you can easily pull out the audio from the models wrong decisions we can add to the data set. I think with [---] real samples the model will be 10x better for you."
X Link 2026-01-08T03:32Z 11.9K followers, [--] engagements
"@letsbuildmore Terminus iOS app for ssh. Connect to one of my desktop machines at home into tmux sessions. Works great except the Claude Codes terminal buffering code isnt optimized for this exactly so sometimes I see a lot of terminal repaints"
X Link 2026-01-08T03:34Z 11.8K followers, [---] engagements
"Inference is fast because all three models are fast we customized the inference code for streaming (possible because the whole stack is open source) and we built on the @pipecat_ai realtime agent core code that's designed to do this very fast multimodal AI processing. To be fast enough for human-quality voice conversations you need good performance at all three of those levels: model architecture inference stack and agent architecture. We wrote about all the optimization here: Happy to answer questions https://www.daily.co/blog/building-voice-agents-with-nvidia-open-models/"
X Link 2026-01-08T17:36Z 11.9K followers, [--] engagements
"This model has a "confidence" threshold internally but it's not intended to be used in production. We intensionally trained this model to have a strongly bimodal distribution. The intuition here is that in a domain where data quantity is the limiting factor it's best to aim to overfit in training (loosely speaking). We also don't use the length of the trailing silence as input to training. I'm a little less sure about this decision to be honest and can imagine re-visiting it. https://twitter.com/i/web/status/2009320169135317191 https://twitter.com/i/web/status/2009320169135317191"
X Link 2026-01-08T17:43Z 11.9K followers, [--] engagements
"Pipecat Cloud is @trydaily's enterprise hosting platform for open source voice agents. Today after a 9-month beta period we're promoting Pipecat Cloud to General Availability With Pipecat Cloud you build your voice agent on @pipecat_ais open source vendor neutral core add your custom code and agent logic and then docker push to Pipecat Cloud. As with everything we do Pipecat Cloud is engineered to give you flexibility to not lock you into any service including Pipecat Cloud itself. Any code that you can host on Pipecat Cloud you can self-host with no changes at all. We've focused on"
X Link 2026-01-08T21:53Z 11.9K followers, [----] engagements
"Pipecat Cloud announcement blog post with more details: Quickstart: https://docs.pipecat.ai/getting-started/quickstart https://www.daily.co/blog/pipecat-cloud-is-now-generally-available/ https://docs.pipecat.ai/getting-started/quickstart https://www.daily.co/blog/pipecat-cloud-is-now-generally-available/"
X Link 2026-01-08T21:53Z 11.9K followers, [---] engagements
"Last May @mark_backman hosted the infrastructure session of our month-long community Voice AI course. Mark's overview is still the best primer on how to deploy voice agents to production: job processing and compute cluster requirements network routing and audio transport telephony interconnect etc. It's definitely worth watching the whole thing if you're part of a team building voice agents. I was re-watching Mark's video today because we just declared GA for Pipecat Cloud the enterprise hosting platform for open source voice agents. I was curious how much has changed since Mark's overview"
X Link 2026-01-09T18:48Z 11.9K followers, [----] engagements
"Full video of Mark's infrastructure session: Get started with Pipecat Cloud: https://docs.pipecat.ai/getting-started/quickstart https://www.youtube.com/watchv=pRaVTv8RqiU https://docs.pipecat.ai/getting-started/quickstart https://www.youtube.com/watchv=pRaVTv8RqiU"
X Link 2026-01-09T18:48Z 11.9K followers, [---] engagements
"100%. Native voice in Claude Code is something I'm sure a lot of people would use based on my experience hacking together various versions and forcing myself to use voice until I got over the "this is new I'm used to the keyboard" hump. I've posted a few times about my personal hacked-up versions of this. https://x.com/kwindla/status/1962597878138053036s=20 https://x.com/kwindla/status/1949308553015263609s=20 https://x.com/kwindla/status/1962597878138053036s=20 https://x.com/kwindla/status/1949308553015263609s=20"
X Link 2026-01-09T21:51Z 11.8K followers, [--] engagements
".@chadbailey59 wrote a nice introduction to the "structured conversations" approach to building super-reliable voice agents. If you've been experimenting with Ralph Wiggum coding loops you already understand the most important thing about structured conversations: throwing away old context is the only way to design complex LLM workflows that execute reliably. Pipecat Flows is an open source library that supports structured conversations and context engineering for complex voice agent worfklows. Things like: - Food ordering - the agent needs to answer questions record items confirm a"
X Link 2026-01-12T19:38Z 11.9K followers, [----] engagements
"Beyond the Context Window: Why Your Voice Agent Needs Structure: Pipecat Flows open source structured conversations framework: More on engineering reliability especially for instruction following and tool calling in multi-turn conversations: https://voiceaiandvoiceagents.com/#scripting https://github.com/pipecat-ai/pipecat-flows/ https://www.daily.co/blog/beyond-the-context-window-why-your-voice-agent-needs-structure-with-pipecat-flows/ https://github.com/pipecat-ai/pipecat-flows/ https://www.daily.co/blog/beyond-the-context-window-why-your-voice-agent-needs-structure-with-pipecat-flows/"
X Link 2026-01-12T19:38Z 11.9K followers, [---] engagements
"Changelog: Pipecat quickstart: https://docs.pipecat.ai/getting-started/quickstart https://github.com/pipecat-ai/pipecat/releases/tag/v0.0.99 https://docs.pipecat.ai/getting-started/quickstart https://github.com/pipecat-ai/pipecat/releases/tag/v0.0.99"
X Link 2026-01-15T04:44Z 11.9K followers, [---] engagements
"@MoodiSadi @Krisp_ai Personally I want to see @mark_backman and @aconchillo do a mini-tutorial on the next episode of Pipecat TV"
X Link 2026-01-15T04:54Z 11.9K followers, [--] engagements
"@codewithimanshu @Krisp_ai All the things: voice-to-voice text-to-voice voice-to-text video-to-voice voice-to-video video-to-video voice-to-code voice-to-liquid-ui "
X Link 2026-01-15T18:06Z 11.9K followers, [--] engagements
"Full session: Getting started with open source voice agent tooling: Chad's Pipecat Flows technical overview: https://www.daily.co/blog/beyond-the-context-window-why-your-voice-agent-needs-structure-with-pipecat-flows/ https://docs.pipecat.ai/getting-started/quickstart https://www.youtube.com/watchv=j-ARPPjJtRQ&list=PLzU2zoMTQIHjMPZ-OnpC3ozZs3bp3kIUs https://www.daily.co/blog/beyond-the-context-window-why-your-voice-agent-needs-structure-with-pipecat-flows/ https://docs.pipecat.ai/getting-started/quickstart https://www.youtube.com/watchv=j-ARPPjJtRQ&list=PLzU2zoMTQIHjMPZ-OnpC3ozZs3bp3kIUs"
X Link 2026-01-15T21:52Z 11.9K followers, [---] engagements
"If you're getting started with voice agents and Android the Pipecat Android demo client has all the core components a client-side voice AI app needs: voice input and output device control and network transport. Marcus just updated the code which now supports two WebRTC transports. The Pipecat SmallWebRTCTransport for zero-dependency peer-to-peer connections. And the Daily WebRTC transport for large-scale production use. The demo bot also sends a video stream which the app renders. You can actually use this code to connect to any voice AI service that implements the RTVI standard too not just"
X Link 2026-01-16T23:04Z 11.9K followers, [---] engagements
"Pipecat Android demo client: Pipecat Simple Chatbot example (for the client to connect to). Two flavors of bot are provided: gpt-4o and Gemini Live. But you can modify the bot to use any TTS LLM STT or speech-to-speech model: https://github.com/pipecat-ai/pipecat-examples/tree/main/simple-chatbot https://github.com/pipecat-ai/pipecat-examples/tree/main/simple-chatbot/client/android https://github.com/pipecat-ai/pipecat-examples/tree/main/simple-chatbot https://github.com/pipecat-ai/pipecat-examples/tree/main/simple-chatbot/client/android"
X Link 2026-01-16T23:04Z 11.9K followers, [---] engagements
"@nasimuddin01 We will definitely support Gemini Live in Pipecat Flows as soon as the Live API allows modifying the conversation context and tools list. That's the missing piece right now"
X Link 2026-01-16T23:10Z 11.9K followers, [--] engagements
".@maxipesfix forked the open source audio Smart Turn model and added video Smart Turn is a "turn detection" model used in a conversational agent to decide when the agent should respond. The model training data and training code are all completely open source. When we built the first version of Smart Turn enabling this kind of extention and collaboration is exactly why we wanted to make everything open source. Maxim's blog post is super useful to read if you're interested in training multimodal models. It describes the design choices and technical details (3D ResNet late fusion two-stage"
X Link 2026-01-17T21:25Z 11.9K followers, [----] engagements
"macOS really wants me to install some updates and reboot so I'm going through all the "I need to look at these tabs" browser tabs before they are lost to me forever. Last month @freeplay_ai launched a nifty new AI analytics feature that looks quite useful for voice agents. In the post below @cairns describes this tooling as helping you identify patterns in production data. For example by surfacing similar examples that initial human review didn't catch and suggesting next steps (building specific kinds of evals creating new test data sets automated optimization runs). Here's my mental model"
X Link 2026-01-18T20:11Z 11.9K followers, [----] engagements
"@pxng0lin Oh cool. Yeah CUDA version stuff often is a time sink for me. Feel free to submit a PR that adds another Dockerfile of it might be useful for other people"
X Link 2026-01-18T22:03Z 11.9K followers, [---] engagements
"I'll be hanging out at the Weights & Biases office in San Francisco with a couple hundred old and new friends next weekend. Come join us and build some self-improving agents WeaveHacks is back Jan 31-Feb [--] at W&B HQ in SF. This time we're building self-improving agents. We've seen @GeoffreyHuntley's Ralph and @Steve_Yegge's Gas Town push the boundaries of what agents can do. Now it's your turn to build what comes next. Details below. 👇 https://t.co/2396L2sd1r WeaveHacks is back Jan 31-Feb [--] at W&B HQ in SF. This time we're building self-improving agents. We've seen @GeoffreyHuntley's Ralph"
X Link 2026-01-23T00:07Z 11.9K followers, [----] engagements
"Sunday second screen World Cup reading: memories of playing soccer in Cte dIvoire - https://medium.com/@kwindla/scratch-the-surface-and-the-beautiful-game-is-always-there-5440dbc1781c https://medium.com/@kwindla/scratch-the-surface-and-the-beautiful-game-is-always-there-5440dbc1781c"
X Link 2014-07-13T20:10Z 10.9K followers, [--] engagements
"Tap tap tap . announcement: I just reserved my bot on @botdotme All the social medias are belong to us. http://bot.me/ http://bot.me/"
X Link 2016-06-27T19:13Z 10.9K followers, [--] engagements
"@nelson Oh man I love Hollinghurst"
X Link 2020-05-18T16:48Z 10.7K followers, [--] engagements
"The Rainbow Room The Golden Gate Bridge and AI "
X Link 2023-06-01T18:28Z [----] followers, [---] engagements
"AI thought of the day: "without context a Large Language Model is basically a reddit simulator." @charles_irl. I think"reddit simulator" is a more useful mental model than "blurry JPEG of the web""
X Link 2023-09-07T21:39Z [----] followers, [----] engagements
"I took a break to check the surf. But mostly I checked Slack. Heres a picture to balance things out. It really is beautiful today"
X Link 2023-09-08T22:59Z [----] followers, [--] engagements
"This is my favorite thing on @ProductHunt today. I don't make films but I love story boards and for some kinds of design and planning work "thinking in story boards" works better than anything else for me. https://www.producthunt.com/posts/storiaboard https://www.producthunt.com/posts/storiaboard"
X Link 2023-09-21T19:33Z [----] followers, [----] engagements
"AI thought of the day . This morning in an ambiguous situation at a four-way stop a self-driving car proceeded through the intersection when I thought it was my turn to go. Was it dangerous No. Did it make me laugh Yes. This morning did I see several human drivers at four-way stops drive more aggressively Also yes. Did I see several human drivers do things I actually thought were dangerous Again yes. Have we collected enough data to know if self-driving cars cause fewer injuries per mile than the average human driver I have no idea. Do I think today's self-driving cars are better drivers than"
X Link 2023-09-22T23:52Z [----] followers, [--] engagements
"My favorite thing on @ProductHunt today: PumpGTP #AWS group buying + LLM-powered devops engineering expertise. AWS is hard to beat for flexibility and breadth of cloud services. This is especially valuable early on in building out a new product or tech stack. If you're fortunate enough to grow though AWS cost can scale alarmingly and the "committed spend" discounts AWS offers are complex to negotiate and as rigid as the pay-as-you-go AWS offerings are flexible. PumpGPT aggregates infrastructure discounts and gives you back that flexibility. They've also just announced an AWS Certified in"
X Link 2023-09-24T23:06Z [----] followers, [----] engagements
"My "AI thought of the day" today is a long one A whole blog post. Over the next couple of weeks we're launching two new AI-focused toolkits publishing a bunch of sample code and fun demos and announcing several partnerships. We've spent the last six years building the world's best infrastructure and SDKs for real-time audio and video. Now we're building on that work to support all the exciting next-generation use cases that combine LLMs (and other AI tools) with WebRTC voice video streaming and recording"
X Link 2023-09-26T18:50Z [----] followers, [----] engagements
"My favorite launch on @ProductHunt today: Founder Salary Report I just had this conversation today with a friend: "I raised a seed round and one of the things I'm trying to think through is how much to pay myself. I'm not paying myself anything right now but should I be And if so how much" This was maybe the 5th or 6th topic we talked about sharing thoughts and notes about our experiences as startup founders. It's never the most important thing. But it's something every founder does have to think about at some point. And advice from early investors is likely to be all over the map. So"
X Link 2023-09-26T21:27Z [----] followers, [----] engagements
". and it's out in the world An SDK for AI + real-time audio video and data. This library makes it so easy to connect an LLM into a video call that I've been tinkering with WebRTC + GPT-4 in a colab notebook. 🤯"
X Link 2023-09-28T01:43Z [----] followers, [----] engagements
"Here's a closer look at the tech stack and APIs underneath the AI-powered Clinical Notes tools for #telehealth that we released last week. "Todays most advanced 'frontier' Large Language Models have an impressive range of use cases. But perhaps themostimpressive thing about them to a computer programmer is that they are good at turningunstructured input dataintostructured output. This is a genuinely new capability and is perhaps the biggest reason so many engineers are so excited about these new tools." Thank you to our partners @DeepgramAI and @ScienceDotIO who have created amazing"
X Link 2023-09-28T16:51Z [----] followers, [----] engagements
"AI thought of the day: I've been accumulating a list of the various ways "AI" feels like a platform shift. At some level this is just a grab bag of analogies to previous technology trends and patterns. Or as the kids say a vibe. On the other hand there is fairly broad agreement (in retrospect) about what the big tech platform shifts have been in my lifetime: the PC the Internet the mobile phone. Anyway I thought of a new one while listening to @eriktorenberg and @labenz on The Cognitive Revolution podcast . MICROPAYMENTS. Which you know all caps because micropayments are one of those"
X Link 2023-09-29T17:27Z [----] followers, [---] engagements
"Also regulatory compliance legal pricing customer support"
X Link 2023-10-07T02:12Z [----] followers, [---] engagements
"Thank you for having me on the AI Chat podcast @jaeden_ai. Really fun conversation"
X Link 2023-10-14T16:19Z [----] followers, [---] engagements
"AI thought of the day: nobody knows what they're doing. Half the conversations I have with engineers building the most interesting capable state of the art tools right now and half the conversations I listen to on podcasts swing around at some point to the "nobody in AI knows what they're doing right now including me" topic. We don't really know much about how to train large language models effectively. Relatedly we don't know much about how to optimize the data sets we use for training. Evaluation (figuring out what a model is good at) is mostly ad hoc. Prompting is a dark art. Retrieval"
X Link 2023-10-22T02:07Z [----] followers, [---] engagements
"The DALL-E [--] prompt for the image was "An image of a computer programmer writing code that implements a very large-scale distributed system smiling to herself and imagining the vast world of computation that she is creating." (Then three rounds of variations)"
X Link 2023-10-22T02:08Z [----] followers, [--] engagements
"Speaking of moving the conversation forward when @terronk tested an early version of this demo he started out with "tell me a story using Disney IP" Which turned out to be 1) a totally 🔥 voice prompt and 2) an interesting experiment to run if you're building LLM apps"
X Link 2023-10-26T15:23Z [----] followers, [---] engagements
"There were way way more Nintendo costumes than Star Wars costumes at trick-or-treating by the beach on Sunday. Im old so I still think of movies as the big thing. But that has not been true for a while. Video games are the big thing"
X Link 2023-10-31T04:33Z [----] followers, [---] engagements
"My favorite launch on @ProductHunt today: LangChain Templates LangChain is a framework for building AI-powered applications. It's widely used evolving quickly and strikes a nice balance between stability and experimentation. Depending on what you're doing LangChain might be a good choice for prototyping production or both. In addition reading how various features and integrations are implemented in LangChain is a great way to learn. (I'm a big fan of learning by reading source code.) The new templates feature gives you even easier ways to set up and deploy apps (and even more source code to"
X Link 2023-10-31T18:02Z [----] followers, [----] engagements
"Agreed. Observationally: it is extremely hard for companies operating at massive scale to ship net new product lines. Which has implications for acquisition strategies anti-trust regulation and startup planning among other things. Amazon and Apple both seem to have developed organizational hacks to partially counteract this. Interestingly from the outside at least these two companies mechanisms seem radically different. So theres not one answer here. Relatedly its extremely hard to scale up without losing a lot of product velocity. Kudos to Open AI for continuing to ship great stuff at speed"
X Link 2023-11-07T15:13Z [----] followers, [--] engagements
"There are a few commercial options. I think @topazlabs is the most accessible tool that I've seen. I'm a little surprised that there aren't good open source models for this yet. There are lots of research papers. But searches on @huggingface and @sievedata didn't turn up anything ready to use"
X Link 2023-11-07T21:57Z [----] followers, [--] engagements
"Speech-to-speech translation is a radically under-appreciated capability of SOTA AI. You can now have a conversation with most people in the world in their own language. (And they can talk to you in yours.) Translation accuracy is more than good enough to have a natural conversation. So is translation latency. And both are going to continue to improve. Here's a real-time real-world AI text-to-speech comparison between #Azure Speech and @play_ht . (Transcription by @DeepgramAI and translation by @OpenAI GPT-4.) https://t.co/1fmhy90rZC Here's a real-time real-world AI text-to-speech comparison"
X Link 2023-11-17T20:05Z [----] followers, [---] engagements
"ChatGPT has replaced approximately half of my general googling both work and non-work. ChatGPT Vision is better than Google Translate's "camera" function. Which turns out to be really nice for helping with 3rd grade Spanish homework. When I have piece of writing to do and am a little bit stuck in semi-procrastination mode occasionally I just ask ChatGPT to draft me something. Because I'm very particular about writing I rarely end up using anything that ChatGPT produced (even when it's quite good for the task at hand). But the quick back-and-forth reliably gets me out of semi-procrastination"
X Link 2023-11-30T17:23Z [----] followers, [---] engagements
"Twilio announced today that they are discontinuing their Programmable Video service. #WebRTC is a small world. If you are impacted in any way by this change my DMs are open and I'm here to be helpful however I can"
X Link 2023-12-05T00:22Z [----] followers, 18.6K engagements
"@srs_server I humbly suggest looking at @trydaily. All the features youve relied on from a great provider like Twilio scalable global infrastructure HIPAA and SOC [--] plus SDKs that make it easy to build AI-powered features like meeting copilots and RAG video search"
X Link 2023-12-05T05:41Z [----] followers, [--] engagements
"The follow-up today is that Twilio is partnering with Zoom to transition their WebRTC customers to Zoom. For engineering decision-makers trying to understand how this news impacts their products and teams it's important to highlight that moving to Zoom is a core technology transition as well as a port to a new SDK. Zoom's tech stack is very good. It's also not WebRTC. There are some positives and some negatives to this. Zoom doesn't have to be standards compliant so can optimize some elements of their implementation without worrying about compatibility with the WebRTC implementations in"
X Link 2023-12-05T19:07Z [----] followers, [---] engagements
"As exciting as anticipated advances in materials science and biology are and as transformative as cognition too cheap to meter will be lets be honest: its really all about talking to whales. Sperm whales have equivalents to human vowels. We uncovered spectral properties in whales clicks that are recurrent across whales independent of traditional types and compositional. We got clues to look into spectral properties from our AI interpretability technique CDEV. https://t.co/8sEAzPkMfo Sperm whales have equivalents to human vowels. We uncovered spectral properties in whales clicks that are"
X Link 2023-12-06T03:38Z [----] followers, [---] engagements
"@embirico @nickarner @with_multi @webrtc @Zoom @twilio @trydaily @embirico We have native macOS builds internally but haven't had enough customer pull to treat native macOS as a GA target. I'm assuming Electron isn't sufficient for what you're doing"
X Link 2023-12-15T20:05Z [----] followers, [--] engagements
"Motivated by Twilio's announcement that Twilio Video is going away I've been spending some time digging into what the latest version of Zoom's Web SDK can (and can't) do. It's definitely getting better but it's still much less performant than native WebRTC. And it's still missing a lot of things that web video apps need"
X Link 2023-12-19T00:14Z [----] followers, [----] engagements
"Agreed that looking at data manually is super important. Like many things in test/evaluation/QA you can write lots and lots of test code (and you should) but unless you are also always doing manual validation youll definitely miss things as you scale up. Relatedly heres GPT-4V going waaay off the rails in response to a prompt that I thought was relatively simple"
X Link 2023-12-23T17:37Z [----] followers, [---] engagements
"This is thinking too small. The question is not at what market cap can you wear the same regular clothes every day. The question is at what market cap you can wear different pajamas (and only pajamas) every day. Asking for a friend but at what market cap can you pull a Jensen Huang and wear the same clothes every day Massive alpha in not having to care. Asking for a friend but at what market cap can you pull a Jensen Huang and wear the same clothes every day Massive alpha in not having to care"
X Link 2024-01-11T01:46Z [----] followers, [---] engagements
"@jeiting I dont know man. Seems like nobody has ever tried to compete with Salesforce for example. Seems easy. My startup barely needs even half the features"
X Link 2024-01-11T20:34Z [----] followers, [--] engagements
"I agree with all of this. Today's glass rectangle will not be the primary device we all carry around ten years from now. Been thinking (a lot) lately about something Alan Kay used to say (a lot): People who are really serious about software should make their own hardware Friday afternoon hot take: I think we're at the start of a new wave of computing hardware (devices). Humane Pin Rabbit R1 Rewind pendant Meta Raybans Meta Quest Vision Pro more to come. Let's go back [--] years. It's [----]. Someone cool has the coolest phone: a Motorola Friday afternoon hot take: I think we're at the start of a"
X Link 2024-01-19T23:33Z [----] followers, [---] engagements
"The Waymo::Uber experience difference today feels similar to the Uber::taxi experience difference in [----]. Quality of Uber drivers is tanking I think they just hire anyone now no quality control at all Quality of Uber drivers is tanking I think they just hire anyone now no quality control at all"
X Link 2024-01-23T19:40Z [----] followers, [----] engagements
"I sat down to write a talk about how to migrate code between SaaS platforms (in this case focusing on migrating from Twilio Video which is now EOL). The first draft was [----] words. Don't worry I edited it down in the second draft. (Somewhat.) Here's the script and video if you're interested"
X Link 2024-01-26T05:11Z [----] followers, [---] engagements
"The short answer is that though they had some excellent engineers working on video Twilio did not invest enough in that product. Real-time video is complex both at the infrastructure and SDK level and the surface area of features that customers want/need is pretty large. We are very lucky that venture investors believed in us and our market enough to fund us to build what the market needed"
X Link 2024-01-26T05:31Z [----] followers, [--] engagements
"So this Lumiere thing (which looks awesome) is another AI "release" from Google that I can't try out even as a demo huh"
X Link 2024-01-26T16:11Z [----] followers, 80.3K engagements
"Voice + AI meetup on Wednesday at the Cloudflare office in San Francisco. Pizza conversation and demos. Thanks to this meeting's panelists: @natrugrats @rajivayyangar and @Prafulfillment. And thanks to @CloudflareDev for letting us come hang out with the lava lamps. RSVP here:"
X Link 2024-01-29T02:14Z [----] followers, [----] engagements
"Really fun event last night. [--] people came out in the rain to talk about voice/real-time/AI topics. We threw up a little weather-appropriate AI-generated textures toy demo in the background on the big screen. Thanks to @fal_ai_data for the high-fps Stable Diffusion Turbo API endpoint. 🌧🔥"
X Link 2024-02-01T21:42Z [----] followers, [----] engagements
"dark mirror (real-time image-to-image thanks to @fal_ai_data)"
X Link 2024-02-03T02:30Z [----] followers, [----] engagements
"My dreams last night were Vision Pro punk. Crystalline rectangles fleched translucent and razor-lit stratified above muddy reality"
X Link 2024-02-08T17:16Z [----] followers, [---] engagements
Limited data mode. Full metrics available with subscription: lunarcrush.com/pricing
/creator/x::kwindla