#  @basetenco Baseten Baseten posts on X about inference, ai, model, compound the most. They currently have [------] followers and [---] posts still getting attention that total [-----] engagements in the last [--] hours. ### Engagements: [-----] [#](/creator/twitter::1375579341178818561/interactions)  - [--] Week [-------] -89% - [--] Month [---------] +17,091% - [--] Months [---------] +538% - [--] Year [---------] +827% ### Mentions: [--] [#](/creator/twitter::1375579341178818561/posts_active)  - [--] Week [--] no change - [--] Month [--] +600% - [--] Months [--] +50% - [--] Year [---] +108% ### Followers: [------] [#](/creator/twitter::1375579341178818561/followers)  - [--] Week [-----] +0.99% - [--] Month [-----] +12% - [--] Months [-----] +52% - [--] Year [-----] +163% ### CreatorRank: [---------] [#](/creator/twitter::1375579341178818561/influencer_rank)  ### Social Influence **Social category influence** [technology brands](/list/technology-brands) [stocks](/list/stocks) [finance](/list/finance) [cryptocurrencies](/list/cryptocurrencies) [vc firms](/list/vc-firms) [travel destinations](/list/travel-destinations) [social networks](/list/social-networks) [events](/list/events) [nhl](/list/nhl) [countries](/list/countries) **Social topic influence** [inference](/topic/inference) #164, [ai](/topic/ai), [model](/topic/model), [compound](/topic/compound), [gpu](/topic/gpu), [in the](/topic/in-the), [realtime](/topic/realtime), [stack](/topic/stack), [dynamo](/topic/dynamo), [6969](/topic/6969) **Top assets mentioned** [Cogito Finance (CGV)](/topic/cogito) [Microsoft Corp. (MSFT)](/topic/microsoft) [QUALCOMM, Inc. (QCOM)](/topic/qualcomm) [MongoDB, Inc. (MDB)](/topic/mongodb) [DeepSeek (DEEPSEEK)](/topic/deepseek) [Uber Technologies, Inc. (UBER)](/topic/uber) [1000X (1000X)](/topic/1000x) ### Top Social Posts Top posts by engagements in the last [--] hours "Former U.S. Chief Data Scientist @DevotedHealth board member & angel investor @dpatil sits down w/us to chat: β€ Finding passion from failure π‘ Defense Digital Service origins πͺ Maximizing impact as a data scientist https://buff.ly/3KXKFKH https://buff.ly/3KXKFKH" [X Link](https://x.com/basetenco/status/1524061726634881032) 2022-05-10T16:20Z [----] followers, [--] engagements "Brilliant search tech leveraging Pinecone by @MenloVentures a fave in the venture space. We love the VC + AI love" [X Link](https://x.com/basetenco/status/1626779613597958150) 2023-02-18T03:04Z [----] followers, [---] engagements "LangChain + Baseten = β₯ Build with LLMs like Falcon WizardLM and Alpaca in just a few lines of code using LangChain's Baseten integration" [X Link](https://x.com/basetenco/status/1674452179455729667) 2023-06-29T16:18Z [----] followers, [--] engagements "#SDXL Stable Diffusion XL [---] is here: the largest most capable open-source image generation model of its kind. You can deploy it in [--] clicks from our model library: Note the accuracy and detail in the face and hands of this kind old wizard:" [X Link](https://x.com/basetenco/status/1684314065575743491) 2023-07-26T21:25Z [----] followers, [--] engagements "Send us a prompt We'll reply with an awesome AI image generated by Stable Diffusion XL 1.0" [X Link](https://x.com/basetenco/status/1685005495470616576) 2023-07-28T19:13Z [----] followers, [--] engagements "Happy Monday We spent our weekend playing with the new Stable Diffusion XL ControlNet modules from @huggingface. Deploy it yourself today on Baseten π" [X Link](https://x.com/basetenco/status/1693659902655533484) 2023-08-21T16:22Z [----] followers, [---] engagements "Llama [--] + @chainlit_io = open-source ChatGPT" [X Link](https://x.com/basetenco/status/1694458606693826937) 2023-08-23T21:16Z [----] followers, [---] engagements "Look what you can do with @Twilio @Langchain and Baseten" [X Link](https://x.com/basetenco/status/1695158181263978805) 2023-08-25T19:36Z [----] followers, [---] engagements "We love when developers combine our infra with powerful platforms like Twilio and Langchain. Big shoutout to @lizziepika and the @TwilioDevs team for this post" [X Link](https://x.com/basetenco/status/1695158183516307598) 2023-08-25T19:36Z [----] followers, [---] engagements "Last week @varunshenoy_ felt the need the need for speed. He went deep on optimizing SDXL inference to squeeze every last drop of performance out of our GPUs. Heres what he did to get down to [----] seconds for SDXL and [----] seconds for Stable Diffusion [---] on an A100:" [X Link](https://x.com/basetenco/status/1697330482931712159) 2023-08-31T19:28Z [----] followers, [----] engagements "There's a new text embedding model by @JinaAI_ with some exciting properties π - 8192-token context window (embed chapters not pages) - Matches OpenAI's ada-002 on popular benchmarks Use jina-embeddings-v2 for search & recommendations and pair w/ LLMs like Mistral for RAG" [X Link](https://x.com/basetenco/status/1719486879404364009) 2023-10-31T22:49Z [----] followers, [---] engagements "Ready to try open source LLMs Switch from GPT to Mistral 7B in the smallest refactor you'll ever ship: just [--] tiny code changes. If you're making the jump DM us for $1000 in free credits" [X Link](https://x.com/basetenco/status/1727136875348169167) 2023-11-22T01:28Z [----] followers, [----] engagements "Upgrade to @LangChainAI version 0.0.353 to use a refactored Baseten integration featuring: - Support for production and development deployments. - Removal of the baseten client dependency. - All-new usage docs featuring Mistral 7B. Just run: pip install --upgrade langchain" [X Link](https://x.com/anyuser/status/1740845216289132570) 2023-12-29T21:20Z [--] followers, [---] engagements "Deploy ML models on L4 GPUs with Baseten The L4 is a 24GB VRAM card like the A10G. But theyre not interchangeable: - Use L4 for image generation models like Stable Diffusion XL - Use A10G for LLMs like Mistral 7B L4s start at $0.8484/hour (70% of A10 prices)" [X Link](https://x.com/basetenco/status/1745587655801295138) 2024-01-11T23:25Z [----] followers, [---] engagements "Launching today π Double your throughput or halve your latency for @MistralAI @StabilityAI + others Do both at 20% lower cost with @nvidia H100s on Baseten. Heres how π" [X Link](https://x.com/basetenco/status/1754939985734537459) 2024-02-06T18:47Z [----] followers, [----] engagements "40% lower latency and 70% higher throughput for Stable Diffusion XL Using NVIDIA TensorRT to optimize each component of the SDXL image generation pipeline we achieved these performance gains on an H100 GPU. Full results:" [X Link](https://x.com/basetenco/status/1760708692897353869) 2024-02-22T16:50Z [----] followers, [----] engagements "Another first π Unlock the power of @nvidia's Multi-Instance GPU (MIG) virtualization technology with H100mig GPUs on Baseten:" [X Link](https://x.com/basetenco/status/1770855104704291278) 2024-03-21T16:48Z [----] followers, [----] engagements "Stable Diffusion [--] is now available in our model library π Deploy it on an A100 optimized for production and generate highly detailed high-resolution images in seconds. https://www.baseten.co/library/stable-diffusion-3-medium/ https://www.baseten.co/library/stable-diffusion-3-medium/" [X Link](https://x.com/basetenco/status/1801311262288318841) 2024-06-13T17:50Z [----] followers, [---] engagements "Have you ever wondered which model to choose for generating images Check out @rapprach's latest article on our blog to learn the pros and cons of few-step text-to-image models like LCMs SDXL Turbo and SDXL Lightning and find out which model is best for your use case. There are dozens of text-to-image models but only a handful can generate images in real time. In my latest article Im comparing three of the most popular few-step image generation models: LCMs SDXL Turbo and SDXL Lightning. π https://t.co/Y7vkAz40Xu There are dozens of text-to-image models but only a handful can generate images" [X Link](https://x.com/basetenco/status/1801645975469019248) 2024-06-14T16:00Z [----] followers, [---] engagements "Stable Diffusion [--] running in under [--] minutes π€― @rapprach will show you how Try it out: https://www.baseten.co/library/stable-diffusion-3-medium/ https://www.baseten.co/library/stable-diffusion-3-medium/" [X Link](https://x.com/basetenco/status/1804169749472895221) 2024-06-21T15:09Z [----] followers, [---] engagements "Want to learn more about building compound AI systems with Baseten Chains π Don't miss our live webinar with Baseten CTO and co-founder @amiruci and Software Engineer Marius Killinger on July 18th RSVP now and secure your spot π https://buff.ly/4bwEuec https://buff.ly/4bwEuec" [X Link](https://x.com/basetenco/status/1806761524578443544) 2024-06-28T18:48Z [----] followers, [---] engagements "βLast week we introduced Chains a framework for building and orchestrating compound AI workflows. πInterested in how we built Chains or what makes it so powerful Learn more in our new technical deep-dive from Marius Killinger and @rapprach https://www.baseten.co/blog/baseten-chains-explained https://www.baseten.co/blog/baseten-chains-explained" [X Link](https://x.com/basetenco/status/1808161494195929233) 2024-07-02T15:31Z [----] followers, [---] engagements "Just [--] days away π Save your spot: Don't miss our live webinar and Q&A on Thursday July 18th Join our CTO and Co-Founder @amiruci and Software Engineer Marius Killinger to learn how you can build scalable compound AI systems with Baseten Chains. βπ https://buff.ly/4bwEuec https://buff.ly/4bwEuec" [X Link](https://x.com/basetenco/status/1811794058261459219) 2024-07-12T16:05Z [----] followers, [---] engagements "Using NVIDIA TensorRT @defpan and @philip_kiely achieved 40% lower latency and 70% higher throughput for Stable Diffusion XL (SDXL) inference. π₯ β‘ See how: Optimizing model inference for faster image generation delivers a better user experience while saving money on model hosting. π° Performance gains are greater for higher step counts and more powerful GPUs and the techniques used can be applied to similar image generation pipelines including SDXL Turbo. You can also launch SDXL and SDXL Turbo from our model library and enjoy blazing-fast inference times in just a few clicks π" [X Link](https://x.com/basetenco/status/1813216110013018159) 2024-07-16T14:16Z [----] followers, [---] engagements "Breaking news: Llama [---] 70B is a really good model. (p.s. inference will get much faster -- this is a minimally optimized demo on A100s)" [X Link](https://x.com/basetenco/status/1815834012340060187) 2024-07-23T19:38Z [----] followers, [---] engagements "Baseten Chains is a solution for reliable high-performance inference for workflows using multiple models and processing steps. In other words: Chains is built for compound AI systems. β Check out @rapprach's new post to learn more about compound AI: https://www.baseten.co/blog/compound-ai-systems-explained/ Compound AI systems are fueling the next generation of AI products. That's one of the (many) reasons we launched Chains a framework and SDK for compound AI. β So what are compound AI systems What's with all the hype π Here I break it down: https://t.co/XhcXYmVTwP https://t.co/8x3AdTPHwN" [X Link](https://x.com/basetenco/status/1820902128803528817) 2024-08-06T19:17Z [----] followers, [---] engagements "Playground vs. Stable Diffusion XL: which do you think is better π€ @philip_kiely compared them head-to-head: https://www.baseten.co/blog/playground-v2-vs-stable-diffusion-xl-1-0-for-text-to-image-generation/ https://www.baseten.co/blog/playground-v2-vs-stable-diffusion-xl-1-0-for-text-to-image-generation/" [X Link](https://x.com/basetenco/status/1822007443041497395) 2024-08-09T20:30Z [----] followers, [---] engagements "π‘ Want to make your ComfyUI workflow shareable while running it on a powerful GPU π Check out @het_trivedi05 & @philip_kiely's guide: If you're using custom nodes or model checkpoints @het_trivedi05 & @rapprach have a guide for that too π" [X Link](https://x.com/basetenco/status/1822339643687239990) 2024-08-10T18:30Z [----] followers, [--] engagements "π¨ Join us at [--] am PT to learn why you need async inference in production in our live webinar and Q&A We're so excited about this feature we're giving $100 in credits to attendees$200 in credits if you bring a friend π Don't miss it: https://buff.ly/4cnEGMZ https://buff.ly/4cnEGMZ" [X Link](https://x.com/basetenco/status/1824083927281094796) 2024-08-15T14:01Z [----] followers, [---] engagements "@philip_kiely got tired of waiting 8-10 sec for Stable Diffusion XL to generate images so he set out to make it faster. π Using [--] different optimizations he first made it 5x faster: Then using TensorRT he and @defpan further decreased latency by 40% π Take a look: https://buff.ly/4fZi6NH https://buff.ly/4dGFHkP https://buff.ly/4dGFHkP https://buff.ly/4fZi6NH https://buff.ly/4dGFHkP https://buff.ly/4dGFHkP" [X Link](https://x.com/basetenco/status/1827363031145177111) 2024-08-24T15:11Z [----] followers, [--] engagements "@usebland It's been awesome to be on the journey with you. Can't wait to see how what you all do next" [X Link](https://x.com/basetenco/status/1828883170152067451) 2024-08-28T19:51Z [----] followers, 20.6K engagements "LLM inference on GPUs has bottlenecks at two stages: GPU compute (in FLOPS) during prefill when the input is being processed to generate the first token. GPU memory (in GB/s) for the rest of inference when the autoregressive model generates each subsequent token. To prove this: Calculate the ops:byte ratio for a given GPU. Calculate the arithmetic intensity of various stages of LLM inference. Compare the two values to see where inference is compute-bound and where it is memory-bound. Follow the math for yourself: https://www.baseten.co/blog/llm-transformer-inference-guide/" [X Link](https://x.com/basetenco/status/1831708766447751487) 2024-09-05T14:59Z [----] followers, [---] engagements "8 days [--] awesome events coming up π π₯ @usebland + @basetenco AI happy hour (9/12 in SF) π€ @HackMIT [----] (9/14 at MIT) π₯ @AITinkerers Technical Founder BBQ (9/17 in NYC) π @PyTorch Conference [----] (9/18 in SF) Save your spot now if you haven't already" [X Link](https://x.com/basetenco/status/1833526499930214562) 2024-09-10T15:22Z [----] followers, [---] engagements "@usebland @HackMIT @AITinkerers @PyTorch π₯ Our AI happy hour with Bland AI: π₯ Technical Founder BBQ with AI Tinkerers: And don't miss us at HackMIT and PyTorch [----] https://nyc.aitinkerers.org/p/technical-founder-bbq https://lu.ma/j03r0ag1 https://nyc.aitinkerers.org/p/technical-founder-bbq https://lu.ma/j03r0ag1" [X Link](https://x.com/basetenco/status/1833526660467200308) 2024-09-10T15:23Z [----] followers, [---] engagements "@aiDotEngineer @nvidia Thanks for sharing Our team had a blast doing it" [X Link](https://x.com/basetenco/status/1834649248996245829) 2024-09-13T17:44Z [----] followers, [--] engagements "Deploying ML models on NVIDIA H100 GPUs offers the lowest latency and highest bandwidth inference for demanding ML workloads. But getting maximum performance takes more than just loading in a model and running inference" [X Link](https://x.com/basetenco/status/1835317790137258166) 2024-09-15T14:00Z [----] followers, [---] engagements "How are you benchmarking your image generation models π Before using a model like SDXL or FLUX.1 in production youll want to see some performance benchmarks. How fast can your model create images How many images can it create at that speed And how much will it cost π°" [X Link](https://x.com/basetenco/status/1837484430681645218) 2024-09-21T13:30Z [----] followers, [---] engagements "@AIatMeta @Arm @MediaTek @Qualcomm We're excited to bring dedicated deployments of these new Llama [---] models to our customers 90B vision looks especially powerful congrats to the entire Llama team" [X Link](https://x.com/basetenco/status/1839010083822280776) 2024-09-25T18:32Z [----] followers, [----] engagements "Do you know what MIG is To get the most performance possible given available hardware we can use a feature of H100 GPUs that allows us to split a single physical GPU across two model serving instances. Enter: MIG or Multi-Instance GPU" [X Link](https://x.com/basetenco/status/1839788304310354083) 2024-09-27T22:04Z [----] followers, [---] engagements "MIG lets us serve models on fractional GPUs: we get two H100 MIG instances each with about 1/2 the power. These instances often meet or beat A100 GPU performanceat 20% lower cost. π Learn more about using fractional H100 GPUs for efficient model serving: https://baseten.co/blog/using-fractional-h100-gpus-for-efficient-model-serving/ https://baseten.co/blog/using-fractional-h100-gpus-for-efficient-model-serving/" [X Link](https://x.com/basetenco/status/1839789452417282520) 2024-09-27T22:09Z [----] followers, [---] engagements "π¨ OpenAI just dropped a new open-source model π¨ Whisper V3 Turbo is a new Whisper model with: - 8x faster relative speed vs Whisper Large - 4x faster than Medium - 2x faster than Small - 809M parameters - Full multilingual support - Minimal degradation in accuracy" [X Link](https://x.com/basetenco/status/1840883111162155138) 2024-09-30T22:35Z [----] followers, [----] engagements "It's officially #SFTechWeek and we can't wait to see everyone at our talk today β β¨ Hear from our engineers how you can combine multiple ML models in production while minimizing latency + optimizing GPU utilization. πͺ Then join us for drinks food and networking πΈ RSVP if you haven't already (link in thread π)" [X Link](https://x.com/basetenco/status/1843331022596812913) 2024-10-07T16:42Z [----] followers, [---] engagements "β RSVP to Building Compound AI with Baseten Chains #SFTechWeek: https://lu.ma/vk73vrfq https://lu.ma/vk73vrfq" [X Link](https://x.com/basetenco/status/1843331226238890115) 2024-10-07T16:43Z [----] followers, [---] engagements "We're excited to announce our partnership with @MongoDB π€ Together we're enabling companies to build and deploy gen AI apps that scale infinitely and deliver optimal performance per dollar. Looking to run high-performance inference for your MongoDB-powered apps Reach out In September we welcomed [--] new #AI partners: @arizeai @basetenco @doppler @haizelabs @modal_labs @PortkeyAI and @RekaAILabs. Learn more: https://t.co/lYLil65dLG https://t.co/XTN64BZIwV In September we welcomed [--] new #AI partners: @arizeai @basetenco @doppler @haizelabs @modal_labs @PortkeyAI and @RekaAILabs. Learn more:" [X Link](https://x.com/basetenco/status/1844681528665677982) 2024-10-11T10:08Z [----] followers, [---] engagements "Looking to build high-performance RAG systems @philip_kiely recently took to our blog to break down how to eliminate bottlenecks for compound AI systems including RAG using Baseten and @MongoDB" [X Link](https://x.com/basetenco/status/1846531793995899332) 2024-10-16T12:41Z [----] followers, [---] engagements "We benchmarked the new NVIDIA H200 GPUs for LLM inference with @LambdaAPI π H200s crush long input sequences π H200s make huge batches more efficient (high throughput) π H100 GPUs are likely more cost-efficient for many inference workloads" [X Link](https://x.com/basetenco/status/1848804872365478252) 2024-10-22T19:13Z [----] followers, [---] engagements "After the team at @rimelabs trained astonishingly lifelike speech synthesis models with over [---] voices they needed fast reliable infra to bring their API to market. With Baseten they've maintained [---] ms p99 latency and 100% uptime through [----]. https://www.baseten.co/customers/rime-serves-speech-synthesis-api-with-stellar-uptime-using-baseten/ https://www.baseten.co/customers/rime-serves-speech-synthesis-api-with-stellar-uptime-using-baseten/" [X Link](https://x.com/basetenco/status/1849112758207353252) 2024-10-23T15:36Z [----] followers, [---] engagements "You can now deploy models with Baseten as part of @vercel's AI SDK π Run OpenAI-compatible LLMs in your Vercel workflows with our best-in-class model performance. Plus: access all our LLM features (including streaming)in any JS frameworkwith just a few lines of code. πͺ" [X Link](https://x.com/basetenco/status/1851398527009927203) 2024-10-29T22:59Z [----] followers, [----] engagements "Were excited to launch canary deployments on Baseten π¦ π Canary deployments let you gradually shift traffic to new model deployments with seamless rollback if needed. Learn more in our launch blog π" [X Link](https://x.com/basetenco/status/1852025274734719061) 2024-10-31T16:30Z [----] followers, [----] engagements "π Check out the launch blog: https://www.baseten.co/blog/canary-deployments-on-baseten/ https://www.baseten.co/blog/canary-deployments-on-baseten/" [X Link](https://x.com/basetenco/status/1852025279146873144) 2024-10-31T16:30Z [----] followers, [---] engagements "You can't always use open-source LLMs for specific use cases out of the box. To customize them you have optionsbut each varies in difficulty cost and customizability: π¬ Prompt engineering π§ Fine-tuning π RAG π Learn more in @philip_kiely's blog: https://buff.ly/3ximK89 https://buff.ly/3ximK89" [X Link](https://x.com/basetenco/status/1853126748986036224) 2024-11-03T17:27Z [----] followers, [---] engagements "Looking to generate high-res images in record time Launch models from @bfl_ml @StabilityAI and @BytedanceTalk in two clicks including: π FLUX.1 dev and schnell π SDXL (Turbo Lightning.) πͺ SD3 Medium π Start generating images: If you have a performance target in mind just reach out https://buff.ly/46T8pfT https://buff.ly/46T8pfT https://buff.ly/46T8pfT https://buff.ly/46T8pfT" [X Link](https://x.com/basetenco/status/1853576888796533080) 2024-11-04T23:15Z [----] followers, [---] engagements "Check out the latest from our lead DevRel @philip_kiely on @SoftwareHuddle diving into all things compound AI and inference optimization π§ New Episode Alert Discover the tech behind companies like Descript Bland & Robust Intelligence π Deep Dive into Inference Optimization for LLMs with Philip Kiely π Today we have Philip Kiely from @basetenco on the show. Baseten is a Series B startup focused on providing https://t.co/fBWleiIZSa New Episode Alert Discover the tech behind companies like Descript Bland & Robust Intelligence π Deep Dive into Inference Optimization for LLMs with Philip Kiely" [X Link](https://x.com/basetenco/status/1853912889448665488) 2024-11-05T21:30Z [----] followers, [---] engagements "Congrats to @waseem_s @MatanPaul @dorisjwo and the whole team at @Get_Writer on the $200M Series C Its been incredible getting a front-row seat to watching the team build such an incredible platform. π π We're excited to announce that we've raised $200M in Series C funding at a valuation of $1.9B to transform work with full-stack generative AI Today hundreds of corporate powerhouses like Mars @Qualcomm @Prudential and @Uber are using Writers full-stack platform to https://t.co/cwqZTjxMyl π We're excited to announce that we've raised $200M in Series C funding at a valuation of $1.9B to" [X Link](https://x.com/basetenco/status/1856463704646201624) 2024-11-12T22:26Z [----] followers, [----] engagements "Weve heard it from ComfyUI users time and again: our ComfyUI integration is best-in-class. πͺ Easily deploy custom ComfyUI workflows behind an API endpoint in minutes. π Check out @philip_kiely and @het_trivedi05's guide on serving ComfyUI models behind an API endpoint: π And @het_trivedi05 and @rapprach's guide on running custom nodes and model checkpoints: https://buff.ly/3WCYk1T https://buff.ly/49gILlk https://buff.ly/3WCYk1T https://buff.ly/49gILlk" [X Link](https://x.com/basetenco/status/1859352216295125248) 2024-11-20T21:44Z [----] followers, [---] engagements "Have you tried the new @Alibaba_Qwen models yet You can start coding with Qwen2.5 Coder on Baseten in minutesusers say the 32B version outperforms GPT-4o Claude and even the 70B version. π Launch the new 7B 14B and 32B Qwen Coder models: https://www.baseten.co/library/publisher/qwen/ https://www.baseten.co/library/publisher/qwen/" [X Link](https://x.com/basetenco/status/1859700066833105100) 2024-11-21T20:47Z [----] followers, [---] engagements "TheNVIDIA A10 GPUis an Ampere-series graphics card popular for common ML inference tasksbut what about the A10G Despite similar specs there may be slight performance differences in specific use cases. π Find out where in @philip_kiely's blog post: https://www.baseten.co/blog/nvidia-a10-vs-a10g-for-ml-model-inference/ https://www.baseten.co/blog/nvidia-a10-vs-a10g-for-ml-model-inference/" [X Link](https://x.com/basetenco/status/1861182195731214791) 2024-11-25T22:56Z [----] followers, [---] engagements "How can you outperform A100 GPUs at 20% lower cost Meet H100 MIGs. Beyond top-tier specs they offer more quantization options and GPU flexibility across clouds. Check out @MattDotHow Vlad Shulman @defpan and @philip_kiely's guide to understand how H100 MIGs work and what to expect when serving models on https://buff.ly/4gaYbLd https://buff.ly/4gaYbLd" [X Link](https://x.com/basetenco/status/1863269920449900718) 2024-12-01T17:12Z [----] followers, [---] engagements "Were excited to introduce Custom Servers on Baseten To run customers mission-critical inference workloads Baseten had to be great at [--] things: 1: Performance optimizations at the model level 2: Massive-scale infrastructure with cross-cloud horizontal scaling All wrapped in an expressive DevEx. We've built extensive tooling for performance optimizationslike our optimized Engine Builder in Truss. However some of our customers come to us with pre-optimized models and they mainly want to take advantage of the seamless autoscaling Baseten provides. Now they can with the launch of Custom Servers." [X Link](https://x.com/basetenco/status/1864727649760739699) 2024-12-05T17:44Z [----] followers, [---] engagements "Learn more in the launch blog: https://www.baseten.co/blog/deploy-production-model-servers-from-docker-images/ https://www.baseten.co/blog/deploy-production-model-servers-from-docker-images/" [X Link](https://x.com/basetenco/status/1864727806132748749) 2024-12-05T17:45Z [----] followers, [--] engagements "π New Generally Available Whisper drop: The fastest most accurate and cost-effective transcription with over 1000x real-time factor for production AI workloads. π Our new Generally Available Whisper implementation delivers: π Over 1000x real-time factor β¨ The lowest word error rate πͺ Production-grade reliability π§© Custom scaling and hardware per processing step π See how in our blog: Reach out to get record-breaking performance for your mission-critical AI workloads https://www.baseten.co/blog/the-fastest-most-accurate-and-cost-efficient-whisper-transcription/" [X Link](https://x.com/basetenco/status/1866931110711882025) 2024-12-11T19:40Z [----] followers, 10.7K engagements "We exist to make sure the best AI builders get the fastest and most reliable performance from their models. At #AWSreInvent we noticed three key things: [--] Nearly every company is using GenAI models to augment existing products and develop net-new apps. [--] Teams are concerned about how to keep Gen AI workloads compliant and customer privacy robust. [--] Organizations are worried about ensuring a quality UX with spiky traffic and high demand. If you need blazing-fast production performance that's reliable secure and elastic in scale book a meeting with our engineers: Thank you to everyone who made" [X Link](https://x.com/basetenco/status/1867619208223166720) 2024-12-13T17:14Z [----] followers, [---] engagements "DeepSeek-V3 dropped today and the LLM world just got turned upside down. Again. Early indicators are that this model completely transforms the closed and open-source model landscapes. Tl;Dr - OSS is now SOTA/Top3 again. Here are the key details to know: - Open source and licensed for commercial use - Beats Llamas Qwens GPT-4o Sonnet [---] - MoE w/ 671B params 37B active per token - 128K-token context window - Distilled o3-style reasoning Deeper dive in π§΅ This is one of the first models that need the horsepower of H200s GPUs so were getting them ready to go. If youre interested in running" [X Link](https://x.com/basetenco/status/1872402216247808056) 2024-12-26T22:00Z [----] followers, 65.9K engagements "DeepSeek-V3 is an incredibly exciting model combining multiple novel techniques including distilled o3-style Chain of Thought reasoning into a standard commercially-licensed open-source LLM" [X Link](https://x.com/basetenco/status/1872402387698344302) 2024-12-26T22:01Z [----] followers, [---] engagements "We run DeepSeek-V3 on NVIDIA H200 GPUs available by invitation on Baseten. Each H200 has 141GB of VRAM with [---] TB/s of bandwidth. Together [--] H200s have [----] GB enough to load all 671B parameters in FP8 plus a KV cache allocation. H200 benchmarks: https://www.baseten.co/blog/evaluating-nvidia-h200-gpus-for-llm-inference/ https://www.baseten.co/blog/evaluating-nvidia-h200-gpus-for-llm-inference/" [X Link](https://x.com/basetenco/status/1872402484314190237) 2024-12-26T22:01Z [----] followers, [---] engagements "To run DeepSeek-V3 with incredible performance and reliability we use the SGLang fast inference framework. https://x.com/zhyncs42/status/1872242567036977262 We're excited to announce SGLang @lmsysorg v0.4.1 which now supports DeepSeek @deepseek_ai V3 - currently the strongest open-source LLM even surpassing GPT-4o. https://t.co/a13AmSakEb https://x.com/zhyncs42/status/1872242567036977262 We're excited to announce SGLang @lmsysorg v0.4.1 which now supports DeepSeek @deepseek_ai V3 - currently the strongest open-source LLM even surpassing GPT-4o. https://t.co/a13AmSakEb" [X Link](https://x.com/basetenco/status/1872402646172340381) 2024-12-26T22:02Z [----] followers, [---] engagements "Contact our engineers today to get a dedicated deployment of DeepSeek-V3 on H200 GPUs. https://www.baseten.co/library/deepseek-v3/ https://www.baseten.co/library/deepseek-v3/" [X Link](https://x.com/basetenco/status/1872402713688019301) 2024-12-26T22:02Z [----] followers, [---] engagements "Our Co-founder @amiruci and Model Performance Engineer @zhyncs42 sat down with @latentspacepod to dive deep into DeepSeek-V3 and SGLang model performance scaling AI products and more. Listen to the full episode here π : Everything you need to run Mission Critical Inference (ft. DeepSeek v3 + SGLang) https://t.co/N67XXjHsHB We chat with @amiruci and @yinengzhang about the Chinese Whale Bro drop of 2024: - @deepseek_ai v3 - @lmsysorg's SGLang - the Three Pillars of Mission Critical : Everything you need to run Mission Critical Inference (ft. DeepSeek v3 + SGLang) https://t.co/N67XXjHsHB We" [X Link](https://x.com/basetenco/status/1881013293067944311) 2025-01-19T16:18Z [----] followers, [----] engagements "The open-source community is still reeling from @deepseek_ai's new R1 drop the new best-in-class reasoning model on par with o1. We're thrilled to have a close relationship with the DeepSeek team hosting DeepSeek-R1 (and V3) from day one (running on H200s). π Learn more about what makes DeepSeek unique and how to run it on @latentspacepod featuring our Co-founder @amiruci and Model Performance Engineer @zhyncs42" [X Link](https://x.com/basetenco/status/1882168379446104108) 2025-01-22T20:48Z [----] followers, [----] engagements "Huge congrats to the @riffusionai team on the launch of their new generative music model Try it out π Introducing FUZZ a generative music model like no other. Personalized full-length high-quality and infinite. Were making this instrument free for as long as our GPUs survive. The best of FUZZ in thread. https://t.co/GHtKphYHV5 Introducing FUZZ a generative music model like no other. Personalized full-length high-quality and infinite. Were making this instrument free for as long as our GPUs survive. The best of FUZZ in thread. https://t.co/GHtKphYHV5" [X Link](https://x.com/basetenco/status/1885072025623970147) 2025-01-30T21:06Z [----] followers, [---] engagements "2025 is the year of inference. We're thrilled to announce our $75m Series C co-led by @IVP and @sparkcapital with participation from @GreylockVC @conviction @basecasevc @southpkcommons and @lachy. We're also excited to add Dick Costolo and Adam Bain from @01Advisors as new investors. Check out our CEO Tuhin's blog to learn more. It's time to build" [X Link](https://x.com/basetenco/status/1892259130540179863) 2025-02-19T17:05Z [----] followers, 125.3K engagements "Last week was crazy. Thank you to everyone who celebrated our Series C with us and met us in person at @aiDotEngineer and our tech breakfast with @MorganBarrettX in NYC And congratulations to our friends at @AbridgeHQ @EvidenceOpen and @LambdaAPI for raising rounds last week too" [X Link](https://x.com/basetenco/status/1894166345933361213) 2025-02-24T23:23Z [----] followers, [----] engagements "You can start using the new @Alibaba_Qwen Qwen QwQ-32B from our model library in two clicks π" [X Link](https://x.com/basetenco/status/1897793344635318426) 2025-03-06T23:36Z [----] followers, [----] engagements "Don't miss live talks from the team: Field Notes on Scaling Real-time AI-Native Applications in Production Session EXS74242 Advanced Techniques for Inference Optimization with TensorRT-LLM Session S71693" [X Link](https://x.com/basetenco/status/1900554899353448745) 2025-03-14T14:29Z [----] followers, [---] engagements "Join Baseten @gmi_cloud and Bridge IT Consulting for: The San Jose Sharks vs Carolina Hurricanes game in our private suite (limited availability): Our Happy Hour on Thurs 03/19: https://buff.ly/eo43oSF https://lu.ma/Sharks https://buff.ly/eo43oSF https://lu.ma/Sharks" [X Link](https://x.com/basetenco/status/1900554901337375017) 2025-03-14T14:29Z [----] followers, [---] engagements "BEIs performance boosts arent an artifact of using more powerful hardware. BEI is even more memory-efficient than other toolkits meaning you can run it on smaller instance types and still get superior performance. (4/5 π§΅)" [X Link](https://x.com/basetenco/status/1905300386111430767) 2025-03-27T16:46Z [----] followers, [---] engagements "Learn more about BEI in our launch blog: Shoutout to @feilsystem on our model performance team for his work carefully optimizing BEI for production AI workloads (5/5 π§΅) https://www.baseten.co/blog/introducing-baseten-embeddings-inference-bei/ https://www.baseten.co/blog/introducing-baseten-embeddings-inference-bei/" [X Link](https://x.com/basetenco/status/1905300389030670356) 2025-03-27T16:46Z [----] followers, [---] engagements "The first Baseten bot is live on Poe It's very fast you can ask questions in your language of choice and get instant answers. We're excited to partner withQuorato power the fastest open-source models for the Poe community" [X Link](https://x.com/basetenco/status/1905660999987822706) 2025-03-28T16:39Z [----] followers, [----] engagements "New bots for Llama [--] Scout and Maverick are now live on Poe Get started with an 8M token context window for Scout (yes you read that right) and 1M for Maverick. We're thrilled to power the fastest open-source models for Quoramore to come" [X Link](https://x.com/basetenco/status/1909346329282695204) 2025-04-07T20:43Z [----] followers, [---] engagements "π We've been heads down for months and now it's finally launch week. Today were releasing our new brand. We believe inference is the foundation of all AI going forward. That's what our new look is all about: . . " [X Link](https://x.com/basetenco/status/1924585691524280470) 2025-05-19T21:59Z [----] followers, [----] engagements "Inference is everywhere. Come find us in San Francisco" [X Link](https://x.com/basetenco/status/1924977852304523690) 2025-05-20T23:57Z [----] followers, [----] engagements "let there be inference" [X Link](https://x.com/basetenco/status/1925582531182891012) 2025-05-22T16:00Z [----] followers, 330.8K engagements "Our secret sauce The Baseten Inference Stack. It consists of two core layers: the Inference Runtime and Inference-optimized Infrastructure. Our engineers break down all the levers we pull to optimize each layer in our new white paper" [X Link](https://x.com/basetenco/status/1927488286764757112) 2025-05-27T22:13Z [----] followers, [----] engagements "New DeepSeek just dropped. Proud to serve the fastest DeepSeek R1 [----] inference on OpenRouter (#1 on TTFT and TPS) with our Model APIs. π DeepSeek-R1-0528 is here πΉ Improved benchmark performance πΉ Enhanced front-end capabilities πΉ Reduced hallucinations πΉ Supports JSON output & function calling β Try it now: https://t.co/IMbTch8Pii π No change to API usage docs here: https://t.co/Qf97ASptDD π https://t.co/kXCGFg9Z5L π DeepSeek-R1-0528 is here πΉ Improved benchmark performance πΉ Enhanced front-end capabilities πΉ Reduced hallucinations πΉ Supports JSON output & function calling β " [X Link](https://x.com/basetenco/status/1928195639822700898) 2025-05-29T21:03Z [----] followers, [----] engagements "People think that GPUs + vLLM = production grade inference. We know that to not be true. With this you can get 80% of what you want the model to do and 95% reliability but formission critical inference - thats not enough" [X Link](https://x.com/basetenco/status/1928603165303185902) 2025-05-31T00:03Z [----] followers, [----] engagements "Hot take: Inference is not a commodity. There is strong complexity and differences between inference providers. Couldnt have said it better @CorinneMRiley" [X Link](https://x.com/basetenco/status/1934745654137627115) 2025-06-16T22:51Z [----] followers, [---] engagements ""Inference is more than just vibes" catch us on the Bay Bridge" [X Link](https://x.com/basetenco/status/1935111728570008035) 2025-06-17T23:06Z [----] followers, [----] engagements "RAISE Summit Paris kicks off tomorrow Find us at Booth [--] on July [--]. Come say hi to our team check out a demo and grab your Baseten "Artificially Intelligent" T-shirt. Don't miss @rapprach's talk on the AI landscape and open source on Wednesday July [--] at 11:40 AM" [X Link](https://x.com/basetenco/status/1942323930347474954) 2025-07-07T20:44Z [----] followers, [---] engagements "Voice is the next frontier for customer experience. Weve seen first hand how Voice can transform the way users interact with new products and existing brands. But we also know Voice is very hard to do well" [X Link](https://x.com/basetenco/status/1942657616079187973) 2025-07-08T18:50Z [----] followers, [---] engagements "This week we'll be deep diving into Voice AI. Well share how we tackle Voice inference how you can build with Voice and share a few exciting demos along the way. Stay tuned" [X Link](https://x.com/basetenco/status/1942657617144545284) 2025-07-08T18:50Z [----] followers, [--] engagements "Confession. Kimi K2 is one of our new favorite models for agentic use cases. Baseten is powering the fastest Kimi K2 available on Openrouter. Test it through our Model APIs today. Alsosay Kimi K2 10x fast. Thanks @Madisonkanna" [X Link](https://x.com/basetenco/status/1945266466476929505) 2025-07-15T23:37Z [----] followers, [----] engagements "Kimi K2 has arrived. You can deploy it on Baseten. Join as we briefly dig into why K2 is generating so much buzz. If youre building agents or utilizing LLMs its absolutely worth testing" [X Link](https://x.com/basetenco/status/1945602641289228483) 2025-07-16T21:53Z [----] followers, [---] engagements "Launch it here https://www.baseten.co/library/kimi-v2/ https://www.baseten.co/library/kimi-v2/" [X Link](https://x.com/basetenco/status/1945602642727899519) 2025-07-16T21:53Z [----] followers, [---] engagements "Baseten is growing Were always looking for determined humble people to join our team. Catch us at the Greylock Techfair on Thursday July [--] at Oracle Park. If you can't attend apply for a role online or share with someone you think would be a great fit" [X Link](https://x.com/basetenco/status/1945992631089234211) 2025-07-17T23:42Z [----] followers, [---] engagements "You can try Voxtral from our model library: https://www.baseten.co/library/voxtral-small-24b/ https://www.baseten.co/library/voxtral-small-24b/" [X Link](https://x.com/basetenco/status/1948101372957880732) 2025-07-23T19:22Z [----] followers, [---] engagements "Deploy Voxtral https://www.baseten.co/library/voxtral-small-24b/ https://www.baseten.co/library/voxtral-small-24b/" [X Link](https://x.com/basetenco/status/1948519985955090491) 2025-07-24T23:05Z [----] followers, [---] engagements "When@zeddotdev set out to build Edit Prediction they knew they wanted it to feel instantaneous. But their previous inference solution wasn't meeting the latency throughput or capacity they needed to meet that goal. Now Zed powers a hyper-responsive Edit Prediction experience and we're thrilled to play a part in that process" [X Link](https://x.com/basetenco/status/1948827060266213438) 2025-07-25T19:25Z [----] followers, [---] engagements "Many thanks to@darrylktaft and @thenewstack for highlighting our work together and everything Zed does to power the world's fastest AI code editor (our engineers have always been huge Zed fans). Read the full article on The New Stack here: https://thenewstack.io/how-rust-based-zed-built-worlds-fastest-ai-code-editor/ https://thenewstack.io/how-rust-based-zed-built-worlds-fastest-ai-code-editor/" [X Link](https://x.com/basetenco/status/1948827062300463602) 2025-07-25T19:25Z [----] followers, [---] engagements "Or check out our case study on how we optimized @zeddotdev's model performance for launch day: https://www.baseten.co/resources/customers/zed-industries-serves-2x-faster-code-completions-with-baseten/ https://www.baseten.co/resources/customers/zed-industries-serves-2x-faster-code-completions-with-baseten/" [X Link](https://x.com/basetenco/status/1948827064087576732) 2025-07-25T19:25Z [----] followers, [---] engagements "Building reliable agents requires a different tech stack: one that natively supports compound AI systems and evaluates quality along the full trajectory of agent behavior. We teamed up with @PatronusAI to break down what this stack looks like from infra and models to debuggers" [X Link](https://x.com/basetenco/status/1950265879369044159) 2025-07-29T18:43Z [----] followers, [---] engagements "@drishanarora It's a great day in open source AI Congrats on the launch. We're excited to be a launch partner with dedicated deployments of Cogito v2 on B200: https://www.baseten.co/library/cogito-v2-671b/ https://www.baseten.co/library/cogito-v2-671b/" [X Link](https://x.com/basetenco/status/1950972028091269593) 2025-07-31T17:29Z [----] followers, [----] engagements "Deep Cogito just dropped [--] new open LLMs -- each one is SOTA for its size. We're excited to be a launch partner for their Cogito v2 models which use a novel IDA mechanism to improve intuition and shorten reasoning chains. Today we are releasing [--] hybrid reasoning models of sizes 70B 109B MoE 405B 671B MoE under open license. These are some of the strongest LLMs in the world and serve as a proof of concept for a novel AI paradigm - iterative self-improvement (AI systems improving themselves). https://t.co/ZfmQIOgysv Today we are releasing [--] hybrid reasoning models of sizes 70B 109B MoE 405B" [X Link](https://x.com/basetenco/status/1950972729332769100) 2025-07-31T17:32Z [----] followers, [---] engagements "Cogito v2 is available as a dedicated deployment on B200 GPUs: https://www.baseten.co/library/cogito-v2-671b/ https://www.baseten.co/library/cogito-v2-671b/" [X Link](https://x.com/basetenco/status/1950972730825908570) 2025-07-31T17:32Z [----] followers, [---] engagements "To illustrate the impact of using BEI on B200s we ran benchmarks on the largest of the Qwen3 Embedding series. On another low query-throughput test (5 tokens/request) BEI handles 8.4x higher queries per second than vLLM and 1.6x higher than TEI" [X Link](https://x.com/basetenco/status/1952476513280094571) 2025-08-04T21:07Z [----] followers, [---] engagements "Read the full blog here: https://www.baseten.co/blog/run-qwen3-embedding-on-nvidia-blackwell-gpus/ https://www.baseten.co/blog/run-qwen3-embedding-on-nvidia-blackwell-gpus/" [X Link](https://x.com/basetenco/status/1952476516740464722) 2025-08-04T21:07Z [----] followers, [---] engagements ""The reality is that for each customer its a different story. Migrating to a new model isnt a small effort there are a lot of variables at playnone are trivial." Our CEO Tuhin spoke with Belle Lin at the WSJ about how gpt-oss stacks up and what businesses considering switching need to consider" [X Link](https://x.com/basetenco/status/1953528089142866326) 2025-08-07T18:46Z [----] followers, [----] engagements "@alphatozeta8148 man we are trying" [X Link](https://x.com/basetenco/status/1953583214469296181) 2025-08-07T22:25Z [----] followers, [--] engagements "@NVIDIAAIDev @OpenAI thank you" [X Link](https://x.com/basetenco/status/1953593451540623461) 2025-08-07T23:05Z [----] followers, [--] engagements "If you're at Ai4 in Vegas next week don't miss Philip Kiely's talk "Inference in the Wild: Lessons from Scaling Real-time AI in Production" on Aug. [--] at 3:25pm. It takes place in Room [---] on the AI Infrastructure & Scalability track" [X Link](https://x.com/basetenco/status/1953919401524441228) 2025-08-08T20:41Z [----] followers, [---] engagements "Day [--] of Ai4 Vegas is here Did you catch our lead DevRel Philip Kielys talk "Inference in the Wild: Lessons from Scaling Real-time AI in Production" yesterday" [X Link](https://x.com/basetenco/status/1955658273929171184) 2025-08-13T15:50Z [----] followers, [---] engagements "Swing by Booth [---] to meet @philip_kiely & the team at #AI4 today. See a live demo and snag your Artificially Intelligent shirt" [X Link](https://x.com/basetenco/status/1955658448961708363) 2025-08-13T15:51Z [----] followers, [---] engagements "When AI education needs to feel human latency matters. Praktika hit [---] ms transcription with 50% cost savings using Basetentransforming their language learning experience for millions of learners across [--] languages" [X Link](https://x.com/basetenco/status/1956044335453233544) 2025-08-14T17:24Z [----] followers, [---] engagements "See how @praktika_ai did it using Basetens Inference Stack: https://www.baseten.co/resources/customers/praktika/ https://www.baseten.co/resources/customers/praktika/" [X Link](https://x.com/basetenco/status/1956044337806221692) 2025-08-14T17:24Z [----] followers, [---] engagements "When it comes to open-source text-to-speech Orpheus should be your go-to model. We're seeing so much demand around real-time voice streaming our engineer Alex Ker wrote a blog on how to start streaming audio in real time with Orpheus in [--] minutes" [X Link](https://x.com/basetenco/status/1957569875855224907) 2025-08-18T22:26Z [----] followers, [----] engagements "Want to fine-tune gpt-oss-120b We teamed up with Axolotl to launch a new recipe to run fine-tuning out of the box multi-node training one-line deployments from the CLI and built-in observability included" [X Link](https://x.com/basetenco/status/1957877915737362437) 2025-08-19T18:50Z [----] followers, [----] engagements "Our first ever Inference Invitational is tomorrow 8/21 Join us for friendly pickleball & padel tournaments great conversation delicious food and drinks and your own custom pickleball paddle. Don't know how to play We'll have an instructor there to teach you" [X Link](https://x.com/basetenco/status/1958300076805435787) 2025-08-20T22:48Z [----] followers, [---] engagements "DeepSeek V3 + R1 (now V3.1 and R1 0528) proved that open-source models can rival closed ones at lower cost and with greater control. But running them in production That became the new hurdle. Our new guide breaks down: Why DeepSeek is powerful and hard to serve The infra + runtime hurdles youll hit Tools + techniques for performant reliable deployments" [X Link](https://x.com/basetenco/status/1960537279367160214) 2025-08-27T02:58Z [----] followers, [----] engagements "If youre building with DeepSeek this is your roadmap to performant reliable cost-efficient inference. Read our full guide here: https://www.baseten.co/resources/guide/the-complete-deepseek-model-guide/ https://www.baseten.co/resources/guide/the-complete-deepseek-model-guide/" [X Link](https://x.com/basetenco/status/1960537282055721093) 2025-08-27T02:58Z [----] followers, [---] engagements "Our team met Parsed a few months ago and we could not be more excited to see the inflection point they are a part of - customized models built for those high impact jobs. This is an incredible team and we're thrilled to power their inference. Congrats @parsedlabs. Let's build. Today were launching Parsed. We are incredibly lucky to live in a world where we stand on the shoulders of giants first in science and now in AI. Our heroes have gotten us to this point where we have brilliant general intelligence in our pocket. But this is a local minima. We https://t.co/R7cR3EGVHT Today were launching" [X Link](https://x.com/basetenco/status/1961113518348145128) 2025-08-28T17:07Z [----] followers, [----] engagements "We're excited to announce Fall into Inference: a multi-month deep dive into our cloud ecosystem and how we use Multi-cloud Capacity Management (MCM) to power fast reliable inference at scale. Over the next few months well showcase how we use MCM to power real-world AI use cases with partners including Google Cloud Amazon Web Services (AWS) OCI CoreWeave Nebius Vultr and NVIDIA. Stay tuned for weekly technical blogs case studies and deep dives with our partners" [X Link](https://x.com/basetenco/status/1963290554286154187) 2025-09-03T17:18Z [----] followers, [----] engagements "We raised a $150M Series D Thank you to all of our customers who trust us to power their inference. We're grateful to work with incredible companies like @Get_Writer @zeddotdev @clay_gtm @trymirage @AbridgeHQ @EvidenceOpen @MeetGamma @Sourcegraph and @usebland. This round was led by @bondcap with @jaysimons joining our Board. We're also thrilled to welcome @conviction and @CapitalG to the round alongside support from @01Advisors @IVP @sparkcapital @GreylockVC @ScribbleVC @BoxGroup and Premji Invest. Today were excited to announce our $150M Series D led by BOND with Jay Simons joining our" [X Link](https://x.com/basetenco/status/1963981711647379653) 2025-09-05T15:05Z [----] followers, 19.5K engagements "AI everywhere = Inference everywhere = Baseten everywhere IN NEWS: @basetenco raises a $150M series D round. @tuhinone (Founder & CEO Baseten) on the future of inference: I think the token price goes down and inference should get cheaper over time. And that really just means there is going to be more inference. Every time we https://t.co/oKplA7BIOY IN NEWS: @basetenco raises a $150M series D round. @tuhinone (Founder & CEO Baseten) on the future of inference: I think the token price goes down and inference should get cheaper over time. And that really just means there is going to be more" [X Link](https://x.com/basetenco/status/1964098651107532909) 2025-09-05T22:49Z [----] followers, [----] engagements "We just raised a $150M Series D and were growing If you're looking for your next opportunity take a look at our 30+ open roles across engineering and GTM" [X Link](https://x.com/basetenco/status/1965160538188775741) 2025-09-08T21:09Z [----] followers, [---] engagements "@Alibaba_Qwen (Gated) Attention is all you need. Excited to offer both Qwen3-Next models on dedicated deployments backed by 4xH100 GPUs. https://app.baseten.co/deploy/qwen_3_next_80B_A3_thinking https://www.baseten.co/library/qwen3-next-80b-a3b-instruct/ https://app.baseten.co/deploy/qwen_3_next_80B_A3_thinking https://www.baseten.co/library/qwen3-next-80b-a3b-instruct/" [X Link](https://x.com/basetenco/status/1966224960223158768) 2025-09-11T19:38Z [----] followers, [----] engagements "Qwen3 Next 80B A3B Thinking outperforms higher-cost and closed models like Gemini [---] Flash Thinking on benchmarks nearing Qwen's flagship model quality at a fraction the size. We have it ready to deploy in our model library running on @nvidia and the Baseten Inference Stack" [X Link](https://x.com/basetenco/status/1967688601640288288) 2025-09-15T20:34Z [----] followers, [----] engagements "The key is having good intuition being willing to go out on a limb building fast learning fast and killing things when you need to. Following our Series D raise our Co-founder and CTO @amiruci walks through why he bet early on inference how were scaling through generative model hypergrowth and his advice for fellow founders" [X Link](https://x.com/basetenco/status/1968009140896497950) 2025-09-16T17:48Z [----] followers, [----] engagements "Well be at SigSum SF this Thursday Sept [--] Catch: - @philip_kiely's talk "Inference Engineering for Hypergrowth" (1 PM) - @tuhinone on the panel "Breaking Building and Betting on AI" (3:30 PM) Visit us in the partner showcase to grab an "Artificially Intelligent" tee" [X Link](https://x.com/basetenco/status/1970255818563207200) 2025-09-22T22:36Z [----] followers, [---] engagements "@rohanpaul_ai someone needs to run the inference and make it fast. we can help with that" [X Link](https://x.com/basetenco/status/1970621011075936433) 2025-09-23T22:47Z [----] followers, [----] engagements "Were hosting our friends at @OpenRouterAI for a SF Tech Week breakfast talk Join us at Baseten HQ on October [--] at 10AM for Learnings from processing [--] Trillion Tokens" [X Link](https://x.com/basetenco/status/1972792659912830988) 2025-09-29T22:36Z [----] followers, [---] engagements "The team at OpenRouter will dive into: Closed vs. open model adoption Global usage trends from running inference at massive scale Tool calling & pricing shifts Seats are limited. Save yours here: https://partiful.com/e/q6l1SeDtPGU9kCQPArk6 https://partiful.com/e/q6l1SeDtPGU9kCQPArk6" [X Link](https://x.com/basetenco/status/1972792662295240964) 2025-09-29T22:36Z [----] followers, [---] engagements "From document processing and image recognition to drug discovery healthcare use cases are at the forefront of AI adoption. We partner with teams like Vultr to support these applications with fast reliable inference. With Multi-cloud Capacity Management and theBaseten Inference Stack we power near-limitless scale for healthcare AI teams on NVIDIA Blackwell GPUs" [X Link](https://x.com/basetenco/status/1973485892607062479) 2025-10-01T20:31Z [----] followers, [---] engagements "Embeddings power search RecSys and agents but making them performant in production requires satisfying two different traffic profiles. In our new guide we cover how to build embedding workflows that are both extremely high-throughput and low-latency from indexing millions of data points to serving individual search queries in milliseconds" [X Link](https://x.com/basetenco/status/1973840655001792785) 2025-10-02T20:00Z [----] followers, [---] engagements "Read it here: https://www.baseten.co/resources/guide/high-performance-embedding-model-inference https://www.baseten.co/resources/guide/high-performance-embedding-model-inference" [X Link](https://x.com/basetenco/status/1973840656541130821) 2025-10-02T20:00Z [----] followers, [---] engagements "Being fast for one customer isn't enough. Low-latency inference at scale requires the ability to recruit every GPU in the world" [X Link](https://x.com/basetenco/status/1975635330201264518) 2025-10-07T18:52Z [----] followers, [---] engagements "Fast models for our fast friends at Factory Deploy and serve custom models with enterprise-grade infrastructure on @basetenco. Special promo for Factory users: receive $500 Model API credits when you fill out this form. https://t.co/UI8NqfACDY Deploy and serve custom models with enterprise-grade infrastructure on @basetenco. Special promo for Factory users: receive $500 Model API credits when you fill out this form. https://t.co/UI8NqfACDY" [X Link](https://x.com/basetenco/status/1975692789838192955) 2025-10-07T22:40Z [----] followers, [----] engagements "Register here: https://events.redis.io/redis-released-london-2025 https://events.redis.io/redis-released-london-2025" [X Link](https://x.com/basetenco/status/1975953278832976009) 2025-10-08T15:55Z [----] followers, [---] engagements "Just in time for the new year Awesome job by our model performance team to hit the top of @ArtificialAnlys for GLM [---] try it here https://www.baseten.co/library/glm-4-7/ happy holidays we just dropped the fastest GLM 4.7: 400+ TPS as benchmarked by @ArtificialAnlys https://t.co/eRv47ok1sV https://www.baseten.co/library/glm-4-7/ happy holidays we just dropped the fastest GLM 4.7: 400+ TPS as benchmarked by @ArtificialAnlys https://t.co/eRv47ok1sV" [X Link](https://x.com/basetenco/status/2005373945143590985) 2025-12-28T20:23Z [----] followers, [----] engagements "We boosted acceptance rate by up to 40% with the Baseten Speculation Engine. How By combining Multi-Token Prediction (MTP) with Suffix Automaton (SA) decoding. This hybrid approach crushes production coding workloads delivering 30%+ longer acceptance lengths on code editing tasks with zero added overhead. An open source version for TensorRT-LLM is now available to the community. Read the full engineering deep dive: https://www.baseten.co/blog/boosting-mtp-acceptance-rates-in-baseten-speculation-engine/ https://www.baseten.co/blog/boosting-mtp-acceptance-rates-in-baseten-speculation-engine/" [X Link](https://x.com/basetenco/status/2016235945662808433) 2026-01-27T19:44Z [----] followers, 13.5K engagements "Were excited to launch Metas Llama [--] in our model library in both 8B and 70B π The newly introduced Llama [--] is a significant improvement over Llama [--] with increased tokens and reduced false refusal rates. These models deliver unparalleled performance showcasing significant advancements in efficiency and speed. Our Llama [--] 8B runs on A100s and Llama [--] 70B runs on H100s optimized for production. https://twitter.com/i/web/status/1781072277850714184 https://twitter.com/i/web/status/1781072277850714184" [X Link](https://x.com/basetenco/status/1781072277850714184) 2024-04-18T21:28Z [----] followers, 131.7K engagements "Meet the Baseten team at the @aiDotEngineer Summit in NYC this week π Booth G3 get a demo and grab some swag" [X Link](https://x.com/basetenco/status/1891991000244982045) 2025-02-18T23:19Z [----] followers, [---] engagements "@IVP @sparkcapital @GreylockVC @conviction @basecasevc @southpkcommons @Lachy @01Advisors https://www.baseten.co/blog/announcing-baseten-75m-series-c/ https://www.baseten.co/blog/announcing-baseten-75m-series-c/" [X Link](https://x.com/basetenco/status/1892259288673865781) 2025-02-19T17:05Z [----] followers, [----] engagements "Friendly reminder from @willreed_21 (Spark Capital): Your team's time is best spent on your product not the infrastructure it runs on" [X Link](https://x.com/basetenco/status/1937939408919126239) 2025-06-25T18:22Z [----] followers, [---] engagements "If you're in London catch Rachel Rapp with our friends from Tavily and cognee at Redis Released. From building and deploying the fastest agentic systems to industry trends they'll break down what the agentic tech stack looks like in a live panel this Thursday" [X Link](https://x.com/basetenco/status/1975953274454130750) 2025-10-08T15:55Z [----] followers, [---] engagements "We caught up with the one and only @thdxr on Opencode's newly launched Zen and his hot takes Zen isnt a for-profit thing. This is something we try to do at breakeven. As we grow we pool all of our resources together and negotiate discounted rates with providers. These cost savings flow back right down to everyone" [X Link](https://x.com/basetenco/status/1976732163619070461) 2025-10-10T19:30Z [----] followers, [----] engagements "From sketch to a 3D model in under [--] seconds with a 1B parameter model We built a flower card generator using Autodesks WaLa open-source AI and Baseten for scalable GPU deployment" [X Link](https://x.com/basetenco/status/1977851561842717041) 2025-10-13T21:38Z [----] followers, [---] engagements "Fast Company named Baseten one of the [--] Next Big Things in Tech [----] Were proud to be recognized for powering the fastest and most reliable inference for the fastest-growing AI companies like Abridge Clay OpenEvidence and many more" [X Link](https://x.com/basetenco/status/1978175416750772634) 2025-10-14T19:05Z [----] followers, [----] engagements "Powering inference for the fastest growing AI companies like OpenEvidence Writer and Clay means being the first to use bleeding-edge model performance tooling in production. That's why we were early adopters of NVIDIA Dynamo giving us 50% lower latency and 60%+ higher throughput with KV cache-aware routing. These results are the tip of the iceberg especially for our customers running large models with large context windows under heavy load" [X Link](https://x.com/basetenco/status/1978883986924634551) 2025-10-16T18:01Z [----] followers, [----] engagements "See the benchmarks in our blog by @aqaderb @feilsystem and @rapprach: https://www.baseten.co/blog/how-baseten-achieved-2x-faster-inference-with-nvidia-dynamo/#how-baseten-uses-nvidia-dynamo https://www.baseten.co/blog/how-baseten-achieved-2x-faster-inference-with-nvidia-dynamo/#how-baseten-uses-nvidia-dynamo" [X Link](https://x.com/basetenco/status/1978883989269307812) 2025-10-16T18:01Z [----] followers, [---] engagements "We unleashed our model performance team on GLM [---] and were very excited to be the fastest provider available today on Artificial analysis at [---] TPS (2x the next best TPS) and a less than [--] second TTFT (2x the next best TTFT). https://artificialanalysis.ai/models/glm-4-6-reasoning/providers https://artificialanalysis.ai/models/glm-4-6-reasoning/providers" [X Link](https://x.com/basetenco/status/1979299403828806053) 2025-10-17T21:32Z [----] followers, 11.2K engagements "We see the massive AWS outage. Baseten web app is down but inference new deploys training jobs and the model management APIs are unaffected" [X Link](https://x.com/basetenco/status/1980191414031138868) 2025-10-20T08:36Z [----] followers, [----] engagements "@DannieHerz @jeffbarg @ClayRunHQ The clay slackmoji in the b10 slack has been getting a lot of play recently" [X Link](https://x.com/basetenco/status/1981484896720978350) 2025-10-23T22:16Z [----] followers, [--] engagements "DeepSeek-OCR stunned the internet this week with 10x more efficient compression unlocking faster and cheaper intelligence. We rolled out performant inference support on day one of the model drop. Learn why compressions are so effective at making models smarter what applications you can build with DeepSeek-OCR and how to serve it on Baseten in under [--] minutes. Link in the replies" [X Link](https://x.com/basetenco/status/1981513042010489305) 2025-10-24T00:08Z [----] followers, [----] engagements "This week Baseten's model performance team unlocked the fastest TPS and TTFT for gpt-oss 120b on @nvidia hardware. When gpt-oss launched we sprinted to offer it at [---] TPS. now we've exceeded [---] TPS and [----] sec TTFT. and we'll keep working to keep raising the bar. We are proud to offer the best E2E latency available with near-limitless scale incredible performance and the highest uptime 99.99%" [X Link](https://x.com/basetenco/status/1981757270053494806) 2025-10-24T16:18Z [----] followers, 42.4K engagements "We are so excited to be a launch partner for @nvidia Nemotron Nano [--] VL today and offer day-zero support for this highly accurate and efficient vision language model alongside other models in the Nemotron family. To learn more read our blog here https://www.baseten.co/blog/high-performance-agents-for-financial-services-with-nvidia-nemotron-on-baseten/ https://www.baseten.co/blog/high-performance-agents-for-financial-services-with-nvidia-nemotron-on-baseten/" [X Link](https://x.com/basetenco/status/1983243273171845596) 2025-10-28T18:43Z [----] followers, [----] engagements "After months of feedback from our early customers and thousands of jobs completed Baseten Training is officially ready for everyone. π Access compute on demand train any model run multi-node jobs and deploy from checkpoints with cache-aware scheduling an ML Cookbook tool calling recipes and more" [X Link](https://x.com/basetenco/status/1983958807353934180) 2025-10-30T18:06Z [----] followers, 13.7K engagements "@james_weitzman @athleticKoder β" [X Link](https://x.com/basetenco/status/1985491806004666611) 2025-11-03T23:38Z [----] followers, [--] engagements "Fun fact - we asked people to describe their favorite agent in SF. We got suggestions for a bunch of new agentic apps to try. Our favorite agent Probably James Bond. If youre living in the new world of agentic AI check out our new deep dive on tool calling in inference. Check out the blog in the comments" [X Link](https://x.com/basetenco/status/1986211245268111827) 2025-11-05T23:17Z [----] followers, [---] engagements "Blog: https://www.baseten.co/blog/tool-calling-in-inference/utm_source=twitter&utm_medium=social&utm_campaign=education_tool_calling_blog_2025-11-05 https://www.baseten.co/blog/tool-calling-in-inference/utm_source=twitter&utm_medium=social&utm_campaign=education_tool_calling_blog_2025-11-05" [X Link](https://x.com/basetenco/status/1986211301782397437) 2025-11-05T23:17Z [----] followers, [---] engagements "Congratulations to @Kimi_Moonshot on their newest model drop Kimi-K2 Thinking one of the worlds most advanced open source models. Baseten is proud to offer Day [--] Support. Sign up with your business email address and get $100 in Model API credits" [X Link](https://x.com/basetenco/status/1986821080800190753) 2025-11-07T15:40Z [----] followers, 270.4K engagements "Heading to KubeCon next week Come visit the team at Booth #631 to test your AI knowledge. Top of the leaderboard gets prizes π" [X Link](https://x.com/basetenco/status/1986915753434718571) 2025-11-07T21:56Z [----] followers, [---] engagements "Excited to share this piece from @VentureBeat spotlighting how Baseten is redefining the AI infrastructure game: Baseten takes on hyperscalers with new AI training platform that lets you own your model weights. Thanks VentureBeat Read full article https://venturebeat.com/ai/baseten-takes-on-hyperscalers-with-new-ai-training-platform-that-lets-you https://venturebeat.com/ai/baseten-takes-on-hyperscalers-with-new-ai-training-platform-that-lets-you" [X Link](https://x.com/basetenco/status/1987943307532476746) 2025-11-10T17:59Z [----] followers, [----] engagements "At @KubeCon_ Swing by Booth #631 to test your inference knowledge and earn some swag" [X Link](https://x.com/basetenco/status/1988322193479192921) 2025-11-11T19:05Z [----] followers, [---] engagements "Congrats to the World Labs team on the launch today Marble lets you create 3D worlds from just a single image text prompt video or 3D layout. We couldn't be more excited to power the inference behind this. Can't wait to see what everyone makes. π₯ Introducing Marble by World Labs: a foundation for a spatially intelligent future. Create your world at https://t.co/V267VJu1H9 https://t.co/T00mtETmCA Introducing Marble by World Labs: a foundation for a spatially intelligent future. Create your world at https://t.co/V267VJu1H9 https://t.co/T00mtETmCA" [X Link](https://x.com/basetenco/status/1988662949083566349) 2025-11-12T17:39Z [----] followers, [----] engagements "Welcome to the new age Defense Against the Dark Arts. It's called fast inference (& Harry Potter would be jealous). Check out our deep dive on how the Baseten wizards (model performance team) optimized Kimi K2 Thinking (now faster and just as smart as GPT-5). https://www.baseten.co/blog/kimi-k2-thinking-at-140-tps-on-nvidia-blackwell/utm_source=twitter&utm_medium=social&utm_campaign=awareness_kimi-k2-thinking_performance_blog_2025-11-12" [X Link](https://x.com/basetenco/status/1988710905706680760) 2025-11-12T20:50Z [----] followers, [----] engagements "Baseten used @nvidia Dynamo to double inference speed for long-context code generation and increased throughput by 1.6x. Dynamo simplifies multi-node inference on Kubernetes helping us scale deployments while reducing costs. Read the full blog post belowπ β NVIDIA Dynamo is now available across major cloud providersincluding @awscloud @googlecloud @Azure and @OracleCloudto enable efficient multi-node inference on Kubernetes in the cloud. And Its already delivering results: @basetenco is seeing faster more cost-effective https://t.co/6efirNmK3r β NVIDIA Dynamo is now available across major" [X Link](https://x.com/basetenco/status/1989058852789317717) 2025-11-13T19:52Z [----] followers, [----] engagements "Working with the @GammaApp team never quite feels like work and thats how their product feels. "Criminally fun." We are honored to be long-term partners and power Gammas inference needs as they push the envelope on how we present ideas. Congratulations on the Series B" [X Link](https://x.com/basetenco/status/1989091556201218127) 2025-11-13T22:02Z [----] followers, 22.8K engagements "Shoutout to the incredible team at @oxen_ai Turning datasets deployed models like its light work. They build fast. We help them ship even faster. Thanks for the partnership @gregschoeninger Check out the story in the comments #AI #MLOps #Baseten" [X Link](https://x.com/basetenco/status/1990894920106680832) 2025-11-18T21:28Z [----] followers, [----] engagements "@drishanarora Congratulations on the launch Excited to support with dedicated deployments of Cogito V2.1. https://www.baseten.co/library/cogito-v2-1-671b/ https://www.baseten.co/library/cogito-v2-1-671b/" [X Link](https://x.com/basetenco/status/1991208966362140871) 2025-11-19T18:16Z [----] followers, [----] engagements "Congrats to our friends at Deep Cogito on launching the most powerful US-based OSS model. It turns out LLM self play produces shorter reasoning chains (low token consumption) while maintaining great performance Try it out on Baseten today: https://www.baseten.co/library/cogito-v2-1-671b/ Today we are releasing the best open-weight LLM by a US company: Cogito v2.1 671B. On most industry benchmarks and our internal evals the model performs competitively with frontier closed and open models while being ahead of any US open model (such as the best versions of https://t.co/F6eZnn8s2Q" [X Link](https://x.com/basetenco/status/1991249966958841950) 2025-11-19T20:59Z [----] followers, [----] engagements "If you need an adrenaline rush to wake up from your post-Thanksgiving stupor we got you. @deepseek_ai V3.2 dropped this week and is now available on Baseten. Its so smart your mother will ask why you can't be more like DeepSeek. V3.2 is currently on par with GPT-5 all whilst being multiples cheaper. V3.2 is now live on our Model APIs and on @OpenRouterAI and @ArtificialAnlys. Baseten is the fastest provider with [----] TTFT and [---] tps (thats 1.5x faster than the next guy). For a model this size its screaming. Get the brains without trading off performance" [X Link](https://x.com/basetenco/status/1996623218040254793) 2025-12-04T16:50Z [----] followers, [----] engagements "We're excited to partner with @getstream_io to help developers build fast production-ready Vision Agents. Together we combined Baseten-hosted Qwen3-VL with Streams real-time voice and video to create an Electronics Setup & Repair Assistant that can see understand and guide users in real time. Check out the full walkthrough and demo below Vision Agents just got better: @basetenco joins us to take multimodal capabilities even further. Our team worked together to create a guide on running models hosted on Baseten with Vision Agents. Check out our blog post where we use Baseten + Qwen 3-VL" [X Link](https://x.com/basetenco/status/1997024685192610238) 2025-12-05T19:26Z [----] followers, [----] engagements ""We want people to own their own intelligence and we now see a really straight shot to get there." @amiruci sits down with @mudithj_ and @charles0neill from the Parsed team. Check out the full fireside chat in the comments" [X Link](https://x.com/basetenco/status/1999240802992562624) 2025-12-11T22:12Z [----] followers, [----] engagements "Inference performance isnt just about the model. It relies on the entire inference stack. In our Inference Stack white paper we explain how Baseten uses @nvidia TensorRT LLM and Dynamo to reduce latency and increase throughput across model modalities. If you care about speed this is worth reading. https://www.baseten.co/resources/guide/the-baseten-inference-stack https://www.baseten.co/resources/guide/the-baseten-inference-stack" [X Link](https://x.com/basetenco/status/2009721846795546952) 2026-01-09T20:20Z [----] followers, [----] engagements "π We're thrilled to introduce the fastest most accurate and cost-efficient Whisper-powered transcription and diarization on the market: [----] RTFwith Whisper Large V3 Turbo Streaming transcriptionwith consistent low latency The most accurate real-time diarization 90% lower costdue to infra optimizations Used in production by companies like @NotionHQ π https://twitter.com/i/web/status/2012203547912245366 https://twitter.com/i/web/status/2012203547912245366" [X Link](https://x.com/basetenco/status/2012203547912245366) 2026-01-16T16:41Z [----] followers, [----] engagements "Want to learn about how to run high performance LLM inference at scaleOur Head of DevRel @philipkiely has the perfect talk for you during NVIDIA Dynamo Day on Jan [--]. Register here: https://nvevents.nvidia.com/dynamodayi=RNQf_gN5cXcdmLzfj_IevFS-tdC553CY https://nvevents.nvidia.com/dynamodayi=RNQf_gN5cXcdmLzfj_IevFS-tdC553CY" [X Link](https://x.com/basetenco/status/2013694085681553469) 2026-01-20T19:24Z [----] followers, [----] engagements "Were thrilled to be working with @LangChain to power the fastest way to generate production-ready agents without code. LangChains Agent Builder represents a way for non-technical knowledge workers and citizen developers to build useful things with AI. All with Baseten Inference backbone and GLM [---]. Weve written a tutorial for you to create your own in minutes" [X Link](https://x.com/basetenco/status/2014025036806627794) 2026-01-21T17:19Z [----] followers, [----] engagements "Tired of waiting for video generation Say less. We've optimized the Wan [---] runtime to hit: 3x faster inference on NVIDIA Blackwell 2.5x faster on Hopper 67% cost reduction. Read the full breakdown of our kernel optimizations and benchmarks here: https://www.baseten.co/blog/wan-2-2-video-generation-in-less-than-60-seconds/#benchmarking-methodology https://www.baseten.co/blog/wan-2-2-video-generation-in-less-than-60-seconds/#benchmarking-methodology" [X Link](https://x.com/basetenco/status/2014337303330926736) 2026-01-22T14:00Z [----] followers, [----] engagements "@lucas_dehaas so if people are wondering why there are baseten stickers tagged across sf they know you're the one to blame" [X Link](https://x.com/basetenco/status/2014776605860855930) 2026-01-23T19:05Z [----] followers, [---] engagements "@adambain grateful for your support and we're still so early π₯" [X Link](https://x.com/basetenco/status/2014796459909251154) 2026-01-23T20:24Z [----] followers, [---] engagements "LIVE Tune in to hear @tuhinone discuss our series E open source and the multi-model future on CNBC A Chinese AI model is having a real coding moment. and not just in China. Zhipu says its coding agent users are concentrated in the *US and China. @tuhinone CEO of @basetenco joins me on the back of his latest fundraise to discuss whats hype and whats real A Chinese AI model is having a real coding moment. and not just in China. Zhipu says its coding agent users are concentrated in the *US and China. @tuhinone CEO of @basetenco joins me on the back of his latest fundraise to discuss whats hype" [X Link](https://x.com/basetenco/status/2015868931928686780) 2026-01-26T19:26Z [----] followers, [----] engagements ""the best application layer companies set up the harness and how to use it for the problem that your user is trying to solve"" [X Link](https://x.com/basetenco/status/2014855297240797277) 2026-01-24T00:18Z [----] followers, [----] engagements "Thank you @BloombergTV for having our CEO and co-founder @tuhinone and day [--] investor @saranormous yesterday to discuss our latest fundraise the bet we're making with inference and how we're powering customers. Full interview here: https://www.bloomberg.com/news/videos/2026-01-26/ai-startup-baseten-raises-300-million-video https://www.bloomberg.com/news/videos/2026-01-26/ai-startup-baseten-raises-300-million-video" [X Link](https://x.com/basetenco/status/2016260125481435532) 2026-01-27T21:20Z [----] followers, [----] engagements "Nemotron [--] Nano NVFP4 is now available on Baseten + NVIDIA B200 BF16-level accuracy up to [--] higher throughput vs FP8 and faster inference powered by QAD + Blackwell running on the Baseten Inference Stack. https://www.baseten.co/library/nvidia-nemotron-3-nano/ https://www.baseten.co/library/nvidia-nemotron-3-nano/" [X Link](https://x.com/basetenco/status/2016569749635994028) 2026-01-28T17:51Z [----] followers, [----] engagements "Our CEO and co-founder @tuhinone sat down with Axios to discuss how we're using our latest funding to build an inference-native cloud that owns the full inference-data-eval-RL loop and why our recent acquisition of Parsed is just the beginning as we continue to pursue aligned talent and capabilities. Full interview here: https://www.axios.com/pro/enterprise-software-deals/2026/01/29/baseten-acquisitions-ai-inference https://www.axios.com/pro/enterprise-software-deals/2026/01/29/baseten-acquisitions-ai-inference" [X Link](https://x.com/basetenco/status/2018386781927075897) 2026-02-02T18:11Z [----] followers, [---] engagements "Who wants to take the 30b parameter Alpaca model for a ride Announcement coming tomorrow" [X Link](https://x.com/basetenco/status/1637633905527492610) 2023-03-20T01:55Z [----] followers, [----] engagements "π Were really excited to be announcing BaseTen today. BaseTen is the fastest way to build applications powered by machine learning. Check it out yourself https://www.baseten.co/blog https://www.baseten.co/blog" [X Link](https://x.com/anyuser/status/1395409150297874437) 2021-05-20T16:00Z [----] followers, [---] engagements "We're excited to announce that we've raised a $40M Series B to help power the next generation of AI-native products with performant reliable and scalable inference infrastructure. https://www.baseten.co/blog/announcing-our-series-b/ https://www.baseten.co/blog/announcing-our-series-b/" [X Link](https://x.com/anyuser/status/1764682602198216931) 2024-03-04T16:01Z [----] followers, 82.7K engagements "We're excited to introduce our new Engine Builder for TensorRT-LLM π Same great @nvidia TensorRT-LLM performance90% less effort. Check out our launch post to learn more: Or @philip_kiely's full video: We often use TensorRT-LLM to support custom models for teams like @Get_Writer. For their latest industry-leading Palmyra LLMs TensorRT-LLM inference engines deployed on Baseten achieved 60% higher tokens per second. We've used TensorRT-LLM to achieve results including: π 3x better throughput π 40% lowertime to first token π 35% lowercost per million tokens While TensorRT-LLM is incredibly" [X Link](https://x.com/anyuser/status/1819048091451859238) 2024-08-01T16:30Z [----] followers, [----] engagements "Another week another model drop Voxtral was released last week and you can now deploy it on Baseten. Transcription workloads are our bread and butter here at Baseten. Weve built a specific runtime for transcription workloads which now powers Voxtral" [X Link](https://x.com/basetenco/status/1947791177886863683) 2025-07-22T22:49Z [----] followers, [----] engagements "Only a handful of models dominated the ASR spaceuntil now. Voxtral has a 30-minute transcription range a 40-minute range for understanding plus built-in function calling for voice out of the box. @thealexker breaks down the technical details" [X Link](https://x.com/basetenco/status/1948101370894073980) 2025-07-23T19:22Z [----] followers, [----] engagements "Forget AI writing your code. AI can now control your home through voice. Weve had a blast putting Voxtral through the paces this week. Mistrals new model delivers see for yourself on Baseten. Our latest blog dives into whats so unique (and powerful) about Voxtral and how you can deploy it to build production-grade apps. https://twitter.com/i/web/status/1948519816312357326 https://twitter.com/i/web/status/1948519816312357326" [X Link](https://x.com/basetenco/status/1948519816312357326) 2025-07-24T23:05Z [----] followers, [----] engagements "We're thrilled to welcome Joey Zwicker as our new Head of Forward Deployed Engineering We've grown rapidly over the last few years and we're excited to have Joey lead the team into our next phase. We're hiring FDEs everywhere -- if you're interested reach out" [X Link](https://x.com/anyuser/status/1955005622749106426) 2025-08-11T20:37Z [----] followers, 10.3K engagements "Thanks @NVIDIAAI for inviting us to Dynamo Day We're active users of Dynamo iterating on it in production for performance gains like 50% lower TTFT and 34% lower TPOT and regularly shipping our work back to the community. Read some of our highlights from Dynamo Day and working with NVIDIA Dynamo here: https://www.baseten.co/blog/nvidia-dynamo-day-baseten-inference-stack/ https://www.baseten.co/blog/nvidia-dynamo-day-baseten-inference-stack/" [X Link](https://x.com/anyuser/status/2018740972658864598) 2026-02-03T17:38Z [----] followers, [----] engagements "The best OpenClawπ¦ setup is fully open-source. Kimi K2.5 on Baseten outperforms Opus [---] on agentic benchmarks at 8x lower cost. Faster inference same or better quality. Set up in [--] minutes here: https://www.baseten.co/blog/openclaw-kimi-k2-5-on-baseten-frontier-agent-performance-with-oss/ https://www.baseten.co/blog/openclaw-kimi-k2-5-on-baseten-frontier-agent-performance-with-oss/" [X Link](https://x.com/anyuser/status/2019138898245611617) 2026-02-04T20:00Z [----] followers, [----] engagements "Continuing this week with a case study β How did @sullyai return 30M+ clinical minutes to doctors By ditching closed-source models for a high-performance open-source stack on Baseten. Like many companies Sully faced inference challenges as they scaled with ballooning proprietary model costs and unpredictable latency. This was especially critical in Sully's case: in a live clinical setting a 70-second wait is an eternity. To solve this challenge we worked together to move to open-source models like GPT OSS 120b. With the Baseten inference stack Sully was live on NVIDIA HGX B200s just [--] days" [X Link](https://x.com/anyuser/status/2021268765141545080) 2026-02-10T17:03Z [----] followers, [----] engagements "We replicated Microsoft Research's Generative Adversarial Distillation (GAD) to distill Qwen3-4B from GPT-5.2. Standard black-box distillation teaches a student to copy teacher outputs but at inference the student generates from its own prefixes small errors compound and it drifts off the expert distribution. GAD reframes this as an on-policy distillation problem training a co-evolving discriminator that provides adaptive reward signals on the student's own generations. Exploring methods like this are how our post-training team surfaces new training patterns. Read here:" [X Link](https://x.com/basetenco/status/2022385713602609427) 2026-02-13T19:01Z [----] followers, [---] engagements "We replicated Microsoft Research's Generative Adversarial Distillation (GAD) to distill Qwen3-4B from GPT-5.2. Standard black-box distillation teaches a student to copy teacher outputs but at inference the student generates from its own prefixes small errors compound and it drifts off the expert distribution. GAD reframes this as an on-policy distillation problem training a co-evolving discriminator that provides adaptive reward signals on the student's own generations. Exploring methods like this are how our post-training team surfaces new training patterns. Read here:" [X Link](https://x.com/basetenco/status/2022392260386861210) 2026-02-13T19:27Z [----] followers, [--] engagements "Were thrilled to introduce Chains a framework for building multi-component AI workflows on Baseten βπ Chains enables users to build complex workflows as modular services in simple Python http://x.com/i/article/1805620705716801538 http://x.com/i/article/1805620705716801538" [X Link](https://x.com/basetenco/status/1806364068598432129) 2024-06-27T16:28Z [----] followers, 23.9K engagements "Welcome to Baseten @DannieHerz Were thrilled to announce that Dannie Herzberg has joined as our new President to lead Basetens GTM and operations. As @tuhinone shared: "Dannie is biased towards action dependable and long-term in her thinking and she knows that the customer experience is everything." Heres to building the next chapter of Baseten with you Dannie Read more from Tuhin about Dannie here https://www.baseten.co/blog/welcoming-dannie-herzberg-to-baseten/ https://www.baseten.co/blog/welcoming-dannie-herzberg-to-baseten/" [X Link](https://x.com/basetenco/status/1960825166264721862) 2025-08-27T22:02Z [----] followers, 97.2K engagements "LLMs are amnesiacs. Once context fills up they forget everything. To fight this means grappling with a core question: how do you update a neural network without breaking what it already knows In this piece @charles0neill and @part_harry_ argue that continual learning is inseparable from specialization. While there are various ideas to allow generalist models to learn everything without forgetting anything these ideas are fundamentally in tension with continual learning in general. What comes after monolith models A Cambrian explosion of specialists. Read more here:" [X Link](https://x.com/anyuser/status/2019831540709257325) 2026-02-06T17:52Z [----] followers, [----] engagements Limited data mode. Full metrics available with subscription: lunarcrush.com/pricing
@basetenco BasetenBaseten posts on X about inference, ai, model, compound the most. They currently have [------] followers and [---] posts still getting attention that total [-----] engagements in the last [--] hours.
Social category influence technology brands stocks finance cryptocurrencies vc firms travel destinations social networks events nhl countries
Social topic influence inference #164, ai, model, compound, gpu, in the, realtime, stack, dynamo, 6969
Top assets mentioned Cogito Finance (CGV) Microsoft Corp. (MSFT) QUALCOMM, Inc. (QCOM) MongoDB, Inc. (MDB) DeepSeek (DEEPSEEK) Uber Technologies, Inc. (UBER) 1000X (1000X)
Top posts by engagements in the last [--] hours
"Former U.S. Chief Data Scientist @DevotedHealth board member & angel investor @dpatil sits down w/us to chat: β€ Finding passion from failure π‘ Defense Digital Service origins πͺ Maximizing impact as a data scientist https://buff.ly/3KXKFKH https://buff.ly/3KXKFKH"
X Link 2022-05-10T16:20Z [----] followers, [--] engagements
"Brilliant search tech leveraging Pinecone by @MenloVentures a fave in the venture space. We love the VC + AI love"
X Link 2023-02-18T03:04Z [----] followers, [---] engagements
"LangChain + Baseten = β₯ Build with LLMs like Falcon WizardLM and Alpaca in just a few lines of code using LangChain's Baseten integration"
X Link 2023-06-29T16:18Z [----] followers, [--] engagements
"#SDXL Stable Diffusion XL [---] is here: the largest most capable open-source image generation model of its kind. You can deploy it in [--] clicks from our model library: Note the accuracy and detail in the face and hands of this kind old wizard:"
X Link 2023-07-26T21:25Z [----] followers, [--] engagements
"Send us a prompt We'll reply with an awesome AI image generated by Stable Diffusion XL 1.0"
X Link 2023-07-28T19:13Z [----] followers, [--] engagements
"Happy Monday We spent our weekend playing with the new Stable Diffusion XL ControlNet modules from @huggingface. Deploy it yourself today on Baseten π"
X Link 2023-08-21T16:22Z [----] followers, [---] engagements
"Llama [--] + @chainlit_io = open-source ChatGPT"
X Link 2023-08-23T21:16Z [----] followers, [---] engagements
"Look what you can do with @Twilio @Langchain and Baseten"
X Link 2023-08-25T19:36Z [----] followers, [---] engagements
"We love when developers combine our infra with powerful platforms like Twilio and Langchain. Big shoutout to @lizziepika and the @TwilioDevs team for this post"
X Link 2023-08-25T19:36Z [----] followers, [---] engagements
"Last week @varunshenoy_ felt the need the need for speed. He went deep on optimizing SDXL inference to squeeze every last drop of performance out of our GPUs. Heres what he did to get down to [----] seconds for SDXL and [----] seconds for Stable Diffusion [---] on an A100:"
X Link 2023-08-31T19:28Z [----] followers, [----] engagements
"There's a new text embedding model by @JinaAI_ with some exciting properties π - 8192-token context window (embed chapters not pages) - Matches OpenAI's ada-002 on popular benchmarks Use jina-embeddings-v2 for search & recommendations and pair w/ LLMs like Mistral for RAG"
X Link 2023-10-31T22:49Z [----] followers, [---] engagements
"Ready to try open source LLMs Switch from GPT to Mistral 7B in the smallest refactor you'll ever ship: just [--] tiny code changes. If you're making the jump DM us for $1000 in free credits"
X Link 2023-11-22T01:28Z [----] followers, [----] engagements
"Upgrade to @LangChainAI version 0.0.353 to use a refactored Baseten integration featuring: - Support for production and development deployments. - Removal of the baseten client dependency. - All-new usage docs featuring Mistral 7B. Just run: pip install --upgrade langchain"
X Link 2023-12-29T21:20Z [--] followers, [---] engagements
"Deploy ML models on L4 GPUs with Baseten The L4 is a 24GB VRAM card like the A10G. But theyre not interchangeable: - Use L4 for image generation models like Stable Diffusion XL - Use A10G for LLMs like Mistral 7B L4s start at $0.8484/hour (70% of A10 prices)"
X Link 2024-01-11T23:25Z [----] followers, [---] engagements
"Launching today π Double your throughput or halve your latency for @MistralAI @StabilityAI + others Do both at 20% lower cost with @nvidia H100s on Baseten. Heres how π"
X Link 2024-02-06T18:47Z [----] followers, [----] engagements
"40% lower latency and 70% higher throughput for Stable Diffusion XL Using NVIDIA TensorRT to optimize each component of the SDXL image generation pipeline we achieved these performance gains on an H100 GPU. Full results:"
X Link 2024-02-22T16:50Z [----] followers, [----] engagements
"Another first π Unlock the power of @nvidia's Multi-Instance GPU (MIG) virtualization technology with H100mig GPUs on Baseten:"
X Link 2024-03-21T16:48Z [----] followers, [----] engagements
"Stable Diffusion [--] is now available in our model library π Deploy it on an A100 optimized for production and generate highly detailed high-resolution images in seconds. https://www.baseten.co/library/stable-diffusion-3-medium/ https://www.baseten.co/library/stable-diffusion-3-medium/"
X Link 2024-06-13T17:50Z [----] followers, [---] engagements
"Have you ever wondered which model to choose for generating images Check out @rapprach's latest article on our blog to learn the pros and cons of few-step text-to-image models like LCMs SDXL Turbo and SDXL Lightning and find out which model is best for your use case. There are dozens of text-to-image models but only a handful can generate images in real time. In my latest article Im comparing three of the most popular few-step image generation models: LCMs SDXL Turbo and SDXL Lightning. π https://t.co/Y7vkAz40Xu There are dozens of text-to-image models but only a handful can generate images"
X Link 2024-06-14T16:00Z [----] followers, [---] engagements
"Stable Diffusion [--] running in under [--] minutes π€― @rapprach will show you how Try it out: https://www.baseten.co/library/stable-diffusion-3-medium/ https://www.baseten.co/library/stable-diffusion-3-medium/"
X Link 2024-06-21T15:09Z [----] followers, [---] engagements
"Want to learn more about building compound AI systems with Baseten Chains π Don't miss our live webinar with Baseten CTO and co-founder @amiruci and Software Engineer Marius Killinger on July 18th RSVP now and secure your spot π https://buff.ly/4bwEuec https://buff.ly/4bwEuec"
X Link 2024-06-28T18:48Z [----] followers, [---] engagements
"βLast week we introduced Chains a framework for building and orchestrating compound AI workflows. πInterested in how we built Chains or what makes it so powerful Learn more in our new technical deep-dive from Marius Killinger and @rapprach https://www.baseten.co/blog/baseten-chains-explained https://www.baseten.co/blog/baseten-chains-explained"
X Link 2024-07-02T15:31Z [----] followers, [---] engagements
"Just [--] days away π Save your spot: Don't miss our live webinar and Q&A on Thursday July 18th Join our CTO and Co-Founder @amiruci and Software Engineer Marius Killinger to learn how you can build scalable compound AI systems with Baseten Chains. βπ https://buff.ly/4bwEuec https://buff.ly/4bwEuec"
X Link 2024-07-12T16:05Z [----] followers, [---] engagements
"Using NVIDIA TensorRT @defpan and @philip_kiely achieved 40% lower latency and 70% higher throughput for Stable Diffusion XL (SDXL) inference. π₯ β‘ See how: Optimizing model inference for faster image generation delivers a better user experience while saving money on model hosting. π° Performance gains are greater for higher step counts and more powerful GPUs and the techniques used can be applied to similar image generation pipelines including SDXL Turbo. You can also launch SDXL and SDXL Turbo from our model library and enjoy blazing-fast inference times in just a few clicks π"
X Link 2024-07-16T14:16Z [----] followers, [---] engagements
"Breaking news: Llama [---] 70B is a really good model. (p.s. inference will get much faster -- this is a minimally optimized demo on A100s)"
X Link 2024-07-23T19:38Z [----] followers, [---] engagements
"Baseten Chains is a solution for reliable high-performance inference for workflows using multiple models and processing steps. In other words: Chains is built for compound AI systems. β Check out @rapprach's new post to learn more about compound AI: https://www.baseten.co/blog/compound-ai-systems-explained/ Compound AI systems are fueling the next generation of AI products. That's one of the (many) reasons we launched Chains a framework and SDK for compound AI. β So what are compound AI systems What's with all the hype π Here I break it down: https://t.co/XhcXYmVTwP https://t.co/8x3AdTPHwN"
X Link 2024-08-06T19:17Z [----] followers, [---] engagements
"Playground vs. Stable Diffusion XL: which do you think is better π€ @philip_kiely compared them head-to-head: https://www.baseten.co/blog/playground-v2-vs-stable-diffusion-xl-1-0-for-text-to-image-generation/ https://www.baseten.co/blog/playground-v2-vs-stable-diffusion-xl-1-0-for-text-to-image-generation/"
X Link 2024-08-09T20:30Z [----] followers, [---] engagements
"π‘ Want to make your ComfyUI workflow shareable while running it on a powerful GPU π Check out @het_trivedi05 & @philip_kiely's guide: If you're using custom nodes or model checkpoints @het_trivedi05 & @rapprach have a guide for that too π"
X Link 2024-08-10T18:30Z [----] followers, [--] engagements
"π¨ Join us at [--] am PT to learn why you need async inference in production in our live webinar and Q&A We're so excited about this feature we're giving $100 in credits to attendees$200 in credits if you bring a friend π Don't miss it: https://buff.ly/4cnEGMZ https://buff.ly/4cnEGMZ"
X Link 2024-08-15T14:01Z [----] followers, [---] engagements
"@philip_kiely got tired of waiting 8-10 sec for Stable Diffusion XL to generate images so he set out to make it faster. π Using [--] different optimizations he first made it 5x faster: Then using TensorRT he and @defpan further decreased latency by 40% π
Take a look: https://buff.ly/4fZi6NH https://buff.ly/4dGFHkP https://buff.ly/4dGFHkP https://buff.ly/4fZi6NH https://buff.ly/4dGFHkP https://buff.ly/4dGFHkP"
X Link 2024-08-24T15:11Z [----] followers, [--] engagements
"@usebland It's been awesome to be on the journey with you. Can't wait to see how what you all do next"
X Link 2024-08-28T19:51Z [----] followers, 20.6K engagements
"LLM inference on GPUs has bottlenecks at two stages: GPU compute (in FLOPS) during prefill when the input is being processed to generate the first token. GPU memory (in GB/s) for the rest of inference when the autoregressive model generates each subsequent token. To prove this: Calculate the ops:byte ratio for a given GPU. Calculate the arithmetic intensity of various stages of LLM inference. Compare the two values to see where inference is compute-bound and where it is memory-bound. Follow the math for yourself: https://www.baseten.co/blog/llm-transformer-inference-guide/"
X Link 2024-09-05T14:59Z [----] followers, [---] engagements
"8 days [--] awesome events coming up π π₯ @usebland + @basetenco AI happy hour (9/12 in SF) π€ @HackMIT [----] (9/14 at MIT) π₯ @AITinkerers Technical Founder BBQ (9/17 in NYC) π @PyTorch Conference [----] (9/18 in SF) Save your spot now if you haven't already"
X Link 2024-09-10T15:22Z [----] followers, [---] engagements
"@usebland @HackMIT @AITinkerers @PyTorch π₯ Our AI happy hour with Bland AI: π₯ Technical Founder BBQ with AI Tinkerers: And don't miss us at HackMIT and PyTorch [----] https://nyc.aitinkerers.org/p/technical-founder-bbq https://lu.ma/j03r0ag1 https://nyc.aitinkerers.org/p/technical-founder-bbq https://lu.ma/j03r0ag1"
X Link 2024-09-10T15:23Z [----] followers, [---] engagements
"@aiDotEngineer @nvidia Thanks for sharing Our team had a blast doing it"
X Link 2024-09-13T17:44Z [----] followers, [--] engagements
"Deploying ML models on NVIDIA H100 GPUs offers the lowest latency and highest bandwidth inference for demanding ML workloads. But getting maximum performance takes more than just loading in a model and running inference"
X Link 2024-09-15T14:00Z [----] followers, [---] engagements
"How are you benchmarking your image generation models π Before using a model like SDXL or FLUX.1 in production youll want to see some performance benchmarks. How fast can your model create images How many images can it create at that speed And how much will it cost π°"
X Link 2024-09-21T13:30Z [----] followers, [---] engagements
"@AIatMeta @Arm @MediaTek @Qualcomm We're excited to bring dedicated deployments of these new Llama [---] models to our customers 90B vision looks especially powerful congrats to the entire Llama team"
X Link 2024-09-25T18:32Z [----] followers, [----] engagements
"Do you know what MIG is To get the most performance possible given available hardware we can use a feature of H100 GPUs that allows us to split a single physical GPU across two model serving instances. Enter: MIG or Multi-Instance GPU"
X Link 2024-09-27T22:04Z [----] followers, [---] engagements
"MIG lets us serve models on fractional GPUs: we get two H100 MIG instances each with about 1/2 the power. These instances often meet or beat A100 GPU performanceat 20% lower cost. π Learn more about using fractional H100 GPUs for efficient model serving: https://baseten.co/blog/using-fractional-h100-gpus-for-efficient-model-serving/ https://baseten.co/blog/using-fractional-h100-gpus-for-efficient-model-serving/"
X Link 2024-09-27T22:09Z [----] followers, [---] engagements
"π¨ OpenAI just dropped a new open-source model π¨ Whisper V3 Turbo is a new Whisper model with: - 8x faster relative speed vs Whisper Large - 4x faster than Medium - 2x faster than Small - 809M parameters - Full multilingual support - Minimal degradation in accuracy"
X Link 2024-09-30T22:35Z [----] followers, [----] engagements
"It's officially #SFTechWeek and we can't wait to see everyone at our talk today β β¨ Hear from our engineers how you can combine multiple ML models in production while minimizing latency + optimizing GPU utilization. πͺ Then join us for drinks food and networking πΈ RSVP if you haven't already (link in thread π)"
X Link 2024-10-07T16:42Z [----] followers, [---] engagements
"β
RSVP to Building Compound AI with Baseten Chains #SFTechWeek: https://lu.ma/vk73vrfq https://lu.ma/vk73vrfq"
X Link 2024-10-07T16:43Z [----] followers, [---] engagements
"We're excited to announce our partnership with @MongoDB π€ Together we're enabling companies to build and deploy gen AI apps that scale infinitely and deliver optimal performance per dollar. Looking to run high-performance inference for your MongoDB-powered apps Reach out In September we welcomed [--] new #AI partners: @arizeai @basetenco @doppler @haizelabs @modal_labs @PortkeyAI and @RekaAILabs. Learn more: https://t.co/lYLil65dLG https://t.co/XTN64BZIwV In September we welcomed [--] new #AI partners: @arizeai @basetenco @doppler @haizelabs @modal_labs @PortkeyAI and @RekaAILabs. Learn more:"
X Link 2024-10-11T10:08Z [----] followers, [---] engagements
"Looking to build high-performance RAG systems @philip_kiely recently took to our blog to break down how to eliminate bottlenecks for compound AI systems including RAG using Baseten and @MongoDB"
X Link 2024-10-16T12:41Z [----] followers, [---] engagements
"We benchmarked the new NVIDIA H200 GPUs for LLM inference with @LambdaAPI π H200s crush long input sequences π H200s make huge batches more efficient (high throughput) π H100 GPUs are likely more cost-efficient for many inference workloads"
X Link 2024-10-22T19:13Z [----] followers, [---] engagements
"After the team at @rimelabs trained astonishingly lifelike speech synthesis models with over [---] voices they needed fast reliable infra to bring their API to market. With Baseten they've maintained [---] ms p99 latency and 100% uptime through [----]. https://www.baseten.co/customers/rime-serves-speech-synthesis-api-with-stellar-uptime-using-baseten/ https://www.baseten.co/customers/rime-serves-speech-synthesis-api-with-stellar-uptime-using-baseten/"
X Link 2024-10-23T15:36Z [----] followers, [---] engagements
"You can now deploy models with Baseten as part of @vercel's AI SDK π Run OpenAI-compatible LLMs in your Vercel workflows with our best-in-class model performance. Plus: access all our LLM features (including streaming)in any JS frameworkwith just a few lines of code. πͺ"
X Link 2024-10-29T22:59Z [----] followers, [----] engagements
"Were excited to launch canary deployments on Baseten π¦ π Canary deployments let you gradually shift traffic to new model deployments with seamless rollback if needed. Learn more in our launch blog π"
X Link 2024-10-31T16:30Z [----] followers, [----] engagements
"π Check out the launch blog: https://www.baseten.co/blog/canary-deployments-on-baseten/ https://www.baseten.co/blog/canary-deployments-on-baseten/"
X Link 2024-10-31T16:30Z [----] followers, [---] engagements
"You can't always use open-source LLMs for specific use cases out of the box. To customize them you have optionsbut each varies in difficulty cost and customizability: π¬ Prompt engineering π§ Fine-tuning π RAG π Learn more in @philip_kiely's blog: https://buff.ly/3ximK89 https://buff.ly/3ximK89"
X Link 2024-11-03T17:27Z [----] followers, [---] engagements
"Looking to generate high-res images in record time Launch models from @bfl_ml @StabilityAI and @BytedanceTalk in two clicks including: π FLUX.1 dev and schnell π SDXL (Turbo Lightning.) πͺ SD3 Medium π Start generating images: If you have a performance target in mind just reach out https://buff.ly/46T8pfT https://buff.ly/46T8pfT https://buff.ly/46T8pfT https://buff.ly/46T8pfT"
X Link 2024-11-04T23:15Z [----] followers, [---] engagements
"Check out the latest from our lead DevRel @philip_kiely on @SoftwareHuddle diving into all things compound AI and inference optimization π§ New Episode Alert Discover the tech behind companies like Descript Bland & Robust Intelligence π Deep Dive into Inference Optimization for LLMs with Philip Kiely π Today we have Philip Kiely from @basetenco on the show. Baseten is a Series B startup focused on providing https://t.co/fBWleiIZSa New Episode Alert Discover the tech behind companies like Descript Bland & Robust Intelligence π Deep Dive into Inference Optimization for LLMs with Philip Kiely"
X Link 2024-11-05T21:30Z [----] followers, [---] engagements
"Congrats to @waseem_s @MatanPaul @dorisjwo and the whole team at @Get_Writer on the $200M Series C Its been incredible getting a front-row seat to watching the team build such an incredible platform. π π We're excited to announce that we've raised $200M in Series C funding at a valuation of $1.9B to transform work with full-stack generative AI Today hundreds of corporate powerhouses like Mars @Qualcomm @Prudential and @Uber are using Writers full-stack platform to https://t.co/cwqZTjxMyl π We're excited to announce that we've raised $200M in Series C funding at a valuation of $1.9B to"
X Link 2024-11-12T22:26Z [----] followers, [----] engagements
"Weve heard it from ComfyUI users time and again: our ComfyUI integration is best-in-class. πͺ Easily deploy custom ComfyUI workflows behind an API endpoint in minutes. π Check out @philip_kiely and @het_trivedi05's guide on serving ComfyUI models behind an API endpoint: π And @het_trivedi05 and @rapprach's guide on running custom nodes and model checkpoints: https://buff.ly/3WCYk1T https://buff.ly/49gILlk https://buff.ly/3WCYk1T https://buff.ly/49gILlk"
X Link 2024-11-20T21:44Z [----] followers, [---] engagements
"Have you tried the new @Alibaba_Qwen models yet You can start coding with Qwen2.5 Coder on Baseten in minutesusers say the 32B version outperforms GPT-4o Claude and even the 70B version. π Launch the new 7B 14B and 32B Qwen Coder models: https://www.baseten.co/library/publisher/qwen/ https://www.baseten.co/library/publisher/qwen/"
X Link 2024-11-21T20:47Z [----] followers, [---] engagements
"TheNVIDIA A10 GPUis an Ampere-series graphics card popular for common ML inference tasksbut what about the A10G Despite similar specs there may be slight performance differences in specific use cases. π Find out where in @philip_kiely's blog post: https://www.baseten.co/blog/nvidia-a10-vs-a10g-for-ml-model-inference/ https://www.baseten.co/blog/nvidia-a10-vs-a10g-for-ml-model-inference/"
X Link 2024-11-25T22:56Z [----] followers, [---] engagements
"How can you outperform A100 GPUs at 20% lower cost Meet H100 MIGs. Beyond top-tier specs they offer more quantization options and GPU flexibility across clouds. Check out @MattDotHow Vlad Shulman @defpan and @philip_kiely's guide to understand how H100 MIGs work and what to expect when serving models on https://buff.ly/4gaYbLd https://buff.ly/4gaYbLd"
X Link 2024-12-01T17:12Z [----] followers, [---] engagements
"Were excited to introduce Custom Servers on Baseten To run customers mission-critical inference workloads Baseten had to be great at [--] things: 1: Performance optimizations at the model level 2: Massive-scale infrastructure with cross-cloud horizontal scaling All wrapped in an expressive DevEx. We've built extensive tooling for performance optimizationslike our optimized Engine Builder in Truss. However some of our customers come to us with pre-optimized models and they mainly want to take advantage of the seamless autoscaling Baseten provides. Now they can with the launch of Custom Servers."
X Link 2024-12-05T17:44Z [----] followers, [---] engagements
"Learn more in the launch blog: https://www.baseten.co/blog/deploy-production-model-servers-from-docker-images/ https://www.baseten.co/blog/deploy-production-model-servers-from-docker-images/"
X Link 2024-12-05T17:45Z [----] followers, [--] engagements
"π New Generally Available Whisper drop: The fastest most accurate and cost-effective transcription with over 1000x real-time factor for production AI workloads. π Our new Generally Available Whisper implementation delivers: π Over 1000x real-time factor β¨ The lowest word error rate πͺ Production-grade reliability π§© Custom scaling and hardware per processing step π See how in our blog: Reach out to get record-breaking performance for your mission-critical AI workloads https://www.baseten.co/blog/the-fastest-most-accurate-and-cost-efficient-whisper-transcription/"
X Link 2024-12-11T19:40Z [----] followers, 10.7K engagements
"We exist to make sure the best AI builders get the fastest and most reliable performance from their models. At #AWSreInvent we noticed three key things: [--] Nearly every company is using GenAI models to augment existing products and develop net-new apps. [--] Teams are concerned about how to keep Gen AI workloads compliant and customer privacy robust. [--] Organizations are worried about ensuring a quality UX with spiky traffic and high demand. If you need blazing-fast production performance that's reliable secure and elastic in scale book a meeting with our engineers: Thank you to everyone who made"
X Link 2024-12-13T17:14Z [----] followers, [---] engagements
"DeepSeek-V3 dropped today and the LLM world just got turned upside down. Again. Early indicators are that this model completely transforms the closed and open-source model landscapes. Tl;Dr - OSS is now SOTA/Top3 again. Here are the key details to know: - Open source and licensed for commercial use - Beats Llamas Qwens GPT-4o Sonnet [---] - MoE w/ 671B params 37B active per token - 128K-token context window - Distilled o3-style reasoning Deeper dive in π§΅ This is one of the first models that need the horsepower of H200s GPUs so were getting them ready to go. If youre interested in running"
X Link 2024-12-26T22:00Z [----] followers, 65.9K engagements
"DeepSeek-V3 is an incredibly exciting model combining multiple novel techniques including distilled o3-style Chain of Thought reasoning into a standard commercially-licensed open-source LLM"
X Link 2024-12-26T22:01Z [----] followers, [---] engagements
"We run DeepSeek-V3 on NVIDIA H200 GPUs available by invitation on Baseten. Each H200 has 141GB of VRAM with [---] TB/s of bandwidth. Together [--] H200s have [----] GB enough to load all 671B parameters in FP8 plus a KV cache allocation. H200 benchmarks: https://www.baseten.co/blog/evaluating-nvidia-h200-gpus-for-llm-inference/ https://www.baseten.co/blog/evaluating-nvidia-h200-gpus-for-llm-inference/"
X Link 2024-12-26T22:01Z [----] followers, [---] engagements
"To run DeepSeek-V3 with incredible performance and reliability we use the SGLang fast inference framework. https://x.com/zhyncs42/status/1872242567036977262 We're excited to announce SGLang @lmsysorg v0.4.1 which now supports DeepSeek @deepseek_ai V3 - currently the strongest open-source LLM even surpassing GPT-4o. https://t.co/a13AmSakEb https://x.com/zhyncs42/status/1872242567036977262 We're excited to announce SGLang @lmsysorg v0.4.1 which now supports DeepSeek @deepseek_ai V3 - currently the strongest open-source LLM even surpassing GPT-4o. https://t.co/a13AmSakEb"
X Link 2024-12-26T22:02Z [----] followers, [---] engagements
"Contact our engineers today to get a dedicated deployment of DeepSeek-V3 on H200 GPUs. https://www.baseten.co/library/deepseek-v3/ https://www.baseten.co/library/deepseek-v3/"
X Link 2024-12-26T22:02Z [----] followers, [---] engagements
"Our Co-founder @amiruci and Model Performance Engineer @zhyncs42 sat down with @latentspacepod to dive deep into DeepSeek-V3 and SGLang model performance scaling AI products and more. Listen to the full episode here π : Everything you need to run Mission Critical Inference (ft. DeepSeek v3 + SGLang) https://t.co/N67XXjHsHB We chat with @amiruci and @yinengzhang about the Chinese Whale Bro drop of 2024: - @deepseek_ai v3 - @lmsysorg's SGLang - the Three Pillars of Mission Critical : Everything you need to run Mission Critical Inference (ft. DeepSeek v3 + SGLang) https://t.co/N67XXjHsHB We"
X Link 2025-01-19T16:18Z [----] followers, [----] engagements
"The open-source community is still reeling from @deepseek_ai's new R1 drop the new best-in-class reasoning model on par with o1. We're thrilled to have a close relationship with the DeepSeek team hosting DeepSeek-R1 (and V3) from day one (running on H200s). π Learn more about what makes DeepSeek unique and how to run it on @latentspacepod featuring our Co-founder @amiruci and Model Performance Engineer @zhyncs42"
X Link 2025-01-22T20:48Z [----] followers, [----] engagements
"Huge congrats to the @riffusionai team on the launch of their new generative music model Try it out π Introducing FUZZ a generative music model like no other. Personalized full-length high-quality and infinite. Were making this instrument free for as long as our GPUs survive. The best of FUZZ in thread. https://t.co/GHtKphYHV5 Introducing FUZZ a generative music model like no other. Personalized full-length high-quality and infinite. Were making this instrument free for as long as our GPUs survive. The best of FUZZ in thread. https://t.co/GHtKphYHV5"
X Link 2025-01-30T21:06Z [----] followers, [---] engagements
"2025 is the year of inference. We're thrilled to announce our $75m Series C co-led by @IVP and @sparkcapital with participation from @GreylockVC @conviction @basecasevc @southpkcommons and @lachy. We're also excited to add Dick Costolo and Adam Bain from @01Advisors as new investors. Check out our CEO Tuhin's blog to learn more. It's time to build"
X Link 2025-02-19T17:05Z [----] followers, 125.3K engagements
"Last week was crazy. Thank you to everyone who celebrated our Series C with us and met us in person at @aiDotEngineer and our tech breakfast with @MorganBarrettX in NYC And congratulations to our friends at @AbridgeHQ @EvidenceOpen and @LambdaAPI for raising rounds last week too"
X Link 2025-02-24T23:23Z [----] followers, [----] engagements
"You can start using the new @Alibaba_Qwen Qwen QwQ-32B from our model library in two clicks π"
X Link 2025-03-06T23:36Z [----] followers, [----] engagements
"Don't miss live talks from the team: Field Notes on Scaling Real-time AI-Native Applications in Production Session EXS74242 Advanced Techniques for Inference Optimization with TensorRT-LLM Session S71693"
X Link 2025-03-14T14:29Z [----] followers, [---] engagements
"Join Baseten @gmi_cloud and Bridge IT Consulting for: The San Jose Sharks vs Carolina Hurricanes game in our private suite (limited availability): Our Happy Hour on Thurs 03/19: https://buff.ly/eo43oSF https://lu.ma/Sharks https://buff.ly/eo43oSF https://lu.ma/Sharks"
X Link 2025-03-14T14:29Z [----] followers, [---] engagements
"BEIs performance boosts arent an artifact of using more powerful hardware. BEI is even more memory-efficient than other toolkits meaning you can run it on smaller instance types and still get superior performance. (4/5 π§΅)"
X Link 2025-03-27T16:46Z [----] followers, [---] engagements
"Learn more about BEI in our launch blog: Shoutout to @feilsystem on our model performance team for his work carefully optimizing BEI for production AI workloads (5/5 π§΅) https://www.baseten.co/blog/introducing-baseten-embeddings-inference-bei/ https://www.baseten.co/blog/introducing-baseten-embeddings-inference-bei/"
X Link 2025-03-27T16:46Z [----] followers, [---] engagements
"The first Baseten bot is live on Poe It's very fast you can ask questions in your language of choice and get instant answers. We're excited to partner withQuorato power the fastest open-source models for the Poe community"
X Link 2025-03-28T16:39Z [----] followers, [----] engagements
"New bots for Llama [--] Scout and Maverick are now live on Poe Get started with an 8M token context window for Scout (yes you read that right) and 1M for Maverick. We're thrilled to power the fastest open-source models for Quoramore to come"
X Link 2025-04-07T20:43Z [----] followers, [---] engagements
"π We've been heads down for months and now it's finally launch week. Today were releasing our new brand. We believe inference is the foundation of all AI going forward. That's what our new look is all about: . . "
X Link 2025-05-19T21:59Z [----] followers, [----] engagements
"Inference is everywhere. Come find us in San Francisco"
X Link 2025-05-20T23:57Z [----] followers, [----] engagements
"let there be inference"
X Link 2025-05-22T16:00Z [----] followers, 330.8K engagements
"Our secret sauce The Baseten Inference Stack. It consists of two core layers: the Inference Runtime and Inference-optimized Infrastructure. Our engineers break down all the levers we pull to optimize each layer in our new white paper"
X Link 2025-05-27T22:13Z [----] followers, [----] engagements
"New DeepSeek just dropped. Proud to serve the fastest DeepSeek R1 [----] inference on OpenRouter (#1 on TTFT and TPS) with our Model APIs. π DeepSeek-R1-0528 is here πΉ Improved benchmark performance πΉ Enhanced front-end capabilities πΉ Reduced hallucinations πΉ Supports JSON output & function calling β
Try it now: https://t.co/IMbTch8Pii π No change to API usage docs here: https://t.co/Qf97ASptDD π https://t.co/kXCGFg9Z5L π DeepSeek-R1-0528 is here πΉ Improved benchmark performance πΉ Enhanced front-end capabilities πΉ Reduced hallucinations πΉ Supports JSON output & function calling β
"
X Link 2025-05-29T21:03Z [----] followers, [----] engagements
"People think that GPUs + vLLM = production grade inference. We know that to not be true. With this you can get 80% of what you want the model to do and 95% reliability but formission critical inference - thats not enough"
X Link 2025-05-31T00:03Z [----] followers, [----] engagements
"Hot take: Inference is not a commodity. There is strong complexity and differences between inference providers. Couldnt have said it better @CorinneMRiley"
X Link 2025-06-16T22:51Z [----] followers, [---] engagements
""Inference is more than just vibes" catch us on the Bay Bridge"
X Link 2025-06-17T23:06Z [----] followers, [----] engagements
"RAISE Summit Paris kicks off tomorrow Find us at Booth [--] on July [--]. Come say hi to our team check out a demo and grab your Baseten "Artificially Intelligent" T-shirt. Don't miss @rapprach's talk on the AI landscape and open source on Wednesday July [--] at 11:40 AM"
X Link 2025-07-07T20:44Z [----] followers, [---] engagements
"Voice is the next frontier for customer experience. Weve seen first hand how Voice can transform the way users interact with new products and existing brands. But we also know Voice is very hard to do well"
X Link 2025-07-08T18:50Z [----] followers, [---] engagements
"This week we'll be deep diving into Voice AI. Well share how we tackle Voice inference how you can build with Voice and share a few exciting demos along the way. Stay tuned"
X Link 2025-07-08T18:50Z [----] followers, [--] engagements
"Confession. Kimi K2 is one of our new favorite models for agentic use cases. Baseten is powering the fastest Kimi K2 available on Openrouter. Test it through our Model APIs today. Alsosay Kimi K2 10x fast. Thanks @Madisonkanna"
X Link 2025-07-15T23:37Z [----] followers, [----] engagements
"Kimi K2 has arrived. You can deploy it on Baseten. Join as we briefly dig into why K2 is generating so much buzz. If youre building agents or utilizing LLMs its absolutely worth testing"
X Link 2025-07-16T21:53Z [----] followers, [---] engagements
"Launch it here https://www.baseten.co/library/kimi-v2/ https://www.baseten.co/library/kimi-v2/"
X Link 2025-07-16T21:53Z [----] followers, [---] engagements
"Baseten is growing Were always looking for determined humble people to join our team. Catch us at the Greylock Techfair on Thursday July [--] at Oracle Park. If you can't attend apply for a role online or share with someone you think would be a great fit"
X Link 2025-07-17T23:42Z [----] followers, [---] engagements
"You can try Voxtral from our model library: https://www.baseten.co/library/voxtral-small-24b/ https://www.baseten.co/library/voxtral-small-24b/"
X Link 2025-07-23T19:22Z [----] followers, [---] engagements
"Deploy Voxtral https://www.baseten.co/library/voxtral-small-24b/ https://www.baseten.co/library/voxtral-small-24b/"
X Link 2025-07-24T23:05Z [----] followers, [---] engagements
"When@zeddotdev set out to build Edit Prediction they knew they wanted it to feel instantaneous. But their previous inference solution wasn't meeting the latency throughput or capacity they needed to meet that goal. Now Zed powers a hyper-responsive Edit Prediction experience and we're thrilled to play a part in that process"
X Link 2025-07-25T19:25Z [----] followers, [---] engagements
"Many thanks to@darrylktaft and @thenewstack for highlighting our work together and everything Zed does to power the world's fastest AI code editor (our engineers have always been huge Zed fans). Read the full article on The New Stack here: https://thenewstack.io/how-rust-based-zed-built-worlds-fastest-ai-code-editor/ https://thenewstack.io/how-rust-based-zed-built-worlds-fastest-ai-code-editor/"
X Link 2025-07-25T19:25Z [----] followers, [---] engagements
"Or check out our case study on how we optimized @zeddotdev's model performance for launch day: https://www.baseten.co/resources/customers/zed-industries-serves-2x-faster-code-completions-with-baseten/ https://www.baseten.co/resources/customers/zed-industries-serves-2x-faster-code-completions-with-baseten/"
X Link 2025-07-25T19:25Z [----] followers, [---] engagements
"Building reliable agents requires a different tech stack: one that natively supports compound AI systems and evaluates quality along the full trajectory of agent behavior. We teamed up with @PatronusAI to break down what this stack looks like from infra and models to debuggers"
X Link 2025-07-29T18:43Z [----] followers, [---] engagements
"@drishanarora It's a great day in open source AI Congrats on the launch. We're excited to be a launch partner with dedicated deployments of Cogito v2 on B200: https://www.baseten.co/library/cogito-v2-671b/ https://www.baseten.co/library/cogito-v2-671b/"
X Link 2025-07-31T17:29Z [----] followers, [----] engagements
"Deep Cogito just dropped [--] new open LLMs -- each one is SOTA for its size. We're excited to be a launch partner for their Cogito v2 models which use a novel IDA mechanism to improve intuition and shorten reasoning chains. Today we are releasing [--] hybrid reasoning models of sizes 70B 109B MoE 405B 671B MoE under open license. These are some of the strongest LLMs in the world and serve as a proof of concept for a novel AI paradigm - iterative self-improvement (AI systems improving themselves). https://t.co/ZfmQIOgysv Today we are releasing [--] hybrid reasoning models of sizes 70B 109B MoE 405B"
X Link 2025-07-31T17:32Z [----] followers, [---] engagements
"Cogito v2 is available as a dedicated deployment on B200 GPUs: https://www.baseten.co/library/cogito-v2-671b/ https://www.baseten.co/library/cogito-v2-671b/"
X Link 2025-07-31T17:32Z [----] followers, [---] engagements
"To illustrate the impact of using BEI on B200s we ran benchmarks on the largest of the Qwen3 Embedding series. On another low query-throughput test (5 tokens/request) BEI handles 8.4x higher queries per second than vLLM and 1.6x higher than TEI"
X Link 2025-08-04T21:07Z [----] followers, [---] engagements
"Read the full blog here: https://www.baseten.co/blog/run-qwen3-embedding-on-nvidia-blackwell-gpus/ https://www.baseten.co/blog/run-qwen3-embedding-on-nvidia-blackwell-gpus/"
X Link 2025-08-04T21:07Z [----] followers, [---] engagements
""The reality is that for each customer its a different story. Migrating to a new model isnt a small effort there are a lot of variables at playnone are trivial." Our CEO Tuhin spoke with Belle Lin at the WSJ about how gpt-oss stacks up and what businesses considering switching need to consider"
X Link 2025-08-07T18:46Z [----] followers, [----] engagements
"@alphatozeta8148 man we are trying"
X Link 2025-08-07T22:25Z [----] followers, [--] engagements
"@NVIDIAAIDev @OpenAI thank you"
X Link 2025-08-07T23:05Z [----] followers, [--] engagements
"If you're at Ai4 in Vegas next week don't miss Philip Kiely's talk "Inference in the Wild: Lessons from Scaling Real-time AI in Production" on Aug. [--] at 3:25pm. It takes place in Room [---] on the AI Infrastructure & Scalability track"
X Link 2025-08-08T20:41Z [----] followers, [---] engagements
"Day [--] of Ai4 Vegas is here Did you catch our lead DevRel Philip Kielys talk "Inference in the Wild: Lessons from Scaling Real-time AI in Production" yesterday"
X Link 2025-08-13T15:50Z [----] followers, [---] engagements
"Swing by Booth [---] to meet @philip_kiely & the team at #AI4 today. See a live demo and snag your Artificially Intelligent shirt"
X Link 2025-08-13T15:51Z [----] followers, [---] engagements
"When AI education needs to feel human latency matters. Praktika hit [---] ms transcription with 50% cost savings using Basetentransforming their language learning experience for millions of learners across [--] languages"
X Link 2025-08-14T17:24Z [----] followers, [---] engagements
"See how @praktika_ai did it using Basetens Inference Stack: https://www.baseten.co/resources/customers/praktika/ https://www.baseten.co/resources/customers/praktika/"
X Link 2025-08-14T17:24Z [----] followers, [---] engagements
"When it comes to open-source text-to-speech Orpheus should be your go-to model. We're seeing so much demand around real-time voice streaming our engineer Alex Ker wrote a blog on how to start streaming audio in real time with Orpheus in [--] minutes"
X Link 2025-08-18T22:26Z [----] followers, [----] engagements
"Want to fine-tune gpt-oss-120b We teamed up with Axolotl to launch a new recipe to run fine-tuning out of the box multi-node training one-line deployments from the CLI and built-in observability included"
X Link 2025-08-19T18:50Z [----] followers, [----] engagements
"Our first ever Inference Invitational is tomorrow 8/21 Join us for friendly pickleball & padel tournaments great conversation delicious food and drinks and your own custom pickleball paddle. Don't know how to play We'll have an instructor there to teach you"
X Link 2025-08-20T22:48Z [----] followers, [---] engagements
"DeepSeek V3 + R1 (now V3.1 and R1 0528) proved that open-source models can rival closed ones at lower cost and with greater control. But running them in production That became the new hurdle. Our new guide breaks down: Why DeepSeek is powerful and hard to serve The infra + runtime hurdles youll hit Tools + techniques for performant reliable deployments"
X Link 2025-08-27T02:58Z [----] followers, [----] engagements
"If youre building with DeepSeek this is your roadmap to performant reliable cost-efficient inference. Read our full guide here: https://www.baseten.co/resources/guide/the-complete-deepseek-model-guide/ https://www.baseten.co/resources/guide/the-complete-deepseek-model-guide/"
X Link 2025-08-27T02:58Z [----] followers, [---] engagements
"Our team met Parsed a few months ago and we could not be more excited to see the inflection point they are a part of - customized models built for those high impact jobs. This is an incredible team and we're thrilled to power their inference. Congrats @parsedlabs. Let's build. Today were launching Parsed. We are incredibly lucky to live in a world where we stand on the shoulders of giants first in science and now in AI. Our heroes have gotten us to this point where we have brilliant general intelligence in our pocket. But this is a local minima. We https://t.co/R7cR3EGVHT Today were launching"
X Link 2025-08-28T17:07Z [----] followers, [----] engagements
"We're excited to announce Fall into Inference: a multi-month deep dive into our cloud ecosystem and how we use Multi-cloud Capacity Management (MCM) to power fast reliable inference at scale. Over the next few months well showcase how we use MCM to power real-world AI use cases with partners including Google Cloud Amazon Web Services (AWS) OCI CoreWeave Nebius Vultr and NVIDIA. Stay tuned for weekly technical blogs case studies and deep dives with our partners"
X Link 2025-09-03T17:18Z [----] followers, [----] engagements
"We raised a $150M Series D Thank you to all of our customers who trust us to power their inference. We're grateful to work with incredible companies like @Get_Writer @zeddotdev @clay_gtm @trymirage @AbridgeHQ @EvidenceOpen @MeetGamma @Sourcegraph and @usebland. This round was led by @bondcap with @jaysimons joining our Board. We're also thrilled to welcome @conviction and @CapitalG to the round alongside support from @01Advisors @IVP @sparkcapital @GreylockVC @ScribbleVC @BoxGroup and Premji Invest. Today were excited to announce our $150M Series D led by BOND with Jay Simons joining our"
X Link 2025-09-05T15:05Z [----] followers, 19.5K engagements
"AI everywhere = Inference everywhere = Baseten everywhere IN NEWS: @basetenco raises a $150M series D round. @tuhinone (Founder & CEO Baseten) on the future of inference: I think the token price goes down and inference should get cheaper over time. And that really just means there is going to be more inference. Every time we https://t.co/oKplA7BIOY IN NEWS: @basetenco raises a $150M series D round. @tuhinone (Founder & CEO Baseten) on the future of inference: I think the token price goes down and inference should get cheaper over time. And that really just means there is going to be more"
X Link 2025-09-05T22:49Z [----] followers, [----] engagements
"We just raised a $150M Series D and were growing If you're looking for your next opportunity take a look at our 30+ open roles across engineering and GTM"
X Link 2025-09-08T21:09Z [----] followers, [---] engagements
"@Alibaba_Qwen (Gated) Attention is all you need. Excited to offer both Qwen3-Next models on dedicated deployments backed by 4xH100 GPUs. https://app.baseten.co/deploy/qwen_3_next_80B_A3_thinking https://www.baseten.co/library/qwen3-next-80b-a3b-instruct/ https://app.baseten.co/deploy/qwen_3_next_80B_A3_thinking https://www.baseten.co/library/qwen3-next-80b-a3b-instruct/"
X Link 2025-09-11T19:38Z [----] followers, [----] engagements
"Qwen3 Next 80B A3B Thinking outperforms higher-cost and closed models like Gemini [---] Flash Thinking on benchmarks nearing Qwen's flagship model quality at a fraction the size. We have it ready to deploy in our model library running on @nvidia and the Baseten Inference Stack"
X Link 2025-09-15T20:34Z [----] followers, [----] engagements
"The key is having good intuition being willing to go out on a limb building fast learning fast and killing things when you need to. Following our Series D raise our Co-founder and CTO @amiruci walks through why he bet early on inference how were scaling through generative model hypergrowth and his advice for fellow founders"
X Link 2025-09-16T17:48Z [----] followers, [----] engagements
"Well be at SigSum SF this Thursday Sept [--] Catch: - @philip_kiely's talk "Inference Engineering for Hypergrowth" (1 PM) - @tuhinone on the panel "Breaking Building and Betting on AI" (3:30 PM) Visit us in the partner showcase to grab an "Artificially Intelligent" tee"
X Link 2025-09-22T22:36Z [----] followers, [---] engagements
"@rohanpaul_ai someone needs to run the inference and make it fast. we can help with that"
X Link 2025-09-23T22:47Z [----] followers, [----] engagements
"Were hosting our friends at @OpenRouterAI for a SF Tech Week breakfast talk Join us at Baseten HQ on October [--] at 10AM for Learnings from processing [--] Trillion Tokens"
X Link 2025-09-29T22:36Z [----] followers, [---] engagements
"The team at OpenRouter will dive into: Closed vs. open model adoption Global usage trends from running inference at massive scale Tool calling & pricing shifts Seats are limited. Save yours here: https://partiful.com/e/q6l1SeDtPGU9kCQPArk6 https://partiful.com/e/q6l1SeDtPGU9kCQPArk6"
X Link 2025-09-29T22:36Z [----] followers, [---] engagements
"From document processing and image recognition to drug discovery healthcare use cases are at the forefront of AI adoption. We partner with teams like Vultr to support these applications with fast reliable inference. With Multi-cloud Capacity Management and theBaseten Inference Stack we power near-limitless scale for healthcare AI teams on NVIDIA Blackwell GPUs"
X Link 2025-10-01T20:31Z [----] followers, [---] engagements
"Embeddings power search RecSys and agents but making them performant in production requires satisfying two different traffic profiles. In our new guide we cover how to build embedding workflows that are both extremely high-throughput and low-latency from indexing millions of data points to serving individual search queries in milliseconds"
X Link 2025-10-02T20:00Z [----] followers, [---] engagements
"Read it here: https://www.baseten.co/resources/guide/high-performance-embedding-model-inference https://www.baseten.co/resources/guide/high-performance-embedding-model-inference"
X Link 2025-10-02T20:00Z [----] followers, [---] engagements
"Being fast for one customer isn't enough. Low-latency inference at scale requires the ability to recruit every GPU in the world"
X Link 2025-10-07T18:52Z [----] followers, [---] engagements
"Fast models for our fast friends at Factory Deploy and serve custom models with enterprise-grade infrastructure on @basetenco. Special promo for Factory users: receive $500 Model API credits when you fill out this form. https://t.co/UI8NqfACDY Deploy and serve custom models with enterprise-grade infrastructure on @basetenco. Special promo for Factory users: receive $500 Model API credits when you fill out this form. https://t.co/UI8NqfACDY"
X Link 2025-10-07T22:40Z [----] followers, [----] engagements
"Register here: https://events.redis.io/redis-released-london-2025 https://events.redis.io/redis-released-london-2025"
X Link 2025-10-08T15:55Z [----] followers, [---] engagements
"Just in time for the new year Awesome job by our model performance team to hit the top of @ArtificialAnlys for GLM [---] try it here https://www.baseten.co/library/glm-4-7/ happy holidays we just dropped the fastest GLM 4.7: 400+ TPS as benchmarked by @ArtificialAnlys https://t.co/eRv47ok1sV https://www.baseten.co/library/glm-4-7/ happy holidays we just dropped the fastest GLM 4.7: 400+ TPS as benchmarked by @ArtificialAnlys https://t.co/eRv47ok1sV"
X Link 2025-12-28T20:23Z [----] followers, [----] engagements
"We boosted acceptance rate by up to 40% with the Baseten Speculation Engine. How By combining Multi-Token Prediction (MTP) with Suffix Automaton (SA) decoding. This hybrid approach crushes production coding workloads delivering 30%+ longer acceptance lengths on code editing tasks with zero added overhead. An open source version for TensorRT-LLM is now available to the community. Read the full engineering deep dive: https://www.baseten.co/blog/boosting-mtp-acceptance-rates-in-baseten-speculation-engine/ https://www.baseten.co/blog/boosting-mtp-acceptance-rates-in-baseten-speculation-engine/"
X Link 2026-01-27T19:44Z [----] followers, 13.5K engagements
"Were excited to launch Metas Llama [--] in our model library in both 8B and 70B π The newly introduced Llama [--] is a significant improvement over Llama [--] with increased tokens and reduced false refusal rates. These models deliver unparalleled performance showcasing significant advancements in efficiency and speed. Our Llama [--] 8B runs on A100s and Llama [--] 70B runs on H100s optimized for production. https://twitter.com/i/web/status/1781072277850714184 https://twitter.com/i/web/status/1781072277850714184"
X Link 2024-04-18T21:28Z [----] followers, 131.7K engagements
"Meet the Baseten team at the @aiDotEngineer Summit in NYC this week π Booth G3 get a demo and grab some swag"
X Link 2025-02-18T23:19Z [----] followers, [---] engagements
"@IVP @sparkcapital @GreylockVC @conviction @basecasevc @southpkcommons @Lachy @01Advisors https://www.baseten.co/blog/announcing-baseten-75m-series-c/ https://www.baseten.co/blog/announcing-baseten-75m-series-c/"
X Link 2025-02-19T17:05Z [----] followers, [----] engagements
"Friendly reminder from @willreed_21 (Spark Capital): Your team's time is best spent on your product not the infrastructure it runs on"
X Link 2025-06-25T18:22Z [----] followers, [---] engagements
"If you're in London catch Rachel Rapp with our friends from Tavily and cognee at Redis Released. From building and deploying the fastest agentic systems to industry trends they'll break down what the agentic tech stack looks like in a live panel this Thursday"
X Link 2025-10-08T15:55Z [----] followers, [---] engagements
"We caught up with the one and only @thdxr on Opencode's newly launched Zen and his hot takes Zen isnt a for-profit thing. This is something we try to do at breakeven. As we grow we pool all of our resources together and negotiate discounted rates with providers. These cost savings flow back right down to everyone"
X Link 2025-10-10T19:30Z [----] followers, [----] engagements
"From sketch to a 3D model in under [--] seconds with a 1B parameter model We built a flower card generator using Autodesks WaLa open-source AI and Baseten for scalable GPU deployment"
X Link 2025-10-13T21:38Z [----] followers, [---] engagements
"Fast Company named Baseten one of the [--] Next Big Things in Tech [----] Were proud to be recognized for powering the fastest and most reliable inference for the fastest-growing AI companies like Abridge Clay OpenEvidence and many more"
X Link 2025-10-14T19:05Z [----] followers, [----] engagements
"Powering inference for the fastest growing AI companies like OpenEvidence Writer and Clay means being the first to use bleeding-edge model performance tooling in production. That's why we were early adopters of NVIDIA Dynamo giving us 50% lower latency and 60%+ higher throughput with KV cache-aware routing. These results are the tip of the iceberg especially for our customers running large models with large context windows under heavy load"
X Link 2025-10-16T18:01Z [----] followers, [----] engagements
"See the benchmarks in our blog by @aqaderb @feilsystem and @rapprach: https://www.baseten.co/blog/how-baseten-achieved-2x-faster-inference-with-nvidia-dynamo/#how-baseten-uses-nvidia-dynamo https://www.baseten.co/blog/how-baseten-achieved-2x-faster-inference-with-nvidia-dynamo/#how-baseten-uses-nvidia-dynamo"
X Link 2025-10-16T18:01Z [----] followers, [---] engagements
"We unleashed our model performance team on GLM [---] and were very excited to be the fastest provider available today on Artificial analysis at [---] TPS (2x the next best TPS) and a less than [--] second TTFT (2x the next best TTFT). https://artificialanalysis.ai/models/glm-4-6-reasoning/providers https://artificialanalysis.ai/models/glm-4-6-reasoning/providers"
X Link 2025-10-17T21:32Z [----] followers, 11.2K engagements
"We see the massive AWS outage. Baseten web app is down but inference new deploys training jobs and the model management APIs are unaffected"
X Link 2025-10-20T08:36Z [----] followers, [----] engagements
"@DannieHerz @jeffbarg @ClayRunHQ The clay slackmoji in the b10 slack has been getting a lot of play recently"
X Link 2025-10-23T22:16Z [----] followers, [--] engagements
"DeepSeek-OCR stunned the internet this week with 10x more efficient compression unlocking faster and cheaper intelligence. We rolled out performant inference support on day one of the model drop. Learn why compressions are so effective at making models smarter what applications you can build with DeepSeek-OCR and how to serve it on Baseten in under [--] minutes. Link in the replies"
X Link 2025-10-24T00:08Z [----] followers, [----] engagements
"This week Baseten's model performance team unlocked the fastest TPS and TTFT for gpt-oss 120b on @nvidia hardware. When gpt-oss launched we sprinted to offer it at [---] TPS. now we've exceeded [---] TPS and [----] sec TTFT. and we'll keep working to keep raising the bar. We are proud to offer the best E2E latency available with near-limitless scale incredible performance and the highest uptime 99.99%"
X Link 2025-10-24T16:18Z [----] followers, 42.4K engagements
"We are so excited to be a launch partner for @nvidia Nemotron Nano [--] VL today and offer day-zero support for this highly accurate and efficient vision language model alongside other models in the Nemotron family. To learn more read our blog here https://www.baseten.co/blog/high-performance-agents-for-financial-services-with-nvidia-nemotron-on-baseten/ https://www.baseten.co/blog/high-performance-agents-for-financial-services-with-nvidia-nemotron-on-baseten/"
X Link 2025-10-28T18:43Z [----] followers, [----] engagements
"After months of feedback from our early customers and thousands of jobs completed Baseten Training is officially ready for everyone. π Access compute on demand train any model run multi-node jobs and deploy from checkpoints with cache-aware scheduling an ML Cookbook tool calling recipes and more"
X Link 2025-10-30T18:06Z [----] followers, 13.7K engagements
"@james_weitzman @athleticKoder β"
X Link 2025-11-03T23:38Z [----] followers, [--] engagements
"Fun fact - we asked people to describe their favorite agent in SF. We got suggestions for a bunch of new agentic apps to try. Our favorite agent Probably James Bond. If youre living in the new world of agentic AI check out our new deep dive on tool calling in inference. Check out the blog in the comments"
X Link 2025-11-05T23:17Z [----] followers, [---] engagements
"Blog: https://www.baseten.co/blog/tool-calling-in-inference/utm_source=twitter&utm_medium=social&utm_campaign=education_tool_calling_blog_2025-11-05 https://www.baseten.co/blog/tool-calling-in-inference/utm_source=twitter&utm_medium=social&utm_campaign=education_tool_calling_blog_2025-11-05"
X Link 2025-11-05T23:17Z [----] followers, [---] engagements
"Congratulations to @Kimi_Moonshot on their newest model drop Kimi-K2 Thinking one of the worlds most advanced open source models. Baseten is proud to offer Day [--] Support. Sign up with your business email address and get $100 in Model API credits"
X Link 2025-11-07T15:40Z [----] followers, 270.4K engagements
"Heading to KubeCon next week Come visit the team at Booth #631 to test your AI knowledge. Top of the leaderboard gets prizes π"
X Link 2025-11-07T21:56Z [----] followers, [---] engagements
"Excited to share this piece from @VentureBeat spotlighting how Baseten is redefining the AI infrastructure game: Baseten takes on hyperscalers with new AI training platform that lets you own your model weights. Thanks VentureBeat Read full article https://venturebeat.com/ai/baseten-takes-on-hyperscalers-with-new-ai-training-platform-that-lets-you https://venturebeat.com/ai/baseten-takes-on-hyperscalers-with-new-ai-training-platform-that-lets-you"
X Link 2025-11-10T17:59Z [----] followers, [----] engagements
"At @KubeCon_ Swing by Booth #631 to test your inference knowledge and earn some swag"
X Link 2025-11-11T19:05Z [----] followers, [---] engagements
"Congrats to the World Labs team on the launch today Marble lets you create 3D worlds from just a single image text prompt video or 3D layout. We couldn't be more excited to power the inference behind this. Can't wait to see what everyone makes. π₯ Introducing Marble by World Labs: a foundation for a spatially intelligent future. Create your world at https://t.co/V267VJu1H9 https://t.co/T00mtETmCA Introducing Marble by World Labs: a foundation for a spatially intelligent future. Create your world at https://t.co/V267VJu1H9 https://t.co/T00mtETmCA"
X Link 2025-11-12T17:39Z [----] followers, [----] engagements
"Welcome to the new age Defense Against the Dark Arts. It's called fast inference (& Harry Potter would be jealous). Check out our deep dive on how the Baseten wizards (model performance team) optimized Kimi K2 Thinking (now faster and just as smart as GPT-5). https://www.baseten.co/blog/kimi-k2-thinking-at-140-tps-on-nvidia-blackwell/utm_source=twitter&utm_medium=social&utm_campaign=awareness_kimi-k2-thinking_performance_blog_2025-11-12"
X Link 2025-11-12T20:50Z [----] followers, [----] engagements
"Baseten used @nvidia Dynamo to double inference speed for long-context code generation and increased throughput by 1.6x. Dynamo simplifies multi-node inference on Kubernetes helping us scale deployments while reducing costs. Read the full blog post belowπ β NVIDIA Dynamo is now available across major cloud providersincluding @awscloud @googlecloud @Azure and @OracleCloudto enable efficient multi-node inference on Kubernetes in the cloud. And Its already delivering results: @basetenco is seeing faster more cost-effective https://t.co/6efirNmK3r β NVIDIA Dynamo is now available across major"
X Link 2025-11-13T19:52Z [----] followers, [----] engagements
"Working with the @GammaApp team never quite feels like work and thats how their product feels. "Criminally fun." We are honored to be long-term partners and power Gammas inference needs as they push the envelope on how we present ideas. Congratulations on the Series B"
X Link 2025-11-13T22:02Z [----] followers, 22.8K engagements
"Shoutout to the incredible team at @oxen_ai Turning datasets deployed models like its light work. They build fast. We help them ship even faster. Thanks for the partnership @gregschoeninger Check out the story in the comments #AI #MLOps #Baseten"
X Link 2025-11-18T21:28Z [----] followers, [----] engagements
"@drishanarora Congratulations on the launch Excited to support with dedicated deployments of Cogito V2.1. https://www.baseten.co/library/cogito-v2-1-671b/ https://www.baseten.co/library/cogito-v2-1-671b/"
X Link 2025-11-19T18:16Z [----] followers, [----] engagements
"Congrats to our friends at Deep Cogito on launching the most powerful US-based OSS model. It turns out LLM self play produces shorter reasoning chains (low token consumption) while maintaining great performance Try it out on Baseten today: https://www.baseten.co/library/cogito-v2-1-671b/ Today we are releasing the best open-weight LLM by a US company: Cogito v2.1 671B. On most industry benchmarks and our internal evals the model performs competitively with frontier closed and open models while being ahead of any US open model (such as the best versions of https://t.co/F6eZnn8s2Q"
X Link 2025-11-19T20:59Z [----] followers, [----] engagements
"If you need an adrenaline rush to wake up from your post-Thanksgiving stupor we got you. @deepseek_ai V3.2 dropped this week and is now available on Baseten. Its so smart your mother will ask why you can't be more like DeepSeek. V3.2 is currently on par with GPT-5 all whilst being multiples cheaper. V3.2 is now live on our Model APIs and on @OpenRouterAI and @ArtificialAnlys. Baseten is the fastest provider with [----] TTFT and [---] tps (thats 1.5x faster than the next guy). For a model this size its screaming. Get the brains without trading off performance"
X Link 2025-12-04T16:50Z [----] followers, [----] engagements
"We're excited to partner with @getstream_io to help developers build fast production-ready Vision Agents. Together we combined Baseten-hosted Qwen3-VL with Streams real-time voice and video to create an Electronics Setup & Repair Assistant that can see understand and guide users in real time. Check out the full walkthrough and demo below Vision Agents just got better: @basetenco joins us to take multimodal capabilities even further. Our team worked together to create a guide on running models hosted on Baseten with Vision Agents. Check out our blog post where we use Baseten + Qwen 3-VL"
X Link 2025-12-05T19:26Z [----] followers, [----] engagements
""We want people to own their own intelligence and we now see a really straight shot to get there." @amiruci sits down with @mudithj_ and @charles0neill from the Parsed team. Check out the full fireside chat in the comments"
X Link 2025-12-11T22:12Z [----] followers, [----] engagements
"Inference performance isnt just about the model. It relies on the entire inference stack. In our Inference Stack white paper we explain how Baseten uses @nvidia TensorRT LLM and Dynamo to reduce latency and increase throughput across model modalities. If you care about speed this is worth reading. https://www.baseten.co/resources/guide/the-baseten-inference-stack https://www.baseten.co/resources/guide/the-baseten-inference-stack"
X Link 2026-01-09T20:20Z [----] followers, [----] engagements
"π We're thrilled to introduce the fastest most accurate and cost-efficient Whisper-powered transcription and diarization on the market: [----] RTFwith Whisper Large V3 Turbo Streaming transcriptionwith consistent low latency The most accurate real-time diarization 90% lower costdue to infra optimizations Used in production by companies like @NotionHQ π https://twitter.com/i/web/status/2012203547912245366 https://twitter.com/i/web/status/2012203547912245366"
X Link 2026-01-16T16:41Z [----] followers, [----] engagements
"Want to learn about how to run high performance LLM inference at scaleOur Head of DevRel @philipkiely has the perfect talk for you during NVIDIA Dynamo Day on Jan [--]. Register here: https://nvevents.nvidia.com/dynamodayi=RNQf_gN5cXcdmLzfj_IevFS-tdC553CY https://nvevents.nvidia.com/dynamodayi=RNQf_gN5cXcdmLzfj_IevFS-tdC553CY"
X Link 2026-01-20T19:24Z [----] followers, [----] engagements
"Were thrilled to be working with @LangChain to power the fastest way to generate production-ready agents without code. LangChains Agent Builder represents a way for non-technical knowledge workers and citizen developers to build useful things with AI. All with Baseten Inference backbone and GLM [---]. Weve written a tutorial for you to create your own in minutes"
X Link 2026-01-21T17:19Z [----] followers, [----] engagements
"Tired of waiting for video generation Say less. We've optimized the Wan [---] runtime to hit: 3x faster inference on NVIDIA Blackwell 2.5x faster on Hopper 67% cost reduction. Read the full breakdown of our kernel optimizations and benchmarks here: https://www.baseten.co/blog/wan-2-2-video-generation-in-less-than-60-seconds/#benchmarking-methodology https://www.baseten.co/blog/wan-2-2-video-generation-in-less-than-60-seconds/#benchmarking-methodology"
X Link 2026-01-22T14:00Z [----] followers, [----] engagements
"@lucas_dehaas so if people are wondering why there are baseten stickers tagged across sf they know you're the one to blame"
X Link 2026-01-23T19:05Z [----] followers, [---] engagements
"@adambain grateful for your support and we're still so early π₯"
X Link 2026-01-23T20:24Z [----] followers, [---] engagements
"LIVE Tune in to hear @tuhinone discuss our series E open source and the multi-model future on CNBC A Chinese AI model is having a real coding moment. and not just in China. Zhipu says its coding agent users are concentrated in the *US and China. @tuhinone CEO of @basetenco joins me on the back of his latest fundraise to discuss whats hype and whats real A Chinese AI model is having a real coding moment. and not just in China. Zhipu says its coding agent users are concentrated in the *US and China. @tuhinone CEO of @basetenco joins me on the back of his latest fundraise to discuss whats hype"
X Link 2026-01-26T19:26Z [----] followers, [----] engagements
""the best application layer companies set up the harness and how to use it for the problem that your user is trying to solve""
X Link 2026-01-24T00:18Z [----] followers, [----] engagements
"Thank you @BloombergTV for having our CEO and co-founder @tuhinone and day [--] investor @saranormous yesterday to discuss our latest fundraise the bet we're making with inference and how we're powering customers. Full interview here: https://www.bloomberg.com/news/videos/2026-01-26/ai-startup-baseten-raises-300-million-video https://www.bloomberg.com/news/videos/2026-01-26/ai-startup-baseten-raises-300-million-video"
X Link 2026-01-27T21:20Z [----] followers, [----] engagements
"Nemotron [--] Nano NVFP4 is now available on Baseten + NVIDIA B200 BF16-level accuracy up to [--] higher throughput vs FP8 and faster inference powered by QAD + Blackwell running on the Baseten Inference Stack. https://www.baseten.co/library/nvidia-nemotron-3-nano/ https://www.baseten.co/library/nvidia-nemotron-3-nano/"
X Link 2026-01-28T17:51Z [----] followers, [----] engagements
"Our CEO and co-founder @tuhinone sat down with Axios to discuss how we're using our latest funding to build an inference-native cloud that owns the full inference-data-eval-RL loop and why our recent acquisition of Parsed is just the beginning as we continue to pursue aligned talent and capabilities. Full interview here: https://www.axios.com/pro/enterprise-software-deals/2026/01/29/baseten-acquisitions-ai-inference https://www.axios.com/pro/enterprise-software-deals/2026/01/29/baseten-acquisitions-ai-inference"
X Link 2026-02-02T18:11Z [----] followers, [---] engagements
"Who wants to take the 30b parameter Alpaca model for a ride Announcement coming tomorrow"
X Link 2023-03-20T01:55Z [----] followers, [----] engagements
"π Were really excited to be announcing BaseTen today. BaseTen is the fastest way to build applications powered by machine learning. Check it out yourself https://www.baseten.co/blog https://www.baseten.co/blog"
X Link 2021-05-20T16:00Z [----] followers, [---] engagements
"We're excited to announce that we've raised a $40M Series B to help power the next generation of AI-native products with performant reliable and scalable inference infrastructure. https://www.baseten.co/blog/announcing-our-series-b/ https://www.baseten.co/blog/announcing-our-series-b/"
X Link 2024-03-04T16:01Z [----] followers, 82.7K engagements
"We're excited to introduce our new Engine Builder for TensorRT-LLM π Same great @nvidia TensorRT-LLM performance90% less effort. Check out our launch post to learn more: Or @philip_kiely's full video: We often use TensorRT-LLM to support custom models for teams like @Get_Writer. For their latest industry-leading Palmyra LLMs TensorRT-LLM inference engines deployed on Baseten achieved 60% higher tokens per second. We've used TensorRT-LLM to achieve results including: π 3x better throughput π 40% lowertime to first token π 35% lowercost per million tokens While TensorRT-LLM is incredibly"
X Link 2024-08-01T16:30Z [----] followers, [----] engagements
"Another week another model drop Voxtral was released last week and you can now deploy it on Baseten. Transcription workloads are our bread and butter here at Baseten. Weve built a specific runtime for transcription workloads which now powers Voxtral"
X Link 2025-07-22T22:49Z [----] followers, [----] engagements
"Only a handful of models dominated the ASR spaceuntil now. Voxtral has a 30-minute transcription range a 40-minute range for understanding plus built-in function calling for voice out of the box. @thealexker breaks down the technical details"
X Link 2025-07-23T19:22Z [----] followers, [----] engagements
"Forget AI writing your code. AI can now control your home through voice. Weve had a blast putting Voxtral through the paces this week. Mistrals new model delivers see for yourself on Baseten. Our latest blog dives into whats so unique (and powerful) about Voxtral and how you can deploy it to build production-grade apps. https://twitter.com/i/web/status/1948519816312357326 https://twitter.com/i/web/status/1948519816312357326"
X Link 2025-07-24T23:05Z [----] followers, [----] engagements
"We're thrilled to welcome Joey Zwicker as our new Head of Forward Deployed Engineering We've grown rapidly over the last few years and we're excited to have Joey lead the team into our next phase. We're hiring FDEs everywhere -- if you're interested reach out"
X Link 2025-08-11T20:37Z [----] followers, 10.3K engagements
"Thanks @NVIDIAAI for inviting us to Dynamo Day We're active users of Dynamo iterating on it in production for performance gains like 50% lower TTFT and 34% lower TPOT and regularly shipping our work back to the community. Read some of our highlights from Dynamo Day and working with NVIDIA Dynamo here: https://www.baseten.co/blog/nvidia-dynamo-day-baseten-inference-stack/ https://www.baseten.co/blog/nvidia-dynamo-day-baseten-inference-stack/"
X Link 2026-02-03T17:38Z [----] followers, [----] engagements
"The best OpenClawπ¦ setup is fully open-source. Kimi K2.5 on Baseten outperforms Opus [---] on agentic benchmarks at 8x lower cost. Faster inference same or better quality. Set up in [--] minutes here: https://www.baseten.co/blog/openclaw-kimi-k2-5-on-baseten-frontier-agent-performance-with-oss/ https://www.baseten.co/blog/openclaw-kimi-k2-5-on-baseten-frontier-agent-performance-with-oss/"
X Link 2026-02-04T20:00Z [----] followers, [----] engagements
"Continuing this week with a case study β How did @sullyai return 30M+ clinical minutes to doctors By ditching closed-source models for a high-performance open-source stack on Baseten. Like many companies Sully faced inference challenges as they scaled with ballooning proprietary model costs and unpredictable latency. This was especially critical in Sully's case: in a live clinical setting a 70-second wait is an eternity. To solve this challenge we worked together to move to open-source models like GPT OSS 120b. With the Baseten inference stack Sully was live on NVIDIA HGX B200s just [--] days"
X Link 2026-02-10T17:03Z [----] followers, [----] engagements
"We replicated Microsoft Research's Generative Adversarial Distillation (GAD) to distill Qwen3-4B from GPT-5.2. Standard black-box distillation teaches a student to copy teacher outputs but at inference the student generates from its own prefixes small errors compound and it drifts off the expert distribution. GAD reframes this as an on-policy distillation problem training a co-evolving discriminator that provides adaptive reward signals on the student's own generations. Exploring methods like this are how our post-training team surfaces new training patterns. Read here:"
X Link 2026-02-13T19:01Z [----] followers, [---] engagements
"We replicated Microsoft Research's Generative Adversarial Distillation (GAD) to distill Qwen3-4B from GPT-5.2. Standard black-box distillation teaches a student to copy teacher outputs but at inference the student generates from its own prefixes small errors compound and it drifts off the expert distribution. GAD reframes this as an on-policy distillation problem training a co-evolving discriminator that provides adaptive reward signals on the student's own generations. Exploring methods like this are how our post-training team surfaces new training patterns. Read here:"
X Link 2026-02-13T19:27Z [----] followers, [--] engagements
"Were thrilled to introduce Chains a framework for building multi-component AI workflows on Baseten βπ Chains enables users to build complex workflows as modular services in simple Python http://x.com/i/article/1805620705716801538 http://x.com/i/article/1805620705716801538"
X Link 2024-06-27T16:28Z [----] followers, 23.9K engagements
"Welcome to Baseten @DannieHerz Were thrilled to announce that Dannie Herzberg has joined as our new President to lead Basetens GTM and operations. As @tuhinone shared: "Dannie is biased towards action dependable and long-term in her thinking and she knows that the customer experience is everything." Heres to building the next chapter of Baseten with you Dannie Read more from Tuhin about Dannie here https://www.baseten.co/blog/welcoming-dannie-herzberg-to-baseten/ https://www.baseten.co/blog/welcoming-dannie-herzberg-to-baseten/"
X Link 2025-08-27T22:02Z [----] followers, 97.2K engagements
"LLMs are amnesiacs. Once context fills up they forget everything. To fight this means grappling with a core question: how do you update a neural network without breaking what it already knows In this piece @charles0neill and @part_harry_ argue that continual learning is inseparable from specialization. While there are various ideas to allow generalist models to learn everything without forgetting anything these ideas are fundamentally in tension with continual learning in general. What comes after monolith models A Cambrian explosion of specialists. Read more here:"
X Link 2026-02-06T17:52Z [----] followers, [----] engagements
Limited data mode. Full metrics available with subscription: lunarcrush.com/pricing
/creator/x::basetenco