#  @basetenco Baseten Baseten posts on X about inference, ai, in the, kimi the most. They currently have [-----] followers and [---] posts still getting attention that total [-----] engagements in the last [--] hours. ### Engagements: [-----] [#](/creator/twitter::1375579341178818561/interactions)  - [--] Week [-------] -91% - [--] Month [---------] +17,259% - [--] Months [---------] +629% - [--] Year [---------] +826% ### Mentions: [--] [#](/creator/twitter::1375579341178818561/posts_active)  - [--] Week [--] +58% - [--] Month [--] +760% - [--] Months [---] +94% - [--] Year [---] +149% ### Followers: [-----] [#](/creator/twitter::1375579341178818561/followers)  - [--] Week [-----] +0.95% - [--] Month [-----] +12% - [--] Months [-----] +52% - [--] Year [-----] +163% ### CreatorRank: [---------] [#](/creator/twitter::1375579341178818561/influencer_rank)  ### Social Influence **Social category influence** [technology brands](/list/technology-brands) 23% [stocks](/list/stocks) 19% [finance](/list/finance) 5% [vc firms](/list/vc-firms) 3% [cryptocurrencies](/list/cryptocurrencies) 2% [travel destinations](/list/travel-destinations) 1% [social networks](/list/social-networks) 1% [countries](/list/countries) 1% **Social topic influence** [inference](/topic/inference) #193, [ai](/topic/ai) 11%, [in the](/topic/in-the) 5%, [kimi](/topic/kimi) #506, [realtime](/topic/realtime) #1832, [if you](/topic/if-you) 4%, [llm](/topic/llm) 4%, [ceo](/topic/ceo) 4%, [microsoft](/topic/microsoft) 3%, [generative](/topic/generative) 3% **Top accounts mentioned or mentioned by** [@nvidia](/creator/undefined) [@tuhinone](/creator/undefined) [@artificialanlys](/creator/undefined) [@greylockvc](/creator/undefined) [@nvidiaaidev](/creator/undefined) [@omar_or_ahmed](/creator/undefined) [@koylanai](/creator/undefined) [@dannieherz](/creator/undefined) [@madisonkanna](/creator/undefined) [@oxenai](/creator/undefined) [@gregschoeninger](/creator/undefined) [@charles0neill](/creator/undefined) [@thealexker](/creator/undefined) [@getwriter](/creator/undefined) [@conviction](/creator/undefined) [@01advisors](/creator/undefined) [@ivp](/creator/undefined) [@sparkcapital](/creator/undefined) [@amiruci](/creator/undefined) [@philipkielys](/creator/undefined) **Top assets mentioned** [Microsoft Corp. (MSFT)](/topic/microsoft) [Cogito Finance (CGV)](/topic/cogito) [Alphabet Inc Class A (GOOGL)](/topic/$googl) ### Top Social Posts Top posts by engagements in the last [--] hours "Another week another model drop Voxtral was released last week and you can now deploy it on Baseten. Transcription workloads are our bread and butter here at Baseten. Weve built a specific runtime for transcription workloads which now powers Voxtral" [X Link](https://x.com/basetenco/status/1947791177886863683) 2025-07-22T22:49Z [----] followers, [----] engagements "@koylanai We love Sully Thank you for trusting us with your inference" [X Link](https://x.com/basetenco/status/2022115530833113473) 2026-02-13T01:08Z [----] followers, [--] engagements "We replicated Microsoft Research's Generative Adversarial Distillation (GAD) to distill Qwen3-4B from GPT-5.2. Standard black-box distillation teaches a student to copy teacher outputs but at inference the student generates from its own prefixes small errors compound and it drifts off the expert distribution. GAD reframes this as an on-policy distillation problem training a co-evolving discriminator that provides adaptive reward signals on the student's own generations. Exploring methods like this are how our post-training team surfaces new training patterns. Read here:" [X Link](https://x.com/basetenco/status/2022385713602609427) 2026-02-13T19:01Z [----] followers, [---] engagements "We replicated Microsoft Research's Generative Adversarial Distillation (GAD) to distill Qwen3-4B from GPT-5.2. Standard black-box distillation teaches a student to copy teacher outputs but at inference the student generates from its own prefixes small errors compound and it drifts off the expert distribution. GAD reframes this as an on-policy distillation problem training a co-evolving discriminator that provides adaptive reward signals on the student's own generations. Exploring methods like this are how our post-training team surfaces new training patterns. Read here:" [X Link](https://x.com/basetenco/status/2022392260386861210) 2026-02-13T19:27Z [----] followers, [--] engagements "Welcome to Baseten @DannieHerz Were thrilled to announce that Dannie Herzberg has joined as our new President to lead Basetens GTM and operations. As @tuhinone shared: "Dannie is biased towards action dependable and long-term in her thinking and she knows that the customer experience is everything." Heres to building the next chapter of Baseten with you Dannie Read more from Tuhin about Dannie here https://www.baseten.co/blog/welcoming-dannie-herzberg-to-baseten/ https://www.baseten.co/blog/welcoming-dannie-herzberg-to-baseten/" [X Link](https://x.com/basetenco/status/1960825166264721862) 2025-08-27T22:02Z [----] followers, 97.2K engagements "We boosted acceptance rate by up to 40% with the Baseten Speculation Engine. How By combining Multi-Token Prediction (MTP) with Suffix Automaton (SA) decoding. This hybrid approach crushes production coding workloads delivering 30%+ longer acceptance lengths on code editing tasks with zero added overhead. An open source version for TensorRT-LLM is now available to the community. Read the full engineering deep dive: https://www.baseten.co/blog/boosting-mtp-acceptance-rates-in-baseten-speculation-engine/ https://www.baseten.co/blog/boosting-mtp-acceptance-rates-in-baseten-speculation-engine/" [X Link](https://x.com/basetenco/status/2016235945662808433) 2026-01-27T19:44Z [----] followers, 13.5K engagements "RT @koylanai: I've never been this excited for a mission Solving the healthcare crisis by freeing clinicians from routine tasks using mu" [X Link](https://x.com/basetenco/status/2022115604174651606) 2026-02-13T01:08Z [----] followers, [--] engagements "Introducing Kimi K2.5 on Basetens Model APIs with the most performant TTFT (0.26 sec) and TPS (340) on Artificial Analysis. Even among a landscape of incredible open source models Kimi K2.5 stands out with its multi-modal capabilities and it's ability to accommodate an alarmingly large number of tool calls. Get the good stuff here: https://www.baseten.co/library/kimi-k25/ https://www.baseten.co/library/kimi-k25/" [X Link](https://x.com/anyuser/status/2021243980802031900) 2026-02-10T15:24Z [----] followers, 13.7K engagements "Following up on yesterday's release π¨ How did we build the fastest Kimi K2.5 inference Custom EAGLE-3 speculator trained on synthetic query dataset INT4 to NVFP4 conversion to unlock Blackwell inference Get the technical details: https://www.baseten.co/blog/how-we-built-the-fastest-kimi-k2-5-on-artificial-analysis/ Introducing Kimi K2.5 on Basetens Model APIs with the most performant TTFT (0.26 sec) and TPS (340) on Artificial Analysis. Even among a landscape of incredible open source models Kimi K2.5 stands out with its multi-modal capabilities and it's ability to accommodate an" [X Link](https://x.com/anyuser/status/2021609618972020822) 2026-02-11T15:37Z [----] followers, [----] engagements "Ready to cook π³ New training recipe in the Baseten ML Cookbook: GLM [---] and [---] Flash Fine tune the leading multimodal LLMs which are great for customer-facing chatbots real-time coding assistants and more. What you get: config.py for infra setup run.sh for training Easily plug in with HF datasets and just launch the command: truss train push GLM [---] recipe: GLM [---] Flash recipe here: How to cook: https://github.com/basetenlabs/ml-cookbook/tree/main#prerequisites https://github.com/basetenlabs/ml-cookbook/tree/main/examples/glm-4.7-flash-msswift/training" [X Link](https://x.com/basetenco/status/2018781931115356327) 2026-02-03T20:21Z [----] followers, [---] engagements "MARS-Flash is now available on Baseten. If you know Baseten you know were obsessed with speed. Enter MARS-Flash. MARS-Flash is a TTS model from @useCamb_AI built for low latency real-time voice agents and assistants. The Camb.ai team built the MARS8 family of models to solve the specific pain points of TTS workloads with models built for each specific use case. Get access here: https://www.baseten.co/library/mars8-flash/ https://www.baseten.co/library/mars8-flash/" [X Link](https://x.com/anyuser/status/2019123796892950725) 2026-02-04T19:00Z [----] followers, [---] engagements "LLMs display human-like behavior with Karpathy once describing them as stochastic "people spirits." This makes them notoriously hard to benchmark leading most teams to skip this step entirely and ship a poorly optimized model. We wrote a two-part series on performance benchmarking we wish existed when we started. In this first chapter learn how to run InferenceMAX and build your own benchmark tailored to your workload. Read it here: https://www.baseten.co/blog/how-to-run-llm-performance-benchmarks-and-why-you-should/" [X Link](https://x.com/anyuser/status/2019440923479023959) 2026-02-05T16:00Z [----] followers, [---] engagements "Whats the connection between LLM fine-tuning and Hollywood It turns out that there are many from VFX tooling to branding and advertising. @Madisonkanna sits down with @oxen_ai founder and CEO @gregschoeninger to discuss dataset version control infrastructure fine-tuning and deploying custom models end-to-end. Read the interview here: http://baseten.co/blog/fine-tuning-models-ai-and-hollywood-a-conversation-with-oxen-s-founder-greg http://baseten.co/blog/fine-tuning-models-ai-and-hollywood-a-conversation-with-oxen-s-founder-greg" [X Link](https://x.com/basetenco/status/2019456028346348014) 2026-02-05T17:00Z [----] followers, [---] engagements "LLMs are amnesiacs. Once context fills up they forget everything. To fight this means grappling with a core question: how do you update a neural network without breaking what it already knows In this piece @charles0neill and @part_harry_ argue that continual learning is inseparable from specialization. While there are various ideas to allow generalist models to learn everything without forgetting anything these ideas are fundamentally in tension with continual learning in general. What comes after monolith models A Cambrian explosion of specialists. Read more here:" [X Link](https://x.com/anyuser/status/2019831540709257325) 2026-02-06T17:52Z [----] followers, [----] engagements "RL often throws away useful signal at intermediate steps or as @karpathy put it it's like "sucking supervision through a straw." MiniMax M2.5 solves this with per-token process rewards. The result is frontier coding performance at least 1/10th the cost of closed source. @thealexker breaks down how this mechanism works and how M2.5 excels in general knowledge work. Read about it here: https://www.baseten.co/blog/minimax-m2-5-intelligence-too-cheap-to-meter-rl-process-rewards-real-world-produc/" [X Link](https://x.com/basetenco/status/2022456010049495213) 2026-02-13T23:41Z [----] followers, [----] engagements "Were thrilled to announce that we have raised $300M at a $5B valuation. The round is led by IVP and CapitalG both doubling down on their investment in Baseten and joined by 01A Altimeter Battery Ventures BOND BoxGroup Blackbird Ventures Conviction Greylock and NVIDIA. Read more here: https://www.baseten.co/blog/announcing-baseten-s-300m-series-e/ https://www.baseten.co/blog/announcing-baseten-s-300m-series-e/" [X Link](https://x.com/anyuser/status/2014755013344792595) 2026-01-23T17:40Z [----] followers, 266.2K engagements "Thanks @NVIDIAAI for inviting us to Dynamo Day We're active users of Dynamo iterating on it in production for performance gains like 50% lower TTFT and 34% lower TPOT and regularly shipping our work back to the community. Read some of our highlights from Dynamo Day and working with NVIDIA Dynamo here: https://www.baseten.co/blog/nvidia-dynamo-day-baseten-inference-stack/ https://www.baseten.co/blog/nvidia-dynamo-day-baseten-inference-stack/" [X Link](https://x.com/anyuser/status/2018740972658864598) 2026-02-03T17:38Z [----] followers, [----] engagements "The best OpenClawπ¦ setup is fully open-source. Kimi K2.5 on Baseten outperforms Opus [---] on agentic benchmarks at 8x lower cost. Faster inference same or better quality. Set up in [--] minutes here: https://www.baseten.co/blog/openclaw-kimi-k2-5-on-baseten-frontier-agent-performance-with-oss/ https://www.baseten.co/blog/openclaw-kimi-k2-5-on-baseten-frontier-agent-performance-with-oss/" [X Link](https://x.com/anyuser/status/2019138898245611617) 2026-02-04T20:00Z [----] followers, [----] engagements "We're living in the era of metric obsession. (How is your sleep score after Super Bowl weekend) π Now your metric obsession can extend to AI workloads. As model quality converges performance during inference has become a key differentiator. While AI users now expect fast responses by default it can be challenging to make sense of different benchmarks. Our latest blog post aims to give you a quick download on AI model performance metrics. Namely: What is the difference between Time to First Token (TTFT) Tokens Per Second (TPS) and End-to-End Latency (E2E) Why benchmarks can be misleading How" [X Link](https://x.com/anyuser/status/2020936990334779815) 2026-02-09T19:05Z [----] followers, [----] engagements "Continuing this week with a case study β How did @sullyai return 30M+ clinical minutes to doctors By ditching closed-source models for a high-performance open-source stack on Baseten. Like many companies Sully faced inference challenges as they scaled with ballooning proprietary model costs and unpredictable latency. This was especially critical in Sully's case: in a live clinical setting a 70-second wait is an eternity. To solve this challenge we worked together to move to open-source models like GPT OSS 120b. With the Baseten inference stack Sully was live on NVIDIA HGX B200s just [--] days" [X Link](https://x.com/basetenco/status/2021268765141545080) 2026-02-10T17:03Z [----] followers, [----] engagements "Sully.ai is transforming healthcare efficiency with Basetens Model APIs running frontier open models like gpt-oss-120b. We delivered a 10x cost reduction and 65% faster responses for clinical note generation thanks to the Baseten Inference Stack. Under the hood it leverages NVFP4 components of TensorRT-LLM and Dynamo and the Baseten Speculation Engineall running on @NVIDIA Blackwell GPUs. The result: 30+ million minutes returned to physicians. It means more time for doctor-patient conversations and less for paperwork. Read the blog to learn more" [X Link](https://x.com/anyuser/status/2021985164776419437) 2026-02-12T16:30Z [----] followers, [----] engagements "We replicated Microsoft Research's Generative Adversarial Distillation (GAD) to distill Qwen3-4B from GPT-5.2. Standard black-box distillation teaches a student to copy teacher outputs but at inference the student generates from its own prefixes small errors compound and it drifts off the expert distribution. GAD reframes this as an on-policy distillation problem training a co-evolving discriminator that provides adaptive reward signals on the student's own generations. Exploring methods like this are how our post-training team surfaces new training patterns. Read here:" [X Link](https://x.com/anyuser/status/2022393468405035422) 2026-02-13T19:32Z [----] followers, [----] engagements "Just because it's a federal holiday doesn't mean we're slacking. MiniMax M2.5 is live on our Model APIs. Try it here: https://baseten.co/library/minimax-m2-5/ https://baseten.co/library/minimax-m2-5/" [X Link](https://x.com/anyuser/status/2023451899815686261) 2026-02-16T17:38Z [----] followers, [----] engagements "If youre building with DeepSeek this is your roadmap to performant reliable cost-efficient inference. Read our full guide here: https://www.baseten.co/resources/guide/the-complete-deepseek-model-guide/ https://www.baseten.co/resources/guide/the-complete-deepseek-model-guide/" [X Link](https://x.com/basetenco/status/1960537282055721093) 2025-08-27T02:58Z [----] followers, [---] engagements "Our team met Parsed a few months ago and we could not be more excited to see the inflection point they are a part of - customized models built for those high impact jobs. This is an incredible team and we're thrilled to power their inference. Congrats @parsedlabs. Let's build. Today were launching Parsed. We are incredibly lucky to live in a world where we stand on the shoulders of giants first in science and now in AI. Our heroes have gotten us to this point where we have brilliant general intelligence in our pocket. But this is a local minima. We https://t.co/R7cR3EGVHT Today were launching" [X Link](https://x.com/basetenco/status/1961113518348145128) 2025-08-28T17:07Z [----] followers, [----] engagements "We're excited to announce Fall into Inference: a multi-month deep dive into our cloud ecosystem and how we use Multi-cloud Capacity Management (MCM) to power fast reliable inference at scale. Over the next few months well showcase how we use MCM to power real-world AI use cases with partners including Google Cloud Amazon Web Services (AWS) OCI CoreWeave Nebius Vultr and NVIDIA. Stay tuned for weekly technical blogs case studies and deep dives with our partners" [X Link](https://x.com/basetenco/status/1963290554286154187) 2025-09-03T17:18Z [----] followers, [----] engagements "We raised a $150M Series D Thank you to all of our customers who trust us to power their inference. We're grateful to work with incredible companies like @Get_Writer @zeddotdev @clay_gtm @trymirage @AbridgeHQ @EvidenceOpen @MeetGamma @Sourcegraph and @usebland. This round was led by @bondcap with @jaysimons joining our Board. We're also thrilled to welcome @conviction and @CapitalG to the round alongside support from @01Advisors @IVP @sparkcapital @GreylockVC @ScribbleVC @BoxGroup and Premji Invest. Today were excited to announce our $150M Series D led by BOND with Jay Simons joining our" [X Link](https://x.com/basetenco/status/1963981711647379653) 2025-09-05T15:05Z [----] followers, 19.5K engagements "AI everywhere = Inference everywhere = Baseten everywhere IN NEWS: @basetenco raises a $150M series D round. @tuhinone (Founder & CEO Baseten) on the future of inference: I think the token price goes down and inference should get cheaper over time. And that really just means there is going to be more inference. Every time we https://t.co/oKplA7BIOY IN NEWS: @basetenco raises a $150M series D round. @tuhinone (Founder & CEO Baseten) on the future of inference: I think the token price goes down and inference should get cheaper over time. And that really just means there is going to be more" [X Link](https://x.com/basetenco/status/1964098651107532909) 2025-09-05T22:49Z [----] followers, [----] engagements "We just raised a $150M Series D and were growing If you're looking for your next opportunity take a look at our 30+ open roles across engineering and GTM" [X Link](https://x.com/basetenco/status/1965160538188775741) 2025-09-08T21:09Z [----] followers, [---] engagements "@Alibaba_Qwen (Gated) Attention is all you need. Excited to offer both Qwen3-Next models on dedicated deployments backed by 4xH100 GPUs. https://app.baseten.co/deploy/qwen_3_next_80B_A3_thinking https://www.baseten.co/library/qwen3-next-80b-a3b-instruct/ https://app.baseten.co/deploy/qwen_3_next_80B_A3_thinking https://www.baseten.co/library/qwen3-next-80b-a3b-instruct/" [X Link](https://x.com/basetenco/status/1966224960223158768) 2025-09-11T19:38Z [----] followers, [----] engagements "Qwen3 Next 80B A3B Thinking outperforms higher-cost and closed models like Gemini [---] Flash Thinking on benchmarks nearing Qwen's flagship model quality at a fraction the size. We have it ready to deploy in our model library running on @nvidia and the Baseten Inference Stack" [X Link](https://x.com/basetenco/status/1967688601640288288) 2025-09-15T20:34Z [----] followers, [----] engagements "The key is having good intuition being willing to go out on a limb building fast learning fast and killing things when you need to. Following our Series D raise our Co-founder and CTO @amiruci walks through why he bet early on inference how were scaling through generative model hypergrowth and his advice for fellow founders" [X Link](https://x.com/basetenco/status/1968009140896497950) 2025-09-16T17:48Z [----] followers, [----] engagements "Well be at SigSum SF this Thursday Sept [--] Catch: - @philip_kiely's talk "Inference Engineering for Hypergrowth" (1 PM) - @tuhinone on the panel "Breaking Building and Betting on AI" (3:30 PM) Visit us in the partner showcase to grab an "Artificially Intelligent" tee" [X Link](https://x.com/basetenco/status/1970255818563207200) 2025-09-22T22:36Z [----] followers, [---] engagements "@rohanpaul_ai someone needs to run the inference and make it fast. we can help with that" [X Link](https://x.com/basetenco/status/1970621011075936433) 2025-09-23T22:47Z [----] followers, [----] engagements "Were hosting our friends at @OpenRouterAI for a SF Tech Week breakfast talk Join us at Baseten HQ on October [--] at 10AM for Learnings from processing [--] Trillion Tokens" [X Link](https://x.com/basetenco/status/1972792659912830988) 2025-09-29T22:36Z [----] followers, [---] engagements "The team at OpenRouter will dive into: Closed vs. open model adoption Global usage trends from running inference at massive scale Tool calling & pricing shifts Seats are limited. Save yours here: https://partiful.com/e/q6l1SeDtPGU9kCQPArk6 https://partiful.com/e/q6l1SeDtPGU9kCQPArk6" [X Link](https://x.com/basetenco/status/1972792662295240964) 2025-09-29T22:36Z [----] followers, [---] engagements "From document processing and image recognition to drug discovery healthcare use cases are at the forefront of AI adoption. We partner with teams like Vultr to support these applications with fast reliable inference. With Multi-cloud Capacity Management and theBaseten Inference Stack we power near-limitless scale for healthcare AI teams on NVIDIA Blackwell GPUs" [X Link](https://x.com/basetenco/status/1973485892607062479) 2025-10-01T20:31Z [----] followers, [---] engagements "Embeddings power search RecSys and agents but making them performant in production requires satisfying two different traffic profiles. In our new guide we cover how to build embedding workflows that are both extremely high-throughput and low-latency from indexing millions of data points to serving individual search queries in milliseconds" [X Link](https://x.com/basetenco/status/1973840655001792785) 2025-10-02T20:00Z [----] followers, [---] engagements "Read it here: https://www.baseten.co/resources/guide/high-performance-embedding-model-inference https://www.baseten.co/resources/guide/high-performance-embedding-model-inference" [X Link](https://x.com/basetenco/status/1973840656541130821) 2025-10-02T20:00Z [----] followers, [---] engagements "Being fast for one customer isn't enough. Low-latency inference at scale requires the ability to recruit every GPU in the world" [X Link](https://x.com/basetenco/status/1975635330201264518) 2025-10-07T18:52Z [----] followers, [---] engagements "Fast models for our fast friends at Factory Deploy and serve custom models with enterprise-grade infrastructure on @basetenco. Special promo for Factory users: receive $500 Model API credits when you fill out this form. https://t.co/UI8NqfACDY Deploy and serve custom models with enterprise-grade infrastructure on @basetenco. Special promo for Factory users: receive $500 Model API credits when you fill out this form. https://t.co/UI8NqfACDY" [X Link](https://x.com/basetenco/status/1975692789838192955) 2025-10-07T22:40Z [----] followers, [----] engagements "Register here: https://events.redis.io/redis-released-london-2025 https://events.redis.io/redis-released-london-2025" [X Link](https://x.com/basetenco/status/1975953278832976009) 2025-10-08T15:55Z [----] followers, [---] engagements "Were excited to launch Metas Llama [--] in our model library in both 8B and 70B π The newly introduced Llama [--] is a significant improvement over Llama [--] with increased tokens and reduced false refusal rates. These models deliver unparalleled performance showcasing significant advancements in efficiency and speed. Our Llama [--] 8B runs on A100s and Llama [--] 70B runs on H100s optimized for production. https://twitter.com/i/web/status/1781072277850714184 https://twitter.com/i/web/status/1781072277850714184" [X Link](https://x.com/basetenco/status/1781072277850714184) 2024-04-18T21:28Z [----] followers, 131.7K engagements "Meet the Baseten team at the @aiDotEngineer Summit in NYC this week π Booth G3 get a demo and grab some swag" [X Link](https://x.com/basetenco/status/1891991000244982045) 2025-02-18T23:19Z [----] followers, [---] engagements "@IVP @sparkcapital @GreylockVC @conviction @basecasevc @southpkcommons @Lachy @01Advisors https://www.baseten.co/blog/announcing-baseten-75m-series-c/ https://www.baseten.co/blog/announcing-baseten-75m-series-c/" [X Link](https://x.com/basetenco/status/1892259288673865781) 2025-02-19T17:05Z [----] followers, [----] engagements "Friendly reminder from @willreed_21 (Spark Capital): Your team's time is best spent on your product not the infrastructure it runs on" [X Link](https://x.com/basetenco/status/1937939408919126239) 2025-06-25T18:22Z [----] followers, [---] engagements "If you're in London catch Rachel Rapp with our friends from Tavily and cognee at Redis Released. From building and deploying the fastest agentic systems to industry trends they'll break down what the agentic tech stack looks like in a live panel this Thursday" [X Link](https://x.com/basetenco/status/1975953274454130750) 2025-10-08T15:55Z [----] followers, [---] engagements "We caught up with the one and only @thdxr on Opencode's newly launched Zen and his hot takes Zen isnt a for-profit thing. This is something we try to do at breakeven. As we grow we pool all of our resources together and negotiate discounted rates with providers. These cost savings flow back right down to everyone" [X Link](https://x.com/basetenco/status/1976732163619070461) 2025-10-10T19:30Z [----] followers, [----] engagements "From sketch to a 3D model in under [--] seconds with a 1B parameter model We built a flower card generator using Autodesks WaLa open-source AI and Baseten for scalable GPU deployment" [X Link](https://x.com/basetenco/status/1977851561842717041) 2025-10-13T21:38Z [----] followers, [---] engagements "Fast Company named Baseten one of the [--] Next Big Things in Tech [----] Were proud to be recognized for powering the fastest and most reliable inference for the fastest-growing AI companies like Abridge Clay OpenEvidence and many more" [X Link](https://x.com/basetenco/status/1978175416750772634) 2025-10-14T19:05Z [----] followers, [----] engagements "Powering inference for the fastest growing AI companies like OpenEvidence Writer and Clay means being the first to use bleeding-edge model performance tooling in production. That's why we were early adopters of NVIDIA Dynamo giving us 50% lower latency and 60%+ higher throughput with KV cache-aware routing. These results are the tip of the iceberg especially for our customers running large models with large context windows under heavy load" [X Link](https://x.com/basetenco/status/1978883986924634551) 2025-10-16T18:01Z [----] followers, [----] engagements "See the benchmarks in our blog by @aqaderb @feilsystem and @rapprach: https://www.baseten.co/blog/how-baseten-achieved-2x-faster-inference-with-nvidia-dynamo/#how-baseten-uses-nvidia-dynamo https://www.baseten.co/blog/how-baseten-achieved-2x-faster-inference-with-nvidia-dynamo/#how-baseten-uses-nvidia-dynamo" [X Link](https://x.com/basetenco/status/1978883989269307812) 2025-10-16T18:01Z [----] followers, [---] engagements "We unleashed our model performance team on GLM [---] and were very excited to be the fastest provider available today on Artificial analysis at [---] TPS (2x the next best TPS) and a less than [--] second TTFT (2x the next best TTFT). https://artificialanalysis.ai/models/glm-4-6-reasoning/providers https://artificialanalysis.ai/models/glm-4-6-reasoning/providers" [X Link](https://x.com/basetenco/status/1979299403828806053) 2025-10-17T21:32Z [----] followers, 11.2K engagements "We see the massive AWS outage. Baseten web app is down but inference new deploys training jobs and the model management APIs are unaffected" [X Link](https://x.com/basetenco/status/1980191414031138868) 2025-10-20T08:36Z [----] followers, [----] engagements "@DannieHerz @jeffbarg @ClayRunHQ The clay slackmoji in the b10 slack has been getting a lot of play recently" [X Link](https://x.com/basetenco/status/1981484896720978350) 2025-10-23T22:16Z [----] followers, [--] engagements "DeepSeek-OCR stunned the internet this week with 10x more efficient compression unlocking faster and cheaper intelligence. We rolled out performant inference support on day one of the model drop. Learn why compressions are so effective at making models smarter what applications you can build with DeepSeek-OCR and how to serve it on Baseten in under [--] minutes. Link in the replies" [X Link](https://x.com/basetenco/status/1981513042010489305) 2025-10-24T00:08Z [----] followers, [----] engagements "This week Baseten's model performance team unlocked the fastest TPS and TTFT for gpt-oss 120b on @nvidia hardware. When gpt-oss launched we sprinted to offer it at [---] TPS. now we've exceeded [---] TPS and [----] sec TTFT. and we'll keep working to keep raising the bar. We are proud to offer the best E2E latency available with near-limitless scale incredible performance and the highest uptime 99.99%" [X Link](https://x.com/basetenco/status/1981757270053494806) 2025-10-24T16:18Z [----] followers, 42.4K engagements "We are so excited to be a launch partner for @nvidia Nemotron Nano [--] VL today and offer day-zero support for this highly accurate and efficient vision language model alongside other models in the Nemotron family. To learn more read our blog here https://www.baseten.co/blog/high-performance-agents-for-financial-services-with-nvidia-nemotron-on-baseten/ https://www.baseten.co/blog/high-performance-agents-for-financial-services-with-nvidia-nemotron-on-baseten/" [X Link](https://x.com/basetenco/status/1983243273171845596) 2025-10-28T18:43Z [----] followers, [----] engagements "After months of feedback from our early customers and thousands of jobs completed Baseten Training is officially ready for everyone. π Access compute on demand train any model run multi-node jobs and deploy from checkpoints with cache-aware scheduling an ML Cookbook tool calling recipes and more" [X Link](https://x.com/basetenco/status/1983958807353934180) 2025-10-30T18:06Z [----] followers, 13.7K engagements "@james_weitzman @athleticKoder β" [X Link](https://x.com/basetenco/status/1985491806004666611) 2025-11-03T23:38Z [----] followers, [--] engagements "Fun fact - we asked people to describe their favorite agent in SF. We got suggestions for a bunch of new agentic apps to try. Our favorite agent Probably James Bond. If youre living in the new world of agentic AI check out our new deep dive on tool calling in inference. Check out the blog in the comments" [X Link](https://x.com/basetenco/status/1986211245268111827) 2025-11-05T23:17Z [----] followers, [---] engagements "Blog: https://www.baseten.co/blog/tool-calling-in-inference/utm_source=twitter&utm_medium=social&utm_campaign=education_tool_calling_blog_2025-11-05 https://www.baseten.co/blog/tool-calling-in-inference/utm_source=twitter&utm_medium=social&utm_campaign=education_tool_calling_blog_2025-11-05" [X Link](https://x.com/basetenco/status/1986211301782397437) 2025-11-05T23:17Z [----] followers, [---] engagements "Congratulations to @Kimi_Moonshot on their newest model drop Kimi-K2 Thinking one of the worlds most advanced open source models. Baseten is proud to offer Day [--] Support. Sign up with your business email address and get $100 in Model API credits" [X Link](https://x.com/basetenco/status/1986821080800190753) 2025-11-07T15:40Z [----] followers, 270.4K engagements "Heading to KubeCon next week Come visit the team at Booth #631 to test your AI knowledge. Top of the leaderboard gets prizes π" [X Link](https://x.com/basetenco/status/1986915753434718571) 2025-11-07T21:56Z [----] followers, [---] engagements "Excited to share this piece from @VentureBeat spotlighting how Baseten is redefining the AI infrastructure game: Baseten takes on hyperscalers with new AI training platform that lets you own your model weights. Thanks VentureBeat Read full article https://venturebeat.com/ai/baseten-takes-on-hyperscalers-with-new-ai-training-platform-that-lets-you https://venturebeat.com/ai/baseten-takes-on-hyperscalers-with-new-ai-training-platform-that-lets-you" [X Link](https://x.com/basetenco/status/1987943307532476746) 2025-11-10T17:59Z [----] followers, [----] engagements "At @KubeCon_ Swing by Booth #631 to test your inference knowledge and earn some swag" [X Link](https://x.com/basetenco/status/1988322193479192921) 2025-11-11T19:05Z [----] followers, [---] engagements "Congrats to the World Labs team on the launch today Marble lets you create 3D worlds from just a single image text prompt video or 3D layout. We couldn't be more excited to power the inference behind this. Can't wait to see what everyone makes. π₯ Introducing Marble by World Labs: a foundation for a spatially intelligent future. Create your world at https://t.co/V267VJu1H9 https://t.co/T00mtETmCA Introducing Marble by World Labs: a foundation for a spatially intelligent future. Create your world at https://t.co/V267VJu1H9 https://t.co/T00mtETmCA" [X Link](https://x.com/basetenco/status/1988662949083566349) 2025-11-12T17:39Z [----] followers, [----] engagements "Welcome to the new age Defense Against the Dark Arts. It's called fast inference (& Harry Potter would be jealous). Check out our deep dive on how the Baseten wizards (model performance team) optimized Kimi K2 Thinking (now faster and just as smart as GPT-5). https://www.baseten.co/blog/kimi-k2-thinking-at-140-tps-on-nvidia-blackwell/utm_source=twitter&utm_medium=social&utm_campaign=awareness_kimi-k2-thinking_performance_blog_2025-11-12" [X Link](https://x.com/basetenco/status/1988710905706680760) 2025-11-12T20:50Z [----] followers, [----] engagements "Baseten used @nvidia Dynamo to double inference speed for long-context code generation and increased throughput by 1.6x. Dynamo simplifies multi-node inference on Kubernetes helping us scale deployments while reducing costs. Read the full blog post belowπ β NVIDIA Dynamo is now available across major cloud providersincluding @awscloud @googlecloud @Azure and @OracleCloudto enable efficient multi-node inference on Kubernetes in the cloud. And Its already delivering results: @basetenco is seeing faster more cost-effective https://t.co/6efirNmK3r β NVIDIA Dynamo is now available across major" [X Link](https://x.com/basetenco/status/1989058852789317717) 2025-11-13T19:52Z [----] followers, [----] engagements "Working with the @GammaApp team never quite feels like work and thats how their product feels. "Criminally fun." We are honored to be long-term partners and power Gammas inference needs as they push the envelope on how we present ideas. Congratulations on the Series B" [X Link](https://x.com/basetenco/status/1989091556201218127) 2025-11-13T22:02Z [----] followers, 22.8K engagements "Shoutout to the incredible team at @oxen_ai Turning datasets deployed models like its light work. They build fast. We help them ship even faster. Thanks for the partnership @gregschoeninger Check out the story in the comments #AI #MLOps #Baseten" [X Link](https://x.com/basetenco/status/1990894920106680832) 2025-11-18T21:28Z [----] followers, [----] engagements "@drishanarora Congratulations on the launch Excited to support with dedicated deployments of Cogito V2.1. https://www.baseten.co/library/cogito-v2-1-671b/ https://www.baseten.co/library/cogito-v2-1-671b/" [X Link](https://x.com/basetenco/status/1991208966362140871) 2025-11-19T18:16Z [----] followers, [----] engagements "Congrats to our friends at Deep Cogito on launching the most powerful US-based OSS model. It turns out LLM self play produces shorter reasoning chains (low token consumption) while maintaining great performance Try it out on Baseten today: https://www.baseten.co/library/cogito-v2-1-671b/ Today we are releasing the best open-weight LLM by a US company: Cogito v2.1 671B. On most industry benchmarks and our internal evals the model performs competitively with frontier closed and open models while being ahead of any US open model (such as the best versions of https://t.co/F6eZnn8s2Q" [X Link](https://x.com/basetenco/status/1991249966958841950) 2025-11-19T20:59Z [----] followers, [----] engagements "If you need an adrenaline rush to wake up from your post-Thanksgiving stupor we got you. @deepseek_ai V3.2 dropped this week and is now available on Baseten. Its so smart your mother will ask why you can't be more like DeepSeek. V3.2 is currently on par with GPT-5 all whilst being multiples cheaper. V3.2 is now live on our Model APIs and on @OpenRouterAI and @ArtificialAnlys. Baseten is the fastest provider with [----] TTFT and [---] tps (thats 1.5x faster than the next guy). For a model this size its screaming. Get the brains without trading off performance" [X Link](https://x.com/basetenco/status/1996623218040254793) 2025-12-04T16:50Z [----] followers, [----] engagements "We're excited to partner with @getstream_io to help developers build fast production-ready Vision Agents. Together we combined Baseten-hosted Qwen3-VL with Streams real-time voice and video to create an Electronics Setup & Repair Assistant that can see understand and guide users in real time. Check out the full walkthrough and demo below Vision Agents just got better: @basetenco joins us to take multimodal capabilities even further. Our team worked together to create a guide on running models hosted on Baseten with Vision Agents. Check out our blog post where we use Baseten + Qwen 3-VL" [X Link](https://x.com/basetenco/status/1997024685192610238) 2025-12-05T19:26Z [----] followers, [----] engagements ""We want people to own their own intelligence and we now see a really straight shot to get there." @amiruci sits down with @mudithj_ and @charles0neill from the Parsed team. Check out the full fireside chat in the comments" [X Link](https://x.com/basetenco/status/1999240802992562624) 2025-12-11T22:12Z [----] followers, [----] engagements "Inference performance isnt just about the model. It relies on the entire inference stack. In our Inference Stack white paper we explain how Baseten uses @nvidia TensorRT LLM and Dynamo to reduce latency and increase throughput across model modalities. If you care about speed this is worth reading. https://www.baseten.co/resources/guide/the-baseten-inference-stack https://www.baseten.co/resources/guide/the-baseten-inference-stack" [X Link](https://x.com/basetenco/status/2009721846795546952) 2026-01-09T20:20Z [----] followers, [----] engagements "π We're thrilled to introduce the fastest most accurate and cost-efficient Whisper-powered transcription and diarization on the market: [----] RTFwith Whisper Large V3 Turbo Streaming transcriptionwith consistent low latency The most accurate real-time diarization 90% lower costdue to infra optimizations Used in production by companies like @NotionHQ π https://twitter.com/i/web/status/2012203547912245366 https://twitter.com/i/web/status/2012203547912245366" [X Link](https://x.com/basetenco/status/2012203547912245366) 2026-01-16T16:41Z [----] followers, [----] engagements "Want to learn about how to run high performance LLM inference at scaleOur Head of DevRel @philipkiely has the perfect talk for you during NVIDIA Dynamo Day on Jan [--]. Register here: https://nvevents.nvidia.com/dynamodayi=RNQf_gN5cXcdmLzfj_IevFS-tdC553CY https://nvevents.nvidia.com/dynamodayi=RNQf_gN5cXcdmLzfj_IevFS-tdC553CY" [X Link](https://x.com/basetenco/status/2013694085681553469) 2026-01-20T19:24Z [----] followers, [----] engagements "Were thrilled to be working with @LangChain to power the fastest way to generate production-ready agents without code. LangChains Agent Builder represents a way for non-technical knowledge workers and citizen developers to build useful things with AI. All with Baseten Inference backbone and GLM [---]. Weve written a tutorial for you to create your own in minutes" [X Link](https://x.com/basetenco/status/2014025036806627794) 2026-01-21T17:19Z [----] followers, [----] engagements "Tired of waiting for video generation Say less. We've optimized the Wan [---] runtime to hit: 3x faster inference on NVIDIA Blackwell 2.5x faster on Hopper 67% cost reduction. Read the full breakdown of our kernel optimizations and benchmarks here: https://www.baseten.co/blog/wan-2-2-video-generation-in-less-than-60-seconds/#benchmarking-methodology https://www.baseten.co/blog/wan-2-2-video-generation-in-less-than-60-seconds/#benchmarking-methodology" [X Link](https://x.com/basetenco/status/2014337303330926736) 2026-01-22T14:00Z [----] followers, [----] engagements "@lucas_dehaas so if people are wondering why there are baseten stickers tagged across sf they know you're the one to blame" [X Link](https://x.com/basetenco/status/2014776605860855930) 2026-01-23T19:05Z [----] followers, [---] engagements "@adambain grateful for your support and we're still so early π₯" [X Link](https://x.com/basetenco/status/2014796459909251154) 2026-01-23T20:24Z [----] followers, [---] engagements "LIVE Tune in to hear @tuhinone discuss our series E open source and the multi-model future on CNBC A Chinese AI model is having a real coding moment. and not just in China. Zhipu says its coding agent users are concentrated in the *US and China. @tuhinone CEO of @basetenco joins me on the back of his latest fundraise to discuss whats hype and whats real A Chinese AI model is having a real coding moment. and not just in China. Zhipu says its coding agent users are concentrated in the *US and China. @tuhinone CEO of @basetenco joins me on the back of his latest fundraise to discuss whats hype" [X Link](https://x.com/basetenco/status/2015868931928686780) 2026-01-26T19:26Z [----] followers, [----] engagements "Who wants to take the 30b parameter Alpaca model for a ride Announcement coming tomorrow" [X Link](https://x.com/basetenco/status/1637633905527492610) 2023-03-20T01:55Z [----] followers, [----] engagements "Only a handful of models dominated the ASR spaceuntil now. Voxtral has a 30-minute transcription range a 40-minute range for understanding plus built-in function calling for voice out of the box. @thealexker breaks down the technical details" [X Link](https://x.com/basetenco/status/1948101370894073980) 2025-07-23T19:22Z [----] followers, [----] engagements "Forget AI writing your code. AI can now control your home through voice. Weve had a blast putting Voxtral through the paces this week. Mistrals new model delivers see for yourself on Baseten. Our latest blog dives into whats so unique (and powerful) about Voxtral and how you can deploy it to build production-grade apps. https://twitter.com/i/web/status/1948519816312357326 https://twitter.com/i/web/status/1948519816312357326" [X Link](https://x.com/basetenco/status/1948519816312357326) 2025-07-24T23:05Z [----] followers, [----] engagements ""the best application layer companies set up the harness and how to use it for the problem that your user is trying to solve"" [X Link](https://x.com/basetenco/status/2014855297240797277) 2026-01-24T00:18Z [----] followers, [----] engagements "Thank you @BloombergTV for having our CEO and co-founder @tuhinone and day [--] investor @saranormous yesterday to discuss our latest fundraise the bet we're making with inference and how we're powering customers. Full interview here: https://www.bloomberg.com/news/videos/2026-01-26/ai-startup-baseten-raises-300-million-video https://www.bloomberg.com/news/videos/2026-01-26/ai-startup-baseten-raises-300-million-video" [X Link](https://x.com/basetenco/status/2016260125481435532) 2026-01-27T21:20Z [----] followers, [----] engagements "Nemotron [--] Nano NVFP4 is now available on Baseten + NVIDIA B200 BF16-level accuracy up to [--] higher throughput vs FP8 and faster inference powered by QAD + Blackwell running on the Baseten Inference Stack. https://www.baseten.co/library/nvidia-nemotron-3-nano/ https://www.baseten.co/library/nvidia-nemotron-3-nano/" [X Link](https://x.com/basetenco/status/2016569749635994028) 2026-01-28T17:51Z [----] followers, [----] engagements "Our CEO and co-founder @tuhinone sat down with Axios to discuss how we're using our latest funding to build an inference-native cloud that owns the full inference-data-eval-RL loop and why our recent acquisition of Parsed is just the beginning as we continue to pursue aligned talent and capabilities. Full interview here: https://www.axios.com/pro/enterprise-software-deals/2026/01/29/baseten-acquisitions-ai-inference https://www.axios.com/pro/enterprise-software-deals/2026/01/29/baseten-acquisitions-ai-inference" [X Link](https://x.com/basetenco/status/2018386781927075897) 2026-02-02T18:11Z [----] followers, [---] engagements "RT @NVIDIAAIDev: π Thank you @basetenco for being an engaged and impactful contributor in the NVIDIA Dynamo ecosystem. By running Dynamo" [X Link](https://x.com/basetenco/status/2019829175000142299) 2026-02-06T17:42Z [----] followers, [--] engagements "π Were really excited to be announcing BaseTen today. BaseTen is the fastest way to build applications powered by machine learning. Check it out yourself https://www.baseten.co/blog https://www.baseten.co/blog" [X Link](https://x.com/anyuser/status/1395409150297874437) 2021-05-20T16:00Z [----] followers, [---] engagements "Introducing Baseten Chains π Were thrilled to introduce Chains a framework for building multi-component AI workflows on Baseten βπ Chains enables users to build complex workflows as modular services in simple Python http://x.com/i/article/1805620705716801538 http://x.com/i/article/1805620705716801538" [X Link](https://x.com/anyuser/status/1806364068598432129) 2024-06-27T16:28Z [----] followers, 23.9K engagements "We're excited to introduce our new Engine Builder for TensorRT-LLM π Same great @nvidia TensorRT-LLM performance90% less effort. Check out our launch post to learn more: Or @philip_kiely's full video: We often use TensorRT-LLM to support custom models for teams like @Get_Writer. For their latest industry-leading Palmyra LLMs TensorRT-LLM inference engines deployed on Baseten achieved 60% higher tokens per second. We've used TensorRT-LLM to achieve results including: π 3x better throughput π 40% lowertime to first token π 35% lowercost per million tokens While TensorRT-LLM is incredibly" [X Link](https://x.com/anyuser/status/1819048091451859238) 2024-08-01T16:30Z [----] followers, [----] engagements "Just in time for the new year Awesome job by our model performance team to hit the top of @ArtificialAnlys for GLM [---] try it here https://www.baseten.co/library/glm-4-7/ happy holidays we just dropped the fastest GLM 4.7: 400+ TPS as benchmarked by @ArtificialAnlys https://t.co/eRv47ok1sV https://www.baseten.co/library/glm-4-7/ happy holidays we just dropped the fastest GLM 4.7: 400+ TPS as benchmarked by @ArtificialAnlys https://t.co/eRv47ok1sV" [X Link](https://x.com/basetenco/status/2005373945143590985) 2025-12-28T20:23Z [----] followers, [----] engagements "We're excited to announce that we've raised a $40M Series B to help power the next generation of AI-native products with performant reliable and scalable inference infrastructure. https://www.baseten.co/blog/announcing-our-series-b/ https://www.baseten.co/blog/announcing-our-series-b/" [X Link](https://x.com/basetenco/status/1764682602198216931) 2024-03-04T16:01Z [----] followers, 82.7K engagements "We're thrilled to welcome Joey Zwicker as our new Head of Forward Deployed Engineering We've grown rapidly over the last few years and we're excited to have Joey lead the team into our next phase. We're hiring FDEs everywhere -- if you're interested reach out" [X Link](https://x.com/basetenco/status/1955005622749106426) 2025-08-11T20:37Z [----] followers, 10.3K engagements "Nobody knows what inference means but it's provocative" [X Link](https://x.com/basetenco/status/1895489142693429630) 2025-02-28T15:00Z [----] followers, 497.8K engagements "RT @DylanAbruscato: If you were a guest on TBPN your logo will air during the Super Bowl. Here's the final frame of the ad π" [X Link](https://x.com/basetenco/status/2019097423151587684) 2026-02-04T17:15Z [----] followers, [--] engagements Limited data mode. Full metrics available with subscription: lunarcrush.com/pricing
@basetenco BasetenBaseten posts on X about inference, ai, in the, kimi the most. They currently have [-----] followers and [---] posts still getting attention that total [-----] engagements in the last [--] hours.
Social category influence technology brands 23% stocks 19% finance 5% vc firms 3% cryptocurrencies 2% travel destinations 1% social networks 1% countries 1%
Social topic influence inference #193, ai 11%, in the 5%, kimi #506, realtime #1832, if you 4%, llm 4%, ceo 4%, microsoft 3%, generative 3%
Top accounts mentioned or mentioned by @nvidia @tuhinone @artificialanlys @greylockvc @nvidiaaidev @omar_or_ahmed @koylanai @dannieherz @madisonkanna @oxenai @gregschoeninger @charles0neill @thealexker @getwriter @conviction @01advisors @ivp @sparkcapital @amiruci @philipkielys
Top assets mentioned Microsoft Corp. (MSFT) Cogito Finance (CGV) Alphabet Inc Class A (GOOGL)
Top posts by engagements in the last [--] hours
"Another week another model drop Voxtral was released last week and you can now deploy it on Baseten. Transcription workloads are our bread and butter here at Baseten. Weve built a specific runtime for transcription workloads which now powers Voxtral"
X Link 2025-07-22T22:49Z [----] followers, [----] engagements
"@koylanai We love Sully Thank you for trusting us with your inference"
X Link 2026-02-13T01:08Z [----] followers, [--] engagements
"We replicated Microsoft Research's Generative Adversarial Distillation (GAD) to distill Qwen3-4B from GPT-5.2. Standard black-box distillation teaches a student to copy teacher outputs but at inference the student generates from its own prefixes small errors compound and it drifts off the expert distribution. GAD reframes this as an on-policy distillation problem training a co-evolving discriminator that provides adaptive reward signals on the student's own generations. Exploring methods like this are how our post-training team surfaces new training patterns. Read here:"
X Link 2026-02-13T19:01Z [----] followers, [---] engagements
"We replicated Microsoft Research's Generative Adversarial Distillation (GAD) to distill Qwen3-4B from GPT-5.2. Standard black-box distillation teaches a student to copy teacher outputs but at inference the student generates from its own prefixes small errors compound and it drifts off the expert distribution. GAD reframes this as an on-policy distillation problem training a co-evolving discriminator that provides adaptive reward signals on the student's own generations. Exploring methods like this are how our post-training team surfaces new training patterns. Read here:"
X Link 2026-02-13T19:27Z [----] followers, [--] engagements
"Welcome to Baseten @DannieHerz Were thrilled to announce that Dannie Herzberg has joined as our new President to lead Basetens GTM and operations. As @tuhinone shared: "Dannie is biased towards action dependable and long-term in her thinking and she knows that the customer experience is everything." Heres to building the next chapter of Baseten with you Dannie Read more from Tuhin about Dannie here https://www.baseten.co/blog/welcoming-dannie-herzberg-to-baseten/ https://www.baseten.co/blog/welcoming-dannie-herzberg-to-baseten/"
X Link 2025-08-27T22:02Z [----] followers, 97.2K engagements
"We boosted acceptance rate by up to 40% with the Baseten Speculation Engine. How By combining Multi-Token Prediction (MTP) with Suffix Automaton (SA) decoding. This hybrid approach crushes production coding workloads delivering 30%+ longer acceptance lengths on code editing tasks with zero added overhead. An open source version for TensorRT-LLM is now available to the community. Read the full engineering deep dive: https://www.baseten.co/blog/boosting-mtp-acceptance-rates-in-baseten-speculation-engine/ https://www.baseten.co/blog/boosting-mtp-acceptance-rates-in-baseten-speculation-engine/"
X Link 2026-01-27T19:44Z [----] followers, 13.5K engagements
"RT @koylanai: I've never been this excited for a mission Solving the healthcare crisis by freeing clinicians from routine tasks using mu"
X Link 2026-02-13T01:08Z [----] followers, [--] engagements
"Introducing Kimi K2.5 on Basetens Model APIs with the most performant TTFT (0.26 sec) and TPS (340) on Artificial Analysis. Even among a landscape of incredible open source models Kimi K2.5 stands out with its multi-modal capabilities and it's ability to accommodate an alarmingly large number of tool calls. Get the good stuff here: https://www.baseten.co/library/kimi-k25/ https://www.baseten.co/library/kimi-k25/"
X Link 2026-02-10T15:24Z [----] followers, 13.7K engagements
"Following up on yesterday's release π¨ How did we build the fastest Kimi K2.5 inference Custom EAGLE-3 speculator trained on synthetic query dataset INT4 to NVFP4 conversion to unlock Blackwell inference Get the technical details: https://www.baseten.co/blog/how-we-built-the-fastest-kimi-k2-5-on-artificial-analysis/ Introducing Kimi K2.5 on Basetens Model APIs with the most performant TTFT (0.26 sec) and TPS (340) on Artificial Analysis. Even among a landscape of incredible open source models Kimi K2.5 stands out with its multi-modal capabilities and it's ability to accommodate an"
X Link 2026-02-11T15:37Z [----] followers, [----] engagements
"Ready to cook π³ New training recipe in the Baseten ML Cookbook: GLM [---] and [---] Flash Fine tune the leading multimodal LLMs which are great for customer-facing chatbots real-time coding assistants and more. What you get: config.py for infra setup run.sh for training Easily plug in with HF datasets and just launch the command: truss train push GLM [---] recipe: GLM [---] Flash recipe here: How to cook: https://github.com/basetenlabs/ml-cookbook/tree/main#prerequisites https://github.com/basetenlabs/ml-cookbook/tree/main/examples/glm-4.7-flash-msswift/training"
X Link 2026-02-03T20:21Z [----] followers, [---] engagements
"MARS-Flash is now available on Baseten. If you know Baseten you know were obsessed with speed. Enter MARS-Flash. MARS-Flash is a TTS model from @useCamb_AI built for low latency real-time voice agents and assistants. The Camb.ai team built the MARS8 family of models to solve the specific pain points of TTS workloads with models built for each specific use case. Get access here: https://www.baseten.co/library/mars8-flash/ https://www.baseten.co/library/mars8-flash/"
X Link 2026-02-04T19:00Z [----] followers, [---] engagements
"LLMs display human-like behavior with Karpathy once describing them as stochastic "people spirits." This makes them notoriously hard to benchmark leading most teams to skip this step entirely and ship a poorly optimized model. We wrote a two-part series on performance benchmarking we wish existed when we started. In this first chapter learn how to run InferenceMAX and build your own benchmark tailored to your workload. Read it here: https://www.baseten.co/blog/how-to-run-llm-performance-benchmarks-and-why-you-should/"
X Link 2026-02-05T16:00Z [----] followers, [---] engagements
"Whats the connection between LLM fine-tuning and Hollywood It turns out that there are many from VFX tooling to branding and advertising. @Madisonkanna sits down with @oxen_ai founder and CEO @gregschoeninger to discuss dataset version control infrastructure fine-tuning and deploying custom models end-to-end. Read the interview here: http://baseten.co/blog/fine-tuning-models-ai-and-hollywood-a-conversation-with-oxen-s-founder-greg http://baseten.co/blog/fine-tuning-models-ai-and-hollywood-a-conversation-with-oxen-s-founder-greg"
X Link 2026-02-05T17:00Z [----] followers, [---] engagements
"LLMs are amnesiacs. Once context fills up they forget everything. To fight this means grappling with a core question: how do you update a neural network without breaking what it already knows In this piece @charles0neill and @part_harry_ argue that continual learning is inseparable from specialization. While there are various ideas to allow generalist models to learn everything without forgetting anything these ideas are fundamentally in tension with continual learning in general. What comes after monolith models A Cambrian explosion of specialists. Read more here:"
X Link 2026-02-06T17:52Z [----] followers, [----] engagements
"RL often throws away useful signal at intermediate steps or as @karpathy put it it's like "sucking supervision through a straw." MiniMax M2.5 solves this with per-token process rewards. The result is frontier coding performance at least 1/10th the cost of closed source. @thealexker breaks down how this mechanism works and how M2.5 excels in general knowledge work. Read about it here: https://www.baseten.co/blog/minimax-m2-5-intelligence-too-cheap-to-meter-rl-process-rewards-real-world-produc/"
X Link 2026-02-13T23:41Z [----] followers, [----] engagements
"Were thrilled to announce that we have raised $300M at a $5B valuation. The round is led by IVP and CapitalG both doubling down on their investment in Baseten and joined by 01A Altimeter Battery Ventures BOND BoxGroup Blackbird Ventures Conviction Greylock and NVIDIA. Read more here: https://www.baseten.co/blog/announcing-baseten-s-300m-series-e/ https://www.baseten.co/blog/announcing-baseten-s-300m-series-e/"
X Link 2026-01-23T17:40Z [----] followers, 266.2K engagements
"Thanks @NVIDIAAI for inviting us to Dynamo Day We're active users of Dynamo iterating on it in production for performance gains like 50% lower TTFT and 34% lower TPOT and regularly shipping our work back to the community. Read some of our highlights from Dynamo Day and working with NVIDIA Dynamo here: https://www.baseten.co/blog/nvidia-dynamo-day-baseten-inference-stack/ https://www.baseten.co/blog/nvidia-dynamo-day-baseten-inference-stack/"
X Link 2026-02-03T17:38Z [----] followers, [----] engagements
"The best OpenClawπ¦ setup is fully open-source. Kimi K2.5 on Baseten outperforms Opus [---] on agentic benchmarks at 8x lower cost. Faster inference same or better quality. Set up in [--] minutes here: https://www.baseten.co/blog/openclaw-kimi-k2-5-on-baseten-frontier-agent-performance-with-oss/ https://www.baseten.co/blog/openclaw-kimi-k2-5-on-baseten-frontier-agent-performance-with-oss/"
X Link 2026-02-04T20:00Z [----] followers, [----] engagements
"We're living in the era of metric obsession. (How is your sleep score after Super Bowl weekend) π Now your metric obsession can extend to AI workloads. As model quality converges performance during inference has become a key differentiator. While AI users now expect fast responses by default it can be challenging to make sense of different benchmarks. Our latest blog post aims to give you a quick download on AI model performance metrics. Namely: What is the difference between Time to First Token (TTFT) Tokens Per Second (TPS) and End-to-End Latency (E2E) Why benchmarks can be misleading How"
X Link 2026-02-09T19:05Z [----] followers, [----] engagements
"Continuing this week with a case study β How did @sullyai return 30M+ clinical minutes to doctors By ditching closed-source models for a high-performance open-source stack on Baseten. Like many companies Sully faced inference challenges as they scaled with ballooning proprietary model costs and unpredictable latency. This was especially critical in Sully's case: in a live clinical setting a 70-second wait is an eternity. To solve this challenge we worked together to move to open-source models like GPT OSS 120b. With the Baseten inference stack Sully was live on NVIDIA HGX B200s just [--] days"
X Link 2026-02-10T17:03Z [----] followers, [----] engagements
"Sully.ai is transforming healthcare efficiency with Basetens Model APIs running frontier open models like gpt-oss-120b. We delivered a 10x cost reduction and 65% faster responses for clinical note generation thanks to the Baseten Inference Stack. Under the hood it leverages NVFP4 components of TensorRT-LLM and Dynamo and the Baseten Speculation Engineall running on @NVIDIA Blackwell GPUs. The result: 30+ million minutes returned to physicians. It means more time for doctor-patient conversations and less for paperwork. Read the blog to learn more"
X Link 2026-02-12T16:30Z [----] followers, [----] engagements
"We replicated Microsoft Research's Generative Adversarial Distillation (GAD) to distill Qwen3-4B from GPT-5.2. Standard black-box distillation teaches a student to copy teacher outputs but at inference the student generates from its own prefixes small errors compound and it drifts off the expert distribution. GAD reframes this as an on-policy distillation problem training a co-evolving discriminator that provides adaptive reward signals on the student's own generations. Exploring methods like this are how our post-training team surfaces new training patterns. Read here:"
X Link 2026-02-13T19:32Z [----] followers, [----] engagements
"Just because it's a federal holiday doesn't mean we're slacking. MiniMax M2.5 is live on our Model APIs. Try it here: https://baseten.co/library/minimax-m2-5/ https://baseten.co/library/minimax-m2-5/"
X Link 2026-02-16T17:38Z [----] followers, [----] engagements
"If youre building with DeepSeek this is your roadmap to performant reliable cost-efficient inference. Read our full guide here: https://www.baseten.co/resources/guide/the-complete-deepseek-model-guide/ https://www.baseten.co/resources/guide/the-complete-deepseek-model-guide/"
X Link 2025-08-27T02:58Z [----] followers, [---] engagements
"Our team met Parsed a few months ago and we could not be more excited to see the inflection point they are a part of - customized models built for those high impact jobs. This is an incredible team and we're thrilled to power their inference. Congrats @parsedlabs. Let's build. Today were launching Parsed. We are incredibly lucky to live in a world where we stand on the shoulders of giants first in science and now in AI. Our heroes have gotten us to this point where we have brilliant general intelligence in our pocket. But this is a local minima. We https://t.co/R7cR3EGVHT Today were launching"
X Link 2025-08-28T17:07Z [----] followers, [----] engagements
"We're excited to announce Fall into Inference: a multi-month deep dive into our cloud ecosystem and how we use Multi-cloud Capacity Management (MCM) to power fast reliable inference at scale. Over the next few months well showcase how we use MCM to power real-world AI use cases with partners including Google Cloud Amazon Web Services (AWS) OCI CoreWeave Nebius Vultr and NVIDIA. Stay tuned for weekly technical blogs case studies and deep dives with our partners"
X Link 2025-09-03T17:18Z [----] followers, [----] engagements
"We raised a $150M Series D Thank you to all of our customers who trust us to power their inference. We're grateful to work with incredible companies like @Get_Writer @zeddotdev @clay_gtm @trymirage @AbridgeHQ @EvidenceOpen @MeetGamma @Sourcegraph and @usebland. This round was led by @bondcap with @jaysimons joining our Board. We're also thrilled to welcome @conviction and @CapitalG to the round alongside support from @01Advisors @IVP @sparkcapital @GreylockVC @ScribbleVC @BoxGroup and Premji Invest. Today were excited to announce our $150M Series D led by BOND with Jay Simons joining our"
X Link 2025-09-05T15:05Z [----] followers, 19.5K engagements
"AI everywhere = Inference everywhere = Baseten everywhere IN NEWS: @basetenco raises a $150M series D round. @tuhinone (Founder & CEO Baseten) on the future of inference: I think the token price goes down and inference should get cheaper over time. And that really just means there is going to be more inference. Every time we https://t.co/oKplA7BIOY IN NEWS: @basetenco raises a $150M series D round. @tuhinone (Founder & CEO Baseten) on the future of inference: I think the token price goes down and inference should get cheaper over time. And that really just means there is going to be more"
X Link 2025-09-05T22:49Z [----] followers, [----] engagements
"We just raised a $150M Series D and were growing If you're looking for your next opportunity take a look at our 30+ open roles across engineering and GTM"
X Link 2025-09-08T21:09Z [----] followers, [---] engagements
"@Alibaba_Qwen (Gated) Attention is all you need. Excited to offer both Qwen3-Next models on dedicated deployments backed by 4xH100 GPUs. https://app.baseten.co/deploy/qwen_3_next_80B_A3_thinking https://www.baseten.co/library/qwen3-next-80b-a3b-instruct/ https://app.baseten.co/deploy/qwen_3_next_80B_A3_thinking https://www.baseten.co/library/qwen3-next-80b-a3b-instruct/"
X Link 2025-09-11T19:38Z [----] followers, [----] engagements
"Qwen3 Next 80B A3B Thinking outperforms higher-cost and closed models like Gemini [---] Flash Thinking on benchmarks nearing Qwen's flagship model quality at a fraction the size. We have it ready to deploy in our model library running on @nvidia and the Baseten Inference Stack"
X Link 2025-09-15T20:34Z [----] followers, [----] engagements
"The key is having good intuition being willing to go out on a limb building fast learning fast and killing things when you need to. Following our Series D raise our Co-founder and CTO @amiruci walks through why he bet early on inference how were scaling through generative model hypergrowth and his advice for fellow founders"
X Link 2025-09-16T17:48Z [----] followers, [----] engagements
"Well be at SigSum SF this Thursday Sept [--] Catch: - @philip_kiely's talk "Inference Engineering for Hypergrowth" (1 PM) - @tuhinone on the panel "Breaking Building and Betting on AI" (3:30 PM) Visit us in the partner showcase to grab an "Artificially Intelligent" tee"
X Link 2025-09-22T22:36Z [----] followers, [---] engagements
"@rohanpaul_ai someone needs to run the inference and make it fast. we can help with that"
X Link 2025-09-23T22:47Z [----] followers, [----] engagements
"Were hosting our friends at @OpenRouterAI for a SF Tech Week breakfast talk Join us at Baseten HQ on October [--] at 10AM for Learnings from processing [--] Trillion Tokens"
X Link 2025-09-29T22:36Z [----] followers, [---] engagements
"The team at OpenRouter will dive into: Closed vs. open model adoption Global usage trends from running inference at massive scale Tool calling & pricing shifts Seats are limited. Save yours here: https://partiful.com/e/q6l1SeDtPGU9kCQPArk6 https://partiful.com/e/q6l1SeDtPGU9kCQPArk6"
X Link 2025-09-29T22:36Z [----] followers, [---] engagements
"From document processing and image recognition to drug discovery healthcare use cases are at the forefront of AI adoption. We partner with teams like Vultr to support these applications with fast reliable inference. With Multi-cloud Capacity Management and theBaseten Inference Stack we power near-limitless scale for healthcare AI teams on NVIDIA Blackwell GPUs"
X Link 2025-10-01T20:31Z [----] followers, [---] engagements
"Embeddings power search RecSys and agents but making them performant in production requires satisfying two different traffic profiles. In our new guide we cover how to build embedding workflows that are both extremely high-throughput and low-latency from indexing millions of data points to serving individual search queries in milliseconds"
X Link 2025-10-02T20:00Z [----] followers, [---] engagements
"Read it here: https://www.baseten.co/resources/guide/high-performance-embedding-model-inference https://www.baseten.co/resources/guide/high-performance-embedding-model-inference"
X Link 2025-10-02T20:00Z [----] followers, [---] engagements
"Being fast for one customer isn't enough. Low-latency inference at scale requires the ability to recruit every GPU in the world"
X Link 2025-10-07T18:52Z [----] followers, [---] engagements
"Fast models for our fast friends at Factory Deploy and serve custom models with enterprise-grade infrastructure on @basetenco. Special promo for Factory users: receive $500 Model API credits when you fill out this form. https://t.co/UI8NqfACDY Deploy and serve custom models with enterprise-grade infrastructure on @basetenco. Special promo for Factory users: receive $500 Model API credits when you fill out this form. https://t.co/UI8NqfACDY"
X Link 2025-10-07T22:40Z [----] followers, [----] engagements
"Register here: https://events.redis.io/redis-released-london-2025 https://events.redis.io/redis-released-london-2025"
X Link 2025-10-08T15:55Z [----] followers, [---] engagements
"Were excited to launch Metas Llama [--] in our model library in both 8B and 70B π The newly introduced Llama [--] is a significant improvement over Llama [--] with increased tokens and reduced false refusal rates. These models deliver unparalleled performance showcasing significant advancements in efficiency and speed. Our Llama [--] 8B runs on A100s and Llama [--] 70B runs on H100s optimized for production. https://twitter.com/i/web/status/1781072277850714184 https://twitter.com/i/web/status/1781072277850714184"
X Link 2024-04-18T21:28Z [----] followers, 131.7K engagements
"Meet the Baseten team at the @aiDotEngineer Summit in NYC this week π Booth G3 get a demo and grab some swag"
X Link 2025-02-18T23:19Z [----] followers, [---] engagements
"@IVP @sparkcapital @GreylockVC @conviction @basecasevc @southpkcommons @Lachy @01Advisors https://www.baseten.co/blog/announcing-baseten-75m-series-c/ https://www.baseten.co/blog/announcing-baseten-75m-series-c/"
X Link 2025-02-19T17:05Z [----] followers, [----] engagements
"Friendly reminder from @willreed_21 (Spark Capital): Your team's time is best spent on your product not the infrastructure it runs on"
X Link 2025-06-25T18:22Z [----] followers, [---] engagements
"If you're in London catch Rachel Rapp with our friends from Tavily and cognee at Redis Released. From building and deploying the fastest agentic systems to industry trends they'll break down what the agentic tech stack looks like in a live panel this Thursday"
X Link 2025-10-08T15:55Z [----] followers, [---] engagements
"We caught up with the one and only @thdxr on Opencode's newly launched Zen and his hot takes Zen isnt a for-profit thing. This is something we try to do at breakeven. As we grow we pool all of our resources together and negotiate discounted rates with providers. These cost savings flow back right down to everyone"
X Link 2025-10-10T19:30Z [----] followers, [----] engagements
"From sketch to a 3D model in under [--] seconds with a 1B parameter model We built a flower card generator using Autodesks WaLa open-source AI and Baseten for scalable GPU deployment"
X Link 2025-10-13T21:38Z [----] followers, [---] engagements
"Fast Company named Baseten one of the [--] Next Big Things in Tech [----] Were proud to be recognized for powering the fastest and most reliable inference for the fastest-growing AI companies like Abridge Clay OpenEvidence and many more"
X Link 2025-10-14T19:05Z [----] followers, [----] engagements
"Powering inference for the fastest growing AI companies like OpenEvidence Writer and Clay means being the first to use bleeding-edge model performance tooling in production. That's why we were early adopters of NVIDIA Dynamo giving us 50% lower latency and 60%+ higher throughput with KV cache-aware routing. These results are the tip of the iceberg especially for our customers running large models with large context windows under heavy load"
X Link 2025-10-16T18:01Z [----] followers, [----] engagements
"See the benchmarks in our blog by @aqaderb @feilsystem and @rapprach: https://www.baseten.co/blog/how-baseten-achieved-2x-faster-inference-with-nvidia-dynamo/#how-baseten-uses-nvidia-dynamo https://www.baseten.co/blog/how-baseten-achieved-2x-faster-inference-with-nvidia-dynamo/#how-baseten-uses-nvidia-dynamo"
X Link 2025-10-16T18:01Z [----] followers, [---] engagements
"We unleashed our model performance team on GLM [---] and were very excited to be the fastest provider available today on Artificial analysis at [---] TPS (2x the next best TPS) and a less than [--] second TTFT (2x the next best TTFT). https://artificialanalysis.ai/models/glm-4-6-reasoning/providers https://artificialanalysis.ai/models/glm-4-6-reasoning/providers"
X Link 2025-10-17T21:32Z [----] followers, 11.2K engagements
"We see the massive AWS outage. Baseten web app is down but inference new deploys training jobs and the model management APIs are unaffected"
X Link 2025-10-20T08:36Z [----] followers, [----] engagements
"@DannieHerz @jeffbarg @ClayRunHQ The clay slackmoji in the b10 slack has been getting a lot of play recently"
X Link 2025-10-23T22:16Z [----] followers, [--] engagements
"DeepSeek-OCR stunned the internet this week with 10x more efficient compression unlocking faster and cheaper intelligence. We rolled out performant inference support on day one of the model drop. Learn why compressions are so effective at making models smarter what applications you can build with DeepSeek-OCR and how to serve it on Baseten in under [--] minutes. Link in the replies"
X Link 2025-10-24T00:08Z [----] followers, [----] engagements
"This week Baseten's model performance team unlocked the fastest TPS and TTFT for gpt-oss 120b on @nvidia hardware. When gpt-oss launched we sprinted to offer it at [---] TPS. now we've exceeded [---] TPS and [----] sec TTFT. and we'll keep working to keep raising the bar. We are proud to offer the best E2E latency available with near-limitless scale incredible performance and the highest uptime 99.99%"
X Link 2025-10-24T16:18Z [----] followers, 42.4K engagements
"We are so excited to be a launch partner for @nvidia Nemotron Nano [--] VL today and offer day-zero support for this highly accurate and efficient vision language model alongside other models in the Nemotron family. To learn more read our blog here https://www.baseten.co/blog/high-performance-agents-for-financial-services-with-nvidia-nemotron-on-baseten/ https://www.baseten.co/blog/high-performance-agents-for-financial-services-with-nvidia-nemotron-on-baseten/"
X Link 2025-10-28T18:43Z [----] followers, [----] engagements
"After months of feedback from our early customers and thousands of jobs completed Baseten Training is officially ready for everyone. π Access compute on demand train any model run multi-node jobs and deploy from checkpoints with cache-aware scheduling an ML Cookbook tool calling recipes and more"
X Link 2025-10-30T18:06Z [----] followers, 13.7K engagements
"@james_weitzman @athleticKoder β"
X Link 2025-11-03T23:38Z [----] followers, [--] engagements
"Fun fact - we asked people to describe their favorite agent in SF. We got suggestions for a bunch of new agentic apps to try. Our favorite agent Probably James Bond. If youre living in the new world of agentic AI check out our new deep dive on tool calling in inference. Check out the blog in the comments"
X Link 2025-11-05T23:17Z [----] followers, [---] engagements
"Blog: https://www.baseten.co/blog/tool-calling-in-inference/utm_source=twitter&utm_medium=social&utm_campaign=education_tool_calling_blog_2025-11-05 https://www.baseten.co/blog/tool-calling-in-inference/utm_source=twitter&utm_medium=social&utm_campaign=education_tool_calling_blog_2025-11-05"
X Link 2025-11-05T23:17Z [----] followers, [---] engagements
"Congratulations to @Kimi_Moonshot on their newest model drop Kimi-K2 Thinking one of the worlds most advanced open source models. Baseten is proud to offer Day [--] Support. Sign up with your business email address and get $100 in Model API credits"
X Link 2025-11-07T15:40Z [----] followers, 270.4K engagements
"Heading to KubeCon next week Come visit the team at Booth #631 to test your AI knowledge. Top of the leaderboard gets prizes π"
X Link 2025-11-07T21:56Z [----] followers, [---] engagements
"Excited to share this piece from @VentureBeat spotlighting how Baseten is redefining the AI infrastructure game: Baseten takes on hyperscalers with new AI training platform that lets you own your model weights. Thanks VentureBeat Read full article https://venturebeat.com/ai/baseten-takes-on-hyperscalers-with-new-ai-training-platform-that-lets-you https://venturebeat.com/ai/baseten-takes-on-hyperscalers-with-new-ai-training-platform-that-lets-you"
X Link 2025-11-10T17:59Z [----] followers, [----] engagements
"At @KubeCon_ Swing by Booth #631 to test your inference knowledge and earn some swag"
X Link 2025-11-11T19:05Z [----] followers, [---] engagements
"Congrats to the World Labs team on the launch today Marble lets you create 3D worlds from just a single image text prompt video or 3D layout. We couldn't be more excited to power the inference behind this. Can't wait to see what everyone makes. π₯ Introducing Marble by World Labs: a foundation for a spatially intelligent future. Create your world at https://t.co/V267VJu1H9 https://t.co/T00mtETmCA Introducing Marble by World Labs: a foundation for a spatially intelligent future. Create your world at https://t.co/V267VJu1H9 https://t.co/T00mtETmCA"
X Link 2025-11-12T17:39Z [----] followers, [----] engagements
"Welcome to the new age Defense Against the Dark Arts. It's called fast inference (& Harry Potter would be jealous). Check out our deep dive on how the Baseten wizards (model performance team) optimized Kimi K2 Thinking (now faster and just as smart as GPT-5). https://www.baseten.co/blog/kimi-k2-thinking-at-140-tps-on-nvidia-blackwell/utm_source=twitter&utm_medium=social&utm_campaign=awareness_kimi-k2-thinking_performance_blog_2025-11-12"
X Link 2025-11-12T20:50Z [----] followers, [----] engagements
"Baseten used @nvidia Dynamo to double inference speed for long-context code generation and increased throughput by 1.6x. Dynamo simplifies multi-node inference on Kubernetes helping us scale deployments while reducing costs. Read the full blog post belowπ β NVIDIA Dynamo is now available across major cloud providersincluding @awscloud @googlecloud @Azure and @OracleCloudto enable efficient multi-node inference on Kubernetes in the cloud. And Its already delivering results: @basetenco is seeing faster more cost-effective https://t.co/6efirNmK3r β NVIDIA Dynamo is now available across major"
X Link 2025-11-13T19:52Z [----] followers, [----] engagements
"Working with the @GammaApp team never quite feels like work and thats how their product feels. "Criminally fun." We are honored to be long-term partners and power Gammas inference needs as they push the envelope on how we present ideas. Congratulations on the Series B"
X Link 2025-11-13T22:02Z [----] followers, 22.8K engagements
"Shoutout to the incredible team at @oxen_ai Turning datasets deployed models like its light work. They build fast. We help them ship even faster. Thanks for the partnership @gregschoeninger Check out the story in the comments #AI #MLOps #Baseten"
X Link 2025-11-18T21:28Z [----] followers, [----] engagements
"@drishanarora Congratulations on the launch Excited to support with dedicated deployments of Cogito V2.1. https://www.baseten.co/library/cogito-v2-1-671b/ https://www.baseten.co/library/cogito-v2-1-671b/"
X Link 2025-11-19T18:16Z [----] followers, [----] engagements
"Congrats to our friends at Deep Cogito on launching the most powerful US-based OSS model. It turns out LLM self play produces shorter reasoning chains (low token consumption) while maintaining great performance Try it out on Baseten today: https://www.baseten.co/library/cogito-v2-1-671b/ Today we are releasing the best open-weight LLM by a US company: Cogito v2.1 671B. On most industry benchmarks and our internal evals the model performs competitively with frontier closed and open models while being ahead of any US open model (such as the best versions of https://t.co/F6eZnn8s2Q"
X Link 2025-11-19T20:59Z [----] followers, [----] engagements
"If you need an adrenaline rush to wake up from your post-Thanksgiving stupor we got you. @deepseek_ai V3.2 dropped this week and is now available on Baseten. Its so smart your mother will ask why you can't be more like DeepSeek. V3.2 is currently on par with GPT-5 all whilst being multiples cheaper. V3.2 is now live on our Model APIs and on @OpenRouterAI and @ArtificialAnlys. Baseten is the fastest provider with [----] TTFT and [---] tps (thats 1.5x faster than the next guy). For a model this size its screaming. Get the brains without trading off performance"
X Link 2025-12-04T16:50Z [----] followers, [----] engagements
"We're excited to partner with @getstream_io to help developers build fast production-ready Vision Agents. Together we combined Baseten-hosted Qwen3-VL with Streams real-time voice and video to create an Electronics Setup & Repair Assistant that can see understand and guide users in real time. Check out the full walkthrough and demo below Vision Agents just got better: @basetenco joins us to take multimodal capabilities even further. Our team worked together to create a guide on running models hosted on Baseten with Vision Agents. Check out our blog post where we use Baseten + Qwen 3-VL"
X Link 2025-12-05T19:26Z [----] followers, [----] engagements
""We want people to own their own intelligence and we now see a really straight shot to get there." @amiruci sits down with @mudithj_ and @charles0neill from the Parsed team. Check out the full fireside chat in the comments"
X Link 2025-12-11T22:12Z [----] followers, [----] engagements
"Inference performance isnt just about the model. It relies on the entire inference stack. In our Inference Stack white paper we explain how Baseten uses @nvidia TensorRT LLM and Dynamo to reduce latency and increase throughput across model modalities. If you care about speed this is worth reading. https://www.baseten.co/resources/guide/the-baseten-inference-stack https://www.baseten.co/resources/guide/the-baseten-inference-stack"
X Link 2026-01-09T20:20Z [----] followers, [----] engagements
"π We're thrilled to introduce the fastest most accurate and cost-efficient Whisper-powered transcription and diarization on the market: [----] RTFwith Whisper Large V3 Turbo Streaming transcriptionwith consistent low latency The most accurate real-time diarization 90% lower costdue to infra optimizations Used in production by companies like @NotionHQ π https://twitter.com/i/web/status/2012203547912245366 https://twitter.com/i/web/status/2012203547912245366"
X Link 2026-01-16T16:41Z [----] followers, [----] engagements
"Want to learn about how to run high performance LLM inference at scaleOur Head of DevRel @philipkiely has the perfect talk for you during NVIDIA Dynamo Day on Jan [--]. Register here: https://nvevents.nvidia.com/dynamodayi=RNQf_gN5cXcdmLzfj_IevFS-tdC553CY https://nvevents.nvidia.com/dynamodayi=RNQf_gN5cXcdmLzfj_IevFS-tdC553CY"
X Link 2026-01-20T19:24Z [----] followers, [----] engagements
"Were thrilled to be working with @LangChain to power the fastest way to generate production-ready agents without code. LangChains Agent Builder represents a way for non-technical knowledge workers and citizen developers to build useful things with AI. All with Baseten Inference backbone and GLM [---]. Weve written a tutorial for you to create your own in minutes"
X Link 2026-01-21T17:19Z [----] followers, [----] engagements
"Tired of waiting for video generation Say less. We've optimized the Wan [---] runtime to hit: 3x faster inference on NVIDIA Blackwell 2.5x faster on Hopper 67% cost reduction. Read the full breakdown of our kernel optimizations and benchmarks here: https://www.baseten.co/blog/wan-2-2-video-generation-in-less-than-60-seconds/#benchmarking-methodology https://www.baseten.co/blog/wan-2-2-video-generation-in-less-than-60-seconds/#benchmarking-methodology"
X Link 2026-01-22T14:00Z [----] followers, [----] engagements
"@lucas_dehaas so if people are wondering why there are baseten stickers tagged across sf they know you're the one to blame"
X Link 2026-01-23T19:05Z [----] followers, [---] engagements
"@adambain grateful for your support and we're still so early π₯"
X Link 2026-01-23T20:24Z [----] followers, [---] engagements
"LIVE Tune in to hear @tuhinone discuss our series E open source and the multi-model future on CNBC A Chinese AI model is having a real coding moment. and not just in China. Zhipu says its coding agent users are concentrated in the *US and China. @tuhinone CEO of @basetenco joins me on the back of his latest fundraise to discuss whats hype and whats real A Chinese AI model is having a real coding moment. and not just in China. Zhipu says its coding agent users are concentrated in the *US and China. @tuhinone CEO of @basetenco joins me on the back of his latest fundraise to discuss whats hype"
X Link 2026-01-26T19:26Z [----] followers, [----] engagements
"Who wants to take the 30b parameter Alpaca model for a ride Announcement coming tomorrow"
X Link 2023-03-20T01:55Z [----] followers, [----] engagements
"Only a handful of models dominated the ASR spaceuntil now. Voxtral has a 30-minute transcription range a 40-minute range for understanding plus built-in function calling for voice out of the box. @thealexker breaks down the technical details"
X Link 2025-07-23T19:22Z [----] followers, [----] engagements
"Forget AI writing your code. AI can now control your home through voice. Weve had a blast putting Voxtral through the paces this week. Mistrals new model delivers see for yourself on Baseten. Our latest blog dives into whats so unique (and powerful) about Voxtral and how you can deploy it to build production-grade apps. https://twitter.com/i/web/status/1948519816312357326 https://twitter.com/i/web/status/1948519816312357326"
X Link 2025-07-24T23:05Z [----] followers, [----] engagements
""the best application layer companies set up the harness and how to use it for the problem that your user is trying to solve""
X Link 2026-01-24T00:18Z [----] followers, [----] engagements
"Thank you @BloombergTV for having our CEO and co-founder @tuhinone and day [--] investor @saranormous yesterday to discuss our latest fundraise the bet we're making with inference and how we're powering customers. Full interview here: https://www.bloomberg.com/news/videos/2026-01-26/ai-startup-baseten-raises-300-million-video https://www.bloomberg.com/news/videos/2026-01-26/ai-startup-baseten-raises-300-million-video"
X Link 2026-01-27T21:20Z [----] followers, [----] engagements
"Nemotron [--] Nano NVFP4 is now available on Baseten + NVIDIA B200 BF16-level accuracy up to [--] higher throughput vs FP8 and faster inference powered by QAD + Blackwell running on the Baseten Inference Stack. https://www.baseten.co/library/nvidia-nemotron-3-nano/ https://www.baseten.co/library/nvidia-nemotron-3-nano/"
X Link 2026-01-28T17:51Z [----] followers, [----] engagements
"Our CEO and co-founder @tuhinone sat down with Axios to discuss how we're using our latest funding to build an inference-native cloud that owns the full inference-data-eval-RL loop and why our recent acquisition of Parsed is just the beginning as we continue to pursue aligned talent and capabilities. Full interview here: https://www.axios.com/pro/enterprise-software-deals/2026/01/29/baseten-acquisitions-ai-inference https://www.axios.com/pro/enterprise-software-deals/2026/01/29/baseten-acquisitions-ai-inference"
X Link 2026-02-02T18:11Z [----] followers, [---] engagements
"RT @NVIDIAAIDev: π Thank you @basetenco for being an engaged and impactful contributor in the NVIDIA Dynamo ecosystem. By running Dynamo"
X Link 2026-02-06T17:42Z [----] followers, [--] engagements
"π Were really excited to be announcing BaseTen today. BaseTen is the fastest way to build applications powered by machine learning. Check it out yourself https://www.baseten.co/blog https://www.baseten.co/blog"
X Link 2021-05-20T16:00Z [----] followers, [---] engagements
"Introducing Baseten Chains π Were thrilled to introduce Chains a framework for building multi-component AI workflows on Baseten βπ Chains enables users to build complex workflows as modular services in simple Python http://x.com/i/article/1805620705716801538 http://x.com/i/article/1805620705716801538"
X Link 2024-06-27T16:28Z [----] followers, 23.9K engagements
"We're excited to introduce our new Engine Builder for TensorRT-LLM π Same great @nvidia TensorRT-LLM performance90% less effort. Check out our launch post to learn more: Or @philip_kiely's full video: We often use TensorRT-LLM to support custom models for teams like @Get_Writer. For their latest industry-leading Palmyra LLMs TensorRT-LLM inference engines deployed on Baseten achieved 60% higher tokens per second. We've used TensorRT-LLM to achieve results including: π 3x better throughput π 40% lowertime to first token π 35% lowercost per million tokens While TensorRT-LLM is incredibly"
X Link 2024-08-01T16:30Z [----] followers, [----] engagements
"Just in time for the new year Awesome job by our model performance team to hit the top of @ArtificialAnlys for GLM [---] try it here https://www.baseten.co/library/glm-4-7/ happy holidays we just dropped the fastest GLM 4.7: 400+ TPS as benchmarked by @ArtificialAnlys https://t.co/eRv47ok1sV https://www.baseten.co/library/glm-4-7/ happy holidays we just dropped the fastest GLM 4.7: 400+ TPS as benchmarked by @ArtificialAnlys https://t.co/eRv47ok1sV"
X Link 2025-12-28T20:23Z [----] followers, [----] engagements
"We're excited to announce that we've raised a $40M Series B to help power the next generation of AI-native products with performant reliable and scalable inference infrastructure. https://www.baseten.co/blog/announcing-our-series-b/ https://www.baseten.co/blog/announcing-our-series-b/"
X Link 2024-03-04T16:01Z [----] followers, 82.7K engagements
"We're thrilled to welcome Joey Zwicker as our new Head of Forward Deployed Engineering We've grown rapidly over the last few years and we're excited to have Joey lead the team into our next phase. We're hiring FDEs everywhere -- if you're interested reach out"
X Link 2025-08-11T20:37Z [----] followers, 10.3K engagements
"Nobody knows what inference means but it's provocative"
X Link 2025-02-28T15:00Z [----] followers, 497.8K engagements
"RT @DylanAbruscato: If you were a guest on TBPN your logo will air during the Super Bowl. Here's the final frame of the ad π"
X Link 2026-02-04T17:15Z [----] followers, [--] engagements
Limited data mode. Full metrics available with subscription: lunarcrush.com/pricing
/creator/twitter::basetenco