#  @corbtt Kyle Corbitt Kyle Corbitt posts on X about model, ai, open ai, in the the most. They currently have [------] followers and [---] posts still getting attention that total [-----] engagements in the last [--] hours. ### Engagements: [-----] [#](/creator/twitter::823506858/interactions)  - [--] Week [-------] -21% - [--] Month [-------] +60% - [--] Months [---------] -61% - [--] Year [----------] +46% ### Mentions: [--] [#](/creator/twitter::823506858/posts_active)  - [--] Months [---] -30% - [--] Year [---] -7.20% ### Followers: [------] [#](/creator/twitter::823506858/followers)  - [--] Week [------] +7.20% - [--] Month [------] +7.90% - [--] Months [------] +19% - [--] Year [------] +61% ### CreatorRank: [---------] [#](/creator/twitter::823506858/influencer_rank)  ### Social Influence **Social category influence** [technology brands](/list/technology-brands) [stocks](/list/stocks) [finance](/list/finance) [social networks](/list/social-networks) [countries](/list/countries) [travel destinations](/list/travel-destinations) [vc firms](/list/vc-firms) **Social topic influence** [model](/topic/model), [ai](/topic/ai), [open ai](/topic/open-ai) #1455, [in the](/topic/in-the), [agents](/topic/agents), [if you](/topic/if-you), [art](/topic/art), [agentic](/topic/agentic), [inference](/topic/inference), [how to](/topic/how-to) **Top accounts mentioned or mentioned by** [@openpipeai](/creator/undefined) [@coreweave](/creator/undefined) [@hamelhusain](/creator/undefined) [@mattshumer](/creator/undefined) [@officiallogank](/creator/undefined) [@weightsbiases](/creator/undefined) [@willccbb](/creator/undefined) [@aidanmclau](/creator/undefined) [@artificialanlys](/creator/undefined) [@unslothai](/creator/undefined) [@eugeneyan](/creator/undefined) [@hallerite](/creator/undefined) [@casperhansen](/creator/undefined) [@databricks](/creator/undefined) [@simonw](/creator/undefined) [@danielhanchen](/creator/undefined) [@yarvol](/creator/undefined) [@vikhyatk](/creator/undefined) [@jefrankle](/creator/undefined) [@huggingface](/creator/undefined) **Top assets mentioned** [Alphabet Inc Class A (GOOGL)](/topic/$googl) [Microsoft Corp. (MSFT)](/topic/microsoft) [Frontier (FRONT)](/topic/frontier) ### Top Social Posts Top posts by engagements in the last [--] hours "The [----] crypto boom felt really scammy -- price mostly driven by ponzi-style promos and ICOs. There's still some of that today (memecoins most NFTs) but ecosystem feels way healthier overall. I'm now pretty bullish on crypto's long-term impact" [X Link](https://x.com/corbtt/status/1384556549272608770) 2021-04-20T17:16Z 10.4K followers, [--] engagements "Gotta do a better job of sharing what we're working on. π Shipped last week: automatically convert your prompts between GPT Claude 1/2 and Llama [--] syntax. Sign up at to convert and benchmark your own prompts https://openpipe.ai/ Curious about Llama [--] Here's a fun feature we shipped last week: automatically convert your GPT-3.5 prompt to the Llama [--] format with best practices Play with Llama [--] Claude [--] and GPT models at https://t.co/ySUpFCX8N4 π https://t.co/5AuYe69j6D https://openpipe.ai/ Curious about Llama [--] Here's a fun feature we shipped last week: automatically convert your GPT-3.5" [X Link](https://x.com/corbtt/status/1687876309631041536) 2023-08-05T17:20Z 10.4K followers, [---] engagements "Just officially launched OpenPipe as a YC company. DM me if you're interested in converting your expensive LLM prompt into a cheap reliable fine-tuned model. https://www.ycombinator.com/launches/JMa-openpipe-convert-expensive-llm-prompts-into-fast-cheap-fine-tuned-models https://www.ycombinator.com/launches/JMa-openpipe-convert-expensive-llm-prompts-into-fast-cheap-fine-tuned-models" [X Link](https://x.com/corbtt/status/1696185686888796437) 2023-08-28T15:39Z 11.3K followers, 13.2K engagements "@Altimor Sounds like we should talk. π https://openpipe.ai/source=x https://openpipe.ai/source=x" [X Link](https://x.com/corbtt/status/1696309344151978321) 2023-08-28T23:50Z 10.4K followers, [---] engagements "@tszzl The Saudis would like a word" [X Link](https://x.com/corbtt/status/1734074417821696152) 2023-12-11T04:55Z [----] followers, [---] engagements "We β₯ @helicone_ai" [X Link](https://x.com/corbtt/status/1734434824184807678) 2023-12-12T04:47Z [----] followers, [---] engagements "@arnabmanna619 Sure There are a lot of tricks but the basic idea is (1) run your prompt on several hundred inputs and store the outputs. (2) Use the inputs+outputs to fine-tune a LoRA (optionally on a smaller base model) and use that for future inference" [X Link](https://x.com/corbtt/status/1739715890260984309) 2023-12-26T18:32Z [----] followers, [---] engagements "@TheRealAneesh @Teknium1 How does unsloth compare perf-wise to SFTTrainer/Axolotl Creator has big claims but haven't heard much external corroboration" [X Link](https://x.com/anyuser/status/1743906387628081513) 2024-01-07T08:04Z [--] followers, [---] engagements "There's a lot of noise in the LLMOps space around tooling that doesn't solve a real problem. @OpenPipeAI does @josedlpuente Honestly its almost all in house stuff. Nothing in LLM ops has really impressed me so far (except @OpenPipeAI ) and even if it was good it wasnt around when we started building @josedlpuente Honestly its almost all in house stuff. Nothing in LLM ops has really impressed me so far (except @OpenPipeAI ) and even if it was good it wasnt around when we started building" [X Link](https://x.com/corbtt/status/1746975069694509426) 2024-01-15T19:18Z [----] followers, [---] engagements "Fine-tuning isnt the future anymore its the present. We 10xd in both unique customers and completions served in January. Now generating over 1M completions a day and ramping quickly" [X Link](https://x.com/corbtt/status/1757118240944427392) 2024-02-12T19:03Z 11.3K followers, [----] engagements "@yar_vol @vikhyatk @OpenPipeAI We haven't posted our internal analysis of Mixtral yet but fine-tuned Mistral is much stronger than GPT-3.5 and Mixtral is stronger than Mistral. https://openpipe.ai/blog/mistral-7b-fine-tune-optimized https://openpipe.ai/blog/mistral-7b-fine-tune-optimized" [X Link](https://x.com/corbtt/status/1761975917793231035) 2024-02-26T04:46Z 14.3K followers, [---] engagements "@rishdotblog Productized on @OpenPipeAI https://docs.openpipe.ai/features/pruning-rules https://docs.openpipe.ai/features/pruning-rules" [X Link](https://x.com/corbtt/status/1765913673607057539) 2024-03-08T01:33Z 10.4K followers, [---] engagements "The formerly hard parts of fine-tuning are fully automated. Need a ton of data β Synthetic data generation Data prep/cleaning β Use LLM to filter and relabel Evaluations β LLM-as-judge This works. You can build a SOTA model for your task in an hour not a month" [X Link](https://x.com/corbtt/status/1768018784684835033) 2024-03-13T20:58Z [----] followers, [---] engagements "@khanshq @OpenPipeAI Yes" [X Link](https://x.com/corbtt/status/1770511639965815183) 2024-03-20T18:04Z [----] followers, [---] engagements "HN putting out some real bangers this morning" [X Link](https://x.com/corbtt/status/1771555680626892859) 2024-03-23T15:12Z [----] followers, [----] engagements "@MistralAI live-dropping their new 7B model 32K context and improved benchmarks" [X Link](https://x.com/corbtt/status/1771608690866434322) 2024-03-23T18:43Z 10.4K followers, [---] engagements "Spoke to a Microsoft engineer on the GPT-6 training cluster project. He kvetched about the pain they're having provisioning infiniband-class links between GPUs in different regions. Me: "why not just colocate the cluster in one region" Him: "Oh yeah we tried that first. We can't put more than 100K H100s in a single state without bringing down the power grid." π€―" [X Link](https://x.com/anyuser/status/1772392525174620355) 2024-03-25T22:38Z 19K followers, 1.9M engagements "@Jessassin my general understanding of the business model is "whoever builds agi first wins the whole game." you can agree with them or not but openai really does believe they're playing for all the marbles here" [X Link](https://x.com/anyuser/status/1772489878451663231) 2024-03-26T05:04Z 19K followers, 99.9K engagements "@AnthropicAI be cooking. I'm old enough to remember when "why would anyone want to use the second-best AI assistant" was OpenAI's argument. π The king is dead RIP GPT-4 Claude opus #1 ELo Haiku beats GPT-4 [----] & Mistral large Thats insane for how cheap & fast it is https://t.co/fAwzJScLTH The king is dead RIP GPT-4 Claude opus #1 ELo Haiku beats GPT-4 [----] & Mistral large Thats insane for how cheap & fast it is https://t.co/fAwzJScLTH" [X Link](https://x.com/corbtt/status/1772774888853639629) 2024-03-26T23:57Z [----] followers, [---] engagements "@iammaestro04 @burny_tech openai is playing for all the marbles" [X Link](https://x.com/corbtt/status/1772776835849580854) 2024-03-27T00:05Z [----] followers, [---] engagements ""Different team members threw out ideas in Slack for how to use the remaining week of computer power. One idea was. a much smaller version for hobbyists to play with" There's still time @databricks Show us what a DBRX-7B can do @jefrankle https://www.wired.com/story/dbrx-inside-the-creation-of-the-worlds-most-powerful-open-source-ai-model/ https://www.wired.com/story/dbrx-inside-the-creation-of-the-worlds-most-powerful-open-source-ai-model/" [X Link](https://x.com/corbtt/status/1772997898466361487) 2024-03-27T14:43Z [----] followers, [----] engagements "Lots of @databricks π on this I'm optimistic" [X Link](https://x.com/corbtt/status/1773199468499751286) 2024-03-28T04:04Z [----] followers, [---] engagements "@jefrankle @mejia_petit @databricks @code_star @mvpatel2000 For our use cases 7B and 13B are definitely sweet spots. Not a huge need for 34B imho unless it can beat Mixtral at a range of tasks" [X Link](https://x.com/corbtt/status/1773220171374547085) 2024-03-28T05:26Z [----] followers, [---] engagements "scoop: @jefrankle is going to give the @DbrxMosaicAI team the day off and hero-run a new state-of-the-art 7B model on the @databricks cluster. you heard it here first π" [X Link](https://x.com/corbtt/status/1773222964965564482) 2024-03-28T05:37Z [----] followers, [----] engagements "@jefrankle @code_star @databricks A magnetized needle and a steady hand" [X Link](https://x.com/corbtt/status/1773226043807080588) 2024-03-28T05:50Z [----] followers, [--] engagements "@OfficialLoganK has been a fantastic resource for us I highly recommend taking his money if you have the option π Ive invested in 30+ AI startups in the last year and a half if youre working on something cool and want to chat my DMs are open. Its time to build Ive invested in 30+ AI startups in the last year and a half if youre working on something cool and want to chat my DMs are open. Its time to build" [X Link](https://x.com/corbtt/status/1773561133493871084) 2024-03-29T04:01Z 11.3K followers, [---] engagements "An AI-empowered employee has a vastly higher skill floor than a non-AI-empowered employee. Just asked an engineer with [--] marketing experience to set up our Hubspot so to send a product newsletter to all our current and future users. Pre-AI would not have been worth the ramp to get him familiar with Hubspot. Now he can just ask GPT-4 the top-level questions about best practices and get it working in an hour" [X Link](https://x.com/corbtt/status/1775264476859879703) 2024-04-02T20:50Z [----] followers, [----] engagements "@consolelogwill @lgrammel Very likely yes our completions endpoint is OpenAI compatible. If you have any issues DM me and we'll take a look" [X Link](https://x.com/corbtt/status/1775594723287404985) 2024-04-03T18:42Z [----] followers, [---] engagements "2025: it's now considered good manners to add subtle typos and grammar errors to your emails. signals a real human spent time on it. 2026: all frontier models are now RLHF'd to add typos and grammar errors" [X Link](https://x.com/corbtt/status/1776851359758725545) 2024-04-07T05:55Z [----] followers, [----] engagements "Making tool calls a first-class citizen in the OpenAI API was a mistake. JSON mode can do everything tool calls can and more and is conceptually simpler" [X Link](https://x.com/corbtt/status/1777855642121928945) 2024-04-10T00:26Z [----] followers, [----] engagements "This is how foundation model companies build a data flywheel that makes it hard to keep up. You'd better believe that OpenAI is using GPT-5 to filter and synthesize training data for GPT-6 already" [X Link](https://x.com/anyuser/status/1781010105032683849) 2024-04-18T17:21Z 19K followers, 41K engagements "In a year you'll be able to directly prompt your social media feeds. "Yes I know I spent [--] seconds looking at that picture. No that doesn't mean I only want to see thirst traps for the next [--] days." If the platforms don't build this in directly someone will come in and build it on top" [X Link](https://x.com/corbtt/status/1782645257030787459) 2024-04-23T05:38Z [----] followers, [--] engagements "In a year you'll be able to directly prompt your social media feeds. "Yes I know I spent [--] seconds looking at that picture. No that doesn't mean I only want to see thirst traps for the next [--] days." Platforms should build this directly or someone will build it on top" [X Link](https://x.com/corbtt/status/1782645354732958047) 2024-04-23T05:39Z [----] followers, [---] engagements "@4evaBehindSOTA @abacaj So I will 100% grant that if you spend enough time with hparam sweeps and data filtering/augmentation you can get a better model than what we give you at openpipe. But I still think getting you 80% of the way there for 5% of the effort is hugely valuable. π€·" [X Link](https://x.com/corbtt/status/1787706275142447523) 2024-05-07T04:49Z 10.6K followers, [--] engagements "@altryne @Teknium1 @sama More likely explanation for (1) is that the model developer saw sam's tweet and decided to use the name for the lulz. It's true that using the openai tokenizer is an unusual choice for a non-openai model though" [X Link](https://x.com/corbtt/status/1787932576654782508) 2024-05-07T19:48Z [----] followers, [---] engagements "OpenAI demo day is really cool and I love gpt-4 at half the price. BUT This updates me more towards "openai has hit a wall on frontier model performance". You don't spend your time messing around with cost optimization if you could be releasing the smartest model in the world instead. Hoping I'm wrong and they've got a "one more thing" event coming soon" [X Link](https://x.com/anyuser/status/1790089023400325268) 2024-05-13T18:37Z 19K followers, 141K engagements "@raydelvecc sure if they're doing both that's fine. I want to see the Big Iron model tho" [X Link](https://x.com/corbtt/status/1790108205424816508) 2024-05-13T19:53Z [----] followers, 10.5K engagements "@0mniusprime there is always a market for the smartest model as long as it's cheaper per-token than a similarly-smart human. just deploy it at $0.01/token or whatever until you can get the prices down" [X Link](https://x.com/corbtt/status/1790108458148311523) 2024-05-13T19:54Z [----] followers, [----] engagements "@strickvl @HamelHusain @dan_s_becker Do you know what stack you'll use for the fine-tuning yet Would love to see you on @OpenPipeAI" [X Link](https://x.com/corbtt/status/1797477986318610648) 2024-06-03T03:58Z 10.7K followers, [---] engagements "Last month I predicted that OpenAI had hit a wall on frontier model performance and a lot of folks called it out as a bad take. Feeling pretty vindicated now -- rumors have shifted from "GPT-5 this summer" to "GPT-5 in December". Translation: whatever they tried in the last training run failed and they're starting over from scratch (with no guarantee the new run will work either). OpenAI demo day is really cool and I love gpt-4 at half the price. BUT This updates me more towards "openai has hit a wall on frontier model performance". You don't spend your time messing around with cost" [X Link](https://x.com/corbtt/status/1798043111266087330) 2024-06-04T17:24Z [----] followers, 21.2K engagements "@swyx @HamelHusain bullish on I've used it for some lightweight data munging and works fairly well. Still reach for python for anything heavy though. I know @ankrgyl is a TS fan. Should we just quit our startups and spend [--] years building out the TS-data ecosystem π€π https://www.npmjs.com/package/nodejs-polars https://www.npmjs.com/package/nodejs-polars" [X Link](https://x.com/corbtt/status/1798170751314653689) 2024-06-05T01:51Z [----] followers, [---] engagements "To be clear I don't think this is a permanent wall and I definitely think OpenAI will remain on the frontier. It's just turning out to be harder to make the next jump than a lot of us assumed" [X Link](https://x.com/corbtt/status/1798577948511113362) 2024-06-06T04:49Z [----] followers, [----] engagements "My take as someone who doesn't fall into either the "Safety First" or e/acc camps: [--]. Leopold's essay is at a minimum a well-articulated case for taking a the AI capability explosion seriously. Low-brow dismissals like "he's just drawing lines on a graph bro" don't do it justice. [--]. The basic thesis is at least plausible. So far adding OOMs seems to mostly be working. There's no guarantee that scaling will keep working but also no strong reason to believe it won't. I think assuming that 5-6 more OOMs get us to AGI is not unreasonable. [--]. That said. 5-6 more OOMs is a lot. This is probably the" [X Link](https://x.com/corbtt/status/1798853894896050625) 2024-06-06T23:06Z [----] followers, [----] engagements "How can Apple outperform Phi-3 with a much smaller model Easyby fine-tuning an adapter for each task This is the future. You won't ship prompts you'll ship adapters. Far better perf. @OpenPipeAI makes this as easy as prompting" [X Link](https://x.com/corbtt/status/1800533517727940671) 2024-06-11T14:20Z 11.3K followers, [----] engagements "@JoshPurtell @OpenPipeAI Just use your task and try it out You can integrate OpenPipe and have a task-specific fine-tuned model to play around with in less than an hour of engineering time" [X Link](https://x.com/corbtt/status/1800900134689263805) 2024-06-12T14:37Z 10.6K followers, [--] engagements "The modal use case for the largest most capable and most expensive models will be distilling training data for the smaller (but still very capable) models which will be what people actually use to get work done .@nvidia just released their own open-source model Nemotron-4 340B a family of open models that developers can use to generate synthetic data for training large language models (LLMs) for commercial applications across healthcare finance manufacturing retail and every .@nvidia just released their own open-source model Nemotron-4 340B a family of open models that developers can use to" [X Link](https://x.com/corbtt/status/1801723105116885210) 2024-06-14T21:07Z [----] followers, [----] engagements "@nickwalton00 STAY TUNED" [X Link](https://x.com/corbtt/status/1801755454156247252) 2024-06-14T23:15Z 10.4K followers, [--] engagements "@mgoin_ @jeremyphoward @neuralmagic What is the perf impact If I have a base model in fp8 and deploy a bf16 LoRA on top does it need to dynamically convert the weights one way or the other so they're compatible data types" [X Link](https://x.com/corbtt/status/1803194960818426017) 2024-06-18T22:35Z [----] followers, [--] engagements "The MoA architecture is simple: generate [--] initial GPT-4 completions have GPT-4 reflect on them and then have GPT-4 produce a final output based on its deliberations" [X Link](https://x.com/corbtt/status/1803813999043301504) 2024-06-20T15:35Z [----] followers, 13.8K engagements "@Yossi_Dahan_ That comparison is specifically an OpenPipe-served Llama [--] 8B vs GPT-4 Turbo" [X Link](https://x.com/corbtt/status/1804308780789961037) 2024-06-22T00:21Z 11.7K followers, [---] engagements "@morgymcg @altryne @james_y_zou @thursdai_pod Our AlpacaEval results are actually with MoA on GPT-4 Turbo the fine tuning was separate. We were able to achieve [----] (also note we didn't overfit on the benchmark; just ran it at the end after the flow was worked out) https://tatsu-lab.github.io/alpaca_eval/ https://tatsu-lab.github.io/alpaca_eval/" [X Link](https://x.com/corbtt/status/1804636554159349986) 2024-06-22T22:04Z [----] followers, [--] engagements "@mattzcarey @mattshumer_ @togethercompute We're actually serving it as a prepackaged endpoint https://openpipe.ai/blog/mixture-of-agents https://openpipe.ai/blog/mixture-of-agents" [X Link](https://x.com/corbtt/status/1805315640728993899) 2024-06-24T19:02Z 12.1K followers, [--] engagements "Interesting mini-result that you need to know if you're trying to create the highest-quality fine-tuned LoRAs: you should ignore the loss on input tokens when training the model and only train your model on the completion token loss. easiest way to accomplish this: use @winglian's @axolotl_ai and set train_on_inputs: false. By doing this you're allowing the model to concentrate entirely on learning to produce the output at the cost of not learning to produce the input. Most frameworks do not support this Eg. the Huggingface trainer doesn't let you combine this training strategy with sample" [X Link](https://x.com/corbtt/status/1806336011804484017) 2024-06-27T14:37Z 11.3K followers, 11.5K engagements "Fantastic blog series from @strickvl who found his fine-tuned @OpenPipeAI model outperformed GPT-4 (as well as other fine-tuned models) on response quality for a tiny fraction of the cost. Love it when our users share their success. π€© https://mlops.systems/posts/2024-07-01-full-finetuned-model-evaluation.html https://mlops.systems/posts/2024-07-01-full-finetuned-model-evaluation.html" [X Link](https://x.com/corbtt/status/1807779307198230877) 2024-07-01T14:12Z [----] followers, [----] engagements "Is anyone rebuilding Turbotax for the AI era I feel like the ideal UX is to just give it last year's tax return an unorganized pile of every vaguely tax-related document I've been mailed and then a conversational interface where I can brain-dump random thoughts and it can tell me what I missed. Easily achievable with today's tech" [X Link](https://x.com/corbtt/status/1809679595156959539) 2024-07-06T20:03Z [----] followers, [----] engagements "Super excited that fine-tuning Claude is here BUT major caveat: unlike OpenAI and most open-source fine-tuning platforms to deploy a Claude fine-tune you have to pay by the GPU hour not by the token. Real bummer if you're just trying things out Why no S-LoRA You can now fine-tune Claude [--] Haikuour fastest and most cost-effective modelin Amazon Bedrock. https://t.co/VUkiKs6daA You can now fine-tune Claude [--] Haikuour fastest and most cost-effective modelin Amazon Bedrock. https://t.co/VUkiKs6daA" [X Link](https://x.com/corbtt/status/1811269830433218683) 2024-07-11T05:22Z [----] followers, [----] engagements "If you ever felt the need for an Extremely Overpowered fine-tuned model. we now support training GPT-4o in @OpenPipeAI. Please use responsibly. π" [X Link](https://x.com/anyuser/status/1813018434822971556) 2024-07-16T01:10Z 19K followers, 29.5K engagements "@aidan_mclau @OpenPipeAI you need beta access but if you contact your openai rep it should be pretty easy to get. π" [X Link](https://x.com/corbtt/status/1813020048895975499) 2024-07-16T01:17Z [----] followers, [----] engagements "@ankrgyl @HamelHusain all [--] tasks are prompts that our customers have deployed at scale (first with gpt-4 or gpt-4 turbo and later with openpipe). I'd say they're "medium-engineered"good enough to get reasonable results and deploy at scale at least" [X Link](https://x.com/corbtt/status/1814147899649097788) 2024-07-19T03:58Z 11.3K followers, [----] engagements "@yar_vol @OpenPipeAI Yes we'll def support serverless training and inference just like we do with every model on OpenPipe" [X Link](https://x.com/corbtt/status/1815472408809357801) 2024-07-22T19:42Z 11.7K followers, [---] engagements "A lot of folks don't realize that @OpenPipeAI supports fine-tuning all OpenAI models at cost. That means you can now fine-tune GPT-4o mini through OpenPipe for free with OpenAI's subsidy π" [X Link](https://x.com/corbtt/status/1815927848068342146) 2024-07-24T01:51Z 11.3K followers, 12.8K engagements "@HamelHusain Typescript is the nicer language but I don't see the ML ecosystem migrating in the short term and in the long term what matters is what's more ergonomic for the AI not humans. Their idea of ergonomic might be very different than ours. π" [X Link](https://x.com/corbtt/status/1817090507190804931) 2024-07-27T06:51Z [----] followers, [---] engagements "How did Apple make a 3B parameter model work extremely well Task-specific fine-tuned adapters. In the next few years the vast majority of LLM inference will happen on task-specific adapters not on prompted base models. Build with that in mind. Apple spilled the beans on Apple Intelligence Foundation Models (notes below): Architecture: Dense - decoder only transformer architecture RMSNorm & Query/ Key normalization GQA (w/ [--] KV heads) SwiGLU activation & RoPE (base_freq=500K for long context) Pre-training & https://t.co/7j5QVeTRXe Apple spilled the beans on Apple Intelligence Foundation Models" [X Link](https://x.com/corbtt/status/1818402679665865168) 2024-07-30T21:45Z 12K followers, 15.4K engagements "Gemma 2B is π₯ Been waiting for more competition in the SLM model sizes. This size class is where most future inference will happen leaning on small fine-tuned adapters that give it equivalent task-specific perf to models 100x the size. My analysis+updates for Gemma-2 2b: [--]. 2T tokens distilled from an unnamed model [--]. Flash Attention has softcapping support O(N) memory instead of O(N2) for bf16 [--]. Reminder - edit head_dim to [---] from [---] [--]. Distillation +7% acc in ablations [--]. Scope + Shield Long form: https://t.co/LPrK1La4Ws My analysis+updates for Gemma-2 2b: [--]. 2T tokens distilled from an" [X Link](https://x.com/corbtt/status/1818724940855886025) 2024-07-31T19:06Z 11.7K followers, [----] engagements "I will be taking a sabbatical from OpenPipe until tomorrow at 7:30am" [X Link](https://x.com/corbtt/status/1820708233075691794) 2024-08-06T06:27Z 10.7K followers, [----] engagements "Wait did OpenAI just drop the price of GPT-4o (the real one not mini) by 50% without even announcing it Like there is nothing on their blog/@OpenAIDevs just a new pricing page with $2.50/1M tokens listed. Bullish for OpenAI. π€―" [X Link](https://x.com/anyuser/status/1820910339388825762) 2024-08-06T19:50Z 19K followers, 53.6K engagements "unpopular opinion: openai tool call syntax has no reason to exist as an API-level construct and should instead have been implemented by asking the model to output structured JSON with tool name and arguments. So much unnecessary complexity downstream from the way they did that" [X Link](https://x.com/corbtt/status/1820987388774248659) 2024-08-07T00:56Z [----] followers, [----] engagements "@andrew_n_carr literally our main thing at openpipe. makes models so much better" [X Link](https://x.com/corbtt/status/1821077401574793544) 2024-08-07T06:54Z 10.7K followers, [--] engagements "- Google drops Gemini [---] price to $0.075 / 1M input tokens - Charges the same for fine-tuned and base model inference. Gemini [---] Flash is super high quality and now the cheapest fine-tunable model available anywhere by a huge margin. Good news for @GoogleAI developers: - Gemini [---] Flash price is now 70% lower ($0.075 / 1M) - Gemini [---] Flash tuning available to all - Added support for 100+ new languages in the API - AI Studio is available to all workspace customers - Much more : ) https://t.co/LmN4KiDBWq Good news for @GoogleAI developers: - Gemini [---] Flash price is now 70% lower ($0.075" [X Link](https://x.com/anyuser/status/1821926307086868629) 2024-08-09T15:07Z 19K followers, 29.6K engagements "So weve got [--] labs with models at the GPT-4 level and nobody has meaningfully surpassed it. OpenAI had to scrap/restart their GPT-5 training project bc this barrier has been harder to break than expected. For the next leap a lot of labs seem to be betting on something new. Lotta smart folks pivoting to reinforcement learning (the real thing not the RLHF weak-sauce version). You gotta let your model make contact with the real world test its ideas get bruised. The hard part here is making a good general-purpose reward model that isnt insanely sparse but maybe not impossible" [X Link](https://x.com/anyuser/status/1824567261065253306) 2024-08-16T22:01Z 19K followers, 111.1K engagements "Now that everyone has figured out how to effectively train/use models with 128K+ context lengths can we please kill tokenization once and for all and start training on bytes directly at least for text Tokenization issues are the source of so many headaches. π" [X Link](https://x.com/corbtt/status/1826316327193329929) 2024-08-21T17:51Z 10.4K followers, 15.4K engagements "@goodside Ok fine I will also accept training on bits" [X Link](https://x.com/corbtt/status/1826340737124073875) 2024-08-21T19:28Z 10.4K followers, [---] engagements "@winglian @HamelHusain They position it as helping with multi-GPU training. Are the improvements relevant to single-GPU training jobs as well" [X Link](https://x.com/corbtt/status/1827371199145636109) 2024-08-24T15:43Z 10.4K followers, [---] engagements "@asankhaya @OpenAI Love this. π Mind if we try fine-tuning some open source models on the same dataset as well I'm really curious to see how they'd compare" [X Link](https://x.com/corbtt/status/1828564046595215555) 2024-08-27T22:43Z 10.4K followers, [---] engagements "If you're looking for an 8B prompted model I highly recommend checking out Llama [---] Storm 8B. Outperforms Llama [---] Instruct and Hermes [--] 8B on our evals and the improvement on vibes is even larger. @akjindal53244 and the team are legit. π -.-- has arrived A new 8B parameter LLM that outperforms @Meta -.-- and ---.- across diverse benchmarks Our new 8B LLM pushes the boundaries of what's possible with smaller language https://t.co/tCBdLfoGCu π -.-- has arrived A new 8B parameter LLM that outperforms @Meta -.-- and ---.- across diverse benchmarks Our new 8B LLM pushes the boundaries of" [X Link](https://x.com/corbtt/status/1828893174313746847) 2024-08-28T20:31Z 10.4K followers, [----] engagements "@aidan_mclau I think there are potential advantages. Like OpenAI fo sho is using prefix caching for ChatGPT to control costs but they don't expose it via the API so competitors who want to build a chatbot pay more" [X Link](https://x.com/corbtt/status/1829691557165072638) 2024-08-31T01:23Z 10.4K followers, [----] engagements "People think OpenAI charging $2K a month for ChatGPT subscriptions would be crazy but I would pay $2K a month for a Cursor+Claude subscription if no cheaper alternatives were available. It isn't unreasonable if their new models are way better" [X Link](https://x.com/corbtt/status/1831753975420608814) 2024-09-05T17:59Z 10.4K followers, [----] engagements "I am working with @mattshumer_ to get to the bottom of what happened with Reflection. He is providing access to all serving code and weights with the goal of replicating the strong reasoning performance @ArtificialAnlys was able to see over the weekend. I will report back" [X Link](https://x.com/anyuser/status/1833209248236601602) 2024-09-09T18:21Z 19K followers, 133K engagements "Has anyone had good results using KTO to replace an SFT+DPO pipeline Seems promising but in practice we're seeing slightly worse results (relative to SFT+DPO) across most benchmarks" [X Link](https://x.com/corbtt/status/1833300786614571262) 2024-09-10T00:25Z 10.4K followers, [----] engagements "@mattshumer_ @ArtificialAnlys Have a lot of great folks who spent all day trying to reproduce this or if it doesn't reproduce understand as much as possible what went wrong. So far no success" [X Link](https://x.com/corbtt/status/1833392533004816722) 2024-09-10T06:30Z 10.4K followers, [----] engagements "@mattshumer_ @ArtificialAnlys Final report on Reflection-70B: after investigating I do not believe the a model that achieved the claimed benchmarks ever existed. It' very unclear to me where those numbers came from and I hope that Sahil/Matt will shed more light how this happened" [X Link](https://x.com/anyuser/status/1833633946644713582) 2024-09-10T22:29Z 19K followers, 22.6K engagements "@ethayarajh How do I set lambda_d Not seeing anything by that name in KTOConfig https://huggingface.co/docs/trl/main/en/kto_trainer#using-the-ktotrainer https://huggingface.co/docs/trl/main/en/kto_trainer#using-the-ktotrainer" [X Link](https://x.com/corbtt/status/1833650427008893311) 2024-09-10T23:35Z 10.4K followers, [--] engagements "@kadenbilyeu0 @CheatLayer @mattshumer_ @ArtificialAnlys Yep I offered to train a 405B version of their model using our OpenPipe platform under the assumption that there was something really special in the dataset they had put together. However they decided to release before I actually did any work on the project" [X Link](https://x.com/corbtt/status/1833673734127006175) 2024-09-11T01:07Z 10.5K followers, [---] engagements "This is strong evidence that Llama [---] with multimodal support is close to dropping π Mistral released Pixtral 12B Vision Language Model π₯ Some notes on the release: [--]. Text backbone: Mistral Nemo 12B [--]. Vision Adapter: 400M [--]. Uses GeLU (for vision adapter) & 2D RoPE (for vision encoder) [--]. Larger vocabulary - [------] [--]. Three new special tokens - img https://t.co/7f21NAqvsV Mistral released Pixtral 12B Vision Language Model π₯ Some notes on the release: [--]. Text backbone: Mistral Nemo 12B [--]. Vision Adapter: 400M [--]. Uses GeLU (for vision adapter) & 2D RoPE (for vision encoder) [--]. Larger" [X Link](https://x.com/corbtt/status/1833849901656236180) 2024-09-11T12:47Z 10.4K followers, [----] engagements "We're adding o1-preview and o1-mini-preview to @OpenPipeAI as relabeling models to improve your dataset right now If you fine-tune a smol model on o1 outputs you'll be able to pick up much of the improved quality without paying any more for inference than before. π" [X Link](https://x.com/corbtt/status/1834280498832826606) 2024-09-12T17:18Z 10.4K followers, [---] engagements "@aidan_mclau https://openpipe.ai/blog/mixture-of-agents https://openpipe.ai/blog/mixture-of-agents" [X Link](https://x.com/corbtt/status/1834302433075626041) 2024-09-12T18:45Z 10.7K followers, [---] engagements "Founder friends: do you promote a writing culture for internal comms at your company Have you found it to be an unlock For context OpenPipe is a [--] eng in-person team and internal comms are mostly oral. Works for us but wondering if we're missing something" [X Link](https://x.com/corbtt/status/1836026622564294703) 2024-09-17T12:57Z 11.3K followers, [---] engagements "I will consider it a great failure if OpenPipe ever shows up on a list of "fastest growing startups by headcount." One of the many places where less is more (see also: lines of code). I put together a new list of [--] of the fastest-growing startups (based on recent hiring rates) backed by top-tier funds. All have [--] employees and are hiring: https://t.co/SOf0wC41sP I put together a new list of [--] of the fastest-growing startups (based on recent hiring rates) backed by top-tier funds. All have [--] employees and are hiring: https://t.co/SOf0wC41sP" [X Link](https://x.com/corbtt/status/1836140988697555002) 2024-09-17T20:31Z 10.7K followers, [----] engagements "Direct Criteria Steering is a new way to steer LLMs developed by the research team at @OpenPipeAI It improves compliance with arbitrary user-defined criteria by 60-90% vs prompting. Well release research and lots more info soon. (still cooking but too excited not to share π«’)" [X Link](https://x.com/corbtt/status/1836445608766550445) 2024-09-18T16:42Z 10.4K followers, [----] engagements "@Sanket_goyallll Nah you definitely still want to fine-tune Gemini Flash to make sure it actually understands your task before throwing it at 38M posts. (Flash support in OpenPipe coming soon. π)" [X Link](https://x.com/corbtt/status/1836463552317776080) 2024-09-18T17:53Z 11.3K followers, [---] engagements "OpenAI free fine tuning continues If you run it through @OpenPipeAI you can also compare your results to OSS models and Gemini Flash (coming soon). π" [X Link](https://x.com/corbtt/status/1838731531633987687) 2024-09-25T00:05Z 10.4K followers, [---] engagements "Llama [---] fine-tuning notes/initial results. This is for the new text models nothing to share on multimodal just yet - 1B/3B models are very self-hostable You can easily run these on CPU in your own infra alongside the rest of your code. - 3B outperforms GPT-4o post-fine-tuning. (1B is slightly worse) - Vs Quen2.5: perf is very similar at comparable parameter counts. Qwen2.5 is maybe slightly better but I expect the ecosystem around Llama [---] to be stronger so will probably standardize on that" [X Link](https://x.com/corbtt/status/1839072182665883825) 2024-09-25T22:39Z 10.4K followers, [----] engagements "@deliprao Yes that's our thing Check out @OpenPipeAI https://openpipe.ai/ https://openpipe.ai/" [X Link](https://x.com/corbtt/status/1839667919325888533) 2024-09-27T14:06Z 15.4K followers, [----] engagements "Recently overheard a Groq employee: apparently their per-token costs are 1-2 orders of magnitude higher than what they charge and the new chip won't materially help. There's no credible plan to fix this. This is why they aren't raising rate limits. Very bearish" [X Link](https://x.com/anyuser/status/1839689546390495666) 2024-09-27T15:32Z 19K followers, 273.7K engagements "@kitledru @yangcullinan @tenstorrent Yep much better architectures for transformers are definitely possible. I don't know much about tenstorrent specifically" [X Link](https://x.com/corbtt/status/1839704779158880471) 2024-09-27T16:32Z 10.4K followers, [--] engagements "@j0hnparkhill probably not but some are further away than others" [X Link](https://x.com/corbtt/status/1839711861304144331) 2024-09-27T17:01Z 10.4K followers, [----] engagements "OpenAI To Grant 7% to Mr. Altman; Becomes Latest Stage Company Ever to Accept the YC Standard Deal" [X Link](https://x.com/corbtt/status/1839918188982874417) 2024-09-28T06:40Z 10.4K followers, [----] engagements "@simonw We support this for all open source models fine tuned on OpenPipe" [X Link](https://x.com/corbtt/status/1841235579947925870) 2024-10-01T21:55Z 10.5K followers, [---] engagements "Extremely disappointed to learn that we are not on the list of companies OpenAI has blacklisted to their investors. There are many other less prestigious blacklists out there so of course we're evaluating our options but OpenAI's is still the primary goal" [X Link](https://x.com/anyuser/status/1841928457917497441) 2024-10-03T19:49Z 19K followers, 26.1K engagements "@ankrgyl @cramforce This is 100% solved by the typescript-eslint/no-floating-promises rule what am I missing" [X Link](https://x.com/corbtt/status/1842237334991024248) 2024-10-04T16:16Z 10.4K followers, [---] engagements "@emollick If you believe there's a 75% chance that AGI is [--] years away and a 25% chance we hit a ceiling before that. It still might make sense to invest for the 25% case iff you don't believe anything you do now can affect your standing in the post-AGI world anyway" [X Link](https://x.com/corbtt/status/1843763902687130073) 2024-10-08T21:22Z 10.4K followers, [---] engagements "@pvncher yeah google really cooked with flash 8b" [X Link](https://x.com/corbtt/status/1844156197534040482) 2024-10-09T23:21Z 10.4K followers, [---] engagements "There is space for much better recommendation algorithms that (1) use far larger models than anyone in the space is using right now that are (2) fine-tuned per-user on the fly. Either existing platforms will adopt this or new ones will come in on top. Excellent startup opp. The YouTube video I want to watch is any highly rated 1hr long information dense lecture on anything esoteric and the algorithm just doesnt get it. Its too content-driven and too narrow-minded The YouTube video I want to watch is any highly rated 1hr long information dense lecture on anything esoteric and the algorithm" [X Link](https://x.com/corbtt/status/1844450734626177137) 2024-10-10T18:51Z [----] followers, [----] engagements "I wouldn't say that I love it when the $150B incumbent announces a product that's almost identical to our product from [--] months ago but I wouldn't say I'm particularly worried by it either. We just keep competing (and keep winning) you have to compete with labs head-on perplexity is better than searchgpt; cursor is better than canvas; openpipe is better than distillation; pinecone is better than assistants; and so on victory *is* possible dont listen to them just beat them you have to compete with labs head-on perplexity is better than searchgpt; cursor is better than canvas; openpipe is" [X Link](https://x.com/corbtt/status/1845601219995455654) 2024-10-13T23:03Z 11.3K followers, [----] engagements "Nice work @danielhanchen and the Unsloth team Fix is already live in production on OpenPipe. π Today were releasing a new method that improves the way everyone trains LLMs. There's a significant bug that causes loss miscalculations during training. Our Gradient Accumulation fix corrects the issue reducing L2 norm error by 10x. Blog details: https://t.co/GwOGPs62Yu https://t.co/BwCiBGTfmh Today were releasing a new method that improves the way everyone trains LLMs. There's a significant bug that causes loss miscalculations during training. Our Gradient Accumulation fix corrects the issue" [X Link](https://x.com/corbtt/status/1846241125759349007) 2024-10-15T17:26Z 11.3K followers, [----] engagements "@Teknium1 @huggingface @UnslothAI Fix is deployed in OpenPipe" [X Link](https://x.com/corbtt/status/1846268498642887070) 2024-10-15T19:14Z 11.3K followers, [---] engagements "Spoke to a space startup. They're betting in the post-Starship world launching a datacenter into a sun-synchronous orbit (reliable [--] hour solar) will be cheaper than building the same datacenter on earth because of the permitting+battery costs. Feels unlikely but intriguing" [X Link](https://x.com/anyuser/status/1846297109303382388) 2024-10-15T21:08Z 19K followers, 33.9K engagements "@paultoo Yeah I asked about that. The mass required for radiant cooling is always proportional to the mass of the solar array (since you just have to radiate out the waste heat from the electricity you collect) and with current tech is 1/3 the mass of the solar. Good enough" [X Link](https://x.com/corbtt/status/1846332197646786816) 2024-10-15T23:27Z [----] followers, [----] engagements "@catheryn_li @Zach_Kamran "that's my quant. my quantitative" π Congrats Cat and Zach" [X Link](https://x.com/corbtt/status/1846998362211729590) 2024-10-17T19:34Z [----] followers, [--] engagements "Why are all foundation model companies so good at making models yet so bad at naming things What's the next version going to be "Claude [---] (new_new)" Can we all just opt for sanity and call this "Claude 3.6" π I'm excited to share what we've been working on lately at Anthropic. - Computer use API - New Claude [---] Sonnet - Claude [---] Haiku Let's walk through everything: https://t.co/rpwHU6um4H I'm excited to share what we've been working on lately at Anthropic. - Computer use API - New Claude [---] Sonnet - Claude [---] Haiku Let's walk through everything: https://t.co/rpwHU6um4H" [X Link](https://x.com/anyuser/status/1848765392011018286) 2024-10-22T16:36Z 19K followers, 13.2K engagements "Just launched agent.exe a free open-source Mac/Windows/Linux app that lets you use Claude [---] Sonnet to control your computer This was a fun little project to explore the API and see what the model can do. Computer use is really coolI expect [----] will be the year of agents" [X Link](https://x.com/anyuser/status/1849124800838713844) 2024-10-23T16:24Z 19K followers, 643.8K engagements "Here's agent.exe booking travel on Google Flights. βClaude [---] definitely isn't perfectnote that it confidently chooses the wrong dates" [X Link](https://x.com/anyuser/status/1849126269398700541) 2024-10-23T16:30Z 19K followers, 43.5K engagements "All the code as well as a (still minimal) README for running the app is available here with an open source Apache [--] license. This is definitely still research-project-quality but would love to see more development happening on top https://github.com/corbt/agent.exe https://github.com/corbt/agent.exe" [X Link](https://x.com/anyuser/status/1849126899714555945) 2024-10-23T16:33Z 19K followers, 26.5K engagements "As a side note the new Claude [---] is incredible for coding as well. This is my first Electron app and Claude +Cursor could consistently build complex functionality across multiple files in a single shot. First time I've felt more like a manager than an engineer while coding" [X Link](https://x.com/anyuser/status/1849127639866626171) 2024-10-23T16:35Z 19K followers, 21.8K engagements "@keremk I was going to implement a "semi auto" mode where you have to manually approve each action but in practice it's mega slow to do anything so you can just hit the "stop" button if it seems like it's turning evil. πΏ" [X Link](https://x.com/corbtt/status/1849157034555707824) 2024-10-23T18:32Z 16.6K followers, [----] engagements "@teortaxesTex Anthropic has poached enough high-level OpenAI employees that they definitely know how to build an o1 if they want to. I doubt centralization would speed things up fwiw. Google was the only player in town for about a decade and moved extremely slowly for reasons" [X Link](https://x.com/corbtt/status/1849259353616052464) 2024-10-24T01:19Z 16.6K followers, [---] engagements "https://openpipe.ai/blog/hacker-news-rlhf-part-1 https://openpipe.ai/blog/hacker-news-rlhf-part-1" [X Link](https://x.com/corbtt/status/1851338058190467437) 2024-10-29T18:59Z 16.6K followers, [----] engagements "If your application has human feedback (regenerations user choices etc.) please DM me and Id love to chat about how we can use RLHF to improve your response quality significantly with the minimum marginal effort" [X Link](https://x.com/corbtt/status/1851338182463512659) 2024-10-29T18:59Z 11.3K followers, [----] engagements "@paulg This is a convincing argument to not vote for Trump (and one that I agree with). However it's not a convincing argument to vote for Harris. If you live in one of the [--] states that will definitely not decide the presidential election why not just vote third-party" [X Link](https://x.com/corbtt/status/1851347390919532940) 2024-10-29T19:36Z 16.6K followers, [---] engagements "@consolelogwill Ideally you decouple the "data" part from the "instruction" part of your prompt and just use the "data" part in your evals. Working on supporting this first-party in OpenPipe" [X Link](https://x.com/corbtt/status/1852146101945798846) 2024-11-01T00:30Z 11.3K followers, [--] engagements "If I want to run Llama [---] 1B locally on CPU what is the absolutely fastest inference stack available Doesn't need batching but needs to use every trick in the book to minimize latency" [X Link](https://x.com/corbtt/status/1854236795736883398) 2024-11-06T18:57Z 10.4K followers, [----] engagements "@OfficialLoganK I just went to try AI Studio again I clicked "Create API key" and got this error with no obvious next step to resolve. Any ideas" [X Link](https://x.com/corbtt/status/1854548752977293602) 2024-11-07T15:37Z 10.4K followers, [---] engagements "Qwen [---] Coder 32B is a π β Benchmarks at or above GPT-4 and Claude [---] β Subjectively feels fantastic for code (been trying it) β Fine-tunable on your own data on OpenPipe" [X Link](https://x.com/corbtt/status/1856838459081498750) 2024-11-13T23:16Z 11.7K followers, [----] engagements "@stride_zone is another OpenPipe user Noticing a theme AI/Crypto companies seem to be growing quickly. π Introducing @echosdotfun a new app by Stride contributors π Echos are AI agents with access to a crypto wallet and X account. Users can launch an Echo deploy memecoins send them to Echos and trade them as the Echo posts on X. Check it out π https://t.co/FpW2MFKVUA Echos Introducing @echosdotfun a new app by Stride contributors π Echos are AI agents with access to a crypto wallet and X account. Users can launch an Echo deploy memecoins send them to Echos and trade them as the Echo posts" [X Link](https://x.com/corbtt/status/1857136119931961779) 2024-11-14T18:58Z 11.7K followers, [--] engagements "@mattdsegal @vikhyatk Slightly more info in our docs https://docs.openpipe.ai/features/criteria/api#runtime-evaluation https://docs.openpipe.ai/features/criteria/api#runtime-evaluation" [X Link](https://x.com/corbtt/status/1857894917626016021) 2024-11-16T21:14Z 16.6K followers, [---] engagements "This may become an official Qwen-stan account. β Open source SOTA on code β Open source SOTA in general for 14B+ β Almost SOTA 14B β Works great for LM RM and classification tasks β SOTA open source multimodal" [X Link](https://x.com/anyuser/status/1858923694971625888) 2024-11-19T17:22Z 19K followers, 16.8K engagements "What is the current SOTA on language autoencoders Can you run lossy compression on a 20K-word Wikipedia article to give you an archive that's just a few KB in size but decompresses into text semantically indistinguishable from the original" [X Link](https://x.com/corbtt/status/1860091742655017255) 2024-11-22T22:43Z 10.4K followers, [----] engagements "@bnjmn_marie Is this compatible with efficient serving libraries like vLLM If so how does it impact throughput/latency" [X Link](https://x.com/corbtt/status/1861498851745734916) 2024-11-26T19:54Z 10.4K followers, [--] engagements "Ok I am terrible at sharing product updates here but we now support Llama [---] 1B and 3B (the best small LLMs) as well as Qwen [---] 72B and 32B Coder (the best open general and code-specific models) on OpenPipe" [X Link](https://x.com/corbtt/status/1864106265414074872) 2024-12-04T00:35Z 11.8K followers, [----] engagements "One of the new features I'm most excited about at OpenPipe is "criteria distillation". This allows you to distill an expensive LLM-as-judge criteria into a super fast cheap low-latency reward model that approximates the LLM-as-judge's outputs. DM for access" [X Link](https://x.com/corbtt/status/1864379895498858997) 2024-12-04T18:43Z 11.7K followers, [----] engagements "@_xjdr @jxmnop Has anyone on the team run public benchmarks and shared the results If not why not" [X Link](https://x.com/corbtt/status/1864427212486693315) 2024-12-04T21:51Z 10.5K followers, [----] engagements "------fine-tuning platform to kill openpipe actually doesn't end up killing any startups I don't actually think OpenAI's goal is to kill OpenPipe here but if it is they're doing a terrible job. π anthropic: has the single best model by wide margin clean chat interface; no bells and whistles claude is now berkeley's most eligible bachelor openai: has no leading model o1 too expensive; gpt-4o sucks puts tons of effort into killing startups ------fine-tuning platform anthropic: has the single best model by wide margin clean chat interface; no bells and whistles claude is now berkeley's most" [X Link](https://x.com/corbtt/status/1864429556649607171) 2024-12-04T22:00Z 11.8K followers, [----] engagements "SUPER PUMPED to announce that Gemini fine-tuning is available to all OpenPipe users Gemini Flash provides the lowest cost fine-tuning of any model in its quality class. Comparable to gpt-4o-mini but 4x cheaper inference and FREE fine-tuning" [X Link](https://x.com/corbtt/status/1864710832044573177) 2024-12-05T16:38Z 11.8K followers, [----] engagements "Meta just released Llama [---] 70Bthey claim benchmarks similar to Llama [--] 405B but in a model 20% the size. It's already available as a base model on OpenPipe and we'll release benchmarks as a fine-tuning base model soon. π«‘ As we continue to explore new post-training techniques today we're releasing Llama [---] a new open source model that delivers leading performance and quality across text-based use cases such as synthetic data generation at a fraction of the inference cost. https://t.co/BNoV2czGKL As we continue to explore new post-training techniques today we're releasing Llama [---] a new" [X Link](https://x.com/corbtt/status/1865108818134470962) 2024-12-06T18:59Z 11.7K followers, [----] engagements "OpenAI's Reinforcement Fine-Tuning (RFT) is far more data efficient than SFTcan generalize from 10-20 labeled examples. Huge deal bc as compute costs drop to [--] the pain of gathering high-quality training data is the biggest barrier to deploying AI. RFT needs much less of it" [X Link](https://x.com/anyuser/status/1865150484446876123) 2024-12-06T21:45Z 19K followers, 68.1K engagements "@NighttrekETH watch this space π" [X Link](https://x.com/corbtt/status/1865206104642457961) 2024-12-07T01:26Z 10.5K followers, [----] engagements "Btw you can view your training loss across open source models AND Gemini models on OpenPipe" [X Link](https://x.com/corbtt/status/1866145788344074255) 2024-12-09T15:40Z 11.8K followers, [----] engagements "Qwen [---] is trainable on OpenPipe Benchmarks of providers of Qwen2.5 a leading open-source model family π @alibaba_cloud's Qwen2.5 family of models includes Qwen2.5 72B Qwen2.5 Coder 32B and a range of smaller models including 1.5B and 0.5B models for edge use-cases. Qwen2.5 72B the flagship model is https://t.co/S8K8EZUKP5 Benchmarks of providers of Qwen2.5 a leading open-source model family π @alibaba_cloud's Qwen2.5 family of models includes Qwen2.5 72B Qwen2.5 Coder 32B and a range of smaller models including 1.5B and 0.5B models for edge use-cases. Qwen2.5 72B the flagship model is" [X Link](https://x.com/corbtt/status/1866258492044288095) 2024-12-09T23:07Z 11.7K followers, [----] engagements "What is A100/H100 availability like on the big clouds these days Possible to get an on-demand instance on AWS/GCP/Azure" [X Link](https://x.com/corbtt/status/1866958841080152265) 2024-12-11T21:30Z 10.6K followers, [----] engagements "We will out-ship the big labs because in this house we have BOTH an OpenAI and a Claude subscription. Sam Altman says when ChatGPT went down yesterday he had to work for [--] hours without it and he realized how reliant we are becoming on AI systems as a form of critical infrastructure https://t.co/IRnlb8AbCl Sam Altman says when ChatGPT went down yesterday he had to work for [--] hours without it and he realized how reliant we are becoming on AI systems as a form of critical infrastructure https://t.co/IRnlb8AbCl" [X Link](https://x.com/corbtt/status/1867460924862607811) 2024-12-13T06:45Z 10.6K followers, [----] engagements "There is an opportunity for someone to build a fantastic ChatGPT clone that is provider-agnostic. Your launch window is this week: Gemini [--] Pro launches next week and will be SOTA but no one is in the habit of using Gemini's first-party web UI yet" [X Link](https://x.com/anyuser/status/1867668555803898147) 2024-12-13T20:30Z 19K followers, 30.1K engagements "If you're building this you should def. include a side-by-side comparison view UI. Your marketing strategy is sharing screenshots of interesting outputs by new models contrasted with boring outputs (for the same prompt) for old models" [X Link](https://x.com/corbtt/status/1867684648232661058) 2024-12-13T21:34Z 11.8K followers, [----] engagements "@simonw We did an automated analysis of the tasks our users use OpenPipe for using LLMs to categorize each customer task. Will write this up as a blog post at some point. Note this may or may not be representative of the wider ecosystem" [X Link](https://x.com/anyuser/status/1868759955421249954) 2024-12-16T20:47Z 19K followers, 18.8K engagements "@aidan_mclau if o1+cursor agents can do this successfully I will merge" [X Link](https://x.com/corbtt/status/1869506567827255588) 2024-12-18T22:14Z 10.7K followers, [---] engagements "Now we know why OpenAI sat on strawberry/o1 for so long before releasing. It turns out that once you've seen the trick reproducing the results isn't so hard. Breaking news from Chatbot Arenaβ‘π€ @GoogleDeepMind's Gemini-2.0-Flash-Thinking debuts as #1 across ALL categories The leap from Gemini-2.0-Flash: - Overall: #3 #1 - Overall (Style Control): #4 #1 - Math: #2 #1 - Creative Writing: #2 #1 - Hard Prompts: #1 #1 https://t.co/cq2MRMbWZ1 Breaking news from Chatbot Arenaβ‘π€ @GoogleDeepMind's Gemini-2.0-Flash-Thinking debuts as #1 across ALL categories The leap from Gemini-2.0-Flash: - Overall:" [X Link](https://x.com/corbtt/status/1869799291205890366) 2024-12-19T17:37Z 10.7K followers, 10.1K engagements ""This shows that it's still feasible to create unsaturated interesting benchmarks that are easy for humans yet impossible for AI . We will have AGI when creating such evals becomes outright impossible." This is a reasonable testable definition of AGI imo. So is this AGI While the new model is very impressive and represents a big milestone on the way towards AGI I don't believe this is AGI -- there's still a fair number of very easy ARC-AGI-1 tasks that o3 can't solve and we have early indications that ARC-AGI-2 will remain So is this AGI While the new model is very impressive and represents a" [X Link](https://x.com/anyuser/status/1870174254194720812) 2024-12-20T18:27Z 19K followers, 48.9K engagements "@MSROakRidge @vikhyatk Source" [X Link](https://x.com/corbtt/status/1870182646703440257) 2024-12-20T19:01Z 10.7K followers, [--] engagements "@aidan_mclau The OpenPipe criteria feature could be defined as a workflow to force you to write instructions that unambiguously explain the properties you're looking for in an output to an LLM-as-judge. (And yes we're working on using RL to optimize against them) https://docs.openpipe.ai/features/criteria/overview https://docs.openpipe.ai/features/criteria/overview" [X Link](https://x.com/corbtt/status/1872470260613570976) 2024-12-27T02:31Z 16.6K followers, [---] engagements "@danfaggella Nah. We still play chess draw pictures and throw balls despite living in a world with machines that do all of those infinitely better. There will be a short-term adjustment and then people will keep doing things that make them feel happy and fulfilled" [X Link](https://x.com/corbtt/status/1872768021036712406) 2024-12-27T22:14Z 10.7K followers, [---] engagements "@aaron_defazio cc @bnjmn_marie I believe you tested it out some time ago" [X Link](https://x.com/corbtt/status/1872770214515097982) 2024-12-27T22:23Z 11.2K followers, [---] engagements "A few weeks ago OpenAI announced Reinforcement Fine-Tuning (RFT)a new way to adapt LLMs to complex tasks with very little training data. Heres a quick rundown of how it works why its a big deal and when you should use it. π§΅" [X Link](https://x.com/anyuser/status/1873864746023477482) 2024-12-30T22:52Z 19K followers, 266.9K engagements "Sometimes RFT can be a stepping stone: [--] Label 50-100 examples by hand train an RFT model. [--] Use that RFT model to machine-label more data. [--] Then fine-tune a simpler faster LLM via SFT. Best of both worlds" [X Link](https://x.com/corbtt/status/1873865838408982653) 2024-12-30T22:56Z 11.2K followers, [----] engagements "@girlbossintech Not confirmed but seems likely given how widely PPO is known to be used internally at OpenAI. They may have made tweaks to improve stability" [X Link](https://x.com/corbtt/status/1873891668279386390) 2024-12-31T00:39Z 11.2K followers, [----] engagements "@chiki_champat Yep still closed beta will go GA in January. Unsure whether Azure will host or not; they have rights to the research but may or may not want to build out the product" [X Link](https://x.com/corbtt/status/1873975965396984227) 2024-12-31T06:14Z 11.2K followers, [---] engagements "We are looking to hire a few more really strong engineers on either the ML or systems side. In just over a year we have built the world's best fine-tuning platform. We were the first to launch self-service preference tuning the first to integrate evals/data prep/fine-tuning into a cohesive experience and the first to build self-service learning from human feedback. We have the best customer list in the business from fast-growing startups like @WisprFlow to enormous enterprises. And we've done all this with a technical team of only [--] (including my co-founder and me). If you are a future" [X Link](https://x.com/corbtt/status/1874159180032205310) 2024-12-31T18:22Z 11.2K followers, 13.6K engagements "@OfficialLoganK Thanks @OfficialLoganK has been great working with you I should add to the list that we're the ONLY fine-tuning provider that lets you train OpenAI/open-source/Gemini models through one consistent interface" [X Link](https://x.com/corbtt/status/1874164452020674572) 2024-12-31T18:43Z 11.2K followers, [---] engagements "When I worked at @ycombinator I'd make a point of chatting with very successful founders and getting the "real" backstory not the polished PR one. There were always big mistakes self-doubt long periods lost in the wilderness. Success is only inevitable in retrospect. Marc Andreessen: Every innovator eventually starts to like the taste of their own blood Once something works the stories get retconned and adapted to say it was inevitable all along everybody always knew this was a good idea. The person has won all these awards and https://t.co/iPF63swp7A Marc Andreessen: Every innovator" [X Link](https://x.com/corbtt/status/1877818375407366365) 2025-01-10T20:42Z 11.2K followers, [----] engagements "It's clear in retrospect why OpenAI sat on strawberry/o1 for so long without publishing. Now that everyone has seen the trick replications are coming fast. Qwen AllenAI academic work like PRIME and now Microsoft are all getting impressive results. Microsoft presents rStar-Math Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking On the MATH benchmark it improves Qwen2.5-Math-7B from 58.8% to 90.0% and Phi3-mini-3.8B from 41.4% to 86.4% surpassing o1-preview by +4.5% and +0.9%. On the USA Math Olympiad https://t.co/eQtDaZWe5z Microsoft presents rStar-Math Small LLMs Can Master" [X Link](https://x.com/corbtt/status/1877822459178815702) 2025-01-10T20:58Z 11.3K followers, [----] engagements "Sharing an important lesson learned from working with hundreds of customers: theres a big difference in the right way to evaluate and fine-tune LLMs depending on whether your task has one right answer or many. RFT DPO RLHF evals all downstream of this π§΅" [X Link](https://x.com/anyuser/status/1879628481442836845) 2025-01-15T20:35Z 19K followers, 40.5K engagements "On the other hand freeform tasks have infinitely many correct outputsthink: - Summaries - Email drafts - Chatbots Here correctness is more subjective. Theres no single right answer and that affects how we measure success" [X Link](https://x.com/corbtt/status/1879628653925192066) 2025-01-15T20:36Z 12K followers, [----] engagements "To see how often each type appears in practice I analyzed [----] recent datasets on OpenPipe. 63% were freeform 37% were deterministic" [X Link](https://x.com/corbtt/status/1879628787815727447) 2025-01-15T20:36Z 12K followers, [----] engagements "Ok lets talk about why this matters Key difference #1: Ideal temperature settings. Deterministic tasks usually need temperature=0 for consistent correct outputs. Freeform tasks can benefit from higher temperatures (0.71.0) to foster creativity and variety. (h/t @eugeneyan wrote about this briefly in https://eugeneyan.com/writing/prompting/#selecting-a-temperature https://eugeneyan.com/writing/prompting/#selecting-a-temperature" [X Link](https://x.com/corbtt/status/1879629001414902110) 2025-01-15T20:37Z 11.3K followers, [----] engagements "Key difference #2: Evaluations. - Deterministic tasks can leverage golden datasets with known-correct outputs for straightforward scoring. - Freeform tasks often rely on vibe checks LLM-as-judge user feedback or business metrics to gauge quality" [X Link](https://x.com/corbtt/status/1879629133145317769) 2025-01-15T20:38Z 11.3K followers, [----] engagements "Ok annoyingly X doesn't allow polls in replies So I guess respond or dm me with your answers pls π From the diagram above: [--]. Do you understand what OpenPipe does [--]. Would you use OpenPipe" [X Link](https://x.com/corbtt/status/1880044034984669330) 2025-01-17T00:06Z 11.9K followers, [---] engagements "I mean arguably if you are naming your product literal "Sauron" you should have read a little more LOTR and re-thought that" [X Link](https://x.com/anyuser/status/1880056793319231817) 2025-01-17T00:57Z 19K followers, [----] engagements "Reasoning is a general purpose technology. Deepseek-R1 was trained to reason on math and code problems. But it improved on [--] / [--] benchmarks including the majority of non math/code tasks We can expect reasoning improvements to help across the board" [X Link](https://x.com/anyuser/status/1881507480578257175) 2025-01-21T01:01Z 19K followers, 31.3K engagements "Recently got pitched by a startup building a "copilot Twitter influencer" that suggests posts you can make on X to build influence and gain followers. I'm trying to triangulate the Overton window here. Would you use a product like this no ick yes cool (show results) no ick yes cool (show results)" [X Link](https://x.com/corbtt/status/1882204810990276709) 2025-01-22T23:12Z 11.8K followers, [----] engagements "Big news: we've figured out how to train models 80-90% cheaper than before. Cheaper than renting your own GPUs. Cheaper than any other service. And [--] quality regression. Super proud of the team on this one. New pricing is now live" [X Link](https://x.com/anyuser/status/1882477398534336880) 2025-01-23T17:16Z 19K followers, 91.9K engagements "@xlr8harder Europe's secret plan is to be the natural wildlife preserve for homo sapiens when ASI paperclips the rest of the planet. Genius strategy tbh" [X Link](https://x.com/corbtt/status/1882503636527587431) 2025-01-23T19:00Z 11.8K followers, [---] engagements "@stochasticchasm Yeah fair. RFT as presented by OpenAI uses verifiable rewards but there's no reason it has to. The overlap is in the technique of allowing an unconstrained chain of thought and then assessing the reward on just a separate final output" [X Link](https://x.com/corbtt/status/1882844636869439772) 2025-01-24T17:35Z 11.8K followers, [--] engagements "@bradthilton @jxmnop Yeah this seems the obvious next thing kinda surprised Deepseek didn't even bring the idea up in their paper" [X Link](https://x.com/corbtt/status/1882886189054890237) 2025-01-24T20:20Z 11.8K followers, [--] engagements "@winglian @natolambert Yep that's exactly what process reward models are supposed to do. Although there's some question more recently about whether they're actually necessary given DeepSeek's sucess without one. I suspect rewarding on outcomes with a mild length penalty should work in practice" [X Link](https://x.com/corbtt/status/1885826302013567264) 2025-02-01T23:03Z 11.8K followers, [---] engagements "@willccbb @hallerite I actually made a PR to verl last week that I *think* should get you most of what you want I agree that a python API is much more convenient than bash flags or a yaml config dunno why the later two seem so much more popular. https://github.com/volcengine/verl/pull/162 https://github.com/volcengine/verl/pull/162" [X Link](https://x.com/corbtt/status/1885828652879024210) 2025-02-01T23:12Z 11.8K followers, [---] engagements "@BigLawNoMaw @sc_cath An asset doesn't have to be liquid to have an NPV You can calculate NPV by looking at the cost of buying an annuity with a similar payout schedule" [X Link](https://x.com/corbtt/status/1887209998742397126) 2025-02-05T18:41Z 11.8K followers, [--] engagements "I have never been more excited by our product roadmap. Every agent is going to be trained using RL. And the best ones (outside the frontier labs) are going to be trained on OpenPipe" [X Link](https://x.com/corbtt/status/1887301377799635042) 2025-02-06T00:44Z 12K followers, [----] engagements "@RobertHaisfield @togethercompute @FireworksAI_HQ collect R1 outputs - distill into smolR1 with OpenPipe π" [X Link](https://x.com/corbtt/status/1887327783715770678) 2025-02-06T02:29Z 11.9K followers, [---] engagements "π΅ Can smaller open-weight models match state-of-the-art reasoning performance We investigated using GRPO on "Temporal Clue" surpassing R1 o1 and o3-miniand nearly matching Sonnet [---] at over 100x lower inference cost. Here's how: π (1/6)" [X Link](https://x.com/anyuser/status/1897735437340627405) 2025-03-06T19:46Z 19K followers, 55.3K engagements ""Temporal Clue" is a challenging logic puzzle inspired by the classic board game Clueexpanded to include "when" and "why." Perfectly suited for benchmarking LLM reasoning skills it exposed strengths and weaknesses in top models. (3/6)" [X Link](https://x.com/corbtt/status/1897735685131714738) 2025-03-06T19:47Z 11.9K followers, [----] engagements "By iteratively refining models using GRPO and torchtune we improved accuracy to approach Claude Sonnet [---] significantly outperforming popular models like DeepSeek R1 OpenAI's o1 and o3-mini. (4/6)" [X Link](https://x.com/corbtt/status/1897735927763812428) 2025-03-06T19:47Z 11.9K followers, [----] engagements "@htahir111 put together a really thoughtful guide on how to use ZenML with OpenPipe to build really high-quality models. Excited to partner up @zenml_io π@OpenPipeAI integration for #LLM fine-tuning in production After meeting the OpenPipe team in NY at the AI Engineer Summit (@swyx) last month I was inspired by their vision for making LLM fine-tuning accessible. Shout out to @corbtt @ReidMayo @dvdcrbt What https://t.co/hsqRJlhFDw @zenml_io π@OpenPipeAI integration for #LLM fine-tuning in production After meeting the OpenPipe team in NY at the AI Engineer Summit (@swyx) last month I was" [X Link](https://x.com/corbtt/status/1902049963129106571) 2025-03-18T17:30Z 12K followers, [---] engagements "If you're fine-tuning LLMs Gemma [--] is the new π and it's not close. Gemma [--] trounces Qwen/Llama models at every size - Gemma [--] 4B beats 7B/8B competition - Gemma [--] 27B matches 70B competiton Vision benchmarks coming soon" [X Link](https://x.com/anyuser/status/1903121177490686042) 2025-03-21T16:27Z 19K followers, 37.3K engagements "2 GPU efficiency: Typical RL rollouts can leave GPUs idle waiting for external tasks. ART separates frontend (rollouts reward logic) and backend (inference training) allowing parallelized execution and higher GPU utilization" [X Link](https://x.com/corbtt/status/1911848107668897834) 2025-04-14T18:24Z 12.1K followers, [---] engagements "3 Seamless integration: Other RL trainers require substantial refactoring to fit existing codebases. ART is designed for plug-and-play compatibility easing integration with tools like CrewAI and the OpenAI Agents SDK" [X Link](https://x.com/corbtt/status/1911848229861466610) 2025-04-14T18:25Z 12.1K followers, [---] engagements "@benderville @vllm_project @huggingface @UnslothAI this is an open source project not currently available on our managed service on http://openpipe.ai http://openpipe.ai" [X Link](https://x.com/corbtt/status/1912650859177230566) 2025-04-16T23:34Z 16.6K followers, [--] engagements "πMeet ARTEour open-source RL-trained email research agent that searches your inbox and answers questions more accurately faster and cheaper than o3. Let's go deeper on how we built it. π§΅" [X Link](https://x.com/anyuser/status/1917269992363680054) 2025-04-29T17:29Z 19K followers, 161.4K engagements "The results exceeded expectations: ARTE surpasses o3 on accuracy slashes latency [--] and cuts costs [--]. Turns out RL works really well" [X Link](https://x.com/corbtt/status/1917270273591833040) 2025-04-29T17:30Z 12.7K followers, [----] engagements "Whered the data come from [---] K Enron emails π. We sampled inboxes and used GPT-4.1 to spin up realistic Q/A pairsbecause the perfect dataset didnt exist so we made it up synthetically π" [X Link](https://x.com/corbtt/status/1917270387223846935) 2025-04-29T17:30Z 12.6K followers, [----] engagements "@TheZachMueller interesting to see how it benchmarks I assume that's how they implement the thinking_budget that they benchmark with but not sure. Unfortunately for prod use vllm or similar is probably a requirement" [X Link](https://x.com/corbtt/status/1917562379740815600) 2025-04-30T12:51Z 14.4K followers, [--] engagements "Agentic RAG works better than semantic search RAG and it's not even close. Will post results soon" [X Link](https://x.com/anyuser/status/1920247369448436199) 2025-05-07T22:40Z 19K followers, 138.2K engagements "why is o3 still so confidently wrong" [X Link](https://x.com/corbtt/status/1924349246805053595) 2025-05-19T06:19Z 13K followers, [----] engagements "RL twitter has anyone reproduced the positive results from clip-higher in DAPO Just got around to trying it but using their epsilon values of 0.2/0.28 gets me significantly worse results than the baseline 0.2/0.2. @willccbb @kalomaze @casper_hansen_ @rosmine_b @hallerite @danielhanchen @QGallouedec @brendanh0gan" [X Link](https://x.com/corbtt/status/1925236439593492985) 2025-05-21T17:05Z 13.1K followers, [----] engagements ""RL from a single example works" "RL with random rewards works" "Base model pass@256 can match RL model pass@1" "RL updates a small % of params" Recent papers all point in the same direction: RL is mostly just eliciting latent behavior already learned in pretraining not teaching new behavior. Yann Lecun's "RL is the cherry on top" was right after all. Is this bearish for RL Perhaps not Maybe we should think about RL as the last mile of on-the-job training for your specific task. When you hire a customer support rep you start with someone who's already smart and capable. But you *also* watch" [X Link](https://x.com/anyuser/status/1927428584257261994) 2025-05-27T18:15Z 19K followers, 105.2K engagements "At a recent dinner I met a very senior engineer at one of the Big Four tech cos. His team develops tooling for a 0-engineer future. They're not allowed to tell anyone internally what they're working on to avoid mass panic. He figures mega layoffs start in [--] months" [X Link](https://x.com/anyuser/status/1927821116057309685) 2025-05-28T20:15Z 19K followers, 1.3M engagements "@willccbb @UnslothAI @danielhanchen @natolambert Oh for serving an autoregressive language model with many-token outputs you're probably still better off with vllm. This result just applies to sequence classification models (and I guess autoregressive models that you sample a single token from as well)" [X Link](https://x.com/corbtt/status/1932962370529472837) 2025-06-12T00:45Z 16K followers, [---] engagements "@garybasin I tried [---] o3 o3 pro opus [--] and gemini [---] pro before tweeting π maybe I should make a meta post showing all their threads once I've posted my hand-created one" [X Link](https://x.com/corbtt/status/1933201495010484415) 2025-06-12T16:35Z 14.1K followers, [---] engagements "We've used RL to train hundreds of models on dozens of tasks at @OpenPipeAI. Here's everything I know about "reward hacking": π§΅" [X Link](https://x.com/anyuser/status/1933235820955381789) 2025-06-12T18:51Z 19K followers, 23.6K engagements "@VictorTaelin the right technical solution here is for everyone to walk around with a camera strapped to their face that records and transcribes everything you see and lets you query it with natural language. unclear how many years (generations) before socially acceptable tho" [X Link](https://x.com/corbtt/status/1934363083973238886) 2025-06-15T21:31Z 14.2K followers, [----] engagements "Hot take: at current token prices you should *always* ask your LLM-as-judge to explain its CoT first before answering. Makes them way easier to debug when it inevitably make a judgement you disagree with" [X Link](https://x.com/corbtt/status/1935061614149128616) 2025-06-17T19:46Z 14.1K followers, [----] engagements "An LLM should be able to change a diaper plan an invasion butcher a hog conn a ship design a building write a sonnet balance accounts build a wall set a bone comfort the dying take orders give orders cooperate act alone solve equations analyze a new problem pitch manure program a computer cook a tasty meal fight efficiently die gallantly. Specialization is for insects. if this resonates with you you are probably not trying to do something complex enough for it to matter. which is fine actually. but there are edgy edge cases that live among harder problems. and when it comes to those some" [X Link](https://x.com/corbtt/status/1935806625945907625) 2025-06-19T21:07Z 14.1K followers, [----] engagements "GRPO quirk that contradicted my intuition: If you train on a group with rewards [--] [--] [--] [--] And then you train on another group with rewards [----] [----] [----] [--] Because of how GRPO normalizes within groups the last trajectory will be equally reinforced in both cases" [X Link](https://x.com/anyuser/status/1935810380850511945) 2025-06-19T21:22Z 19K followers, 85.2K engagements Limited data mode. Full metrics available with subscription: lunarcrush.com/pricing
@corbtt Kyle CorbittKyle Corbitt posts on X about model, ai, open ai, in the the most. They currently have [------] followers and [---] posts still getting attention that total [-----] engagements in the last [--] hours.
Social category influence technology brands stocks finance social networks countries travel destinations vc firms
Social topic influence model, ai, open ai #1455, in the, agents, if you, art, agentic, inference, how to
Top accounts mentioned or mentioned by @openpipeai @coreweave @hamelhusain @mattshumer @officiallogank @weightsbiases @willccbb @aidanmclau @artificialanlys @unslothai @eugeneyan @hallerite @casperhansen @databricks @simonw @danielhanchen @yarvol @vikhyatk @jefrankle @huggingface
Top assets mentioned Alphabet Inc Class A (GOOGL) Microsoft Corp. (MSFT) Frontier (FRONT)
Top posts by engagements in the last [--] hours
"The [----] crypto boom felt really scammy -- price mostly driven by ponzi-style promos and ICOs. There's still some of that today (memecoins most NFTs) but ecosystem feels way healthier overall. I'm now pretty bullish on crypto's long-term impact"
X Link 2021-04-20T17:16Z 10.4K followers, [--] engagements
"Gotta do a better job of sharing what we're working on. π Shipped last week: automatically convert your prompts between GPT Claude 1/2 and Llama [--] syntax. Sign up at to convert and benchmark your own prompts https://openpipe.ai/ Curious about Llama [--] Here's a fun feature we shipped last week: automatically convert your GPT-3.5 prompt to the Llama [--] format with best practices Play with Llama [--] Claude [--] and GPT models at https://t.co/ySUpFCX8N4 π https://t.co/5AuYe69j6D https://openpipe.ai/ Curious about Llama [--] Here's a fun feature we shipped last week: automatically convert your GPT-3.5"
X Link 2023-08-05T17:20Z 10.4K followers, [---] engagements
"Just officially launched OpenPipe as a YC company. DM me if you're interested in converting your expensive LLM prompt into a cheap reliable fine-tuned model. https://www.ycombinator.com/launches/JMa-openpipe-convert-expensive-llm-prompts-into-fast-cheap-fine-tuned-models https://www.ycombinator.com/launches/JMa-openpipe-convert-expensive-llm-prompts-into-fast-cheap-fine-tuned-models"
X Link 2023-08-28T15:39Z 11.3K followers, 13.2K engagements
"@Altimor Sounds like we should talk. π https://openpipe.ai/source=x https://openpipe.ai/source=x"
X Link 2023-08-28T23:50Z 10.4K followers, [---] engagements
"@tszzl The Saudis would like a word"
X Link 2023-12-11T04:55Z [----] followers, [---] engagements
"We β₯ @helicone_ai"
X Link 2023-12-12T04:47Z [----] followers, [---] engagements
"@arnabmanna619 Sure There are a lot of tricks but the basic idea is (1) run your prompt on several hundred inputs and store the outputs. (2) Use the inputs+outputs to fine-tune a LoRA (optionally on a smaller base model) and use that for future inference"
X Link 2023-12-26T18:32Z [----] followers, [---] engagements
"@TheRealAneesh @Teknium1 How does unsloth compare perf-wise to SFTTrainer/Axolotl Creator has big claims but haven't heard much external corroboration"
X Link 2024-01-07T08:04Z [--] followers, [---] engagements
"There's a lot of noise in the LLMOps space around tooling that doesn't solve a real problem. @OpenPipeAI does @josedlpuente Honestly its almost all in house stuff. Nothing in LLM ops has really impressed me so far (except @OpenPipeAI ) and even if it was good it wasnt around when we started building @josedlpuente Honestly its almost all in house stuff. Nothing in LLM ops has really impressed me so far (except @OpenPipeAI ) and even if it was good it wasnt around when we started building"
X Link 2024-01-15T19:18Z [----] followers, [---] engagements
"Fine-tuning isnt the future anymore its the present. We 10xd in both unique customers and completions served in January. Now generating over 1M completions a day and ramping quickly"
X Link 2024-02-12T19:03Z 11.3K followers, [----] engagements
"@yar_vol @vikhyatk @OpenPipeAI We haven't posted our internal analysis of Mixtral yet but fine-tuned Mistral is much stronger than GPT-3.5 and Mixtral is stronger than Mistral. https://openpipe.ai/blog/mistral-7b-fine-tune-optimized https://openpipe.ai/blog/mistral-7b-fine-tune-optimized"
X Link 2024-02-26T04:46Z 14.3K followers, [---] engagements
"@rishdotblog Productized on @OpenPipeAI https://docs.openpipe.ai/features/pruning-rules https://docs.openpipe.ai/features/pruning-rules"
X Link 2024-03-08T01:33Z 10.4K followers, [---] engagements
"The formerly hard parts of fine-tuning are fully automated. Need a ton of data β
Synthetic data generation Data prep/cleaning β
Use LLM to filter and relabel Evaluations β
LLM-as-judge This works. You can build a SOTA model for your task in an hour not a month"
X Link 2024-03-13T20:58Z [----] followers, [---] engagements
"@khanshq @OpenPipeAI Yes"
X Link 2024-03-20T18:04Z [----] followers, [---] engagements
"HN putting out some real bangers this morning"
X Link 2024-03-23T15:12Z [----] followers, [----] engagements
"@MistralAI live-dropping their new 7B model 32K context and improved benchmarks"
X Link 2024-03-23T18:43Z 10.4K followers, [---] engagements
"Spoke to a Microsoft engineer on the GPT-6 training cluster project. He kvetched about the pain they're having provisioning infiniband-class links between GPUs in different regions. Me: "why not just colocate the cluster in one region" Him: "Oh yeah we tried that first. We can't put more than 100K H100s in a single state without bringing down the power grid." π€―"
X Link 2024-03-25T22:38Z 19K followers, 1.9M engagements
"@Jessassin my general understanding of the business model is "whoever builds agi first wins the whole game." you can agree with them or not but openai really does believe they're playing for all the marbles here"
X Link 2024-03-26T05:04Z 19K followers, 99.9K engagements
"@AnthropicAI be cooking. I'm old enough to remember when "why would anyone want to use the second-best AI assistant" was OpenAI's argument. π The king is dead RIP GPT-4 Claude opus #1 ELo Haiku beats GPT-4 [----] & Mistral large Thats insane for how cheap & fast it is https://t.co/fAwzJScLTH The king is dead RIP GPT-4 Claude opus #1 ELo Haiku beats GPT-4 [----] & Mistral large Thats insane for how cheap & fast it is https://t.co/fAwzJScLTH"
X Link 2024-03-26T23:57Z [----] followers, [---] engagements
"@iammaestro04 @burny_tech openai is playing for all the marbles"
X Link 2024-03-27T00:05Z [----] followers, [---] engagements
""Different team members threw out ideas in Slack for how to use the remaining week of computer power. One idea was. a much smaller version for hobbyists to play with" There's still time @databricks Show us what a DBRX-7B can do @jefrankle https://www.wired.com/story/dbrx-inside-the-creation-of-the-worlds-most-powerful-open-source-ai-model/ https://www.wired.com/story/dbrx-inside-the-creation-of-the-worlds-most-powerful-open-source-ai-model/"
X Link 2024-03-27T14:43Z [----] followers, [----] engagements
"Lots of @databricks π on this I'm optimistic"
X Link 2024-03-28T04:04Z [----] followers, [---] engagements
"@jefrankle @mejia_petit @databricks @code_star @mvpatel2000 For our use cases 7B and 13B are definitely sweet spots. Not a huge need for 34B imho unless it can beat Mixtral at a range of tasks"
X Link 2024-03-28T05:26Z [----] followers, [---] engagements
"scoop: @jefrankle is going to give the @DbrxMosaicAI team the day off and hero-run a new state-of-the-art 7B model on the @databricks cluster. you heard it here first π"
X Link 2024-03-28T05:37Z [----] followers, [----] engagements
"@jefrankle @code_star @databricks A magnetized needle and a steady hand"
X Link 2024-03-28T05:50Z [----] followers, [--] engagements
"@OfficialLoganK has been a fantastic resource for us I highly recommend taking his money if you have the option π Ive invested in 30+ AI startups in the last year and a half if youre working on something cool and want to chat my DMs are open. Its time to build Ive invested in 30+ AI startups in the last year and a half if youre working on something cool and want to chat my DMs are open. Its time to build"
X Link 2024-03-29T04:01Z 11.3K followers, [---] engagements
"An AI-empowered employee has a vastly higher skill floor than a non-AI-empowered employee. Just asked an engineer with [--] marketing experience to set up our Hubspot so to send a product newsletter to all our current and future users. Pre-AI would not have been worth the ramp to get him familiar with Hubspot. Now he can just ask GPT-4 the top-level questions about best practices and get it working in an hour"
X Link 2024-04-02T20:50Z [----] followers, [----] engagements
"@consolelogwill @lgrammel Very likely yes our completions endpoint is OpenAI compatible. If you have any issues DM me and we'll take a look"
X Link 2024-04-03T18:42Z [----] followers, [---] engagements
"2025: it's now considered good manners to add subtle typos and grammar errors to your emails. signals a real human spent time on it. 2026: all frontier models are now RLHF'd to add typos and grammar errors"
X Link 2024-04-07T05:55Z [----] followers, [----] engagements
"Making tool calls a first-class citizen in the OpenAI API was a mistake. JSON mode can do everything tool calls can and more and is conceptually simpler"
X Link 2024-04-10T00:26Z [----] followers, [----] engagements
"This is how foundation model companies build a data flywheel that makes it hard to keep up. You'd better believe that OpenAI is using GPT-5 to filter and synthesize training data for GPT-6 already"
X Link 2024-04-18T17:21Z 19K followers, 41K engagements
"In a year you'll be able to directly prompt your social media feeds. "Yes I know I spent [--] seconds looking at that picture. No that doesn't mean I only want to see thirst traps for the next [--] days." If the platforms don't build this in directly someone will come in and build it on top"
X Link 2024-04-23T05:38Z [----] followers, [--] engagements
"In a year you'll be able to directly prompt your social media feeds. "Yes I know I spent [--] seconds looking at that picture. No that doesn't mean I only want to see thirst traps for the next [--] days." Platforms should build this directly or someone will build it on top"
X Link 2024-04-23T05:39Z [----] followers, [---] engagements
"@4evaBehindSOTA @abacaj So I will 100% grant that if you spend enough time with hparam sweeps and data filtering/augmentation you can get a better model than what we give you at openpipe. But I still think getting you 80% of the way there for 5% of the effort is hugely valuable. π€·"
X Link 2024-05-07T04:49Z 10.6K followers, [--] engagements
"@altryne @Teknium1 @sama More likely explanation for (1) is that the model developer saw sam's tweet and decided to use the name for the lulz. It's true that using the openai tokenizer is an unusual choice for a non-openai model though"
X Link 2024-05-07T19:48Z [----] followers, [---] engagements
"OpenAI demo day is really cool and I love gpt-4 at half the price. BUT This updates me more towards "openai has hit a wall on frontier model performance". You don't spend your time messing around with cost optimization if you could be releasing the smartest model in the world instead. Hoping I'm wrong and they've got a "one more thing" event coming soon"
X Link 2024-05-13T18:37Z 19K followers, 141K engagements
"@raydelvecc sure if they're doing both that's fine. I want to see the Big Iron model tho"
X Link 2024-05-13T19:53Z [----] followers, 10.5K engagements
"@0mniusprime there is always a market for the smartest model as long as it's cheaper per-token than a similarly-smart human. just deploy it at $0.01/token or whatever until you can get the prices down"
X Link 2024-05-13T19:54Z [----] followers, [----] engagements
"@strickvl @HamelHusain @dan_s_becker Do you know what stack you'll use for the fine-tuning yet Would love to see you on @OpenPipeAI"
X Link 2024-06-03T03:58Z 10.7K followers, [---] engagements
"Last month I predicted that OpenAI had hit a wall on frontier model performance and a lot of folks called it out as a bad take. Feeling pretty vindicated now -- rumors have shifted from "GPT-5 this summer" to "GPT-5 in December". Translation: whatever they tried in the last training run failed and they're starting over from scratch (with no guarantee the new run will work either). OpenAI demo day is really cool and I love gpt-4 at half the price. BUT This updates me more towards "openai has hit a wall on frontier model performance". You don't spend your time messing around with cost"
X Link 2024-06-04T17:24Z [----] followers, 21.2K engagements
"@swyx @HamelHusain bullish on I've used it for some lightweight data munging and works fairly well. Still reach for python for anything heavy though. I know @ankrgyl is a TS fan. Should we just quit our startups and spend [--] years building out the TS-data ecosystem π€π https://www.npmjs.com/package/nodejs-polars https://www.npmjs.com/package/nodejs-polars"
X Link 2024-06-05T01:51Z [----] followers, [---] engagements
"To be clear I don't think this is a permanent wall and I definitely think OpenAI will remain on the frontier. It's just turning out to be harder to make the next jump than a lot of us assumed"
X Link 2024-06-06T04:49Z [----] followers, [----] engagements
"My take as someone who doesn't fall into either the "Safety First" or e/acc camps: [--]. Leopold's essay is at a minimum a well-articulated case for taking a the AI capability explosion seriously. Low-brow dismissals like "he's just drawing lines on a graph bro" don't do it justice. [--]. The basic thesis is at least plausible. So far adding OOMs seems to mostly be working. There's no guarantee that scaling will keep working but also no strong reason to believe it won't. I think assuming that 5-6 more OOMs get us to AGI is not unreasonable. [--]. That said. 5-6 more OOMs is a lot. This is probably the"
X Link 2024-06-06T23:06Z [----] followers, [----] engagements
"How can Apple outperform Phi-3 with a much smaller model Easyby fine-tuning an adapter for each task This is the future. You won't ship prompts you'll ship adapters. Far better perf. @OpenPipeAI makes this as easy as prompting"
X Link 2024-06-11T14:20Z 11.3K followers, [----] engagements
"@JoshPurtell @OpenPipeAI Just use your task and try it out You can integrate OpenPipe and have a task-specific fine-tuned model to play around with in less than an hour of engineering time"
X Link 2024-06-12T14:37Z 10.6K followers, [--] engagements
"The modal use case for the largest most capable and most expensive models will be distilling training data for the smaller (but still very capable) models which will be what people actually use to get work done .@nvidia just released their own open-source model Nemotron-4 340B a family of open models that developers can use to generate synthetic data for training large language models (LLMs) for commercial applications across healthcare finance manufacturing retail and every .@nvidia just released their own open-source model Nemotron-4 340B a family of open models that developers can use to"
X Link 2024-06-14T21:07Z [----] followers, [----] engagements
"@nickwalton00 STAY TUNED"
X Link 2024-06-14T23:15Z 10.4K followers, [--] engagements
"@mgoin_ @jeremyphoward @neuralmagic What is the perf impact If I have a base model in fp8 and deploy a bf16 LoRA on top does it need to dynamically convert the weights one way or the other so they're compatible data types"
X Link 2024-06-18T22:35Z [----] followers, [--] engagements
"The MoA architecture is simple: generate [--] initial GPT-4 completions have GPT-4 reflect on them and then have GPT-4 produce a final output based on its deliberations"
X Link 2024-06-20T15:35Z [----] followers, 13.8K engagements
"@Yossi_Dahan_ That comparison is specifically an OpenPipe-served Llama [--] 8B vs GPT-4 Turbo"
X Link 2024-06-22T00:21Z 11.7K followers, [---] engagements
"@morgymcg @altryne @james_y_zou @thursdai_pod Our AlpacaEval results are actually with MoA on GPT-4 Turbo the fine tuning was separate. We were able to achieve [----] (also note we didn't overfit on the benchmark; just ran it at the end after the flow was worked out) https://tatsu-lab.github.io/alpaca_eval/ https://tatsu-lab.github.io/alpaca_eval/"
X Link 2024-06-22T22:04Z [----] followers, [--] engagements
"@mattzcarey @mattshumer_ @togethercompute We're actually serving it as a prepackaged endpoint https://openpipe.ai/blog/mixture-of-agents https://openpipe.ai/blog/mixture-of-agents"
X Link 2024-06-24T19:02Z 12.1K followers, [--] engagements
"Interesting mini-result that you need to know if you're trying to create the highest-quality fine-tuned LoRAs: you should ignore the loss on input tokens when training the model and only train your model on the completion token loss. easiest way to accomplish this: use @winglian's @axolotl_ai and set train_on_inputs: false. By doing this you're allowing the model to concentrate entirely on learning to produce the output at the cost of not learning to produce the input. Most frameworks do not support this Eg. the Huggingface trainer doesn't let you combine this training strategy with sample"
X Link 2024-06-27T14:37Z 11.3K followers, 11.5K engagements
"Fantastic blog series from @strickvl who found his fine-tuned @OpenPipeAI model outperformed GPT-4 (as well as other fine-tuned models) on response quality for a tiny fraction of the cost. Love it when our users share their success. π€© https://mlops.systems/posts/2024-07-01-full-finetuned-model-evaluation.html https://mlops.systems/posts/2024-07-01-full-finetuned-model-evaluation.html"
X Link 2024-07-01T14:12Z [----] followers, [----] engagements
"Is anyone rebuilding Turbotax for the AI era I feel like the ideal UX is to just give it last year's tax return an unorganized pile of every vaguely tax-related document I've been mailed and then a conversational interface where I can brain-dump random thoughts and it can tell me what I missed. Easily achievable with today's tech"
X Link 2024-07-06T20:03Z [----] followers, [----] engagements
"Super excited that fine-tuning Claude is here BUT major caveat: unlike OpenAI and most open-source fine-tuning platforms to deploy a Claude fine-tune you have to pay by the GPU hour not by the token. Real bummer if you're just trying things out Why no S-LoRA You can now fine-tune Claude [--] Haikuour fastest and most cost-effective modelin Amazon Bedrock. https://t.co/VUkiKs6daA You can now fine-tune Claude [--] Haikuour fastest and most cost-effective modelin Amazon Bedrock. https://t.co/VUkiKs6daA"
X Link 2024-07-11T05:22Z [----] followers, [----] engagements
"If you ever felt the need for an Extremely Overpowered fine-tuned model. we now support training GPT-4o in @OpenPipeAI. Please use responsibly. π"
X Link 2024-07-16T01:10Z 19K followers, 29.5K engagements
"@aidan_mclau @OpenPipeAI you need beta access but if you contact your openai rep it should be pretty easy to get. π"
X Link 2024-07-16T01:17Z [----] followers, [----] engagements
"@ankrgyl @HamelHusain all [--] tasks are prompts that our customers have deployed at scale (first with gpt-4 or gpt-4 turbo and later with openpipe). I'd say they're "medium-engineered"good enough to get reasonable results and deploy at scale at least"
X Link 2024-07-19T03:58Z 11.3K followers, [----] engagements
"@yar_vol @OpenPipeAI Yes we'll def support serverless training and inference just like we do with every model on OpenPipe"
X Link 2024-07-22T19:42Z 11.7K followers, [---] engagements
"A lot of folks don't realize that @OpenPipeAI supports fine-tuning all OpenAI models at cost. That means you can now fine-tune GPT-4o mini through OpenPipe for free with OpenAI's subsidy π"
X Link 2024-07-24T01:51Z 11.3K followers, 12.8K engagements
"@HamelHusain Typescript is the nicer language but I don't see the ML ecosystem migrating in the short term and in the long term what matters is what's more ergonomic for the AI not humans. Their idea of ergonomic might be very different than ours. π"
X Link 2024-07-27T06:51Z [----] followers, [---] engagements
"How did Apple make a 3B parameter model work extremely well Task-specific fine-tuned adapters. In the next few years the vast majority of LLM inference will happen on task-specific adapters not on prompted base models. Build with that in mind. Apple spilled the beans on Apple Intelligence Foundation Models (notes below): Architecture: Dense - decoder only transformer architecture RMSNorm & Query/ Key normalization GQA (w/ [--] KV heads) SwiGLU activation & RoPE (base_freq=500K for long context) Pre-training & https://t.co/7j5QVeTRXe Apple spilled the beans on Apple Intelligence Foundation Models"
X Link 2024-07-30T21:45Z 12K followers, 15.4K engagements
"Gemma 2B is π₯ Been waiting for more competition in the SLM model sizes. This size class is where most future inference will happen leaning on small fine-tuned adapters that give it equivalent task-specific perf to models 100x the size. My analysis+updates for Gemma-2 2b: [--]. 2T tokens distilled from an unnamed model [--]. Flash Attention has softcapping support O(N) memory instead of O(N2) for bf16 [--]. Reminder - edit head_dim to [---] from [---] [--]. Distillation +7% acc in ablations [--]. Scope + Shield Long form: https://t.co/LPrK1La4Ws My analysis+updates for Gemma-2 2b: [--]. 2T tokens distilled from an"
X Link 2024-07-31T19:06Z 11.7K followers, [----] engagements
"I will be taking a sabbatical from OpenPipe until tomorrow at 7:30am"
X Link 2024-08-06T06:27Z 10.7K followers, [----] engagements
"Wait did OpenAI just drop the price of GPT-4o (the real one not mini) by 50% without even announcing it Like there is nothing on their blog/@OpenAIDevs just a new pricing page with $2.50/1M tokens listed. Bullish for OpenAI. π€―"
X Link 2024-08-06T19:50Z 19K followers, 53.6K engagements
"unpopular opinion: openai tool call syntax has no reason to exist as an API-level construct and should instead have been implemented by asking the model to output structured JSON with tool name and arguments. So much unnecessary complexity downstream from the way they did that"
X Link 2024-08-07T00:56Z [----] followers, [----] engagements
"@andrew_n_carr literally our main thing at openpipe. makes models so much better"
X Link 2024-08-07T06:54Z 10.7K followers, [--] engagements
"- Google drops Gemini [---] price to $0.075 / 1M input tokens - Charges the same for fine-tuned and base model inference. Gemini [---] Flash is super high quality and now the cheapest fine-tunable model available anywhere by a huge margin. Good news for @GoogleAI developers: - Gemini [---] Flash price is now 70% lower ($0.075 / 1M) - Gemini [---] Flash tuning available to all - Added support for 100+ new languages in the API - AI Studio is available to all workspace customers - Much more : ) https://t.co/LmN4KiDBWq Good news for @GoogleAI developers: - Gemini [---] Flash price is now 70% lower ($0.075"
X Link 2024-08-09T15:07Z 19K followers, 29.6K engagements
"So weve got [--] labs with models at the GPT-4 level and nobody has meaningfully surpassed it. OpenAI had to scrap/restart their GPT-5 training project bc this barrier has been harder to break than expected. For the next leap a lot of labs seem to be betting on something new. Lotta smart folks pivoting to reinforcement learning (the real thing not the RLHF weak-sauce version). You gotta let your model make contact with the real world test its ideas get bruised. The hard part here is making a good general-purpose reward model that isnt insanely sparse but maybe not impossible"
X Link 2024-08-16T22:01Z 19K followers, 111.1K engagements
"Now that everyone has figured out how to effectively train/use models with 128K+ context lengths can we please kill tokenization once and for all and start training on bytes directly at least for text Tokenization issues are the source of so many headaches. π"
X Link 2024-08-21T17:51Z 10.4K followers, 15.4K engagements
"@goodside Ok fine I will also accept training on bits"
X Link 2024-08-21T19:28Z 10.4K followers, [---] engagements
"@winglian @HamelHusain They position it as helping with multi-GPU training. Are the improvements relevant to single-GPU training jobs as well"
X Link 2024-08-24T15:43Z 10.4K followers, [---] engagements
"@asankhaya @OpenAI Love this. π Mind if we try fine-tuning some open source models on the same dataset as well I'm really curious to see how they'd compare"
X Link 2024-08-27T22:43Z 10.4K followers, [---] engagements
"If you're looking for an 8B prompted model I highly recommend checking out Llama [---] Storm 8B. Outperforms Llama [---] Instruct and Hermes [--] 8B on our evals and the improvement on vibes is even larger. @akjindal53244 and the team are legit. π -.-- has arrived A new 8B parameter LLM that outperforms @Meta -.-- and ---.- across diverse benchmarks Our new 8B LLM pushes the boundaries of what's possible with smaller language https://t.co/tCBdLfoGCu π -.-- has arrived A new 8B parameter LLM that outperforms @Meta -.-- and ---.- across diverse benchmarks Our new 8B LLM pushes the boundaries of"
X Link 2024-08-28T20:31Z 10.4K followers, [----] engagements
"@aidan_mclau I think there are potential advantages. Like OpenAI fo sho is using prefix caching for ChatGPT to control costs but they don't expose it via the API so competitors who want to build a chatbot pay more"
X Link 2024-08-31T01:23Z 10.4K followers, [----] engagements
"People think OpenAI charging $2K a month for ChatGPT subscriptions would be crazy but I would pay $2K a month for a Cursor+Claude subscription if no cheaper alternatives were available. It isn't unreasonable if their new models are way better"
X Link 2024-09-05T17:59Z 10.4K followers, [----] engagements
"I am working with @mattshumer_ to get to the bottom of what happened with Reflection. He is providing access to all serving code and weights with the goal of replicating the strong reasoning performance @ArtificialAnlys was able to see over the weekend. I will report back"
X Link 2024-09-09T18:21Z 19K followers, 133K engagements
"Has anyone had good results using KTO to replace an SFT+DPO pipeline Seems promising but in practice we're seeing slightly worse results (relative to SFT+DPO) across most benchmarks"
X Link 2024-09-10T00:25Z 10.4K followers, [----] engagements
"@mattshumer_ @ArtificialAnlys Have a lot of great folks who spent all day trying to reproduce this or if it doesn't reproduce understand as much as possible what went wrong. So far no success"
X Link 2024-09-10T06:30Z 10.4K followers, [----] engagements
"@mattshumer_ @ArtificialAnlys Final report on Reflection-70B: after investigating I do not believe the a model that achieved the claimed benchmarks ever existed. It' very unclear to me where those numbers came from and I hope that Sahil/Matt will shed more light how this happened"
X Link 2024-09-10T22:29Z 19K followers, 22.6K engagements
"@ethayarajh How do I set lambda_d Not seeing anything by that name in KTOConfig https://huggingface.co/docs/trl/main/en/kto_trainer#using-the-ktotrainer https://huggingface.co/docs/trl/main/en/kto_trainer#using-the-ktotrainer"
X Link 2024-09-10T23:35Z 10.4K followers, [--] engagements
"@kadenbilyeu0 @CheatLayer @mattshumer_ @ArtificialAnlys Yep I offered to train a 405B version of their model using our OpenPipe platform under the assumption that there was something really special in the dataset they had put together. However they decided to release before I actually did any work on the project"
X Link 2024-09-11T01:07Z 10.5K followers, [---] engagements
"This is strong evidence that Llama [---] with multimodal support is close to dropping π Mistral released Pixtral 12B Vision Language Model π₯ Some notes on the release: [--]. Text backbone: Mistral Nemo 12B [--]. Vision Adapter: 400M [--]. Uses GeLU (for vision adapter) & 2D RoPE (for vision encoder) [--]. Larger vocabulary - [------] [--]. Three new special tokens - img https://t.co/7f21NAqvsV Mistral released Pixtral 12B Vision Language Model π₯ Some notes on the release: [--]. Text backbone: Mistral Nemo 12B [--]. Vision Adapter: 400M [--]. Uses GeLU (for vision adapter) & 2D RoPE (for vision encoder) [--]. Larger"
X Link 2024-09-11T12:47Z 10.4K followers, [----] engagements
"We're adding o1-preview and o1-mini-preview to @OpenPipeAI as relabeling models to improve your dataset right now If you fine-tune a smol model on o1 outputs you'll be able to pick up much of the improved quality without paying any more for inference than before. π"
X Link 2024-09-12T17:18Z 10.4K followers, [---] engagements
"@aidan_mclau https://openpipe.ai/blog/mixture-of-agents https://openpipe.ai/blog/mixture-of-agents"
X Link 2024-09-12T18:45Z 10.7K followers, [---] engagements
"Founder friends: do you promote a writing culture for internal comms at your company Have you found it to be an unlock For context OpenPipe is a [--] eng in-person team and internal comms are mostly oral. Works for us but wondering if we're missing something"
X Link 2024-09-17T12:57Z 11.3K followers, [---] engagements
"I will consider it a great failure if OpenPipe ever shows up on a list of "fastest growing startups by headcount." One of the many places where less is more (see also: lines of code). I put together a new list of [--] of the fastest-growing startups (based on recent hiring rates) backed by top-tier funds. All have [--] employees and are hiring: https://t.co/SOf0wC41sP I put together a new list of [--] of the fastest-growing startups (based on recent hiring rates) backed by top-tier funds. All have [--] employees and are hiring: https://t.co/SOf0wC41sP"
X Link 2024-09-17T20:31Z 10.7K followers, [----] engagements
"Direct Criteria Steering is a new way to steer LLMs developed by the research team at @OpenPipeAI It improves compliance with arbitrary user-defined criteria by 60-90% vs prompting. Well release research and lots more info soon. (still cooking but too excited not to share π«’)"
X Link 2024-09-18T16:42Z 10.4K followers, [----] engagements
"@Sanket_goyallll Nah you definitely still want to fine-tune Gemini Flash to make sure it actually understands your task before throwing it at 38M posts. (Flash support in OpenPipe coming soon. π)"
X Link 2024-09-18T17:53Z 11.3K followers, [---] engagements
"OpenAI free fine tuning continues If you run it through @OpenPipeAI you can also compare your results to OSS models and Gemini Flash (coming soon). π"
X Link 2024-09-25T00:05Z 10.4K followers, [---] engagements
"Llama [---] fine-tuning notes/initial results. This is for the new text models nothing to share on multimodal just yet - 1B/3B models are very self-hostable You can easily run these on CPU in your own infra alongside the rest of your code. - 3B outperforms GPT-4o post-fine-tuning. (1B is slightly worse) - Vs Quen2.5: perf is very similar at comparable parameter counts. Qwen2.5 is maybe slightly better but I expect the ecosystem around Llama [---] to be stronger so will probably standardize on that"
X Link 2024-09-25T22:39Z 10.4K followers, [----] engagements
"@deliprao Yes that's our thing Check out @OpenPipeAI https://openpipe.ai/ https://openpipe.ai/"
X Link 2024-09-27T14:06Z 15.4K followers, [----] engagements
"Recently overheard a Groq employee: apparently their per-token costs are 1-2 orders of magnitude higher than what they charge and the new chip won't materially help. There's no credible plan to fix this. This is why they aren't raising rate limits. Very bearish"
X Link 2024-09-27T15:32Z 19K followers, 273.7K engagements
"@kitledru @yangcullinan @tenstorrent Yep much better architectures for transformers are definitely possible. I don't know much about tenstorrent specifically"
X Link 2024-09-27T16:32Z 10.4K followers, [--] engagements
"@j0hnparkhill probably not but some are further away than others"
X Link 2024-09-27T17:01Z 10.4K followers, [----] engagements
"OpenAI To Grant 7% to Mr. Altman; Becomes Latest Stage Company Ever to Accept the YC Standard Deal"
X Link 2024-09-28T06:40Z 10.4K followers, [----] engagements
"@simonw We support this for all open source models fine tuned on OpenPipe"
X Link 2024-10-01T21:55Z 10.5K followers, [---] engagements
"Extremely disappointed to learn that we are not on the list of companies OpenAI has blacklisted to their investors. There are many other less prestigious blacklists out there so of course we're evaluating our options but OpenAI's is still the primary goal"
X Link 2024-10-03T19:49Z 19K followers, 26.1K engagements
"@ankrgyl @cramforce This is 100% solved by the typescript-eslint/no-floating-promises rule what am I missing"
X Link 2024-10-04T16:16Z 10.4K followers, [---] engagements
"@emollick If you believe there's a 75% chance that AGI is [--] years away and a 25% chance we hit a ceiling before that. It still might make sense to invest for the 25% case iff you don't believe anything you do now can affect your standing in the post-AGI world anyway"
X Link 2024-10-08T21:22Z 10.4K followers, [---] engagements
"@pvncher yeah google really cooked with flash 8b"
X Link 2024-10-09T23:21Z 10.4K followers, [---] engagements
"There is space for much better recommendation algorithms that (1) use far larger models than anyone in the space is using right now that are (2) fine-tuned per-user on the fly. Either existing platforms will adopt this or new ones will come in on top. Excellent startup opp. The YouTube video I want to watch is any highly rated 1hr long information dense lecture on anything esoteric and the algorithm just doesnt get it. Its too content-driven and too narrow-minded The YouTube video I want to watch is any highly rated 1hr long information dense lecture on anything esoteric and the algorithm"
X Link 2024-10-10T18:51Z [----] followers, [----] engagements
"I wouldn't say that I love it when the $150B incumbent announces a product that's almost identical to our product from [--] months ago but I wouldn't say I'm particularly worried by it either. We just keep competing (and keep winning) you have to compete with labs head-on perplexity is better than searchgpt; cursor is better than canvas; openpipe is better than distillation; pinecone is better than assistants; and so on victory is possible dont listen to them just beat them you have to compete with labs head-on perplexity is better than searchgpt; cursor is better than canvas; openpipe is"
X Link 2024-10-13T23:03Z 11.3K followers, [----] engagements
"Nice work @danielhanchen and the Unsloth team Fix is already live in production on OpenPipe. π Today were releasing a new method that improves the way everyone trains LLMs. There's a significant bug that causes loss miscalculations during training. Our Gradient Accumulation fix corrects the issue reducing L2 norm error by 10x. Blog details: https://t.co/GwOGPs62Yu https://t.co/BwCiBGTfmh Today were releasing a new method that improves the way everyone trains LLMs. There's a significant bug that causes loss miscalculations during training. Our Gradient Accumulation fix corrects the issue"
X Link 2024-10-15T17:26Z 11.3K followers, [----] engagements
"@Teknium1 @huggingface @UnslothAI Fix is deployed in OpenPipe"
X Link 2024-10-15T19:14Z 11.3K followers, [---] engagements
"Spoke to a space startup. They're betting in the post-Starship world launching a datacenter into a sun-synchronous orbit (reliable [--] hour solar) will be cheaper than building the same datacenter on earth because of the permitting+battery costs. Feels unlikely but intriguing"
X Link 2024-10-15T21:08Z 19K followers, 33.9K engagements
"@paultoo Yeah I asked about that. The mass required for radiant cooling is always proportional to the mass of the solar array (since you just have to radiate out the waste heat from the electricity you collect) and with current tech is 1/3 the mass of the solar. Good enough"
X Link 2024-10-15T23:27Z [----] followers, [----] engagements
"@catheryn_li @Zach_Kamran "that's my quant. my quantitative" π Congrats Cat and Zach"
X Link 2024-10-17T19:34Z [----] followers, [--] engagements
"Why are all foundation model companies so good at making models yet so bad at naming things What's the next version going to be "Claude [---] (new_new)" Can we all just opt for sanity and call this "Claude 3.6" π I'm excited to share what we've been working on lately at Anthropic. - Computer use API - New Claude [---] Sonnet - Claude [---] Haiku Let's walk through everything: https://t.co/rpwHU6um4H I'm excited to share what we've been working on lately at Anthropic. - Computer use API - New Claude [---] Sonnet - Claude [---] Haiku Let's walk through everything: https://t.co/rpwHU6um4H"
X Link 2024-10-22T16:36Z 19K followers, 13.2K engagements
"Just launched agent.exe a free open-source Mac/Windows/Linux app that lets you use Claude [---] Sonnet to control your computer This was a fun little project to explore the API and see what the model can do. Computer use is really coolI expect [----] will be the year of agents"
X Link 2024-10-23T16:24Z 19K followers, 643.8K engagements
"Here's agent.exe booking travel on Google Flights. βClaude [---] definitely isn't perfectnote that it confidently chooses the wrong dates"
X Link 2024-10-23T16:30Z 19K followers, 43.5K engagements
"All the code as well as a (still minimal) README for running the app is available here with an open source Apache [--] license. This is definitely still research-project-quality but would love to see more development happening on top https://github.com/corbt/agent.exe https://github.com/corbt/agent.exe"
X Link 2024-10-23T16:33Z 19K followers, 26.5K engagements
"As a side note the new Claude [---] is incredible for coding as well. This is my first Electron app and Claude +Cursor could consistently build complex functionality across multiple files in a single shot. First time I've felt more like a manager than an engineer while coding"
X Link 2024-10-23T16:35Z 19K followers, 21.8K engagements
"@keremk I was going to implement a "semi auto" mode where you have to manually approve each action but in practice it's mega slow to do anything so you can just hit the "stop" button if it seems like it's turning evil. πΏ"
X Link 2024-10-23T18:32Z 16.6K followers, [----] engagements
"@teortaxesTex Anthropic has poached enough high-level OpenAI employees that they definitely know how to build an o1 if they want to. I doubt centralization would speed things up fwiw. Google was the only player in town for about a decade and moved extremely slowly for reasons"
X Link 2024-10-24T01:19Z 16.6K followers, [---] engagements
"https://openpipe.ai/blog/hacker-news-rlhf-part-1 https://openpipe.ai/blog/hacker-news-rlhf-part-1"
X Link 2024-10-29T18:59Z 16.6K followers, [----] engagements
"If your application has human feedback (regenerations user choices etc.) please DM me and Id love to chat about how we can use RLHF to improve your response quality significantly with the minimum marginal effort"
X Link 2024-10-29T18:59Z 11.3K followers, [----] engagements
"@paulg This is a convincing argument to not vote for Trump (and one that I agree with). However it's not a convincing argument to vote for Harris. If you live in one of the [--] states that will definitely not decide the presidential election why not just vote third-party"
X Link 2024-10-29T19:36Z 16.6K followers, [---] engagements
"@consolelogwill Ideally you decouple the "data" part from the "instruction" part of your prompt and just use the "data" part in your evals. Working on supporting this first-party in OpenPipe"
X Link 2024-11-01T00:30Z 11.3K followers, [--] engagements
"If I want to run Llama [---] 1B locally on CPU what is the absolutely fastest inference stack available Doesn't need batching but needs to use every trick in the book to minimize latency"
X Link 2024-11-06T18:57Z 10.4K followers, [----] engagements
"@OfficialLoganK I just went to try AI Studio again I clicked "Create API key" and got this error with no obvious next step to resolve. Any ideas"
X Link 2024-11-07T15:37Z 10.4K followers, [---] engagements
"Qwen [---] Coder 32B is a π β
Benchmarks at or above GPT-4 and Claude [---] β
Subjectively feels fantastic for code (been trying it) β
Fine-tunable on your own data on OpenPipe"
X Link 2024-11-13T23:16Z 11.7K followers, [----] engagements
"@stride_zone is another OpenPipe user Noticing a theme AI/Crypto companies seem to be growing quickly. π Introducing @echosdotfun a new app by Stride contributors π Echos are AI agents with access to a crypto wallet and X account. Users can launch an Echo deploy memecoins send them to Echos and trade them as the Echo posts on X. Check it out π https://t.co/FpW2MFKVUA Echos Introducing @echosdotfun a new app by Stride contributors π Echos are AI agents with access to a crypto wallet and X account. Users can launch an Echo deploy memecoins send them to Echos and trade them as the Echo posts"
X Link 2024-11-14T18:58Z 11.7K followers, [--] engagements
"@mattdsegal @vikhyatk Slightly more info in our docs https://docs.openpipe.ai/features/criteria/api#runtime-evaluation https://docs.openpipe.ai/features/criteria/api#runtime-evaluation"
X Link 2024-11-16T21:14Z 16.6K followers, [---] engagements
"This may become an official Qwen-stan account. β
Open source SOTA on code β
Open source SOTA in general for 14B+ β
Almost SOTA 14B β
Works great for LM RM and classification tasks β
SOTA open source multimodal"
X Link 2024-11-19T17:22Z 19K followers, 16.8K engagements
"What is the current SOTA on language autoencoders Can you run lossy compression on a 20K-word Wikipedia article to give you an archive that's just a few KB in size but decompresses into text semantically indistinguishable from the original"
X Link 2024-11-22T22:43Z 10.4K followers, [----] engagements
"@bnjmn_marie Is this compatible with efficient serving libraries like vLLM If so how does it impact throughput/latency"
X Link 2024-11-26T19:54Z 10.4K followers, [--] engagements
"Ok I am terrible at sharing product updates here but we now support Llama [---] 1B and 3B (the best small LLMs) as well as Qwen [---] 72B and 32B Coder (the best open general and code-specific models) on OpenPipe"
X Link 2024-12-04T00:35Z 11.8K followers, [----] engagements
"One of the new features I'm most excited about at OpenPipe is "criteria distillation". This allows you to distill an expensive LLM-as-judge criteria into a super fast cheap low-latency reward model that approximates the LLM-as-judge's outputs. DM for access"
X Link 2024-12-04T18:43Z 11.7K followers, [----] engagements
"@_xjdr @jxmnop Has anyone on the team run public benchmarks and shared the results If not why not"
X Link 2024-12-04T21:51Z 10.5K followers, [----] engagements
"------fine-tuning platform to kill openpipe actually doesn't end up killing any startups I don't actually think OpenAI's goal is to kill OpenPipe here but if it is they're doing a terrible job. π anthropic: has the single best model by wide margin clean chat interface; no bells and whistles claude is now berkeley's most eligible bachelor openai: has no leading model o1 too expensive; gpt-4o sucks puts tons of effort into killing startups ------fine-tuning platform anthropic: has the single best model by wide margin clean chat interface; no bells and whistles claude is now berkeley's most"
X Link 2024-12-04T22:00Z 11.8K followers, [----] engagements
"SUPER PUMPED to announce that Gemini fine-tuning is available to all OpenPipe users Gemini Flash provides the lowest cost fine-tuning of any model in its quality class. Comparable to gpt-4o-mini but 4x cheaper inference and FREE fine-tuning"
X Link 2024-12-05T16:38Z 11.8K followers, [----] engagements
"Meta just released Llama [---] 70Bthey claim benchmarks similar to Llama [--] 405B but in a model 20% the size. It's already available as a base model on OpenPipe and we'll release benchmarks as a fine-tuning base model soon. π«‘ As we continue to explore new post-training techniques today we're releasing Llama [---] a new open source model that delivers leading performance and quality across text-based use cases such as synthetic data generation at a fraction of the inference cost. https://t.co/BNoV2czGKL As we continue to explore new post-training techniques today we're releasing Llama [---] a new"
X Link 2024-12-06T18:59Z 11.7K followers, [----] engagements
"OpenAI's Reinforcement Fine-Tuning (RFT) is far more data efficient than SFTcan generalize from 10-20 labeled examples. Huge deal bc as compute costs drop to [--] the pain of gathering high-quality training data is the biggest barrier to deploying AI. RFT needs much less of it"
X Link 2024-12-06T21:45Z 19K followers, 68.1K engagements
"@NighttrekETH watch this space π"
X Link 2024-12-07T01:26Z 10.5K followers, [----] engagements
"Btw you can view your training loss across open source models AND Gemini models on OpenPipe"
X Link 2024-12-09T15:40Z 11.8K followers, [----] engagements
"Qwen [---] is trainable on OpenPipe Benchmarks of providers of Qwen2.5 a leading open-source model family π @alibaba_cloud's Qwen2.5 family of models includes Qwen2.5 72B Qwen2.5 Coder 32B and a range of smaller models including 1.5B and 0.5B models for edge use-cases. Qwen2.5 72B the flagship model is https://t.co/S8K8EZUKP5 Benchmarks of providers of Qwen2.5 a leading open-source model family π @alibaba_cloud's Qwen2.5 family of models includes Qwen2.5 72B Qwen2.5 Coder 32B and a range of smaller models including 1.5B and 0.5B models for edge use-cases. Qwen2.5 72B the flagship model is"
X Link 2024-12-09T23:07Z 11.7K followers, [----] engagements
"What is A100/H100 availability like on the big clouds these days Possible to get an on-demand instance on AWS/GCP/Azure"
X Link 2024-12-11T21:30Z 10.6K followers, [----] engagements
"We will out-ship the big labs because in this house we have BOTH an OpenAI and a Claude subscription. Sam Altman says when ChatGPT went down yesterday he had to work for [--] hours without it and he realized how reliant we are becoming on AI systems as a form of critical infrastructure https://t.co/IRnlb8AbCl Sam Altman says when ChatGPT went down yesterday he had to work for [--] hours without it and he realized how reliant we are becoming on AI systems as a form of critical infrastructure https://t.co/IRnlb8AbCl"
X Link 2024-12-13T06:45Z 10.6K followers, [----] engagements
"There is an opportunity for someone to build a fantastic ChatGPT clone that is provider-agnostic. Your launch window is this week: Gemini [--] Pro launches next week and will be SOTA but no one is in the habit of using Gemini's first-party web UI yet"
X Link 2024-12-13T20:30Z 19K followers, 30.1K engagements
"If you're building this you should def. include a side-by-side comparison view UI. Your marketing strategy is sharing screenshots of interesting outputs by new models contrasted with boring outputs (for the same prompt) for old models"
X Link 2024-12-13T21:34Z 11.8K followers, [----] engagements
"@simonw We did an automated analysis of the tasks our users use OpenPipe for using LLMs to categorize each customer task. Will write this up as a blog post at some point. Note this may or may not be representative of the wider ecosystem"
X Link 2024-12-16T20:47Z 19K followers, 18.8K engagements
"@aidan_mclau if o1+cursor agents can do this successfully I will merge"
X Link 2024-12-18T22:14Z 10.7K followers, [---] engagements
"Now we know why OpenAI sat on strawberry/o1 for so long before releasing. It turns out that once you've seen the trick reproducing the results isn't so hard. Breaking news from Chatbot Arenaβ‘π€ @GoogleDeepMind's Gemini-2.0-Flash-Thinking debuts as #1 across ALL categories The leap from Gemini-2.0-Flash: - Overall: #3 #1 - Overall (Style Control): #4 #1 - Math: #2 #1 - Creative Writing: #2 #1 - Hard Prompts: #1 #1 https://t.co/cq2MRMbWZ1 Breaking news from Chatbot Arenaβ‘π€ @GoogleDeepMind's Gemini-2.0-Flash-Thinking debuts as #1 across ALL categories The leap from Gemini-2.0-Flash: - Overall:"
X Link 2024-12-19T17:37Z 10.7K followers, 10.1K engagements
""This shows that it's still feasible to create unsaturated interesting benchmarks that are easy for humans yet impossible for AI . We will have AGI when creating such evals becomes outright impossible." This is a reasonable testable definition of AGI imo. So is this AGI While the new model is very impressive and represents a big milestone on the way towards AGI I don't believe this is AGI -- there's still a fair number of very easy ARC-AGI-1 tasks that o3 can't solve and we have early indications that ARC-AGI-2 will remain So is this AGI While the new model is very impressive and represents a"
X Link 2024-12-20T18:27Z 19K followers, 48.9K engagements
"@MSROakRidge @vikhyatk Source"
X Link 2024-12-20T19:01Z 10.7K followers, [--] engagements
"@aidan_mclau The OpenPipe criteria feature could be defined as a workflow to force you to write instructions that unambiguously explain the properties you're looking for in an output to an LLM-as-judge. (And yes we're working on using RL to optimize against them) https://docs.openpipe.ai/features/criteria/overview https://docs.openpipe.ai/features/criteria/overview"
X Link 2024-12-27T02:31Z 16.6K followers, [---] engagements
"@danfaggella Nah. We still play chess draw pictures and throw balls despite living in a world with machines that do all of those infinitely better. There will be a short-term adjustment and then people will keep doing things that make them feel happy and fulfilled"
X Link 2024-12-27T22:14Z 10.7K followers, [---] engagements
"@aaron_defazio cc @bnjmn_marie I believe you tested it out some time ago"
X Link 2024-12-27T22:23Z 11.2K followers, [---] engagements
"A few weeks ago OpenAI announced Reinforcement Fine-Tuning (RFT)a new way to adapt LLMs to complex tasks with very little training data. Heres a quick rundown of how it works why its a big deal and when you should use it. π§΅"
X Link 2024-12-30T22:52Z 19K followers, 266.9K engagements
"Sometimes RFT can be a stepping stone: [--] Label 50-100 examples by hand train an RFT model. [--] Use that RFT model to machine-label more data. [--] Then fine-tune a simpler faster LLM via SFT. Best of both worlds"
X Link 2024-12-30T22:56Z 11.2K followers, [----] engagements
"@girlbossintech Not confirmed but seems likely given how widely PPO is known to be used internally at OpenAI. They may have made tweaks to improve stability"
X Link 2024-12-31T00:39Z 11.2K followers, [----] engagements
"@chiki_champat Yep still closed beta will go GA in January. Unsure whether Azure will host or not; they have rights to the research but may or may not want to build out the product"
X Link 2024-12-31T06:14Z 11.2K followers, [---] engagements
"We are looking to hire a few more really strong engineers on either the ML or systems side. In just over a year we have built the world's best fine-tuning platform. We were the first to launch self-service preference tuning the first to integrate evals/data prep/fine-tuning into a cohesive experience and the first to build self-service learning from human feedback. We have the best customer list in the business from fast-growing startups like @WisprFlow to enormous enterprises. And we've done all this with a technical team of only [--] (including my co-founder and me). If you are a future"
X Link 2024-12-31T18:22Z 11.2K followers, 13.6K engagements
"@OfficialLoganK Thanks @OfficialLoganK has been great working with you I should add to the list that we're the ONLY fine-tuning provider that lets you train OpenAI/open-source/Gemini models through one consistent interface"
X Link 2024-12-31T18:43Z 11.2K followers, [---] engagements
"When I worked at @ycombinator I'd make a point of chatting with very successful founders and getting the "real" backstory not the polished PR one. There were always big mistakes self-doubt long periods lost in the wilderness. Success is only inevitable in retrospect. Marc Andreessen: Every innovator eventually starts to like the taste of their own blood Once something works the stories get retconned and adapted to say it was inevitable all along everybody always knew this was a good idea. The person has won all these awards and https://t.co/iPF63swp7A Marc Andreessen: Every innovator"
X Link 2025-01-10T20:42Z 11.2K followers, [----] engagements
"It's clear in retrospect why OpenAI sat on strawberry/o1 for so long without publishing. Now that everyone has seen the trick replications are coming fast. Qwen AllenAI academic work like PRIME and now Microsoft are all getting impressive results. Microsoft presents rStar-Math Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking On the MATH benchmark it improves Qwen2.5-Math-7B from 58.8% to 90.0% and Phi3-mini-3.8B from 41.4% to 86.4% surpassing o1-preview by +4.5% and +0.9%. On the USA Math Olympiad https://t.co/eQtDaZWe5z Microsoft presents rStar-Math Small LLMs Can Master"
X Link 2025-01-10T20:58Z 11.3K followers, [----] engagements
"Sharing an important lesson learned from working with hundreds of customers: theres a big difference in the right way to evaluate and fine-tune LLMs depending on whether your task has one right answer or many. RFT DPO RLHF evals all downstream of this π§΅"
X Link 2025-01-15T20:35Z 19K followers, 40.5K engagements
"On the other hand freeform tasks have infinitely many correct outputsthink: - Summaries - Email drafts - Chatbots Here correctness is more subjective. Theres no single right answer and that affects how we measure success"
X Link 2025-01-15T20:36Z 12K followers, [----] engagements
"To see how often each type appears in practice I analyzed [----] recent datasets on OpenPipe. 63% were freeform 37% were deterministic"
X Link 2025-01-15T20:36Z 12K followers, [----] engagements
"Ok lets talk about why this matters Key difference #1: Ideal temperature settings. Deterministic tasks usually need temperature=0 for consistent correct outputs. Freeform tasks can benefit from higher temperatures (0.71.0) to foster creativity and variety. (h/t @eugeneyan wrote about this briefly in https://eugeneyan.com/writing/prompting/#selecting-a-temperature https://eugeneyan.com/writing/prompting/#selecting-a-temperature"
X Link 2025-01-15T20:37Z 11.3K followers, [----] engagements
"Key difference #2: Evaluations. - Deterministic tasks can leverage golden datasets with known-correct outputs for straightforward scoring. - Freeform tasks often rely on vibe checks LLM-as-judge user feedback or business metrics to gauge quality"
X Link 2025-01-15T20:38Z 11.3K followers, [----] engagements
"Ok annoyingly X doesn't allow polls in replies So I guess respond or dm me with your answers pls π From the diagram above: [--]. Do you understand what OpenPipe does [--]. Would you use OpenPipe"
X Link 2025-01-17T00:06Z 11.9K followers, [---] engagements
"I mean arguably if you are naming your product literal "Sauron" you should have read a little more LOTR and re-thought that"
X Link 2025-01-17T00:57Z 19K followers, [----] engagements
"Reasoning is a general purpose technology. Deepseek-R1 was trained to reason on math and code problems. But it improved on [--] / [--] benchmarks including the majority of non math/code tasks We can expect reasoning improvements to help across the board"
X Link 2025-01-21T01:01Z 19K followers, 31.3K engagements
"Recently got pitched by a startup building a "copilot Twitter influencer" that suggests posts you can make on X to build influence and gain followers. I'm trying to triangulate the Overton window here. Would you use a product like this no ick yes cool (show results) no ick yes cool (show results)"
X Link 2025-01-22T23:12Z 11.8K followers, [----] engagements
"Big news: we've figured out how to train models 80-90% cheaper than before. Cheaper than renting your own GPUs. Cheaper than any other service. And [--] quality regression. Super proud of the team on this one. New pricing is now live"
X Link 2025-01-23T17:16Z 19K followers, 91.9K engagements
"@xlr8harder Europe's secret plan is to be the natural wildlife preserve for homo sapiens when ASI paperclips the rest of the planet. Genius strategy tbh"
X Link 2025-01-23T19:00Z 11.8K followers, [---] engagements
"@stochasticchasm Yeah fair. RFT as presented by OpenAI uses verifiable rewards but there's no reason it has to. The overlap is in the technique of allowing an unconstrained chain of thought and then assessing the reward on just a separate final output"
X Link 2025-01-24T17:35Z 11.8K followers, [--] engagements
"@bradthilton @jxmnop Yeah this seems the obvious next thing kinda surprised Deepseek didn't even bring the idea up in their paper"
X Link 2025-01-24T20:20Z 11.8K followers, [--] engagements
"@winglian @natolambert Yep that's exactly what process reward models are supposed to do. Although there's some question more recently about whether they're actually necessary given DeepSeek's sucess without one. I suspect rewarding on outcomes with a mild length penalty should work in practice"
X Link 2025-02-01T23:03Z 11.8K followers, [---] engagements
"@willccbb @hallerite I actually made a PR to verl last week that I think should get you most of what you want I agree that a python API is much more convenient than bash flags or a yaml config dunno why the later two seem so much more popular. https://github.com/volcengine/verl/pull/162 https://github.com/volcengine/verl/pull/162"
X Link 2025-02-01T23:12Z 11.8K followers, [---] engagements
"@BigLawNoMaw @sc_cath An asset doesn't have to be liquid to have an NPV You can calculate NPV by looking at the cost of buying an annuity with a similar payout schedule"
X Link 2025-02-05T18:41Z 11.8K followers, [--] engagements
"I have never been more excited by our product roadmap. Every agent is going to be trained using RL. And the best ones (outside the frontier labs) are going to be trained on OpenPipe"
X Link 2025-02-06T00:44Z 12K followers, [----] engagements
"@RobertHaisfield @togethercompute @FireworksAI_HQ collect R1 outputs - distill into smolR1 with OpenPipe π"
X Link 2025-02-06T02:29Z 11.9K followers, [---] engagements
"π΅ Can smaller open-weight models match state-of-the-art reasoning performance We investigated using GRPO on "Temporal Clue" surpassing R1 o1 and o3-miniand nearly matching Sonnet [---] at over 100x lower inference cost. Here's how: π (1/6)"
X Link 2025-03-06T19:46Z 19K followers, 55.3K engagements
""Temporal Clue" is a challenging logic puzzle inspired by the classic board game Clueexpanded to include "when" and "why." Perfectly suited for benchmarking LLM reasoning skills it exposed strengths and weaknesses in top models. (3/6)"
X Link 2025-03-06T19:47Z 11.9K followers, [----] engagements
"By iteratively refining models using GRPO and torchtune we improved accuracy to approach Claude Sonnet [---] significantly outperforming popular models like DeepSeek R1 OpenAI's o1 and o3-mini. (4/6)"
X Link 2025-03-06T19:47Z 11.9K followers, [----] engagements
"@htahir111 put together a really thoughtful guide on how to use ZenML with OpenPipe to build really high-quality models. Excited to partner up @zenml_io π@OpenPipeAI integration for #LLM fine-tuning in production After meeting the OpenPipe team in NY at the AI Engineer Summit (@swyx) last month I was inspired by their vision for making LLM fine-tuning accessible. Shout out to @corbtt @ReidMayo @dvdcrbt What https://t.co/hsqRJlhFDw @zenml_io π@OpenPipeAI integration for #LLM fine-tuning in production After meeting the OpenPipe team in NY at the AI Engineer Summit (@swyx) last month I was"
X Link 2025-03-18T17:30Z 12K followers, [---] engagements
"If you're fine-tuning LLMs Gemma [--] is the new π and it's not close. Gemma [--] trounces Qwen/Llama models at every size - Gemma [--] 4B beats 7B/8B competition - Gemma [--] 27B matches 70B competiton Vision benchmarks coming soon"
X Link 2025-03-21T16:27Z 19K followers, 37.3K engagements
"2 GPU efficiency: Typical RL rollouts can leave GPUs idle waiting for external tasks. ART separates frontend (rollouts reward logic) and backend (inference training) allowing parallelized execution and higher GPU utilization"
X Link 2025-04-14T18:24Z 12.1K followers, [---] engagements
"3 Seamless integration: Other RL trainers require substantial refactoring to fit existing codebases. ART is designed for plug-and-play compatibility easing integration with tools like CrewAI and the OpenAI Agents SDK"
X Link 2025-04-14T18:25Z 12.1K followers, [---] engagements
"@benderville @vllm_project @huggingface @UnslothAI this is an open source project not currently available on our managed service on http://openpipe.ai http://openpipe.ai"
X Link 2025-04-16T23:34Z 16.6K followers, [--] engagements
"πMeet ARTEour open-source RL-trained email research agent that searches your inbox and answers questions more accurately faster and cheaper than o3. Let's go deeper on how we built it. π§΅"
X Link 2025-04-29T17:29Z 19K followers, 161.4K engagements
"The results exceeded expectations: ARTE surpasses o3 on accuracy slashes latency [--] and cuts costs [--]. Turns out RL works really well"
X Link 2025-04-29T17:30Z 12.7K followers, [----] engagements
"Whered the data come from [---] K Enron emails π. We sampled inboxes and used GPT-4.1 to spin up realistic Q/A pairsbecause the perfect dataset didnt exist so we made it up synthetically π"
X Link 2025-04-29T17:30Z 12.6K followers, [----] engagements
"@TheZachMueller interesting to see how it benchmarks I assume that's how they implement the thinking_budget that they benchmark with but not sure. Unfortunately for prod use vllm or similar is probably a requirement"
X Link 2025-04-30T12:51Z 14.4K followers, [--] engagements
"Agentic RAG works better than semantic search RAG and it's not even close. Will post results soon"
X Link 2025-05-07T22:40Z 19K followers, 138.2K engagements
"why is o3 still so confidently wrong"
X Link 2025-05-19T06:19Z 13K followers, [----] engagements
"RL twitter has anyone reproduced the positive results from clip-higher in DAPO Just got around to trying it but using their epsilon values of 0.2/0.28 gets me significantly worse results than the baseline 0.2/0.2. @willccbb @kalomaze @casper_hansen_ @rosmine_b @hallerite @danielhanchen @QGallouedec @brendanh0gan"
X Link 2025-05-21T17:05Z 13.1K followers, [----] engagements
""RL from a single example works" "RL with random rewards works" "Base model pass@256 can match RL model pass@1" "RL updates a small % of params" Recent papers all point in the same direction: RL is mostly just eliciting latent behavior already learned in pretraining not teaching new behavior. Yann Lecun's "RL is the cherry on top" was right after all. Is this bearish for RL Perhaps not Maybe we should think about RL as the last mile of on-the-job training for your specific task. When you hire a customer support rep you start with someone who's already smart and capable. But you also watch"
X Link 2025-05-27T18:15Z 19K followers, 105.2K engagements
"At a recent dinner I met a very senior engineer at one of the Big Four tech cos. His team develops tooling for a 0-engineer future. They're not allowed to tell anyone internally what they're working on to avoid mass panic. He figures mega layoffs start in [--] months"
X Link 2025-05-28T20:15Z 19K followers, 1.3M engagements
"@willccbb @UnslothAI @danielhanchen @natolambert Oh for serving an autoregressive language model with many-token outputs you're probably still better off with vllm. This result just applies to sequence classification models (and I guess autoregressive models that you sample a single token from as well)"
X Link 2025-06-12T00:45Z 16K followers, [---] engagements
"@garybasin I tried [---] o3 o3 pro opus [--] and gemini [---] pro before tweeting π maybe I should make a meta post showing all their threads once I've posted my hand-created one"
X Link 2025-06-12T16:35Z 14.1K followers, [---] engagements
"We've used RL to train hundreds of models on dozens of tasks at @OpenPipeAI. Here's everything I know about "reward hacking": π§΅"
X Link 2025-06-12T18:51Z 19K followers, 23.6K engagements
"@VictorTaelin the right technical solution here is for everyone to walk around with a camera strapped to their face that records and transcribes everything you see and lets you query it with natural language. unclear how many years (generations) before socially acceptable tho"
X Link 2025-06-15T21:31Z 14.2K followers, [----] engagements
"Hot take: at current token prices you should always ask your LLM-as-judge to explain its CoT first before answering. Makes them way easier to debug when it inevitably make a judgement you disagree with"
X Link 2025-06-17T19:46Z 14.1K followers, [----] engagements
"An LLM should be able to change a diaper plan an invasion butcher a hog conn a ship design a building write a sonnet balance accounts build a wall set a bone comfort the dying take orders give orders cooperate act alone solve equations analyze a new problem pitch manure program a computer cook a tasty meal fight efficiently die gallantly. Specialization is for insects. if this resonates with you you are probably not trying to do something complex enough for it to matter. which is fine actually. but there are edgy edge cases that live among harder problems. and when it comes to those some"
X Link 2025-06-19T21:07Z 14.1K followers, [----] engagements
"GRPO quirk that contradicted my intuition: If you train on a group with rewards [--] [--] [--] [--] And then you train on another group with rewards [----] [----] [----] [--] Because of how GRPO normalizes within groups the last trajectory will be equally reinforced in both cases"
X Link 2025-06-19T21:22Z 19K followers, 85.2K engagements
Limited data mode. Full metrics available with subscription: lunarcrush.com/pricing
/creator/twitter::corbtt