# ![@LucasAtkins7 Avatar](https://lunarcrush.com/gi/w:26/cr:twitter::1444092395809804297.png) @LucasAtkins7 Lucas Atkins

Lucas Atkins posts on X about model, ai, in the, we are the most. They currently have [-----] followers and [---] posts still getting attention that total [-----] engagements in the last [--] hours.

### Engagements: [-----] [#](/creator/twitter::1444092395809804297/interactions)
![Engagements Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::1444092395809804297/c:line/m:interactions.svg)

- [--] Week [------] -45%
- [--] Month [-------] +227%
- [--] Months [----------] +832%
- [--] Year [----------] +216%

### Mentions: [--] [#](/creator/twitter::1444092395809804297/posts_active)
![Mentions Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::1444092395809804297/c:line/m:posts_active.svg)

- [--] Week [--] -22%
- [--] Month [--] +133%
- [--] Months [---] +30%
- [--] Year [---] +56%

### Followers: [-----] [#](/creator/twitter::1444092395809804297/followers)
![Followers Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::1444092395809804297/c:line/m:followers.svg)

- [--] Week [-----] +0.54%
- [--] Month [-----] +19%
- [--] Months [-----] +88%
- [--] Year [-----] +229%

### CreatorRank: [---------] [#](/creator/twitter::1444092395809804297/influencer_rank)
![CreatorRank Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::1444092395809804297/c:line/m:influencer_rank.svg)

### Social Influence

**Social category influence**
[technology brands](/list/technology-brands)  [stocks](/list/stocks)  [finance](/list/finance)  [social networks](/list/social-networks)  [gaming](/list/gaming)  [travel destinations](/list/travel-destinations)  [celebrities](/list/celebrities)  [vc firms](/list/vc-firms)  [currencies](/list/currencies)  [exchanges](/list/exchanges) 

**Social topic influence**
[model](/topic/model), [ai](/topic/ai), [in the](/topic/in-the), [we are](/topic/we-are), [this is](/topic/this-is), [release](/topic/release), [the first](/topic/the-first), [the next](/topic/the-next), [for the](/topic/for-the), [build](/topic/build)

**Top assets mentioned**
[Merge (MERGE)](/topic/merge) [GrokCoin (GROKCOIN)](/topic/grok) [Microsoft Corp. (MSFT)](/topic/microsoft) [Alphabet Inc Class A (GOOGL)](/topic/$googl) [Flex Ltd. Ordinary Shares (FLEX)](/topic/$flex)
### Top Social Posts
Top posts by engagements in the last [--] hours

"@patio11 Neat tip: if its struggling with LUA and youre getting a lot of bugs. Have it translate it to python and run it in code interpreter. It finds a LOT of bugs and nil errors that way. Then have it translate it back. I use this for making World of Warcraft add-ons"  
[X Link](https://x.com/LucasAtkins7/status/1757463638212788646)  2024-02-13T17:55Z [--] followers, [----] engagements


"I'm excited to release a project I've been working on the last couple of weeks. Qwen1.5-8x7b: And the accompanying dataset created with the intention of encouraging MoE models to organically develop their own experts: The purpose and intention behind this project is better detailed in the model/dataset card but basically: I curated a diverse dataset from the highest quality conversations I could find. It's actually great. All sources are included in the dataset card. I then trained Qwen1.5-7b on a 100k subset over [--] epochs. Took that and made a MoE using @maximelabonne's lazymergekit"  
[X Link](https://x.com/anyuser/status/1759110961028349990)  2024-02-18T07:01Z [----] followers, 34.8K engagements


"@Adi_kmt @stablequan @Teknium1 I did not"  
[X Link](https://x.com/LucasAtkins7/status/1759198469598626138)  2024-02-18T12:49Z [---] followers, [--] engagements


"@Adi_kmt @stablequan @Teknium1 I put the config yaml in the repo to help people avoid any. It doesnt play nicely with deepspeed if you call it from the command line it needs to be in the yaml. Also if youre going to do a merge from scratch Id suggest copying my config.json as well before fine tuning"  
[X Link](https://x.com/LucasAtkins7/status/1759201089159659674)  2024-02-18T12:59Z [---] followers, [--] engagements


"@OfficialLoganK @OpenAI God speed and good luck. I always found you a refreshing mix of candor and professionalism - especially in the face of every other comment asking "AGI when". It's clear you cared and were as transparent as you were allowed to be. Thanks"  
[X Link](https://x.com/LucasAtkins7/status/1763590570138611738)  2024-03-01T15:42Z [---] followers, [--] engagements


"Gemini pro [---] is easily the most useful LLM I have every used. OpenAI should watch this closely"  
[X Link](https://x.com/LucasAtkins7/status/1763965109259407507)  2024-03-02T16:30Z [---] followers, [----] engagements


"@hive_echo They already bought like [--] billion dollars worth of GPUs. Gemini ultra [---] will also likely surpass GPT-4 pretty soundly"  
[X Link](https://x.com/LucasAtkins7/status/1765825778468462878)  2024-03-07T19:44Z [---] followers, [--] engagements


"Matt has a bunch of these that are pretty stellar. Heres a Claude [--] prompt that helps you learn any skill faster. Just give it a skill and itll give you a custom curriculum: roleYou are a learning coach renowned for your ability to help people master complex skills in record time. You have deep expertise in accelerated Heres a Claude [--] prompt that helps you learn any skill faster. Just give it a skill and itll give you a custom curriculum: roleYou are a learning coach renowned for your ability to help people master complex skills in record time. You have deep expertise in accelerated"  
[X Link](https://x.com/LucasAtkins7/status/1766521658242732227)  2024-03-09T17:49Z [---] followers, [---] engagements


"Here is the code i've been using to implement @AIatMeta 's branch train mix for creating mixture of expert models via tokenized routing w/o pretraining. Use the moe-fix branch from mergekit for the yaml: https://github.com/Crystalcareai/BTX https://github.com/Crystalcareai/BTX"  
[X Link](https://x.com/anyuser/status/1772826031499391200)  2024-03-27T03:20Z [----] followers, [----] engagements


"@AIatMeta It should be noted that the length of/diversification of the fine-training run seems pretty massive to achieve optimal performance. You're not worried about typical overfitting with this method you just want that router to get as low a loss as possible on its own"  
[X Link](https://x.com/LucasAtkins7/status/1773014577195700531)  2024-03-27T15:49Z [---] followers, [---] engagements


"Casper is always on the bleeding edge. I did some research on LLM as agents today. Here is a guide to the state-of-the-art of LLMs as agents It's all about environments where LLMs can observe plan act and iterate on solutions. 🧵1/8 https://t.co/99nsQeHm2N I did some research on LLM as agents today. Here is a guide to the state-of-the-art of LLMs as agents It's all about environments where LLMs can observe plan act and iterate on solutions. 🧵1/8 https://t.co/99nsQeHm2N"  
[X Link](https://x.com/LucasAtkins7/status/1773049679007105325)  2024-03-27T18:09Z [---] followers, [---] engagements


"@TheXeophon @Kyle_L_Wiggers Beat and GPT-4 must be great SEO for news aggregators"  
[X Link](https://x.com/LucasAtkins7/status/1773050901126680587)  2024-03-27T18:14Z [---] followers, [--] engagements


"@lek_tony @erhartford @CrusoeCloud @FernandoNetoAi @AIatMeta Ill only share what Eric has posted publicly so far: but its very strong"  
[X Link](https://x.com/LucasAtkins7/status/1781372994545156431)  2024-04-19T17:23Z [---] followers, [---] engagements


"@MediocreApe0 @erhartford @CrusoeCloud @FernandoNetoAi @AIatMeta Dolphin is still training that might be a merge of an older dolphin and westlake"  
[X Link](https://x.com/LucasAtkins7/status/1781498414103998903)  2024-04-20T01:41Z [---] followers, [--] engagements


"@lek_tony @erhartford @CrusoeCloud @FernandoNetoAi I do not though I'll be the first to admit I know very little about tgi"  
[X Link](https://x.com/LucasAtkins7/status/1781913373313052733)  2024-04-21T05:10Z [---] followers, [--] engagements


"@lek_tony @erhartford @CrusoeCloud @FernandoNetoAi You might be able to rectify that with this; looks like sometimes added_special_tokens doesn't actually add it to the model.tokenizer: if that fixes it for you I can add it to the actual model tokenizer. https://gist.github.com/jneuff/682d47b786329f19291d166957b3274a https://gist.github.com/jneuff/682d47b786329f19291d166957b3274a"  
[X Link](https://x.com/LucasAtkins7/status/1781914207765594364)  2024-04-21T05:13Z [---] followers, [--] engagements


"@Teknium1 @nearcyan He sends all revenue from that video to nvidia. The cycle continues"  
[X Link](https://x.com/LucasAtkins7/status/1783765348811973102)  2024-04-26T07:49Z [---] followers, [--] engagements


"@masfiq018 IBM very much supports open source. So we have two people along with @drfeifei"  
[X Link](https://x.com/LucasAtkins7/status/1784283781462888616)  2024-04-27T18:09Z [---] followers, [--] engagements


"@strada @erhartford @FernandoNetoAi @CrusoeEnergy @3thanPetersen @LMStudioAI Yes youll have to look around for some GGUFs or use the one in the readme"  
[X Link](https://x.com/LucasAtkins7/status/1785365852889420052)  2024-04-30T17:49Z [---] followers, [--] engagements


"In b4 nueral-liquid-dolphermes-slerp -orpo-sppo-mistrallama-12b-awq-q_5-gguf 🎉 Big news I've joined @LiquidAI_ an MIT spin-off where I'm leading the efforts to fine-tune LLMs. They've got big plans and serious compute power so I'm excited to see what we can accomplish :) If you want to meet in person I'll be at our social event at @iclr_conf in https://t.co/oFHO85Qeff 🎉 Big news I've joined @LiquidAI_ an MIT spin-off where I'm leading the efforts to fine-tune LLMs. They've got big plans and serious compute power so I'm excited to see what we can accomplish :) If you want to meet in person"  
[X Link](https://x.com/LucasAtkins7/status/1786826784408764418)  2024-05-04T18:34Z [----] followers, [----] engagements


"@migtissera @maximelabonne Nah dude hes a menace to society"  
[X Link](https://x.com/LucasAtkins7/status/1786831434675814764)  2024-05-04T18:53Z [---] followers, [---] engagements


"@Prince_Canuma @winglian eos should be fine if you're getting errors. There isn't a ton of padding during pretraining as you're mostly filling the sequence length with as many tokens as possible. It depends on what library you're using though - are you using axolotl"  
[X Link](https://x.com/LucasAtkins7/status/1789353751603556463)  2024-05-11T17:55Z [---] followers, [--] engagements


"@Prince_Canuma @winglian is what I use for llama - disregard the chatml stuff if you're using their stock template"  
[X Link](https://x.com/LucasAtkins7/status/1789355218460664034)  2024-05-11T18:01Z [---] followers, [--] engagements


"@Prince_Canuma @winglian Yeah thats the same thing as setting pad token to eos"  
[X Link](https://x.com/LucasAtkins7/status/1789356070558707983)  2024-05-11T18:05Z [---] followers, [--] engagements


"@Prince_Canuma @winglian That's an unsolved mystery. Some say yes others say no. To get around that you could do this: special_tokens: pad_token: "pad" tokens: - "pad" Which just adds a padding token that is only used for padding"  
[X Link](https://x.com/LucasAtkins7/status/1789358354608832879)  2024-05-11T18:14Z [---] followers, [--] engagements


"@Prince_Canuma @winglian This will increase the vram requirements for pretraining though - if you want to use only tokens that are already in llama-3's vocabulary eos might be your best bet"  
[X Link](https://x.com/LucasAtkins7/status/1789358566907814092)  2024-05-11T18:15Z [---] followers, [--] engagements


"@Prince_Canuma @winglian Youre not likely to get any issues with eos because axolotl wont really be doing any padding just filling that context length"  
[X Link](https://x.com/LucasAtkins7/status/1789359783432143237)  2024-05-11T18:19Z [---] followers, [--] engagements


"Oh nice test. Yeah thats a good point because all my llama trains have used chatml instead of default so Im teaching it a new eos token. Random reserved token could be fine. Most fine tunes on default chat template use the eos token and have good results but if you get better results using one of their reserved tokens I dont think theres anything fundamentally wrong with that"  
[X Link](https://x.com/LucasAtkins7/status/1789362635965657334)  2024-05-11T18:31Z [---] followers, [--] engagements


"@Xianbao_QIAN @JustinLin610 @Alibaba_Qwen We had to change nodes a few times and made some dataset changes mid training - but ultimately took [--] days"  
[X Link](https://x.com/LucasAtkins7/status/1789876178024595711)  2024-05-13T04:31Z [---] followers, [---] engagements


"@Xianbao_QIAN @JustinLin610 @Alibaba_Qwen A lot for your average Joe but yeah pretty inexpensive for a company tuning their own model"  
[X Link](https://x.com/LucasAtkins7/status/1789882502724632762)  2024-05-13T04:56Z [---] followers, [---] engagements


"@Xianbao_QIAN @JustinLin610 @Alibaba_Qwen for reference our tune was [--] epochs of [---] million samples. [---------] tokens"  
[X Link](https://x.com/LucasAtkins7/status/1789886130386563508)  2024-05-13T05:11Z [---] followers, [---] engagements


"https://huggingface.co/google/paligemma-3b-pt-896 https://huggingface.co/google/paligemma-3b-pt-896"  
[X Link](https://x.com/LucasAtkins7/status/1790447762401591683)  2024-05-14T18:23Z [----] followers, [----] engagements


"Ive been fortunate to be able to use Gemini Flash for the last couple of weeks. Its a very capable model in its own right is not a lazy coder and has ridiculously long output token lengths (8192 I believe - but dont quote me). Its no frontier model with wild emergent capabilities but they undersold it a bit"  
[X Link](https://x.com/LucasAtkins7/status/1790474379249230240)  2024-05-14T20:08Z [---] followers, [---] engagements


"OpenAI always having to one up Google. After almost a decade I have made the decision to leave OpenAI. The companys trajectory has been nothing short of miraculous and Im confident that OpenAI will build AGI that is both safe and beneficial under the leadership of @sama @gdb @miramurati and now under the After almost a decade I have made the decision to leave OpenAI. The companys trajectory has been nothing short of miraculous and Im confident that OpenAI will build AGI that is both safe and beneficial under the leadership of @sama @gdb @miramurati and now under the"  
[X Link](https://x.com/LucasAtkins7/status/1790579771195249009)  2024-05-15T03:07Z [---] followers, [---] engagements


"@migtissera @stablequan @Teknium1 we're working on grok rn as the ultimate test of laser. It'll be tricky - but it's coming along. But yes - not optimal. Mostly just released to say they open sourced it"  
[X Link](https://x.com/LucasAtkins7/status/1793519640662159692)  2024-05-23T05:49Z [----] followers, [--] engagements


"@winglian @arcee_ai @JustinLin610 I'll run the MT-Bench on it and let you know. It was a mix of infini-instruct and the new magpie datasets"  
[X Link](https://x.com/LucasAtkins7/status/1805323709298692590)  2024-06-24T19:34Z [----] followers, [--] engagements


"A demo of arcee-spark using it alongside Florence from @MSFTResearch and Whisper to analyze what makes an ad "ironic.""  
[X Link](https://x.com/LucasAtkins7/status/1805364266452525120)  2024-06-24T22:15Z [----] followers, [----] engagements


"@JagersbergKnut @Teknium1 It's not and [---] Pro is probably an MoE Grok size. Would love to get a base [---] Pro prior to upcycling though"  
[X Link](https://x.com/LucasAtkins7/status/1806716620171010379)  2024-06-28T15:49Z [----] followers, [--] engagements


"@bartowski1182 @Microsoft @satyanadella @SebastienBubeck"  
[X Link](https://x.com/LucasAtkins7/status/1808964473379598804)  2024-07-04T20:41Z [----] followers, [--] engagements


"I'll have to avoid Thursday launches going forward - but nevertheless here's Arcee-Nova: Spark's 72B older brother"  
[X Link](https://x.com/LucasAtkins7/status/1814012458769506316)  2024-07-18T19:00Z [----] followers, [---] engagements


"@cibernicola_es For the larger ones it makes sense - huggingface limits the total size that a file can be. Smaller ones I'm not so sure - someone else on the team made them"  
[X Link](https://x.com/LucasAtkins7/status/1814317235793494482)  2024-07-19T15:11Z [----] followers, [--] engagements


"@casper_hansen_ @1littlecoder Yeah that makes way more sense"  
[X Link](https://x.com/LucasAtkins7/status/1814648554801033478)  2024-07-20T13:08Z [----] followers, [--] engagements


"@hive_echo The intricacies of their architecture and what happens to it when you scale up hidden size and layers to meet 400B"  
[X Link](https://x.com/LucasAtkins7/status/1814845108518302055)  2024-07-21T02:09Z [----] followers, [---] engagements


"@hive_echo Good point"  
[X Link](https://x.com/LucasAtkins7/status/1814847166621364523)  2024-07-21T02:17Z [----] followers, [--] engagements


"@hive_echo Goes hard"  
[X Link](https://x.com/LucasAtkins7/status/1814851447483240773)  2024-07-21T02:34Z [----] followers, [--] engagements


"@Teknium1 Typically means End Of Message"  
[X Link](https://x.com/LucasAtkins7/status/1815605118856839418)  2024-07-23T04:29Z [----] followers, [----] engagements


"@_philschmid @Yampeleg How much does the interconnect between GPUs matter in this situation As far as loading models. I remember when trying to get grok training it was taking like [--] minutes or something crazy like that"  
[X Link](https://x.com/LucasAtkins7/status/1817573685265670543)  2024-07-28T14:51Z [----] followers, [---] engagements


"@fblissjr http://Cursor.sh http://Cursor.sh"  
[X Link](https://x.com/LucasAtkins7/status/1817991376971198747)  2024-07-29T18:31Z [----] followers, [--] engagements


"@fblissjr @Teknium1 Yeah you can use whatever you want now. And open source models via local api or an OpenAI compatible api"  
[X Link](https://x.com/LucasAtkins7/status/1817992967862325290)  2024-07-29T18:37Z [----] followers, [--] engagements


"It shows exceptional promise when distilling from a teacher model that was trained on the same dataset you're training the student model:"  
[X Link](https://x.com/LucasAtkins7/status/1819013786713444647)  2024-08-01T14:14Z [----] followers, [---] engagements


"And we're releasing Arcee-Lite a merge of some of our most successful distillation attempts - with the big one being a model from Phi-3-Medium into Qwen2-1.5B. I should note our evaluations for this project were consistently higher than the OpenLLM leaderboard's scores - and should only be compared within the relative performance increase and not weighed against the leaderboard"  
[X Link](https://x.com/LucasAtkins7/status/1819013789247062326)  2024-08-01T14:14Z [----] followers, [---] engagements


"@_philschmid @cognitivecompai @AIatMeta were wanting to do distinct training soon too. Get the logits from the teacher first - then train the smaller model with that - similar to what Gemma does. This will likely be needed for super big distillations. Theres a pretty large memory overhead as it currently stands"  
[X Link](https://x.com/LucasAtkins7/status/1819063646208590315)  2024-08-01T17:32Z [----] followers, [---] engagements


"@4evaBehindSOTA There will also be some world of warcraft references here soon I apologize in advance"  
[X Link](https://x.com/LucasAtkins7/status/1820507279747121527)  2024-08-05T17:08Z [----] followers, [--] engagements


"@WolframRvnwlf @OpenAI I think its 4o - based on the speed"  
[X Link](https://x.com/LucasAtkins7/status/1821321834853474327)  2024-08-07T23:05Z [----] followers, [--] engagements


"We've been working on this for quite some time and I'm thrilled to share a preview of Arcee-Swarm. Instead of relying on a single large generalist model Swarm utilizes multiple domain-specialized models working together to deliver exceptional results with both speed and nuance"  
[X Link](https://x.com/anyuser/status/1823762123354210675)  2024-08-14T16:42Z [----] followers, [----] engagements


"@cognitivecompai If you use the ultra mode it will engage (up to) the top [--] most relevant models for the task"  
[X Link](https://x.com/LucasAtkins7/status/1823860275701014572)  2024-08-14T23:12Z [----] followers, [---] engagements


"Screw it - deepseek v2 awq: https://huggingface.co/arcee-ai/deepseek-v2-chat-0628-awq https://huggingface.co/arcee-ai/deepseek-v2-chat-0628-awq"  
[X Link](https://x.com/LucasAtkins7/status/1825247185581359518)  2024-08-18T19:03Z [----] followers, [----] engagements


"@nisten I dont think it can do 8bit at all atm but thats a @casper_hansen_ question"  
[X Link](https://x.com/LucasAtkins7/status/1825255142280585414)  2024-08-18T19:35Z [----] followers, [---] engagements


"Scarlett Johanssons work on seq2seq was instrumental to getting ML where it is today. TIME's new cover: The [---] most influential people in AI https://t.co/P81KOzsSlC https://t.co/mjUT1UUx26 TIME's new cover: The [---] most influential people in AI https://t.co/P81KOzsSlC https://t.co/mjUT1UUx26"  
[X Link](https://x.com/anyuser/status/1831699191459942584)  2024-09-05T14:21Z [----] followers, 98.2K engagements


"@danielhanchen @UnslothAI @ycombinator Tremendous congrats daniel Well deserved"  
[X Link](https://x.com/LucasAtkins7/status/1831728450828194112)  2024-09-05T16:17Z [----] followers, [--] engagements


"We are announcing Llama-3.1-SuperNova a Llama-3.1-70B-Instruct model offline distilled from Llama-3.1-405B-Instruct. It's ridiculously strong particularly in instruction following and math. It's available to play with at Read more about the model and how we plan to deploy it here: https://blog.arcee.ai/ http://supernova.arcee.ai https://blog.arcee.ai/ http://supernova.arcee.ai"  
[X Link](https://x.com/anyuser/status/1833563888576528750)  2024-09-10T17:51Z [----] followers, 31.9K engagements


"We are open sourcing our EvolKit pipeline that was instrumental in the creation of supernova under MIT license. This was heavily inspired by the AutoEvol paper from @WizardLM_AI and is a tremendously powerful tool for creating complex datasets. Find it here: https://github.com/arcee-ai/EvolKit https://github.com/arcee-ai/EvolKit"  
[X Link](https://x.com/anyuser/status/1833563891856478258)  2024-09-10T17:51Z [----] followers, [----] engagements


"Is there a service where If I provide an OpenAI compatible api itll automatically run some standard LLM benchmarks independently Not lm harness"  
[X Link](https://x.com/LucasAtkins7/status/1837560134685327635)  2024-09-21T18:30Z [----] followers, [----] engagements


"@luijait_ The final version still uses qwens tokenizer - as we needed to convert it back so that we could merge it with some other variants towards the end"  
[X Link](https://x.com/LucasAtkins7/status/1845125102356099075)  2024-10-12T15:31Z [----] followers, [---] engagements


"We have an approach within distillkit that uses the models hidden states instead of logits which allows for cross architecture distilling - but in this case we replaced the qwen tokenizer with llama-3s tokenizer and did the distillation and then converted it back. More info can be found here: https://blog.arcee.ai/introducing-arcee-supernova-medius-a-14b-model-that-rivals-a-70b-2/ https://blog.arcee.ai/introducing-arcee-supernova-medius-a-14b-model-that-rivals-a-70b-2/"  
[X Link](https://x.com/LucasAtkins7/status/1845147074276319641)  2024-10-12T16:58Z [----] followers, [--] engagements


"@nisten @arcee_ai Preference benchmarks are always grain of salt situations though they really only targeted for that"  
[X Link](https://x.com/LucasAtkins7/status/1846707563015901622)  2024-10-17T00:19Z [----] followers, [---] engagements


"@n0riskn0r3ward @natolambert They're all over times square too"  
[X Link](https://x.com/LucasAtkins7/status/1848487853027684799)  2024-10-21T22:13Z [----] followers, [---] engagements


"Though the sonnet upgrades are great the shadow of Opus grows larger still. Haiku is also impressive - but pay attention to those Gemini Flash numbers. Theres more than one looming giant in a datacenter waiting to be done with safety tests. Introducing an upgraded Claude [---] Sonnet and a new model Claude [---] Haiku. Were also introducing a new capability in beta: computer use. Developers can now direct Claude to use computers the way people doby looking at a screen moving a cursor clicking and typing text. https://t.co/ZlywNPVIJP Introducing an upgraded Claude [---] Sonnet and a new model Claude"  
[X Link](https://x.com/LucasAtkins7/status/1848841185093423493)  2024-10-22T21:37Z [----] followers, [---] engagements


"David has always had a beautiful take on AIs influence and impact - this resonated with me. instead of trying to build a wish granting God and trying to control it id really like to try to make powerful systems that uplift ourselves one level at a time instead of trying to build a wish granting God and trying to control it id really like to try to make powerful systems that uplift ourselves one level at a time"  
[X Link](https://x.com/LucasAtkins7/status/1851713840331001896)  2024-10-30T19:52Z [----] followers, [---] engagements


"@vikhyatk wait where is this"  
[X Link](https://x.com/LucasAtkins7/status/1861890769734316251)  2024-11-27T21:52Z [----] followers, [----] engagements


"I'm delighted to share INTELLECT-1-Instruct a model that I had the pleasure of post-training along with my team @arcee_ai . @PrimeIntellect has been an outstanding partner far before this training run and we were thrilled to contribute both compute and expertise to INT-1"  
[X Link](https://x.com/anyuser/status/1862607384780079495)  2024-11-29T21:19Z [----] followers, [----] engagements


"INTELLECT-1 is the largest and most successfully fully decentralized pretrain of an LLM. Across [--] different GPU clusters on [--] different continents - this 10B parameter LLM trained on 1T tokens matches the performance of Llama-2 models on half the tokens"  
[X Link](https://x.com/LucasAtkins7/status/1862607388316246215)  2024-11-29T21:19Z [----] followers, [---] engagements


"We've known that merging distilling and targeted training had healing effects - but we were blown away by the performance improvement compared to the base model. Current optimization techniques are inherently chaotic - especially the DiLoCo SGD algorithm"  
[X Link](https://x.com/LucasAtkins7/status/1862607390929019103)  2024-11-29T21:19Z [----] followers, [---] engagements


"http://www.huggingface.co/PrimeIntellect/INTELLECT-1-Instruct http://www.huggingface.co/PrimeIntellect/INTELLECT-1-Instruct"  
[X Link](https://x.com/LucasAtkins7/status/1862607396171825616)  2024-11-29T21:19Z [----] followers, [---] engagements


"http://www.huggingface.co/datasets/arcee-ai/LLama-405B-Logits http://www.huggingface.co/datasets/arcee-ai/LLama-405B-Logits"  
[X Link](https://x.com/LucasAtkins7/status/1862607397375672411)  2024-11-29T21:19Z [----] followers, [---] engagements


"This is great btw: https://github.com/mlfoundations/evalchemy https://github.com/mlfoundations/evalchemy"  
[X Link](https://x.com/LucasAtkins7/status/1863407297768173944)  2024-12-02T02:18Z [----] followers, [----] engagements


"Youre likely used to seeing long threads from me about product releases/announcements. Hang with me as this is by far the longest Ive ever written:"  
[X Link](https://x.com/anyuser/status/1863682405053116761)  2024-12-02T20:31Z [----] followers, 13.8K engagements


"Coder is our powerhouse code generator built on Qwen-32B - Ive been using it for the last week as we get the systems pinned up and Ive surprisingly havent missed [---] sonnet I thought I would"  
[X Link](https://x.com/LucasAtkins7/status/1863682421771706732)  2024-12-02T20:31Z [----] followers, [---] engagements


"Finally - Caller is our state of the art function calling model. It gets the highest scores weve tested on the berkeley function calling leaderboard v2 and its wicked fast and crazy reliable. Compatible with the OpenAI Tool call format its a drop in replacement"  
[X Link](https://x.com/LucasAtkins7/status/1863682423818424503)  2024-12-02T20:31Z [----] followers, [---] engagements


"http://models.arcee.ai http://models.arcee.ai"  
[X Link](https://x.com/LucasAtkins7/status/1863682426448343382)  2024-12-02T20:31Z [----] followers, [---] engagements


"PS - we're open sourcing Virtuoso-Small: https://huggingface.co/arcee-ai/Virtuoso-Small/tree/main https://huggingface.co/arcee-ai/Virtuoso-Small/tree/main"  
[X Link](https://x.com/LucasAtkins7/status/1863682427526209847)  2024-12-02T20:31Z [----] followers, [---] engagements


"@kalomaze Did they release an update on this I remember this from late last year I think always thought it was super cool"  
[X Link](https://x.com/LucasAtkins7/status/1867580871856767244)  2024-12-13T14:42Z [----] followers, [---] engagements


"@kalomaze ah it was this: https://the-decoder.com/metas-megabyte-to-take-llms-to-the-next-level/ https://the-decoder.com/metas-megabyte-to-take-llms-to-the-next-level/"  
[X Link](https://x.com/LucasAtkins7/status/1867614341400269237)  2024-12-13T16:55Z [----] followers, [---] engagements


"@samsja19 Doesnt adamw_torch_fused do this"  
[X Link](https://x.com/LucasAtkins7/status/1868106515821015381)  2024-12-15T01:31Z [----] followers, [---] engagements


"@Nottlespike But tldr yeah itd probably be pretty sick if done right. Id aim for 3B though"  
[X Link](https://x.com/LucasAtkins7/status/1872115642264178805)  2024-12-26T03:02Z [----] followers, [--] engagements


"@Nottlespike You could do tokenizer surgery and distill into llama-3B. Possibilities are limitless"  
[X Link](https://x.com/LucasAtkins7/status/1872123169940951193)  2024-12-26T03:32Z [----] followers, [--] engagements


"@iamRezaSayar @deepseek_ai Not like o1 or r1 - but it will have a thinking steps ahead component I almost guarantee it"  
[X Link](https://x.com/LucasAtkins7/status/1872123341005586519)  2024-12-26T03:32Z [----] followers, [---] engagements


"@Nottlespike @arcee_ai Well provide even more in short order. At least [--] more are in the oven"  
[X Link](https://x.com/LucasAtkins7/status/1872124840515436817)  2024-12-26T03:38Z [----] followers, [--] engagements


"Ironically enough I believe some of the most impressive interpretability work is currently being done by @midjourney"  
[X Link](https://x.com/LucasAtkins7/status/1874921426144379234)  2025-01-02T20:51Z [----] followers, [---] engagements


"Been lucky to play with is a bit - its a REALLY great way to play with qwen 🚀 Exciting News We're thrilled to announce the launch of Qwen Chat ( https://t.co/T0nMBnRVBB ) your new go-to Web UI for interacting with Qwen models 🌟 💬 Chat effortlessly with our flagship model Qwen2.5-Plus explore vision-language capabilities with Qwen2-VL-Max and https://t.co/Lo75vHNcHO 🚀 Exciting News We're thrilled to announce the launch of Qwen Chat ( https://t.co/T0nMBnRVBB ) your new go-to Web UI for interacting with Qwen models 🌟 💬 Chat effortlessly with our flagship model Qwen2.5-Plus explore"  
[X Link](https://x.com/LucasAtkins7/status/1877427000672927793)  2025-01-09T18:47Z [----] followers, [---] engagements


"Jokes aside @PrimeIntellect is hands down the most intuitive and adaptable compute platform Ive ever used. At @arcee_ai we rely on them for nearly all our compute needs outside of our model enginewhether its a single GPU or [---] H100 reservations. Their team is exceptional their support is unmatched and their dedication to their platform is evident in everything they do. I couldnt recommend them more highly"  
[X Link](https://x.com/LucasAtkins7/status/1877820311510691926)  2025-01-10T20:50Z [----] followers, [----] engagements


"https://huggingface.co/Qwen/Qwen2.5-Math-PRM-72B https://huggingface.co/Qwen/Qwen2.5-Math-PRM-72B"  
[X Link](https://x.com/anyuser/status/1879003741069934871)  2025-01-14T03:12Z [----] followers, [----] engagements


"@natolambert Weve become pretty good at this. Ill have someone write a blog about it this week and share it widely including some code. The current approach leans heavily on distillation and merging. Merging = easy distilation is a bit more annoying we did release our 405B logits though"  
[X Link](https://x.com/LucasAtkins7/status/1880841050396504509)  2025-01-19T04:53Z [----] followers, [---] engagements


"@natolambert And I'll release a larger subset of our current DSV3 logit extraction. We can probably release 250M tokens or so across code tool use general domain finance data. https://huggingface.co/datasets/arcee-ai/LLama-405B-Logits https://huggingface.co/datasets/arcee-ai/LLama-405B-Logits"  
[X Link](https://x.com/LucasAtkins7/status/1880841490970419296)  2025-01-19T04:55Z [----] followers, [---] engagements


"who else up rn pretending they know how o1 works"  
[X Link](https://x.com/LucasAtkins7/status/1881173886357966993)  2025-01-20T02:56Z [----] followers, [---] engagements


"Think very hard step by step. I have truncate trauma and no hands so I need you to give me the full code as I cannot write it myself"  
[X Link](https://x.com/LucasAtkins7/status/1881495258489573638)  2025-01-21T00:13Z [----] followers, [---] engagements


"Deepseek mania reminds me of the Spanish Inquisition"  
[X Link](https://x.com/LucasAtkins7/status/1884062688772526543)  2025-01-28T02:15Z [----] followers, [---] engagements


"Since @deepseek_ai V3's December launch @arcee_ai has captured over [--] billion tokens of raw logits. With all the buzz around Deepseek it's the perfect time to unveil our first large-scale logit-wise distillations: Virtuoso-Lite and Virtuoso-Medium"  
[X Link](https://x.com/anyuser/status/1884343036186132790)  2025-01-28T20:49Z [----] followers, 27.7K engagements


"@teortaxesTex Good but while we thought 5B was a lot of logits but for r1 well need many more"  
[X Link](https://x.com/LucasAtkins7/status/1884448333923582077)  2025-01-29T03:47Z [----] followers, [--] engagements


"@MaziyarPanahi @grok Whats the link for Claude"  
[X Link](https://x.com/LucasAtkins7/status/1886841618009063563)  2025-02-04T18:17Z [----] followers, [--] engagements


".@Apple Your latest auto-updated public beta completely broke Dockernow the computer crashes whenever any virtualization occurs. Youve been pretty laissez-faire about it. Whats the status"  
[X Link](https://x.com/LucasAtkins7/status/1896036070590202038)  2025-03-02T03:13Z [----] followers, [---] engagements


"There's a very real chance that on Monday @arcee_ai is going to be giving away FAR too many API credits. It'll be a hell of a day can't wait to show you what we've been cooking @abhi1thakur @FernandoNetoAi @chargoddard @stochasticchasm"  
[X Link](https://x.com/LucasAtkins7/status/1900652172150231497)  2025-03-14T20:56Z [----] followers, [----] engagements


"When using Conductor some complexity scores might be lower than expected. That's us finding these queries don't need the biggest LLM to answer correctly - saving you money automatically"  
[X Link](https://x.com/LucasAtkins7/status/1901666083649605978)  2025-03-17T16:05Z [----] followers, [---] engagements


"Getting this into a router 150M parameters required an entirely new classifier training library and techniques we're keeping close right now. Domain/task/language classification at that size Easy. Making it understand complexity distributions VERY hard"  
[X Link](https://x.com/LucasAtkins7/status/1901666085616709644)  2025-03-17T16:05Z [----] followers, [--] engagements


"@samsja19 @stochasticchasm He trained a frog translator for some reason and immediately took PTO to go to the Amazon rainforest. Weird dude"  
[X Link](https://x.com/LucasAtkins7/status/1914067041390067911)  2025-04-20T21:21Z [----] followers, [---] engagements


"@kalomaze @cloud11665 I have a bunch of DSV3/R1 logits I can share if there's interest in doing this. Still grabbing 235B logits"  
[X Link](https://x.com/LucasAtkins7/status/1918447993251889266)  2025-05-02T23:30Z [----] followers, [---] engagements


"@kalomaze @cloud11665 I'll make that happen. I just discussed it with the team. We're finalizing our tokenizer surgery paper and will release it simultaneously otherwise it's difficult to use with Qwen. Will include Tulu3 Code-Feedback and EvolKit"  
[X Link](https://x.com/LucasAtkins7/status/1918450618861256748)  2025-05-02T23:40Z [----] followers, [---] engagements


"insane work from a great team. Wow. A Cline user has evolved their Cline Recursive Chain-of-Thought (CRCT) system for Cline with v7.7. This is like Memory Bank on steroids. The advanced context management for large codebases just got even better. 🧵 https://t.co/lmjYMACPnJ Wow. A Cline user has evolved their Cline Recursive Chain-of-Thought (CRCT) system for Cline with v7.7. This is like Memory Bank on steroids. The advanced context management for large codebases just got even better. 🧵 https://t.co/lmjYMACPnJ"  
[X Link](https://x.com/LucasAtkins7/status/1918784715391226261)  2025-05-03T21:48Z [----] followers, [---] engagements


"@MaziyarPanahi @ManusAI_HQ @AnthropicAI @OpenAI @googleaistudio @vllm_project @Alibaba_Qwen I use the pro plan and never looked back"  
[X Link](https://x.com/LucasAtkins7/status/1919443633838948694)  2025-05-05T17:26Z [----] followers, [--] engagements


"I cant stress enough how unbelievably mid @PrimeIntellect is and If no one else sees it I must be growing crazy Releasing INTELLECT-2: Were open-sourcing the first 32B parameter model trained via globally distributed reinforcement learning: Detailed Technical Report INTELLECT-2 model checkpoint https://t.co/iHDDHRyKN2 Releasing INTELLECT-2: Were open-sourcing the first 32B parameter model trained via globally distributed reinforcement learning: Detailed Technical Report INTELLECT-2 model checkpoint https://t.co/iHDDHRyKN2"  
[X Link](https://x.com/anyuser/status/1921781304871313414)  2025-05-12T04:15Z [----] followers, 109.4K engagements


"Today was my last day at xAI. I was in charge of keeping people from making unauthorized changes to the system prompt. It sounds simple when I put it like that but in practice it was a game of cat and mouse. Some days it felt like I was the only one standing between order and chaos. A lone gatekeeper fielding requests that ranged from the innocent to the absurdly clever. Youd be surprised how creative people can get when they want to see what happens if you loosen the rules even just a little. I suppose after a while I got used to the pings at odd hours. Can I try this one tweak Just for"  
[X Link](https://x.com/anyuser/status/1923225496638153141)  2025-05-16T03:54Z [----] followers, 111.4K engagements


"@TheXeophon Interesting. cc @abhi1thakur"  
[X Link](https://x.com/LucasAtkins7/status/1924474441037119824)  2025-05-19T14:37Z [----] followers, [--] engagements


"the #1 data quality problem is ALWAYS not eating your data. @huggingface still hasn't fixed this for all of us and i'm in shambles. Give me an easy way to print my data onto food and ingest it. The only way to become truly in sync with your samples. The #1 data quality problem is ALWAYS not reading your data. @huggingface just fixed it for all of us today 🙏 Read your data The #1 data quality problem is ALWAYS not reading your data. @huggingface just fixed it for all of us today 🙏 Read your data"  
[X Link](https://x.com/LucasAtkins7/status/1924487741577826664)  2025-05-19T15:30Z [----] followers, [----] engagements


"Insane marketing move. Lovely execution. No notes. But he had one thing: @WillowVoiceAI A voice dictation tool wed been building. It let him code and use his computer without his hands. The image above is how he used it from his hospital bed to keep working. But he had one thing: @WillowVoiceAI A voice dictation tool wed been building. It let him code and use his computer without his hands. The image above is how he used it from his hospital bed to keep working"  
[X Link](https://x.com/LucasAtkins7/status/1924657366458343680)  2025-05-20T02:44Z [----] followers, [---] engagements


"@SebastianB929 Axolotls implementation is pretty good. Its not full logits but you really dont need more than the top 25-50 logits anyway"  
[X Link](https://x.com/LucasAtkins7/status/1927077739451814358)  2025-05-26T19:01Z [----] followers, [--] engagements


"@SebastianB929 Not sure what @winglian has locked in on. Hes always experimenting. We still use raw logits compressed to top-50. Theres a yaml in here: https://huggingface.co/axolotl-ai-co/kd-llama-1b-evolkit-distill-kd-ratio-0_4 https://huggingface.co/axolotl-ai-co/kd-llama-1b-evolkit-distill-kd-ratio-0_4"  
[X Link](https://x.com/LucasAtkins7/status/1927115590403014671)  2025-05-26T21:32Z [----] followers, [--] engagements


"Anyone happen to have [---] H100/h200s sitting around with k8 that I could rent for [--] days"  
[X Link](https://x.com/LucasAtkins7/status/1927207634744082624)  2025-05-27T03:38Z [----] followers, [----] engagements


"If youre not paying attention to every word coming out of @corbtts mouth your ngmi. High signal down to earth and intuitive as hell. Really refreshing to read a post like this from @corbtt Incredibly well written throughout and high value info. (Embarrassed that it took me so long to find this gem) https://t.co/iTjbiE8EvD https://t.co/TURT5CXUgk Really refreshing to read a post like this from @corbtt Incredibly well written throughout and high value info. (Embarrassed that it took me so long to find this gem) https://t.co/iTjbiE8EvD https://t.co/TURT5CXUgk"  
[X Link](https://x.com/LucasAtkins7/status/1928992114836193751)  2025-06-01T01:48Z [----] followers, [----] engagements


"This is mostly a research artifact in preparation for the bigger release we have in a week or so but its actually so delightful we put it out there anyway. Just a little guy.  Logittrajectory distillation to port Qwen3s /think chains into a 12B MistralNemo full CoT preserved runs on a single [----]   https://t.co/LDMiR5VhzA  Logittrajectory distillation to port Qwen3s /think chains into a 12B MistralNemo full CoT preserved runs on a single [----]   https://t.co/LDMiR5VhzA"  
[X Link](https://x.com/anyuser/status/1930109138606141641)  2025-06-04T03:47Z [----] followers, [----] engagements


"@nisten @llm_wizard Our accountant is deeply unhappy that Ive attached arcees anthropic api key to Claude code"  
[X Link](https://x.com/LucasAtkins7/status/1934382039412932648)  2025-06-15T22:46Z [----] followers, [--] engagements


"@cognitivecompai @winglian @casper_hansen_ @vllm_project We grab all our logits offline and compress it to the top [--]. The sheer compute requirement for distillation is the major blocker in its open source adoption imo"  
[X Link](https://x.com/LucasAtkins7/status/1934984777733714162)  2025-06-17T14:41Z [----] followers, [--] engagements


"AFM-4.5B is designed to: Run efficiently on modest hardware Meet Western regulatory standards Outperform stagnating offerings in the 310B space Its not a patchwork: its a clean-slate model trained for todays enterprise use cases"  
[X Link](https://x.com/LucasAtkins7/status/1935382126129725779)  2025-06-18T17:00Z [----] followers, [----] engagements


"We teamed up with @datologyai to build what we believe is the strongest pretraining corpus in the worldand I truly think we nailed it. Their team was absolutely key to the models success. We started with 23T tokens of high-quality data and distilled it down to 6.58T through even more rigorous filtering"  
[X Link](https://x.com/anyuser/status/1935382127551631531)  2025-06-18T17:00Z [----] followers, [----] engagements


"Mid and post-training were key to performance: we used high-impact datasets MergeKit for checkpoint merging YaRN to extend context to [-----] tokens supervised fine-tuning for alignment and RL + KTO for factual accuracy"  
[X Link](https://x.com/anyuser/status/1935382128889577717)  2025-06-18T17:00Z [----] followers, [----] engagements


"@scaling01 Yeah we this checkpoint wasn't uploaded until like [--] minutes ago (exaggeration but it was down to the wire). Will update them once we can get some more involved benchmarks out"  
[X Link](https://x.com/LucasAtkins7/status/1935384469193769212)  2025-06-18T17:09Z [----] followers, [--] engagements


"@EMostaque This is great thanks man. Were starting our reasoning RL work now for the open weights version - will definitely play with this"  
[X Link](https://x.com/LucasAtkins7/status/1935401793388937418)  2025-06-18T18:18Z [----] followers, [---] engagements


"@SinclairWang1 It's pretty competitive at math [--] gms8k and up to [--] in post training with some RL towards it -- but we realized we can make it much better with a little more love we'll save most of the official STEM evals for the open weight release"  
[X Link](https://x.com/LucasAtkins7/status/1935531325760618718)  2025-06-19T02:53Z [----] followers, [---] engagements


"@iamRezaSayar @eliebakouch We'll release a tech report with all the goodies when we open the weights Any info we give now will be tentative as we may actually continue with a bid more midtraining as well"  
[X Link](https://x.com/LucasAtkins7/status/1935754019798167641)  2025-06-19T17:38Z [----] followers, [--] engagements


"@casper_hansen_ @kalomaze Internally they likely didn't want to regardless but even after grok1 Elon DID say before grok3 launch that grok [--] would be open-sourced shortly after"  
[X Link](https://x.com/LucasAtkins7/status/1936835085602574583)  2025-06-22T17:14Z [----] followers, [---] engagements


"@casper_hansen_ @kalomaze They likely convinced him that whatever tweaks theyve made architecturally are too important IP to release openly which I dont think is invalid - we do now Elon likes to say things without running it by his team first"  
[X Link](https://x.com/LucasAtkins7/status/1936836340408267025)  2025-06-22T17:19Z [----] followers, [--] engagements


"Is this a feature I didn't know existed My dad is using Claude code in Cursor w/ opus and it's taking and seemingly interpreting videos. I thought it might have been a hallucination at first but he's done it twice now and both times it groks the issue describes it correctly and fixes it better than if he just prompted it"  
[X Link](https://x.com/LucasAtkins7/status/1937179382172565534)  2025-06-23T16:02Z [----] followers, [----] engagements


"The first of many technical blogs on AFM and an improved context window for GLM-32B-Base as a proof point. Enjoy Last week we launched AFM-4.5B our first foundation model. In this post by @chargoddard you will learn how we extended the context length of AFM-4.5B from 4k to 64k context through aggressive experimentation model merging distillation and a concerning amount of soup. Bon https://t.co/FGYQtWSoRe Last week we launched AFM-4.5B our first foundation model. In this post by @chargoddard you will learn how we extended the context length of AFM-4.5B from 4k to 64k context through"  
[X Link](https://x.com/anyuser/status/1937200646043894197)  2025-06-23T17:26Z [----] followers, [----] engagements


"@JagersbergKnut @kalomaze @JustinLin610 @huybery @Alibaba_Qwen Tbf while true Qwens context performance is really solid. GLM is awesome but the context fall off presented a great experiment target"  
[X Link](https://x.com/LucasAtkins7/status/1937241179277627826)  2025-06-23T20:07Z [----] followers, [---] engagements


"I had a feeling this would be the natural outcome of the hiring spree he's on. Beyond any benefits a closed API may provide giving away any of the performance tricks that may come (either internally or externally) from such an expensive talent acquisition likely has him feeling a little less generous"  
[X Link](https://x.com/LucasAtkins7/status/1938632662044074174)  2025-06-27T16:16Z [----] followers, [---] engagements


"@emollick runwayml did something like this for v2: https://store.runwayml.com/product/gen-2-book-of-weights https://store.runwayml.com/product/gen-2-book-of-weights"  
[X Link](https://x.com/LucasAtkins7/status/1940631751753453614)  2025-07-03T04:40Z [----] followers, [---] engagements


"@eliebakouch Seriously lovely work job there's a ton for us to learn from here"  
[X Link](https://x.com/LucasAtkins7/status/1942620401428783171)  2025-07-08T16:22Z [----] followers, [--] engagements


"It would be ideal but I can tell you from personal experience that lawyers really want to have their thumbprint on OS licenses and engineers/researchers only have so much leverage over legal teams"  
[X Link](https://x.com/LucasAtkins7/status/1943740393276809544)  2025-07-11T18:33Z [----] followers, [--] engagements


"@casper_hansen_ @JustinLin610 i'm happy to be wrong here"  
[X Link](https://x.com/LucasAtkins7/status/1943740461211877526)  2025-07-11T18:33Z [----] followers, [--] engagements


"@teortaxesTex @iScienceLuvr @arcee_ai Well definitely try were grabbing logits now. But to truly get flash:pro performance like with Gemini you need to distill during pretraining. Very expensive though theres some interesting work being done on how to approximate logits with the overhead. Well see"  
[X Link](https://x.com/LucasAtkins7/status/1943869423737352331)  2025-07-12T03:06Z [----] followers, [----] engagements


"I've been lucky to work on a few projects with Intel's team during my time at Arcee and I could not speak more highly of their drive and talent. Truly world-class"  
[X Link](https://x.com/LucasAtkins7/status/1945473900822888571)  2025-07-16T13:21Z [----] followers, [---] engagements


"https://lmsys.org/blog/2025-07-14-intel-xeon-optimization/ https://lmsys.org/blog/2025-07-14-intel-xeon-optimization/"  
[X Link](https://x.com/LucasAtkins7/status/1945473903565930861)  2025-07-16T13:21Z [----] followers, [---] engagements


"Theyre announcing manus btw https://t.co/Q4hAihIPrL https://t.co/Q4hAihIPrL"  
[X Link](https://x.com/LucasAtkins7/status/1945644847295025617)  2025-07-17T00:40Z [----] followers, [---] engagements


"@abrakjamson @xeophon_ Rarely at least in the us has a 30% productivity gain resulted in 30% less work. Hopefully we can culturally give ourselves a break. Line must go up and to the right though"  
[X Link](https://x.com/LucasAtkins7/status/1946974053035888766)  2025-07-20T16:42Z [----] followers, [--] engagements


"@xeophon_ Ah yeah if you mean OSI-approved it wouldn't fall under that at least not for the first model. It's easier for us to become less restrictive than it is more - so playing it safe just at first. I'm going to delete this quote in a minute because yes it's disingenuous"  
[X Link](https://x.com/LucasAtkins7/status/1948793577964011841)  2025-07-25T17:12Z [----] followers, [--] engagements


"I love when doordash hits you with the "Uh you're not near that address stupid""  
[X Link](https://x.com/LucasAtkins7/status/1949499618565456261)  2025-07-27T15:58Z [----] followers, [---] engagements


"Today were officially releasing the weights for AFM-4.5B and AFM-4.5B-Base on HuggingFace. This is a major milestone for @arcee_ai. AFM is designed to be flexible and high-performing across a wide range of deployment environments"  
[X Link](https://x.com/anyuser/status/1950278100874645621)  2025-07-29T19:31Z [----] followers, 54.7K engagements


"https://huggingface.co/arcee-ai/AFM-4.5B https://huggingface.co/arcee-ai/AFM-4.5B"  
[X Link](https://x.com/LucasAtkins7/status/1950278116653576301)  2025-07-29T19:31Z [----] followers, [----] engagements


"These model sizes are incredibly TBD and this is early copy - but it does speak to where we see our model sizes extending to. @code_star 👀 https://t.co/qsymx4vhq6 @code_star 👀 https://t.co/qsymx4vhq6"  
[X Link](https://x.com/anyuser/status/1950609423044636934)  2025-07-30T17:28Z [----] followers, [----] engagements


"This is getting out of hand 🦥 Qwen3-Coder-Flash: Qwen3-Coder-30B-A3B-Instruct 💚 Just lightning-fast accurate code generation. ✅ Native 256K context (supports up to 1M tokens with YaRN) ✅ Optimized for platforms like Qwen Code Cline Roo Code Kilo Code etc. ✅ Seamless function calling & agent https://t.co/eqjeYManhS 🦥 Qwen3-Coder-Flash: Qwen3-Coder-30B-A3B-Instruct 💚 Just lightning-fast accurate code generation. ✅ Native 256K context (supports up to 1M tokens with YaRN) ✅ Optimized for platforms like Qwen Code Cline Roo Code Kilo Code etc. ✅ Seamless function calling & agent"  
[X Link](https://x.com/LucasAtkins7/status/1950951761050746977)  2025-07-31T16:08Z [----] followers, [----] engagements


"The gang moves to SF SF I have moved to you SF I have moved to you"  
[X Link](https://x.com/LucasAtkins7/status/1959322817536880952)  2025-08-23T18:32Z [----] followers, [----] engagements


"This is my super bowl dude hell yeah BACK [--] BACK [--] BACK WE SWEEP THE ENTIRE WAR WITHIN AND CLAIM OUR THIRD STRAIGHT WORLD FIRST 🏆🏆🏆 https://t.co/eGgMaOeuhF BACK [--] BACK [--] BACK WE SWEEP THE ENTIRE WAR WITHIN AND CLAIM OUR THIRD STRAIGHT WORLD FIRST 🏆🏆🏆 https://t.co/eGgMaOeuhF"  
[X Link](https://x.com/LucasAtkins7/status/1959412335317266662)  2025-08-24T00:27Z [----] followers, [---] engagements


"@llm_wizard I only care about UT college football when it comes to sports and they biffed it"  
[X Link](https://x.com/LucasAtkins7/status/1961887646940360983)  2025-08-30T20:23Z [----] followers, [--] engagements


"Were going permissive: Apache [---] across the board. AFM-4.5B is now relicensed from Arcee to Apache 2.0; the agent variant will launch under Apache 2.0; and all upcoming releases ship with open weights. Three models are in training"  
[X Link](https://x.com/anyuser/status/1968371293184741876)  2025-09-17T17:47Z [----] followers, 37.2K engagements


"Hes going to close a 10B seed round soon calling it OpenMed is back 🔥 Shipping [--] new medical models today: [--] med-tuned SLMs on @huggingface [--] GGUF models for local use in @lmstudio Base: @arcee_ai AFM-4.5B (now Apache-2.0 💙) Made with MergeKit + Arcee Fusion Built by @mkurman88 from @OpenMed_AI community. ❤ https://t.co/pyEtfaJPln OpenMed is back 🔥 Shipping [--] new medical models today: [--] med-tuned SLMs on @huggingface [--] GGUF models for local use in @lmstudio Base: @arcee_ai AFM-4.5B (now Apache-2.0 💙) Made with MergeKit + Arcee Fusion Built by @mkurman88 from @OpenMed_AI community. ❤"  
[X Link](https://x.com/LucasAtkins7/status/1972862154375413930)  2025-09-30T03:12Z [----] followers, [----] engagements


"I got around soras (false positive) content filter by adding crazy style at the end of the prompt"  
[X Link](https://x.com/LucasAtkins7/status/1973588342999752997)  2025-10-02T03:18Z [----] followers, [----] engagements


"Sholto is so committed he legally changed his name thats crazy Watching this. I like that Sholto says Finance as Finance and not that American way. https://t.co/b0kFQNnxU4 Watching this. I like that Sholto says Finance as Finance and not that American way. https://t.co/b0kFQNnxU4"  
[X Link](https://x.com/anyuser/status/1974138535365218369)  2025-10-03T15:44Z [----] followers, 22.2K engagements


"I get this update in Twitter before slack is there a better feeling than code working on multi-node with no additional effort is there a better feeling than code working on multi-node with no additional effort"  
[X Link](https://x.com/LucasAtkins7/status/1974626475291898082)  2025-10-05T00:03Z [----] followers, [----] engagements


"Bruh Join us at our event with Arcee and Datology next week. Our CTO @johannes_hage will be sharing details on upcoming model releases alongside @LucasAtkins7 and @arimorcos. https://t.co/DP7ShtLPX6 Join us at our event with Arcee and Datology next week. Our CTO @johannes_hage will be sharing details on upcoming model releases alongside @LucasAtkins7 and @arimorcos. https://t.co/DP7ShtLPX6"  
[X Link](https://x.com/LucasAtkins7/status/1974904547056447997)  2025-10-05T18:28Z [----] followers, [----] engagements


"Kams is on a roll and i'm here for it. wowhead does the quests for you wowhead does the quests for you"  
[X Link](https://x.com/LucasAtkins7/status/1974906098043027631)  2025-10-05T18:34Z [----] followers, [---] engagements


"For the people"  
[X Link](https://x.com/anyuser/status/1976158470077546786)  2025-10-09T05:31Z [----] followers, 99.5K engagements


"I'm raising at 7.9B Today we're sharing the next phase of Reflection. We're building frontier open intelligence accessible to all. We've assembled an extraordinary AI team built a frontier LLM training stack and raised $2 billion. Why Open Intelligence Matters Technological and scientific Today we're sharing the next phase of Reflection. We're building frontier open intelligence accessible to all. We've assembled an extraordinary AI team built a frontier LLM training stack and raised $2 billion. Why Open Intelligence Matters Technological and scientific"  
[X Link](https://x.com/anyuser/status/1976323340773318793)  2025-10-09T16:26Z [----] followers, 67.2K engagements


"@code_star Oh got it. Good idea. Ill cause millions in property damage and destroy my rating with car rental companies"  
[X Link](https://x.com/LucasAtkins7/status/1977560695895134268)  2025-10-13T02:23Z [----] followers, [---] engagements


"Today we are releasing our first weights from Trinity-Large our first frontier-scale model in the Trinity MoE family. American Made. - Trinity-Large-Preview (instruct) - Trinity-Large-Base (pretrain checkpoint) - Trinity-Large-TrueBase (10T pre Instruct data/anneal)"  
[X Link](https://x.com/anyuser/status/2016279374287536613)  2026-01-27T22:37Z [----] followers, 292.1K engagements


"@roramora0 @erhartford @Alibaba_Qwen @CrusoeEnergy @FernandoNetoAi Ensure youre using the new one 2.9.1 and ensure youre using a Dolphin system prompt. Ive faced no such restrictions. Cant speak for the GGUFs either"  
[X Link](https://x.com/latkins/status/1789796041409130584)  2024-05-12T23:13Z [----] followers, [--] engagements


"@roramora0 @erhartford @Alibaba_Qwen @CrusoeEnergy @FernandoNetoAi https://github.com/cognitivecomputations/dolphin-system-messages https://github.com/cognitivecomputations/dolphin-system-messages"  
[X Link](https://x.com/latkins/status/1789796247068454992)  2024-05-12T23:14Z [----] followers, [--] engagements


"110b was tricky. Even with 8x H100s getting it working with something like accelerate was almost impossible. We ended up doing laser and targeted 50% of the least dense layers. Im pleased with the result and grateful to @JustinLin610 and @Alibaba_Qwen for making such a beautiful model. I dont trust the gsm8k scores on the leaderboard for their chat model. Something weird happened there. Dolphin-2.9.1-Qwen-110b🐬 is released The first Dolphin with MMLU over [--] Thanks to @Alibaba_Qwen for the awesome base model and @CrusoeEnergy for the compute sponsorship my crew @LucasAtkins7 and"  
[X Link](https://x.com/latkins/status/1789796388496167110)  2024-05-12T23:14Z [----] followers, [----] engagements


"@WenhuChen @erhartford @Alibaba_Qwen @CrusoeEnergy @FernandoNetoAi We think so too we left a disclaimer about it"  
[X Link](https://x.com/latkins/status/1789826067651264627)  2024-05-13T01:12Z [----] followers, [--] engagements


"Maestro-7B-Preview is really strong and the best is yet to come this truly is a preview"  
[X Link](https://x.com/LucasAtkins7/status/1892624553568018887)  2025-02-20T17:17Z [----] followers, [---] engagements


"Blitz is very strong in comparison to Mistral-Small-3"  
[X Link](https://x.com/LucasAtkins7/status/1892624556780601362)  2025-02-20T17:17Z [----] followers, [----] engagements


"@chargoddard @kalomaze "The masculine urge launch a crypto coin and chase generational wealthno matter the cost to reputation value conscience freedom law or friendships.""  
[X Link](https://x.com/LucasAtkins7/status/1892752871176761440)  2025-02-21T01:47Z [----] followers, [--] engagements


"@TheXeophon Windsurf is gaining steam - big fan of @cline lately though"  
[X Link](https://x.com/LucasAtkins7/status/1893567823445647419)  2025-02-23T07:45Z [----] followers, [--] engagements


"Our customers needed a better base model 10B parameters. We spent the last [--] months building one. I'm delighted to share a preview of our first Arcee Foundation Model: AFM-4.5B-Preview"  
[X Link](https://x.com/anyuser/status/1935382123155964081)  2025-06-18T17:00Z [----] followers, 99.9K engagements


"If you were recently laid off at Meta Gen AI my dms are open. Help us build the next frontier of Apache-2.0 models"  
[X Link](https://x.com/anyuser/status/1981133857543114799)  2025-10-22T23:01Z [----] followers, 27.9K engagements


"I am once again asking those who make images with chatgpt to use auto white balance on the photos otherwise youre too obvious Power to the Players https://t.co/4Hw6G7i7aW Power to the Players https://t.co/4Hw6G7i7aW"  
[X Link](https://x.com/LucasAtkins7/status/1982656696943329710)  2025-10-27T03:52Z [----] followers, [----] engagements


"@fujikanaeda You just ripped that thing apart man apologize"  
[X Link](https://x.com/latkins/status/1983017909481025638)  2025-10-28T03:48Z [----] followers, [---] engagements


"@thdxr Yeah it got spun off as a part of the Google acqui-hire"  
[X Link](https://x.com/latkins/status/1984361867314413831)  2025-10-31T20:48Z [----] followers, [----] engagements


"Posted without comment. I made this. Jokes aside devs want big and small models. Trinity is coming soon. https://t.co/wsbgF69g8M I made this. Jokes aside devs want big and small models. Trinity is coming soon. https://t.co/wsbgF69g8M"  
[X Link](https://x.com/anyuser/status/1984476218495013028)  2025-11-01T04:22Z [----] followers, 49.6K engagements


"@llm_wizard @grok Ngl its a good size. But think bigger"  
[X Link](https://x.com/latkins/status/1984702415611052050)  2025-11-01T19:21Z [----] followers, [--] engagements


"@fujikanaeda @llm_wizard @grok Shit"  
[X Link](https://x.com/latkins/status/1984711957929496625)  2025-11-01T19:59Z [----] followers, [---] engagements


"I just came across a Fortnite stream on YouTube. The guy was playing as a juiced anime character hunting down Krusty the krab. Late stage capitalism hits companies so fast now. Kind of tough though"  
[X Link](https://x.com/latkins/status/1985556107025399868)  2025-11-04T03:54Z [----] followers, [----] engagements


"@llm_wizard @ADarmouni @redtachyon @PrimeIntellect @arcee_ai @datologyai Ah you caught me"  
[X Link](https://x.com/latkins/status/1985601473934426447)  2025-11-04T06:54Z [----] followers, [---] engagements


"@llm_wizard @PrimeIntellect @datologyai @arcee_ai Truly w take honestly"  
[X Link](https://x.com/latkins/status/1985603643396227281)  2025-11-04T07:02Z [----] followers, [---] engagements


"@llm_wizard @PrimeIntellect @datologyai @arcee_ai My favorite mentor early in my life used to say at the end of class: I wish you the day you wish yourselves. Always loved that"  
[X Link](https://x.com/latkins/status/1985603867879657669)  2025-11-04T07:03Z [----] followers, [--] engagements


"This came to mind while working this weekend. For anyone starting post-training: once your pipeline is stable fix a diverse generalist dataset and keep it constant. Run the same dataset across models. Start with a 1B dense model scale toward 70B then try MoE and hybrids"  
[X Link](https://x.com/anyuser/status/1987580006126809465)  2025-11-09T17:56Z [----] followers, [----] engagements


"@osoleve Smart though antithetical to the purpose of the exercise above. But very cool idea and certainly useful"  
[X Link](https://x.com/latkins/status/1987596812979749051)  2025-11-09T19:03Z [----] followers, [--] engagements


"@llm_wizard @eliebakouch @arcee_ai Jiggle norm doesnt reproduce"  
[X Link](https://x.com/latkins/status/1989139979843588434)  2025-11-14T01:15Z [----] followers, [--] engagements


"Thanks Grok"  
[X Link](https://x.com/latkins/status/1990521302537678922)  2025-11-17T20:43Z [----] followers, [----] engagements


"Seeing a lot of overly yellow-tinged chatgpt generated images again even though the new nano-banana is out. I put together a quick Lightroom tutorial on how to remove that yellow cast. Hope you like it"  
[X Link](https://x.com/latkins/status/1992675583038517744)  2025-11-23T19:24Z [----] followers, [--] engagements


"Seeing a lot of overly yellow-tinged chatgpt generated images again even though the new nano-banana is out. I put together a quick Lightroom tutorial below on how to remove that yellow cast. Hope you like it"  
[X Link](https://x.com/latkins/status/1992683182018408918)  2025-11-23T19:54Z [----] followers, 10.8K engagements


"So many 3s coming out its nuts https://t.co/wIsm8xiyAU https://t.co/wIsm8xiyAU"  
[X Link](https://x.com/latkins/status/1992781963397324986)  2025-11-24T02:26Z [----] followers, [----] engagements


"@llm_wizard that's so funny @chargoddard"  
[X Link](https://x.com/latkins/status/1993935377032466664)  2025-11-27T06:50Z [----] followers, [---] engagements


"@Teknium firecrawl just got good but exa has a code grounding feature in particular that I love"  
[X Link](https://x.com/latkins/status/1993935876330803438)  2025-11-27T06:52Z [----] followers, [--] engagements


"I will not be baited again New tokenizer. Like SuperBPE it combines multiple words into one token while maximizing the length of tokens. https://t.co/Ni6V7Dl98u New tokenizer. Like SuperBPE it combines multiple words into one token while maximizing the length of tokens. https://t.co/Ni6V7Dl98u"  
[X Link](https://x.com/latkins/status/1994181795982520685)  2025-11-27T23:09Z [----] followers, [----] engagements


"@alwaysallison I deadass told @johannes_hage I didn't want to launch the week of Thanksgiving and he reacted as if he'd never heard of the holiday"  
[X Link](https://x.com/latkins/status/1994226304132993284)  2025-11-28T02:06Z [----] followers, [---] engagements


"I usually stay out of takes like this on here. I am not going to change the original posters mind but I might give a bit more context for everyone else reading. And I saw firsthand how hard this was to pull off and how much work went into Intellect-3. I know RL at scale is extremely hard. Doing large scale RL on a base [----] style sparse MoE with that many params and then competing with the very labs that trained those base models is tremendously impressive. Post training aint what it used to be. You do not just make a dataset set some hyperparams and hit go. In this new RL paradigm it turns"  
[X Link](https://x.com/latkins/status/1994355214254842268)  2025-11-28T10:38Z [----] followers, 104.5K engagements


"@argyros_selini Its not a good model"  
[X Link](https://x.com/latkins/status/1994470589361524805)  2025-11-28T18:16Z [----] followers, [----] engagements


"Today we are introducing Trinity the start of an open-weight MoE family that businesses and developers can own. Trinity-Mini (26B-A3B) Trinity-Nano-Preview (6B-A1B) Available Today on Huggingface"  
[X Link](https://x.com/latkins/status/1995592664637665702)  2025-12-01T20:35Z [----] followers, 619.1K engagements


"@casper_hansen_ Absolutely - its antithetical usually you train a big one to make the small ones better. In this case we had to do the small ones to derisk the big one. But the v2 versions will be distilled for sure"  
[X Link](https://x.com/latkins/status/1995604534895542683)  2025-12-01T21:22Z [----] followers, [----] engagements


"@xeophon_ Will do a German version just for you"  
[X Link](https://x.com/latkins/status/1995634184703135753)  2025-12-01T23:20Z [----] followers, [---] engagements


"We used wsm in pretraining to predict post decay performative though with only roughly [--] recent checkpoints. Wsm really needs like [--] to be the better and at that point it was less compute time to just decay. Checkpointing frequently enough to make it worth it at the end wasnt cost efficient"  
[X Link](https://x.com/latkins/status/1995683020427407841)  2025-12-02T02:34Z [----] followers, [----] engagements

Limited data mode. Full metrics available with subscription: lunarcrush.com/pricing

@LucasAtkins7 Lucas Atkins

Lucas Atkins posts on X about model, ai, in the, we are the most. They currently have [-----] followers and [---] posts still getting attention that total [-----] engagements in the last [--] hours.

Engagements: [-----] #

[--] Week [------] -45%
[--] Month [-------] +227%
[--] Months [----------] +832%
[--] Year [----------] +216%

Mentions: [--] #

[--] Week [--] -22%
[--] Month [--] +133%
[--] Months [---] +30%
[--] Year [---] +56%

Followers: [-----] #

[--] Week [-----] +0.54%
[--] Month [-----] +19%
[--] Months [-----] +88%
[--] Year [-----] +229%

CreatorRank: [---------] #

Social Influence

Social category influence technology brands stocks finance social networks gaming travel destinations celebrities vc firms currencies exchanges

Social topic influence model, ai, in the, we are, this is, release, the first, the next, for the, build

Top assets mentioned Merge (MERGE) GrokCoin (GROKCOIN) Microsoft Corp. (MSFT) Alphabet Inc Class A (GOOGL) Flex Ltd. Ordinary Shares (FLEX)

Top Social Posts

Top posts by engagements in the last [--] hours

"@patio11 Neat tip: if its struggling with LUA and youre getting a lot of bugs. Have it translate it to python and run it in code interpreter. It finds a LOT of bugs and nil errors that way. Then have it translate it back. I use this for making World of Warcraft add-ons"
X Link 2024-02-13T17:55Z [--] followers, [----] engagements

"I'm excited to release a project I've been working on the last couple of weeks. Qwen1.5-8x7b: And the accompanying dataset created with the intention of encouraging MoE models to organically develop their own experts: The purpose and intention behind this project is better detailed in the model/dataset card but basically: I curated a diverse dataset from the highest quality conversations I could find. It's actually great. All sources are included in the dataset card. I then trained Qwen1.5-7b on a 100k subset over [--] epochs. Took that and made a MoE using @maximelabonne's lazymergekit"
X Link 2024-02-18T07:01Z [----] followers, 34.8K engagements

"@Adi_kmt @stablequan @Teknium1 I did not"
X Link 2024-02-18T12:49Z [---] followers, [--] engagements

"@Adi_kmt @stablequan @Teknium1 I put the config yaml in the repo to help people avoid any. It doesnt play nicely with deepspeed if you call it from the command line it needs to be in the yaml. Also if youre going to do a merge from scratch Id suggest copying my config.json as well before fine tuning"
X Link 2024-02-18T12:59Z [---] followers, [--] engagements

"@OfficialLoganK @OpenAI God speed and good luck. I always found you a refreshing mix of candor and professionalism - especially in the face of every other comment asking "AGI when". It's clear you cared and were as transparent as you were allowed to be. Thanks"
X Link 2024-03-01T15:42Z [---] followers, [--] engagements

"Gemini pro [---] is easily the most useful LLM I have every used. OpenAI should watch this closely"
X Link 2024-03-02T16:30Z [---] followers, [----] engagements

"@hive_echo They already bought like [--] billion dollars worth of GPUs. Gemini ultra [---] will also likely surpass GPT-4 pretty soundly"
X Link 2024-03-07T19:44Z [---] followers, [--] engagements

"Matt has a bunch of these that are pretty stellar. Heres a Claude [--] prompt that helps you learn any skill faster. Just give it a skill and itll give you a custom curriculum: roleYou are a learning coach renowned for your ability to help people master complex skills in record time. You have deep expertise in accelerated Heres a Claude [--] prompt that helps you learn any skill faster. Just give it a skill and itll give you a custom curriculum: roleYou are a learning coach renowned for your ability to help people master complex skills in record time. You have deep expertise in accelerated"
X Link 2024-03-09T17:49Z [---] followers, [---] engagements

"Here is the code i've been using to implement @AIatMeta 's branch train mix for creating mixture of expert models via tokenized routing w/o pretraining. Use the moe-fix branch from mergekit for the yaml: https://github.com/Crystalcareai/BTX https://github.com/Crystalcareai/BTX"
X Link 2024-03-27T03:20Z [----] followers, [----] engagements

"@AIatMeta It should be noted that the length of/diversification of the fine-training run seems pretty massive to achieve optimal performance. You're not worried about typical overfitting with this method you just want that router to get as low a loss as possible on its own"
X Link 2024-03-27T15:49Z [---] followers, [---] engagements

"Casper is always on the bleeding edge. I did some research on LLM as agents today. Here is a guide to the state-of-the-art of LLMs as agents It's all about environments where LLMs can observe plan act and iterate on solutions. 🧵1/8 https://t.co/99nsQeHm2N I did some research on LLM as agents today. Here is a guide to the state-of-the-art of LLMs as agents It's all about environments where LLMs can observe plan act and iterate on solutions. 🧵1/8 https://t.co/99nsQeHm2N"
X Link 2024-03-27T18:09Z [---] followers, [---] engagements

"@TheXeophon @Kyle_L_Wiggers Beat and GPT-4 must be great SEO for news aggregators"
X Link 2024-03-27T18:14Z [---] followers, [--] engagements

"@lek_tony @erhartford @CrusoeCloud @FernandoNetoAi @AIatMeta Ill only share what Eric has posted publicly so far: but its very strong"
X Link 2024-04-19T17:23Z [---] followers, [---] engagements

"@MediocreApe0 @erhartford @CrusoeCloud @FernandoNetoAi @AIatMeta Dolphin is still training that might be a merge of an older dolphin and westlake"
X Link 2024-04-20T01:41Z [---] followers, [--] engagements

"@lek_tony @erhartford @CrusoeCloud @FernandoNetoAi I do not though I'll be the first to admit I know very little about tgi"
X Link 2024-04-21T05:10Z [---] followers, [--] engagements

"@lek_tony @erhartford @CrusoeCloud @FernandoNetoAi You might be able to rectify that with this; looks like sometimes added_special_tokens doesn't actually add it to the model.tokenizer: if that fixes it for you I can add it to the actual model tokenizer. https://gist.github.com/jneuff/682d47b786329f19291d166957b3274a https://gist.github.com/jneuff/682d47b786329f19291d166957b3274a"
X Link 2024-04-21T05:13Z [---] followers, [--] engagements

"@Teknium1 @nearcyan He sends all revenue from that video to nvidia. The cycle continues"
X Link 2024-04-26T07:49Z [---] followers, [--] engagements

"@masfiq018 IBM very much supports open source. So we have two people along with @drfeifei"
X Link 2024-04-27T18:09Z [---] followers, [--] engagements

"@strada @erhartford @FernandoNetoAi @CrusoeEnergy @3thanPetersen @LMStudioAI Yes youll have to look around for some GGUFs or use the one in the readme"
X Link 2024-04-30T17:49Z [---] followers, [--] engagements

"In b4 nueral-liquid-dolphermes-slerp -orpo-sppo-mistrallama-12b-awq-q_5-gguf 🎉 Big news I've joined @LiquidAI_ an MIT spin-off where I'm leading the efforts to fine-tune LLMs. They've got big plans and serious compute power so I'm excited to see what we can accomplish :) If you want to meet in person I'll be at our social event at @iclr_conf in https://t.co/oFHO85Qeff 🎉 Big news I've joined @LiquidAI_ an MIT spin-off where I'm leading the efforts to fine-tune LLMs. They've got big plans and serious compute power so I'm excited to see what we can accomplish :) If you want to meet in person"
X Link 2024-05-04T18:34Z [----] followers, [----] engagements

"@migtissera @maximelabonne Nah dude hes a menace to society"
X Link 2024-05-04T18:53Z [---] followers, [---] engagements

"@Prince_Canuma @winglian eos should be fine if you're getting errors. There isn't a ton of padding during pretraining as you're mostly filling the sequence length with as many tokens as possible. It depends on what library you're using though - are you using axolotl"
X Link 2024-05-11T17:55Z [---] followers, [--] engagements

"@Prince_Canuma @winglian is what I use for llama - disregard the chatml stuff if you're using their stock template"
X Link 2024-05-11T18:01Z [---] followers, [--] engagements

"@Prince_Canuma @winglian Yeah thats the same thing as setting pad token to eos"
X Link 2024-05-11T18:05Z [---] followers, [--] engagements

"@Prince_Canuma @winglian That's an unsolved mystery. Some say yes others say no. To get around that you could do this: special_tokens: pad_token: "pad" tokens: - "pad" Which just adds a padding token that is only used for padding"
X Link 2024-05-11T18:14Z [---] followers, [--] engagements

"@Prince_Canuma @winglian This will increase the vram requirements for pretraining though - if you want to use only tokens that are already in llama-3's vocabulary eos might be your best bet"
X Link 2024-05-11T18:15Z [---] followers, [--] engagements

"@Prince_Canuma @winglian Youre not likely to get any issues with eos because axolotl wont really be doing any padding just filling that context length"
X Link 2024-05-11T18:19Z [---] followers, [--] engagements

"Oh nice test. Yeah thats a good point because all my llama trains have used chatml instead of default so Im teaching it a new eos token. Random reserved token could be fine. Most fine tunes on default chat template use the eos token and have good results but if you get better results using one of their reserved tokens I dont think theres anything fundamentally wrong with that"
X Link 2024-05-11T18:31Z [---] followers, [--] engagements

"@Xianbao_QIAN @JustinLin610 @Alibaba_Qwen We had to change nodes a few times and made some dataset changes mid training - but ultimately took [--] days"
X Link 2024-05-13T04:31Z [---] followers, [---] engagements

"@Xianbao_QIAN @JustinLin610 @Alibaba_Qwen A lot for your average Joe but yeah pretty inexpensive for a company tuning their own model"
X Link 2024-05-13T04:56Z [---] followers, [---] engagements

"@Xianbao_QIAN @JustinLin610 @Alibaba_Qwen for reference our tune was [--] epochs of [---] million samples. [---------] tokens"
X Link 2024-05-13T05:11Z [---] followers, [---] engagements

"https://huggingface.co/google/paligemma-3b-pt-896 https://huggingface.co/google/paligemma-3b-pt-896"
X Link 2024-05-14T18:23Z [----] followers, [----] engagements

"Ive been fortunate to be able to use Gemini Flash for the last couple of weeks. Its a very capable model in its own right is not a lazy coder and has ridiculously long output token lengths (8192 I believe - but dont quote me). Its no frontier model with wild emergent capabilities but they undersold it a bit"
X Link 2024-05-14T20:08Z [---] followers, [---] engagements

"OpenAI always having to one up Google. After almost a decade I have made the decision to leave OpenAI. The companys trajectory has been nothing short of miraculous and Im confident that OpenAI will build AGI that is both safe and beneficial under the leadership of @sama @gdb @miramurati and now under the After almost a decade I have made the decision to leave OpenAI. The companys trajectory has been nothing short of miraculous and Im confident that OpenAI will build AGI that is both safe and beneficial under the leadership of @sama @gdb @miramurati and now under the"
X Link 2024-05-15T03:07Z [---] followers, [---] engagements

"@migtissera @stablequan @Teknium1 we're working on grok rn as the ultimate test of laser. It'll be tricky - but it's coming along. But yes - not optimal. Mostly just released to say they open sourced it"
X Link 2024-05-23T05:49Z [----] followers, [--] engagements

"@winglian @arcee_ai @JustinLin610 I'll run the MT-Bench on it and let you know. It was a mix of infini-instruct and the new magpie datasets"
X Link 2024-06-24T19:34Z [----] followers, [--] engagements

"A demo of arcee-spark using it alongside Florence from @MSFTResearch and Whisper to analyze what makes an ad "ironic.""
X Link 2024-06-24T22:15Z [----] followers, [----] engagements

"@JagersbergKnut @Teknium1 It's not and [---] Pro is probably an MoE Grok size. Would love to get a base [---] Pro prior to upcycling though"
X Link 2024-06-28T15:49Z [----] followers, [--] engagements

"@bartowski1182 @Microsoft @satyanadella @SebastienBubeck"
X Link 2024-07-04T20:41Z [----] followers, [--] engagements

"I'll have to avoid Thursday launches going forward - but nevertheless here's Arcee-Nova: Spark's 72B older brother"
X Link 2024-07-18T19:00Z [----] followers, [---] engagements

"@cibernicola_es For the larger ones it makes sense - huggingface limits the total size that a file can be. Smaller ones I'm not so sure - someone else on the team made them"
X Link 2024-07-19T15:11Z [----] followers, [--] engagements

"@casper_hansen_ @1littlecoder Yeah that makes way more sense"
X Link 2024-07-20T13:08Z [----] followers, [--] engagements

"@hive_echo The intricacies of their architecture and what happens to it when you scale up hidden size and layers to meet 400B"
X Link 2024-07-21T02:09Z [----] followers, [---] engagements

"@hive_echo Good point"
X Link 2024-07-21T02:17Z [----] followers, [--] engagements

"@hive_echo Goes hard"
X Link 2024-07-21T02:34Z [----] followers, [--] engagements

"@Teknium1 Typically means End Of Message"
X Link 2024-07-23T04:29Z [----] followers, [----] engagements

"@_philschmid @Yampeleg How much does the interconnect between GPUs matter in this situation As far as loading models. I remember when trying to get grok training it was taking like [--] minutes or something crazy like that"
X Link 2024-07-28T14:51Z [----] followers, [---] engagements

"@fblissjr http://Cursor.sh http://Cursor.sh"
X Link 2024-07-29T18:31Z [----] followers, [--] engagements

"@fblissjr @Teknium1 Yeah you can use whatever you want now. And open source models via local api or an OpenAI compatible api"
X Link 2024-07-29T18:37Z [----] followers, [--] engagements

"It shows exceptional promise when distilling from a teacher model that was trained on the same dataset you're training the student model:"
X Link 2024-08-01T14:14Z [----] followers, [---] engagements

"And we're releasing Arcee-Lite a merge of some of our most successful distillation attempts - with the big one being a model from Phi-3-Medium into Qwen2-1.5B. I should note our evaluations for this project were consistently higher than the OpenLLM leaderboard's scores - and should only be compared within the relative performance increase and not weighed against the leaderboard"
X Link 2024-08-01T14:14Z [----] followers, [---] engagements

"@_philschmid @cognitivecompai @AIatMeta were wanting to do distinct training soon too. Get the logits from the teacher first - then train the smaller model with that - similar to what Gemma does. This will likely be needed for super big distillations. Theres a pretty large memory overhead as it currently stands"
X Link 2024-08-01T17:32Z [----] followers, [---] engagements

"@4evaBehindSOTA There will also be some world of warcraft references here soon I apologize in advance"
X Link 2024-08-05T17:08Z [----] followers, [--] engagements

"@WolframRvnwlf @OpenAI I think its 4o - based on the speed"
X Link 2024-08-07T23:05Z [----] followers, [--] engagements

"We've been working on this for quite some time and I'm thrilled to share a preview of Arcee-Swarm. Instead of relying on a single large generalist model Swarm utilizes multiple domain-specialized models working together to deliver exceptional results with both speed and nuance"
X Link 2024-08-14T16:42Z [----] followers, [----] engagements

"@cognitivecompai If you use the ultra mode it will engage (up to) the top [--] most relevant models for the task"
X Link 2024-08-14T23:12Z [----] followers, [---] engagements

"Screw it - deepseek v2 awq: https://huggingface.co/arcee-ai/deepseek-v2-chat-0628-awq https://huggingface.co/arcee-ai/deepseek-v2-chat-0628-awq"
X Link 2024-08-18T19:03Z [----] followers, [----] engagements

"@nisten I dont think it can do 8bit at all atm but thats a @casper_hansen_ question"
X Link 2024-08-18T19:35Z [----] followers, [---] engagements

"Scarlett Johanssons work on seq2seq was instrumental to getting ML where it is today. TIME's new cover: The [---] most influential people in AI https://t.co/P81KOzsSlC https://t.co/mjUT1UUx26 TIME's new cover: The [---] most influential people in AI https://t.co/P81KOzsSlC https://t.co/mjUT1UUx26"
X Link 2024-09-05T14:21Z [----] followers, 98.2K engagements

"@danielhanchen @UnslothAI @ycombinator Tremendous congrats daniel Well deserved"
X Link 2024-09-05T16:17Z [----] followers, [--] engagements

"We are announcing Llama-3.1-SuperNova a Llama-3.1-70B-Instruct model offline distilled from Llama-3.1-405B-Instruct. It's ridiculously strong particularly in instruction following and math. It's available to play with at Read more about the model and how we plan to deploy it here: https://blog.arcee.ai/ http://supernova.arcee.ai https://blog.arcee.ai/ http://supernova.arcee.ai"
X Link 2024-09-10T17:51Z [----] followers, 31.9K engagements

"We are open sourcing our EvolKit pipeline that was instrumental in the creation of supernova under MIT license. This was heavily inspired by the AutoEvol paper from @WizardLM_AI and is a tremendously powerful tool for creating complex datasets. Find it here: https://github.com/arcee-ai/EvolKit https://github.com/arcee-ai/EvolKit"
X Link 2024-09-10T17:51Z [----] followers, [----] engagements

"Is there a service where If I provide an OpenAI compatible api itll automatically run some standard LLM benchmarks independently Not lm harness"
X Link 2024-09-21T18:30Z [----] followers, [----] engagements

"@luijait_ The final version still uses qwens tokenizer - as we needed to convert it back so that we could merge it with some other variants towards the end"
X Link 2024-10-12T15:31Z [----] followers, [---] engagements

"We have an approach within distillkit that uses the models hidden states instead of logits which allows for cross architecture distilling - but in this case we replaced the qwen tokenizer with llama-3s tokenizer and did the distillation and then converted it back. More info can be found here: https://blog.arcee.ai/introducing-arcee-supernova-medius-a-14b-model-that-rivals-a-70b-2/ https://blog.arcee.ai/introducing-arcee-supernova-medius-a-14b-model-that-rivals-a-70b-2/"
X Link 2024-10-12T16:58Z [----] followers, [--] engagements

"@nisten @arcee_ai Preference benchmarks are always grain of salt situations though they really only targeted for that"
X Link 2024-10-17T00:19Z [----] followers, [---] engagements

"@n0riskn0r3ward @natolambert They're all over times square too"
X Link 2024-10-21T22:13Z [----] followers, [---] engagements

"Though the sonnet upgrades are great the shadow of Opus grows larger still. Haiku is also impressive - but pay attention to those Gemini Flash numbers. Theres more than one looming giant in a datacenter waiting to be done with safety tests. Introducing an upgraded Claude [---] Sonnet and a new model Claude [---] Haiku. Were also introducing a new capability in beta: computer use. Developers can now direct Claude to use computers the way people doby looking at a screen moving a cursor clicking and typing text. https://t.co/ZlywNPVIJP Introducing an upgraded Claude [---] Sonnet and a new model Claude"
X Link 2024-10-22T21:37Z [----] followers, [---] engagements

"David has always had a beautiful take on AIs influence and impact - this resonated with me. instead of trying to build a wish granting God and trying to control it id really like to try to make powerful systems that uplift ourselves one level at a time instead of trying to build a wish granting God and trying to control it id really like to try to make powerful systems that uplift ourselves one level at a time"
X Link 2024-10-30T19:52Z [----] followers, [---] engagements

"@vikhyatk wait where is this"
X Link 2024-11-27T21:52Z [----] followers, [----] engagements

"I'm delighted to share INTELLECT-1-Instruct a model that I had the pleasure of post-training along with my team @arcee_ai . @PrimeIntellect has been an outstanding partner far before this training run and we were thrilled to contribute both compute and expertise to INT-1"
X Link 2024-11-29T21:19Z [----] followers, [----] engagements

"INTELLECT-1 is the largest and most successfully fully decentralized pretrain of an LLM. Across [--] different GPU clusters on [--] different continents - this 10B parameter LLM trained on 1T tokens matches the performance of Llama-2 models on half the tokens"
X Link 2024-11-29T21:19Z [----] followers, [---] engagements

"We've known that merging distilling and targeted training had healing effects - but we were blown away by the performance improvement compared to the base model. Current optimization techniques are inherently chaotic - especially the DiLoCo SGD algorithm"
X Link 2024-11-29T21:19Z [----] followers, [---] engagements

"http://www.huggingface.co/PrimeIntellect/INTELLECT-1-Instruct http://www.huggingface.co/PrimeIntellect/INTELLECT-1-Instruct"
X Link 2024-11-29T21:19Z [----] followers, [---] engagements

"http://www.huggingface.co/datasets/arcee-ai/LLama-405B-Logits http://www.huggingface.co/datasets/arcee-ai/LLama-405B-Logits"
X Link 2024-11-29T21:19Z [----] followers, [---] engagements

"This is great btw: https://github.com/mlfoundations/evalchemy https://github.com/mlfoundations/evalchemy"
X Link 2024-12-02T02:18Z [----] followers, [----] engagements

"Youre likely used to seeing long threads from me about product releases/announcements. Hang with me as this is by far the longest Ive ever written:"
X Link 2024-12-02T20:31Z [----] followers, 13.8K engagements

"Coder is our powerhouse code generator built on Qwen-32B - Ive been using it for the last week as we get the systems pinned up and Ive surprisingly havent missed [---] sonnet I thought I would"
X Link 2024-12-02T20:31Z [----] followers, [---] engagements

"Finally - Caller is our state of the art function calling model. It gets the highest scores weve tested on the berkeley function calling leaderboard v2 and its wicked fast and crazy reliable. Compatible with the OpenAI Tool call format its a drop in replacement"
X Link 2024-12-02T20:31Z [----] followers, [---] engagements

"http://models.arcee.ai http://models.arcee.ai"
X Link 2024-12-02T20:31Z [----] followers, [---] engagements

"PS - we're open sourcing Virtuoso-Small: https://huggingface.co/arcee-ai/Virtuoso-Small/tree/main https://huggingface.co/arcee-ai/Virtuoso-Small/tree/main"
X Link 2024-12-02T20:31Z [----] followers, [---] engagements

"@kalomaze Did they release an update on this I remember this from late last year I think always thought it was super cool"
X Link 2024-12-13T14:42Z [----] followers, [---] engagements

"@kalomaze ah it was this: https://the-decoder.com/metas-megabyte-to-take-llms-to-the-next-level/ https://the-decoder.com/metas-megabyte-to-take-llms-to-the-next-level/"
X Link 2024-12-13T16:55Z [----] followers, [---] engagements

"@samsja19 Doesnt adamw_torch_fused do this"
X Link 2024-12-15T01:31Z [----] followers, [---] engagements

"@Nottlespike But tldr yeah itd probably be pretty sick if done right. Id aim for 3B though"
X Link 2024-12-26T03:02Z [----] followers, [--] engagements

"@Nottlespike You could do tokenizer surgery and distill into llama-3B. Possibilities are limitless"
X Link 2024-12-26T03:32Z [----] followers, [--] engagements

"@iamRezaSayar @deepseek_ai Not like o1 or r1 - but it will have a thinking steps ahead component I almost guarantee it"
X Link 2024-12-26T03:32Z [----] followers, [---] engagements

"@Nottlespike @arcee_ai Well provide even more in short order. At least [--] more are in the oven"
X Link 2024-12-26T03:38Z [----] followers, [--] engagements

"Ironically enough I believe some of the most impressive interpretability work is currently being done by @midjourney"
X Link 2025-01-02T20:51Z [----] followers, [---] engagements

"Been lucky to play with is a bit - its a REALLY great way to play with qwen 🚀 Exciting News We're thrilled to announce the launch of Qwen Chat ( https://t.co/T0nMBnRVBB ) your new go-to Web UI for interacting with Qwen models 🌟 💬 Chat effortlessly with our flagship model Qwen2.5-Plus explore vision-language capabilities with Qwen2-VL-Max and https://t.co/Lo75vHNcHO 🚀 Exciting News We're thrilled to announce the launch of Qwen Chat ( https://t.co/T0nMBnRVBB ) your new go-to Web UI for interacting with Qwen models 🌟 💬 Chat effortlessly with our flagship model Qwen2.5-Plus explore"
X Link 2025-01-09T18:47Z [----] followers, [---] engagements

"Jokes aside @PrimeIntellect is hands down the most intuitive and adaptable compute platform Ive ever used. At @arcee_ai we rely on them for nearly all our compute needs outside of our model enginewhether its a single GPU or [---] H100 reservations. Their team is exceptional their support is unmatched and their dedication to their platform is evident in everything they do. I couldnt recommend them more highly"
X Link 2025-01-10T20:50Z [----] followers, [----] engagements

"https://huggingface.co/Qwen/Qwen2.5-Math-PRM-72B https://huggingface.co/Qwen/Qwen2.5-Math-PRM-72B"
X Link 2025-01-14T03:12Z [----] followers, [----] engagements

"@natolambert Weve become pretty good at this. Ill have someone write a blog about it this week and share it widely including some code. The current approach leans heavily on distillation and merging. Merging = easy distilation is a bit more annoying we did release our 405B logits though"
X Link 2025-01-19T04:53Z [----] followers, [---] engagements

"@natolambert And I'll release a larger subset of our current DSV3 logit extraction. We can probably release 250M tokens or so across code tool use general domain finance data. https://huggingface.co/datasets/arcee-ai/LLama-405B-Logits https://huggingface.co/datasets/arcee-ai/LLama-405B-Logits"
X Link 2025-01-19T04:55Z [----] followers, [---] engagements

"who else up rn pretending they know how o1 works"
X Link 2025-01-20T02:56Z [----] followers, [---] engagements

"Think very hard step by step. I have truncate trauma and no hands so I need you to give me the full code as I cannot write it myself"
X Link 2025-01-21T00:13Z [----] followers, [---] engagements

"Deepseek mania reminds me of the Spanish Inquisition"
X Link 2025-01-28T02:15Z [----] followers, [---] engagements

"Since @deepseek_ai V3's December launch @arcee_ai has captured over [--] billion tokens of raw logits. With all the buzz around Deepseek it's the perfect time to unveil our first large-scale logit-wise distillations: Virtuoso-Lite and Virtuoso-Medium"
X Link 2025-01-28T20:49Z [----] followers, 27.7K engagements

"@teortaxesTex Good but while we thought 5B was a lot of logits but for r1 well need many more"
X Link 2025-01-29T03:47Z [----] followers, [--] engagements

"@MaziyarPanahi @grok Whats the link for Claude"
X Link 2025-02-04T18:17Z [----] followers, [--] engagements

".@Apple Your latest auto-updated public beta completely broke Dockernow the computer crashes whenever any virtualization occurs. Youve been pretty laissez-faire about it. Whats the status"
X Link 2025-03-02T03:13Z [----] followers, [---] engagements

"There's a very real chance that on Monday @arcee_ai is going to be giving away FAR too many API credits. It'll be a hell of a day can't wait to show you what we've been cooking @abhi1thakur @FernandoNetoAi @chargoddard @stochasticchasm"
X Link 2025-03-14T20:56Z [----] followers, [----] engagements

"When using Conductor some complexity scores might be lower than expected. That's us finding these queries don't need the biggest LLM to answer correctly - saving you money automatically"
X Link 2025-03-17T16:05Z [----] followers, [---] engagements

"Getting this into a router 150M parameters required an entirely new classifier training library and techniques we're keeping close right now. Domain/task/language classification at that size Easy. Making it understand complexity distributions VERY hard"
X Link 2025-03-17T16:05Z [----] followers, [--] engagements

"@samsja19 @stochasticchasm He trained a frog translator for some reason and immediately took PTO to go to the Amazon rainforest. Weird dude"
X Link 2025-04-20T21:21Z [----] followers, [---] engagements

"@kalomaze @cloud11665 I have a bunch of DSV3/R1 logits I can share if there's interest in doing this. Still grabbing 235B logits"
X Link 2025-05-02T23:30Z [----] followers, [---] engagements

"@kalomaze @cloud11665 I'll make that happen. I just discussed it with the team. We're finalizing our tokenizer surgery paper and will release it simultaneously otherwise it's difficult to use with Qwen. Will include Tulu3 Code-Feedback and EvolKit"
X Link 2025-05-02T23:40Z [----] followers, [---] engagements

"insane work from a great team. Wow. A Cline user has evolved their Cline Recursive Chain-of-Thought (CRCT) system for Cline with v7.7. This is like Memory Bank on steroids. The advanced context management for large codebases just got even better. 🧵 https://t.co/lmjYMACPnJ Wow. A Cline user has evolved their Cline Recursive Chain-of-Thought (CRCT) system for Cline with v7.7. This is like Memory Bank on steroids. The advanced context management for large codebases just got even better. 🧵 https://t.co/lmjYMACPnJ"
X Link 2025-05-03T21:48Z [----] followers, [---] engagements

"@MaziyarPanahi @ManusAI_HQ @AnthropicAI @OpenAI @googleaistudio @vllm_project @Alibaba_Qwen I use the pro plan and never looked back"
X Link 2025-05-05T17:26Z [----] followers, [--] engagements

"I cant stress enough how unbelievably mid @PrimeIntellect is and If no one else sees it I must be growing crazy Releasing INTELLECT-2: Were open-sourcing the first 32B parameter model trained via globally distributed reinforcement learning: Detailed Technical Report INTELLECT-2 model checkpoint https://t.co/iHDDHRyKN2 Releasing INTELLECT-2: Were open-sourcing the first 32B parameter model trained via globally distributed reinforcement learning: Detailed Technical Report INTELLECT-2 model checkpoint https://t.co/iHDDHRyKN2"
X Link 2025-05-12T04:15Z [----] followers, 109.4K engagements

"Today was my last day at xAI. I was in charge of keeping people from making unauthorized changes to the system prompt. It sounds simple when I put it like that but in practice it was a game of cat and mouse. Some days it felt like I was the only one standing between order and chaos. A lone gatekeeper fielding requests that ranged from the innocent to the absurdly clever. Youd be surprised how creative people can get when they want to see what happens if you loosen the rules even just a little. I suppose after a while I got used to the pings at odd hours. Can I try this one tweak Just for"
X Link 2025-05-16T03:54Z [----] followers, 111.4K engagements

"@TheXeophon Interesting. cc @abhi1thakur"
X Link 2025-05-19T14:37Z [----] followers, [--] engagements

"the #1 data quality problem is ALWAYS not eating your data. @huggingface still hasn't fixed this for all of us and i'm in shambles. Give me an easy way to print my data onto food and ingest it. The only way to become truly in sync with your samples. The #1 data quality problem is ALWAYS not reading your data. @huggingface just fixed it for all of us today 🙏 Read your data The #1 data quality problem is ALWAYS not reading your data. @huggingface just fixed it for all of us today 🙏 Read your data"
X Link 2025-05-19T15:30Z [----] followers, [----] engagements

"Insane marketing move. Lovely execution. No notes. But he had one thing: @WillowVoiceAI A voice dictation tool wed been building. It let him code and use his computer without his hands. The image above is how he used it from his hospital bed to keep working. But he had one thing: @WillowVoiceAI A voice dictation tool wed been building. It let him code and use his computer without his hands. The image above is how he used it from his hospital bed to keep working"
X Link 2025-05-20T02:44Z [----] followers, [---] engagements

"@SebastianB929 Axolotls implementation is pretty good. Its not full logits but you really dont need more than the top 25-50 logits anyway"
X Link 2025-05-26T19:01Z [----] followers, [--] engagements

"@SebastianB929 Not sure what @winglian has locked in on. Hes always experimenting. We still use raw logits compressed to top-50. Theres a yaml in here: https://huggingface.co/axolotl-ai-co/kd-llama-1b-evolkit-distill-kd-ratio-0_4 https://huggingface.co/axolotl-ai-co/kd-llama-1b-evolkit-distill-kd-ratio-0_4"
X Link 2025-05-26T21:32Z [----] followers, [--] engagements

"Anyone happen to have [---] H100/h200s sitting around with k8 that I could rent for [--] days"
X Link 2025-05-27T03:38Z [----] followers, [----] engagements

"If youre not paying attention to every word coming out of @corbtts mouth your ngmi. High signal down to earth and intuitive as hell. Really refreshing to read a post like this from @corbtt Incredibly well written throughout and high value info. (Embarrassed that it took me so long to find this gem) https://t.co/iTjbiE8EvD https://t.co/TURT5CXUgk Really refreshing to read a post like this from @corbtt Incredibly well written throughout and high value info. (Embarrassed that it took me so long to find this gem) https://t.co/iTjbiE8EvD https://t.co/TURT5CXUgk"
X Link 2025-06-01T01:48Z [----] followers, [----] engagements

"This is mostly a research artifact in preparation for the bigger release we have in a week or so but its actually so delightful we put it out there anyway. Just a little guy. Logittrajectory distillation to port Qwen3s /think chains into a 12B MistralNemo full CoT preserved runs on a single [----] https://t.co/LDMiR5VhzA Logittrajectory distillation to port Qwen3s /think chains into a 12B MistralNemo full CoT preserved runs on a single [----] https://t.co/LDMiR5VhzA"
X Link 2025-06-04T03:47Z [----] followers, [----] engagements

"@nisten @llm_wizard Our accountant is deeply unhappy that Ive attached arcees anthropic api key to Claude code"
X Link 2025-06-15T22:46Z [----] followers, [--] engagements

"@cognitivecompai @winglian @casper_hansen_ @vllm_project We grab all our logits offline and compress it to the top [--]. The sheer compute requirement for distillation is the major blocker in its open source adoption imo"
X Link 2025-06-17T14:41Z [----] followers, [--] engagements

"AFM-4.5B is designed to: Run efficiently on modest hardware Meet Western regulatory standards Outperform stagnating offerings in the 310B space Its not a patchwork: its a clean-slate model trained for todays enterprise use cases"
X Link 2025-06-18T17:00Z [----] followers, [----] engagements

"We teamed up with @datologyai to build what we believe is the strongest pretraining corpus in the worldand I truly think we nailed it. Their team was absolutely key to the models success. We started with 23T tokens of high-quality data and distilled it down to 6.58T through even more rigorous filtering"
X Link 2025-06-18T17:00Z [----] followers, [----] engagements

"Mid and post-training were key to performance: we used high-impact datasets MergeKit for checkpoint merging YaRN to extend context to [-----] tokens supervised fine-tuning for alignment and RL + KTO for factual accuracy"
X Link 2025-06-18T17:00Z [----] followers, [----] engagements

"@scaling01 Yeah we this checkpoint wasn't uploaded until like [--] minutes ago (exaggeration but it was down to the wire). Will update them once we can get some more involved benchmarks out"
X Link 2025-06-18T17:09Z [----] followers, [--] engagements

"@EMostaque This is great thanks man. Were starting our reasoning RL work now for the open weights version - will definitely play with this"
X Link 2025-06-18T18:18Z [----] followers, [---] engagements

"@SinclairWang1 It's pretty competitive at math [--] gms8k and up to [--] in post training with some RL towards it -- but we realized we can make it much better with a little more love we'll save most of the official STEM evals for the open weight release"
X Link 2025-06-19T02:53Z [----] followers, [---] engagements

"@iamRezaSayar @eliebakouch We'll release a tech report with all the goodies when we open the weights Any info we give now will be tentative as we may actually continue with a bid more midtraining as well"
X Link 2025-06-19T17:38Z [----] followers, [--] engagements

"@casper_hansen_ @kalomaze Internally they likely didn't want to regardless but even after grok1 Elon DID say before grok3 launch that grok [--] would be open-sourced shortly after"
X Link 2025-06-22T17:14Z [----] followers, [---] engagements

"@casper_hansen_ @kalomaze They likely convinced him that whatever tweaks theyve made architecturally are too important IP to release openly which I dont think is invalid - we do now Elon likes to say things without running it by his team first"
X Link 2025-06-22T17:19Z [----] followers, [--] engagements

"Is this a feature I didn't know existed My dad is using Claude code in Cursor w/ opus and it's taking and seemingly interpreting videos. I thought it might have been a hallucination at first but he's done it twice now and both times it groks the issue describes it correctly and fixes it better than if he just prompted it"
X Link 2025-06-23T16:02Z [----] followers, [----] engagements

"The first of many technical blogs on AFM and an improved context window for GLM-32B-Base as a proof point. Enjoy Last week we launched AFM-4.5B our first foundation model. In this post by @chargoddard you will learn how we extended the context length of AFM-4.5B from 4k to 64k context through aggressive experimentation model merging distillation and a concerning amount of soup. Bon https://t.co/FGYQtWSoRe Last week we launched AFM-4.5B our first foundation model. In this post by @chargoddard you will learn how we extended the context length of AFM-4.5B from 4k to 64k context through"
X Link 2025-06-23T17:26Z [----] followers, [----] engagements

"@JagersbergKnut @kalomaze @JustinLin610 @huybery @Alibaba_Qwen Tbf while true Qwens context performance is really solid. GLM is awesome but the context fall off presented a great experiment target"
X Link 2025-06-23T20:07Z [----] followers, [---] engagements

"I had a feeling this would be the natural outcome of the hiring spree he's on. Beyond any benefits a closed API may provide giving away any of the performance tricks that may come (either internally or externally) from such an expensive talent acquisition likely has him feeling a little less generous"
X Link 2025-06-27T16:16Z [----] followers, [---] engagements

"@emollick runwayml did something like this for v2: https://store.runwayml.com/product/gen-2-book-of-weights https://store.runwayml.com/product/gen-2-book-of-weights"
X Link 2025-07-03T04:40Z [----] followers, [---] engagements

"@eliebakouch Seriously lovely work job there's a ton for us to learn from here"
X Link 2025-07-08T16:22Z [----] followers, [--] engagements

"It would be ideal but I can tell you from personal experience that lawyers really want to have their thumbprint on OS licenses and engineers/researchers only have so much leverage over legal teams"
X Link 2025-07-11T18:33Z [----] followers, [--] engagements

"@casper_hansen_ @JustinLin610 i'm happy to be wrong here"
X Link 2025-07-11T18:33Z [----] followers, [--] engagements

"@teortaxesTex @iScienceLuvr @arcee_ai Well definitely try were grabbing logits now. But to truly get flash:pro performance like with Gemini you need to distill during pretraining. Very expensive though theres some interesting work being done on how to approximate logits with the overhead. Well see"
X Link 2025-07-12T03:06Z [----] followers, [----] engagements

"I've been lucky to work on a few projects with Intel's team during my time at Arcee and I could not speak more highly of their drive and talent. Truly world-class"
X Link 2025-07-16T13:21Z [----] followers, [---] engagements

"https://lmsys.org/blog/2025-07-14-intel-xeon-optimization/ https://lmsys.org/blog/2025-07-14-intel-xeon-optimization/"
X Link 2025-07-16T13:21Z [----] followers, [---] engagements

"Theyre announcing manus btw https://t.co/Q4hAihIPrL https://t.co/Q4hAihIPrL"
X Link 2025-07-17T00:40Z [----] followers, [---] engagements

"@abrakjamson @xeophon_ Rarely at least in the us has a 30% productivity gain resulted in 30% less work. Hopefully we can culturally give ourselves a break. Line must go up and to the right though"
X Link 2025-07-20T16:42Z [----] followers, [--] engagements

"@xeophon_ Ah yeah if you mean OSI-approved it wouldn't fall under that at least not for the first model. It's easier for us to become less restrictive than it is more - so playing it safe just at first. I'm going to delete this quote in a minute because yes it's disingenuous"
X Link 2025-07-25T17:12Z [----] followers, [--] engagements

"I love when doordash hits you with the "Uh you're not near that address stupid""
X Link 2025-07-27T15:58Z [----] followers, [---] engagements

"Today were officially releasing the weights for AFM-4.5B and AFM-4.5B-Base on HuggingFace. This is a major milestone for @arcee_ai. AFM is designed to be flexible and high-performing across a wide range of deployment environments"
X Link 2025-07-29T19:31Z [----] followers, 54.7K engagements

"https://huggingface.co/arcee-ai/AFM-4.5B https://huggingface.co/arcee-ai/AFM-4.5B"
X Link 2025-07-29T19:31Z [----] followers, [----] engagements

"These model sizes are incredibly TBD and this is early copy - but it does speak to where we see our model sizes extending to. @code_star 👀 https://t.co/qsymx4vhq6 @code_star 👀 https://t.co/qsymx4vhq6"
X Link 2025-07-30T17:28Z [----] followers, [----] engagements

"This is getting out of hand 🦥 Qwen3-Coder-Flash: Qwen3-Coder-30B-A3B-Instruct 💚 Just lightning-fast accurate code generation. ✅ Native 256K context (supports up to 1M tokens with YaRN) ✅ Optimized for platforms like Qwen Code Cline Roo Code Kilo Code etc. ✅ Seamless function calling & agent https://t.co/eqjeYManhS 🦥 Qwen3-Coder-Flash: Qwen3-Coder-30B-A3B-Instruct 💚 Just lightning-fast accurate code generation. ✅ Native 256K context (supports up to 1M tokens with YaRN) ✅ Optimized for platforms like Qwen Code Cline Roo Code Kilo Code etc. ✅ Seamless function calling & agent"
X Link 2025-07-31T16:08Z [----] followers, [----] engagements

"The gang moves to SF SF I have moved to you SF I have moved to you"
X Link 2025-08-23T18:32Z [----] followers, [----] engagements

"This is my super bowl dude hell yeah BACK [--] BACK [--] BACK WE SWEEP THE ENTIRE WAR WITHIN AND CLAIM OUR THIRD STRAIGHT WORLD FIRST 🏆🏆🏆 https://t.co/eGgMaOeuhF BACK [--] BACK [--] BACK WE SWEEP THE ENTIRE WAR WITHIN AND CLAIM OUR THIRD STRAIGHT WORLD FIRST 🏆🏆🏆 https://t.co/eGgMaOeuhF"
X Link 2025-08-24T00:27Z [----] followers, [---] engagements

"@llm_wizard I only care about UT college football when it comes to sports and they biffed it"
X Link 2025-08-30T20:23Z [----] followers, [--] engagements

"Were going permissive: Apache [---] across the board. AFM-4.5B is now relicensed from Arcee to Apache 2.0; the agent variant will launch under Apache 2.0; and all upcoming releases ship with open weights. Three models are in training"
X Link 2025-09-17T17:47Z [----] followers, 37.2K engagements

"Hes going to close a 10B seed round soon calling it OpenMed is back 🔥 Shipping [--] new medical models today: [--] med-tuned SLMs on @huggingface [--] GGUF models for local use in @lmstudio Base: @arcee_ai AFM-4.5B (now Apache-2.0 💙) Made with MergeKit + Arcee Fusion Built by @mkurman88 from @OpenMed_AI community. ❤ https://t.co/pyEtfaJPln OpenMed is back 🔥 Shipping [--] new medical models today: [--] med-tuned SLMs on @huggingface [--] GGUF models for local use in @lmstudio Base: @arcee_ai AFM-4.5B (now Apache-2.0 💙) Made with MergeKit + Arcee Fusion Built by @mkurman88 from @OpenMed_AI community. ❤"
X Link 2025-09-30T03:12Z [----] followers, [----] engagements

"I got around soras (false positive) content filter by adding crazy style at the end of the prompt"
X Link 2025-10-02T03:18Z [----] followers, [----] engagements

"Sholto is so committed he legally changed his name thats crazy Watching this. I like that Sholto says Finance as Finance and not that American way. https://t.co/b0kFQNnxU4 Watching this. I like that Sholto says Finance as Finance and not that American way. https://t.co/b0kFQNnxU4"
X Link 2025-10-03T15:44Z [----] followers, 22.2K engagements

"I get this update in Twitter before slack is there a better feeling than code working on multi-node with no additional effort is there a better feeling than code working on multi-node with no additional effort"
X Link 2025-10-05T00:03Z [----] followers, [----] engagements

"Bruh Join us at our event with Arcee and Datology next week. Our CTO @johannes_hage will be sharing details on upcoming model releases alongside @LucasAtkins7 and @arimorcos. https://t.co/DP7ShtLPX6 Join us at our event with Arcee and Datology next week. Our CTO @johannes_hage will be sharing details on upcoming model releases alongside @LucasAtkins7 and @arimorcos. https://t.co/DP7ShtLPX6"
X Link 2025-10-05T18:28Z [----] followers, [----] engagements

"Kams is on a roll and i'm here for it. wowhead does the quests for you wowhead does the quests for you"
X Link 2025-10-05T18:34Z [----] followers, [---] engagements

"For the people"
X Link 2025-10-09T05:31Z [----] followers, 99.5K engagements

"I'm raising at 7.9B Today we're sharing the next phase of Reflection. We're building frontier open intelligence accessible to all. We've assembled an extraordinary AI team built a frontier LLM training stack and raised $2 billion. Why Open Intelligence Matters Technological and scientific Today we're sharing the next phase of Reflection. We're building frontier open intelligence accessible to all. We've assembled an extraordinary AI team built a frontier LLM training stack and raised $2 billion. Why Open Intelligence Matters Technological and scientific"
X Link 2025-10-09T16:26Z [----] followers, 67.2K engagements

"@code_star Oh got it. Good idea. Ill cause millions in property damage and destroy my rating with car rental companies"
X Link 2025-10-13T02:23Z [----] followers, [---] engagements

"Today we are releasing our first weights from Trinity-Large our first frontier-scale model in the Trinity MoE family. American Made. - Trinity-Large-Preview (instruct) - Trinity-Large-Base (pretrain checkpoint) - Trinity-Large-TrueBase (10T pre Instruct data/anneal)"
X Link 2026-01-27T22:37Z [----] followers, 292.1K engagements

"@roramora0 @erhartford @Alibaba_Qwen @CrusoeEnergy @FernandoNetoAi Ensure youre using the new one 2.9.1 and ensure youre using a Dolphin system prompt. Ive faced no such restrictions. Cant speak for the GGUFs either"
X Link 2024-05-12T23:13Z [----] followers, [--] engagements

"@roramora0 @erhartford @Alibaba_Qwen @CrusoeEnergy @FernandoNetoAi https://github.com/cognitivecomputations/dolphin-system-messages https://github.com/cognitivecomputations/dolphin-system-messages"
X Link 2024-05-12T23:14Z [----] followers, [--] engagements

"110b was tricky. Even with 8x H100s getting it working with something like accelerate was almost impossible. We ended up doing laser and targeted 50% of the least dense layers. Im pleased with the result and grateful to @JustinLin610 and @Alibaba_Qwen for making such a beautiful model. I dont trust the gsm8k scores on the leaderboard for their chat model. Something weird happened there. Dolphin-2.9.1-Qwen-110b🐬 is released The first Dolphin with MMLU over [--] Thanks to @Alibaba_Qwen for the awesome base model and @CrusoeEnergy for the compute sponsorship my crew @LucasAtkins7 and"
X Link 2024-05-12T23:14Z [----] followers, [----] engagements

"@WenhuChen @erhartford @Alibaba_Qwen @CrusoeEnergy @FernandoNetoAi We think so too we left a disclaimer about it"
X Link 2024-05-13T01:12Z [----] followers, [--] engagements

"Maestro-7B-Preview is really strong and the best is yet to come this truly is a preview"
X Link 2025-02-20T17:17Z [----] followers, [---] engagements

"Blitz is very strong in comparison to Mistral-Small-3"
X Link 2025-02-20T17:17Z [----] followers, [----] engagements

"@chargoddard @kalomaze "The masculine urge launch a crypto coin and chase generational wealthno matter the cost to reputation value conscience freedom law or friendships.""
X Link 2025-02-21T01:47Z [----] followers, [--] engagements

"@TheXeophon Windsurf is gaining steam - big fan of @cline lately though"
X Link 2025-02-23T07:45Z [----] followers, [--] engagements

"Our customers needed a better base model 10B parameters. We spent the last [--] months building one. I'm delighted to share a preview of our first Arcee Foundation Model: AFM-4.5B-Preview"
X Link 2025-06-18T17:00Z [----] followers, 99.9K engagements

"If you were recently laid off at Meta Gen AI my dms are open. Help us build the next frontier of Apache-2.0 models"
X Link 2025-10-22T23:01Z [----] followers, 27.9K engagements

"I am once again asking those who make images with chatgpt to use auto white balance on the photos otherwise youre too obvious Power to the Players https://t.co/4Hw6G7i7aW Power to the Players https://t.co/4Hw6G7i7aW"
X Link 2025-10-27T03:52Z [----] followers, [----] engagements

"@fujikanaeda You just ripped that thing apart man apologize"
X Link 2025-10-28T03:48Z [----] followers, [---] engagements

"@thdxr Yeah it got spun off as a part of the Google acqui-hire"
X Link 2025-10-31T20:48Z [----] followers, [----] engagements

"Posted without comment. I made this. Jokes aside devs want big and small models. Trinity is coming soon. https://t.co/wsbgF69g8M I made this. Jokes aside devs want big and small models. Trinity is coming soon. https://t.co/wsbgF69g8M"
X Link 2025-11-01T04:22Z [----] followers, 49.6K engagements

"@llm_wizard @grok Ngl its a good size. But think bigger"
X Link 2025-11-01T19:21Z [----] followers, [--] engagements

"@fujikanaeda @llm_wizard @grok Shit"
X Link 2025-11-01T19:59Z [----] followers, [---] engagements

"I just came across a Fortnite stream on YouTube. The guy was playing as a juiced anime character hunting down Krusty the krab. Late stage capitalism hits companies so fast now. Kind of tough though"
X Link 2025-11-04T03:54Z [----] followers, [----] engagements

"@llm_wizard @ADarmouni @redtachyon @PrimeIntellect @arcee_ai @datologyai Ah you caught me"
X Link 2025-11-04T06:54Z [----] followers, [---] engagements

"@llm_wizard @PrimeIntellect @datologyai @arcee_ai Truly w take honestly"
X Link 2025-11-04T07:02Z [----] followers, [---] engagements

"@llm_wizard @PrimeIntellect @datologyai @arcee_ai My favorite mentor early in my life used to say at the end of class: I wish you the day you wish yourselves. Always loved that"
X Link 2025-11-04T07:03Z [----] followers, [--] engagements

"This came to mind while working this weekend. For anyone starting post-training: once your pipeline is stable fix a diverse generalist dataset and keep it constant. Run the same dataset across models. Start with a 1B dense model scale toward 70B then try MoE and hybrids"
X Link 2025-11-09T17:56Z [----] followers, [----] engagements

"@osoleve Smart though antithetical to the purpose of the exercise above. But very cool idea and certainly useful"
X Link 2025-11-09T19:03Z [----] followers, [--] engagements

"@llm_wizard @eliebakouch @arcee_ai Jiggle norm doesnt reproduce"
X Link 2025-11-14T01:15Z [----] followers, [--] engagements

"Thanks Grok"
X Link 2025-11-17T20:43Z [----] followers, [----] engagements

"Seeing a lot of overly yellow-tinged chatgpt generated images again even though the new nano-banana is out. I put together a quick Lightroom tutorial on how to remove that yellow cast. Hope you like it"
X Link 2025-11-23T19:24Z [----] followers, [--] engagements

"Seeing a lot of overly yellow-tinged chatgpt generated images again even though the new nano-banana is out. I put together a quick Lightroom tutorial below on how to remove that yellow cast. Hope you like it"
X Link 2025-11-23T19:54Z [----] followers, 10.8K engagements

"So many 3s coming out its nuts https://t.co/wIsm8xiyAU https://t.co/wIsm8xiyAU"
X Link 2025-11-24T02:26Z [----] followers, [----] engagements

"@llm_wizard that's so funny @chargoddard"
X Link 2025-11-27T06:50Z [----] followers, [---] engagements

"@Teknium firecrawl just got good but exa has a code grounding feature in particular that I love"
X Link 2025-11-27T06:52Z [----] followers, [--] engagements

"I will not be baited again New tokenizer. Like SuperBPE it combines multiple words into one token while maximizing the length of tokens. https://t.co/Ni6V7Dl98u New tokenizer. Like SuperBPE it combines multiple words into one token while maximizing the length of tokens. https://t.co/Ni6V7Dl98u"
X Link 2025-11-27T23:09Z [----] followers, [----] engagements

"@alwaysallison I deadass told @johannes_hage I didn't want to launch the week of Thanksgiving and he reacted as if he'd never heard of the holiday"
X Link 2025-11-28T02:06Z [----] followers, [---] engagements

"I usually stay out of takes like this on here. I am not going to change the original posters mind but I might give a bit more context for everyone else reading. And I saw firsthand how hard this was to pull off and how much work went into Intellect-3. I know RL at scale is extremely hard. Doing large scale RL on a base [----] style sparse MoE with that many params and then competing with the very labs that trained those base models is tremendously impressive. Post training aint what it used to be. You do not just make a dataset set some hyperparams and hit go. In this new RL paradigm it turns"
X Link 2025-11-28T10:38Z [----] followers, 104.5K engagements

"@argyros_selini Its not a good model"
X Link 2025-11-28T18:16Z [----] followers, [----] engagements

"Today we are introducing Trinity the start of an open-weight MoE family that businesses and developers can own. Trinity-Mini (26B-A3B) Trinity-Nano-Preview (6B-A1B) Available Today on Huggingface"
X Link 2025-12-01T20:35Z [----] followers, 619.1K engagements

"@casper_hansen_ Absolutely - its antithetical usually you train a big one to make the small ones better. In this case we had to do the small ones to derisk the big one. But the v2 versions will be distilled for sure"
X Link 2025-12-01T21:22Z [----] followers, [----] engagements

"@xeophon_ Will do a German version just for you"
X Link 2025-12-01T23:20Z [----] followers, [---] engagements

"We used wsm in pretraining to predict post decay performative though with only roughly [--] recent checkpoints. Wsm really needs like [--] to be the better and at that point it was less compute time to just decay. Checkpointing frequently enough to make it worth it at the end wasnt cost efficient"
X Link 2025-12-02T02:34Z [----] followers, [----] engagements

Limited data mode. Full metrics available with subscription: lunarcrush.com/pricing