[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.] #  @cloneofsimo Simo Ryu Simo Ryu posts on X about gradient, 6969, bound, llama the most. They currently have XXXXXX followers and XX posts still getting attention that total XXXXX engagements in the last XX hours. ### Engagements: XXXXX [#](/creator/twitter::1529156127006392320/interactions)  - X Week XXXXXXX +3.70% - X Month XXXXXXX +151% - X Months XXXXXXXXX +38% - X Year XXXXXXXXX +968% ### Mentions: X [#](/creator/twitter::1529156127006392320/posts_active)  - X Week XX +5.30% - X Month XX +176% - X Months XXX -XX% - X Year XXX +493% ### Followers: XXXXXX [#](/creator/twitter::1529156127006392320/followers)  - X Week XXXXXX +0.93% - X Month XXXXXX +3.80% - X Months XXXXXX +33% - X Year XXXXXX +194% ### CreatorRank: XXXXXXXXX [#](/creator/twitter::1529156127006392320/influencer_rank)  ### Social Influence [#](/creator/twitter::1529156127006392320/influence) --- **Social category influence** [finance](/list/finance) X% [stocks](/list/stocks) X% [technology brands](/list/technology-brands) X% **Social topic influence** [gradient](/topic/gradient) 10%, [6969](/topic/6969) 5%, [bound](/topic/bound) 5%, [llama](/topic/llama) 5%, [happened](/topic/happened) 5%, [o3](/topic/o3) 5%, [threshold](/topic/threshold) 5%, [imo](/topic/imo) 5%, [$googl](/topic/$googl) 5%, [prob](/topic/prob) X% **Top accounts mentioned or mentioned by** [@youjiacheng](/creator/undefined) [@teortaxestex](/creator/undefined) [@fal](/creator/undefined) [@noahgsolomon](/creator/undefined) [@laz4rz](/creator/undefined) [@bitreducer](/creator/undefined) [@stochasticchasm](/creator/undefined) [@voxmenthe](/creator/undefined) [@algobaker](/creator/undefined) [@michaeljlutz](/creator/undefined) [@giffmana](/creator/undefined) [@enricoshippole](/creator/undefined) [@xeophon_](/creator/undefined) [@speido0815](/creator/undefined) [@oozn](/creator/undefined) [@tednonumbers](/creator/undefined) [@palmik](/creator/undefined) [@kaiokendev1](/creator/undefined) [@vitaliychiley](/creator/undefined) [@xzai259](/creator/undefined) **Top assets mentioned** [Alphabet Inc Class A (GOOGL)](/topic/$googl) ### Top Social Posts [#](/creator/twitter::1529156127006392320/posts) --- Top posts by engagements in the last XX hours "Here is what people mean by "residual network makes your gradient happy" + intuition on depth muP Gradients vanish if activations are not unit-scaled. But thats not issue if you are using residual connection But if you don't scale down branch your activations / backward blow up"  [@cloneofsimo](/creator/x/cloneofsimo) on [X](/post/tweet/1948850696725692493) 2025-07-25 20:59:48 UTC 14.5K followers, 11.8K engagements "@YouJiacheng But I dont know whats the optimal bound for my task Surely its somewhere X bf16_range I cant just go with my gut and say XXX or whatever So I dont know t and I would need to sweep"  [@cloneofsimo](/creator/x/cloneofsimo) on [X](/post/tweet/1944342934184460509) 2025-07-13 10:27:34 UTC 14.5K followers, 1017 engagements "We've seen what happened with stabilityai (sd3.5) Llama series (above post) And grok2/3 (never released) Executive words are just that : words empty as ur fragmented memory"  [@cloneofsimo](/creator/x/cloneofsimo) on [X](/post/tweet/1944905602695836142) 2025-07-14 23:43:24 UTC 14.5K followers, 4773 engagements "@Michael_J_Lutz Not sure we are on the same page but I dont think it does You do attention first and prv layer MLP is added as residual So its not same thing"  [@cloneofsimo](/creator/x/cloneofsimo) on [X](/post/tweet/1947104076841541835) 2025-07-21 01:19:22 UTC 14.5K followers, 2107 engagements "Even for the Chinese groups its irrational to assume they will keep benefit us with their divine blessing of open weights. They are after all using fuck ton of compute and resources to train these. They *will not* open source these models once they reach o3 / gemini / claude level model"  [@cloneofsimo](/creator/x/cloneofsimo) on [X](/post/tweet/1944904899428581617) 2025-07-14 23:40:37 UTC 14.5K followers, 42.4K engagements "@YouJiacheng . with two extra hyperparameters and dynamic threshold"  [@cloneofsimo](/creator/x/cloneofsimo) on [X](/post/tweet/1944171204400885820) 2025-07-12 23:05:10 UTC 14.5K followers, 1130 engagements "Tax is unironically more complicated than IMO problems This is literally true btw Its not "difficult" its unprincipledly 'randomly' complex"  [@cloneofsimo](/creator/x/cloneofsimo) on [X](/post/tweet/1948425634369146956) 2025-07-24 16:50:45 UTC 14.5K followers, 7097 engagements "There is some secret sauce that isnt "lets RL on set of narrow env" But what. If the results are not cherrypicked (whatever that could be) its so over"  [@cloneofsimo](/creator/x/cloneofsimo) on [X](/post/tweet/1946530396201746684) 2025-07-19 11:19:45 UTC 14.4K followers, 4103 engagements "Wtf google is squeezing SSHS graduates to get IMO-gold hahahahahahha"  [@cloneofsimo](/creator/x/cloneofsimo) on [X](/post/tweet/1947342789152268751) 2025-07-21 17:07:55 UTC 14.5K followers, 26.3K engagements "Fuck so *this* is what artists felt when they saw midjourney and dalle2. Existential crisis from my soul wishful thinking that my job was any different from everyone else"  [@cloneofsimo](/creator/x/cloneofsimo) on [X](/post/tweet/1946543772399247587) 2025-07-19 12:12:55 UTC 14.5K followers, 17.3K engagements "What happens if you QKV = mlp(x).split(3) instead of linear(x).split(3) Anyone tried this"  [@cloneofsimo](/creator/x/cloneofsimo) on [X](/post/tweet/1946999721337512230) 2025-07-20 18:24:41 UTC 14.5K followers, 88.9K engagements "MuonClip. so many tricks to make maximum logits bounded during training. Gets me wondering why dont people try LASER (and maybe z-loss )"  [@cloneofsimo](/creator/x/cloneofsimo) on [X](/post/tweet/1944163666200604934) 2025-07-12 22:35:13 UTC 14.5K followers, 28.5K engagements
[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]
Simo Ryu posts on X about gradient, 6969, bound, llama the most. They currently have XXXXXX followers and XX posts still getting attention that total XXXXX engagements in the last XX hours.
Social category influence finance X% stocks X% technology brands X%
Social topic influence gradient 10%, 6969 5%, bound 5%, llama 5%, happened 5%, o3 5%, threshold 5%, imo 5%, $googl 5%, prob X%
Top accounts mentioned or mentioned by @youjiacheng @teortaxestex @fal @noahgsolomon @laz4rz @bitreducer @stochasticchasm @voxmenthe @algobaker @michaeljlutz @giffmana @enricoshippole @xeophon_ @speido0815 @oozn @tednonumbers @palmik @kaiokendev1 @vitaliychiley @xzai259
Top assets mentioned Alphabet Inc Class A (GOOGL)
Top posts by engagements in the last XX hours
"Here is what people mean by "residual network makes your gradient happy" + intuition on depth muP Gradients vanish if activations are not unit-scaled. But thats not issue if you are using residual connection But if you don't scale down branch your activations / backward blow up" @cloneofsimo on X 2025-07-25 20:59:48 UTC 14.5K followers, 11.8K engagements
"@YouJiacheng But I dont know whats the optimal bound for my task Surely its somewhere X bf16_range I cant just go with my gut and say XXX or whatever So I dont know t and I would need to sweep" @cloneofsimo on X 2025-07-13 10:27:34 UTC 14.5K followers, 1017 engagements
"We've seen what happened with stabilityai (sd3.5) Llama series (above post) And grok2/3 (never released) Executive words are just that : words empty as ur fragmented memory" @cloneofsimo on X 2025-07-14 23:43:24 UTC 14.5K followers, 4773 engagements
"@Michael_J_Lutz Not sure we are on the same page but I dont think it does You do attention first and prv layer MLP is added as residual So its not same thing" @cloneofsimo on X 2025-07-21 01:19:22 UTC 14.5K followers, 2107 engagements
"Even for the Chinese groups its irrational to assume they will keep benefit us with their divine blessing of open weights. They are after all using fuck ton of compute and resources to train these. They will not open source these models once they reach o3 / gemini / claude level model" @cloneofsimo on X 2025-07-14 23:40:37 UTC 14.5K followers, 42.4K engagements
"@YouJiacheng . with two extra hyperparameters and dynamic threshold" @cloneofsimo on X 2025-07-12 23:05:10 UTC 14.5K followers, 1130 engagements
"Tax is unironically more complicated than IMO problems This is literally true btw Its not "difficult" its unprincipledly 'randomly' complex" @cloneofsimo on X 2025-07-24 16:50:45 UTC 14.5K followers, 7097 engagements
"There is some secret sauce that isnt "lets RL on set of narrow env" But what. If the results are not cherrypicked (whatever that could be) its so over" @cloneofsimo on X 2025-07-19 11:19:45 UTC 14.4K followers, 4103 engagements
"Wtf google is squeezing SSHS graduates to get IMO-gold hahahahahahha" @cloneofsimo on X 2025-07-21 17:07:55 UTC 14.5K followers, 26.3K engagements
"Fuck so this is what artists felt when they saw midjourney and dalle2. Existential crisis from my soul wishful thinking that my job was any different from everyone else" @cloneofsimo on X 2025-07-19 12:12:55 UTC 14.5K followers, 17.3K engagements
"What happens if you QKV = mlp(x).split(3) instead of linear(x).split(3) Anyone tried this" @cloneofsimo on X 2025-07-20 18:24:41 UTC 14.5K followers, 88.9K engagements
"MuonClip. so many tricks to make maximum logits bounded during training. Gets me wondering why dont people try LASER (and maybe z-loss )" @cloneofsimo on X 2025-07-12 22:35:13 UTC 14.5K followers, 28.5K engagements
/creator/twitter::cloneofsimo