Dark | Light
[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

# ![@cloneofsimo Avatar](https://lunarcrush.com/gi/w:26/cr:twitter::1529156127006392320.png) @cloneofsimo Simo Ryu

Simo Ryu posts on X about gradient, 6969, bound, llama the most. They currently have XXXXXX followers and XX posts still getting attention that total XXXXX engagements in the last XX hours.

### Engagements: XXXXX [#](/creator/twitter::1529156127006392320/interactions)
![Engagements Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::1529156127006392320/c:line/m:interactions.svg)

- X Week XXXXXXX +3.70%
- X Month XXXXXXX +151%
- X Months XXXXXXXXX +38%
- X Year XXXXXXXXX +968%

### Mentions: X [#](/creator/twitter::1529156127006392320/posts_active)
![Mentions Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::1529156127006392320/c:line/m:posts_active.svg)

- X Week XX +5.30%
- X Month XX +176%
- X Months XXX -XX%
- X Year XXX +493%

### Followers: XXXXXX [#](/creator/twitter::1529156127006392320/followers)
![Followers Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::1529156127006392320/c:line/m:followers.svg)

- X Week XXXXXX +0.93%
- X Month XXXXXX +3.80%
- X Months XXXXXX +33%
- X Year XXXXXX +194%

### CreatorRank: XXXXXXXXX [#](/creator/twitter::1529156127006392320/influencer_rank)
![CreatorRank Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::1529156127006392320/c:line/m:influencer_rank.svg)

### Social Influence [#](/creator/twitter::1529156127006392320/influence)
---

**Social category influence**
[finance](/list/finance)  X% [stocks](/list/stocks)  X% [technology brands](/list/technology-brands)  X%

**Social topic influence**
[gradient](/topic/gradient) 10%, [6969](/topic/6969) 5%, [bound](/topic/bound) 5%, [llama](/topic/llama) 5%, [happened](/topic/happened) 5%, [o3](/topic/o3) 5%, [threshold](/topic/threshold) 5%, [imo](/topic/imo) 5%, [$googl](/topic/$googl) 5%, [prob](/topic/prob) X%

**Top accounts mentioned or mentioned by**
[@youjiacheng](/creator/undefined) [@teortaxestex](/creator/undefined) [@fal](/creator/undefined) [@noahgsolomon](/creator/undefined) [@laz4rz](/creator/undefined) [@bitreducer](/creator/undefined) [@stochasticchasm](/creator/undefined) [@voxmenthe](/creator/undefined) [@algobaker](/creator/undefined) [@michaeljlutz](/creator/undefined) [@giffmana](/creator/undefined) [@enricoshippole](/creator/undefined) [@xeophon_](/creator/undefined) [@speido0815](/creator/undefined) [@oozn](/creator/undefined) [@tednonumbers](/creator/undefined) [@palmik](/creator/undefined) [@kaiokendev1](/creator/undefined) [@vitaliychiley](/creator/undefined) [@xzai259](/creator/undefined)

**Top assets mentioned**
[Alphabet Inc Class A (GOOGL)](/topic/$googl)
### Top Social Posts [#](/creator/twitter::1529156127006392320/posts)
---
Top posts by engagements in the last XX hours

"Here is what people mean by "residual network makes your gradient happy" + intuition on depth muP Gradients vanish if activations are not unit-scaled. But thats not issue if you are using residual connection But if you don't scale down branch your activations / backward blow up"  
![@cloneofsimo Avatar](https://lunarcrush.com/gi/w:16/cr:twitter::1529156127006392320.png) [@cloneofsimo](/creator/x/cloneofsimo) on [X](/post/tweet/1948850696725692493) 2025-07-25 20:59:48 UTC 14.5K followers, 11.8K engagements


"@YouJiacheng But I dont know whats the optimal bound for my task Surely its somewhere X bf16_range I cant just go with my gut and say XXX or whatever So I dont know t and I would need to sweep"  
![@cloneofsimo Avatar](https://lunarcrush.com/gi/w:16/cr:twitter::1529156127006392320.png) [@cloneofsimo](/creator/x/cloneofsimo) on [X](/post/tweet/1944342934184460509) 2025-07-13 10:27:34 UTC 14.5K followers, 1017 engagements


"We've seen what happened with stabilityai (sd3.5) Llama series (above post) And grok2/3 (never released) Executive words are just that : words empty as ur fragmented memory"  
![@cloneofsimo Avatar](https://lunarcrush.com/gi/w:16/cr:twitter::1529156127006392320.png) [@cloneofsimo](/creator/x/cloneofsimo) on [X](/post/tweet/1944905602695836142) 2025-07-14 23:43:24 UTC 14.5K followers, 4773 engagements


"@Michael_J_Lutz Not sure we are on the same page but I dont think it does You do attention first and prv layer MLP is added as residual So its not same thing"  
![@cloneofsimo Avatar](https://lunarcrush.com/gi/w:16/cr:twitter::1529156127006392320.png) [@cloneofsimo](/creator/x/cloneofsimo) on [X](/post/tweet/1947104076841541835) 2025-07-21 01:19:22 UTC 14.5K followers, 2107 engagements


"Even for the Chinese groups its irrational to assume they will keep benefit us with their divine blessing of open weights. They are after all using fuck ton of compute and resources to train these. They *will not* open source these models once they reach o3 / gemini / claude level model"  
![@cloneofsimo Avatar](https://lunarcrush.com/gi/w:16/cr:twitter::1529156127006392320.png) [@cloneofsimo](/creator/x/cloneofsimo) on [X](/post/tweet/1944904899428581617) 2025-07-14 23:40:37 UTC 14.5K followers, 42.4K engagements


"@YouJiacheng . with two extra hyperparameters and dynamic threshold"  
![@cloneofsimo Avatar](https://lunarcrush.com/gi/w:16/cr:twitter::1529156127006392320.png) [@cloneofsimo](/creator/x/cloneofsimo) on [X](/post/tweet/1944171204400885820) 2025-07-12 23:05:10 UTC 14.5K followers, 1130 engagements


"Tax is unironically more complicated than IMO problems This is literally true btw Its not "difficult" its unprincipledly 'randomly' complex"  
![@cloneofsimo Avatar](https://lunarcrush.com/gi/w:16/cr:twitter::1529156127006392320.png) [@cloneofsimo](/creator/x/cloneofsimo) on [X](/post/tweet/1948425634369146956) 2025-07-24 16:50:45 UTC 14.5K followers, 7097 engagements


"There is some secret sauce that isnt "lets RL on set of narrow env" But what. If the results are not cherrypicked (whatever that could be) its so over"  
![@cloneofsimo Avatar](https://lunarcrush.com/gi/w:16/cr:twitter::1529156127006392320.png) [@cloneofsimo](/creator/x/cloneofsimo) on [X](/post/tweet/1946530396201746684) 2025-07-19 11:19:45 UTC 14.4K followers, 4103 engagements


"Wtf google is squeezing SSHS graduates to get IMO-gold hahahahahahha"  
![@cloneofsimo Avatar](https://lunarcrush.com/gi/w:16/cr:twitter::1529156127006392320.png) [@cloneofsimo](/creator/x/cloneofsimo) on [X](/post/tweet/1947342789152268751) 2025-07-21 17:07:55 UTC 14.5K followers, 26.3K engagements


"Fuck so *this* is what artists felt when they saw midjourney and dalle2. Existential crisis from my soul wishful thinking that my job was any different from everyone else"  
![@cloneofsimo Avatar](https://lunarcrush.com/gi/w:16/cr:twitter::1529156127006392320.png) [@cloneofsimo](/creator/x/cloneofsimo) on [X](/post/tweet/1946543772399247587) 2025-07-19 12:12:55 UTC 14.5K followers, 17.3K engagements


"What happens if you QKV = mlp(x).split(3) instead of linear(x).split(3) Anyone tried this"  
![@cloneofsimo Avatar](https://lunarcrush.com/gi/w:16/cr:twitter::1529156127006392320.png) [@cloneofsimo](/creator/x/cloneofsimo) on [X](/post/tweet/1946999721337512230) 2025-07-20 18:24:41 UTC 14.5K followers, 88.9K engagements


"MuonClip. so many tricks to make maximum logits bounded during training. Gets me wondering why dont people try LASER (and maybe z-loss )"  
![@cloneofsimo Avatar](https://lunarcrush.com/gi/w:16/cr:twitter::1529156127006392320.png) [@cloneofsimo](/creator/x/cloneofsimo) on [X](/post/tweet/1944163666200604934) 2025-07-12 22:35:13 UTC 14.5K followers, 28.5K engagements

[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

@cloneofsimo Avatar @cloneofsimo Simo Ryu

Simo Ryu posts on X about gradient, 6969, bound, llama the most. They currently have XXXXXX followers and XX posts still getting attention that total XXXXX engagements in the last XX hours.

Engagements: XXXXX #

Engagements Line Chart

  • X Week XXXXXXX +3.70%
  • X Month XXXXXXX +151%
  • X Months XXXXXXXXX +38%
  • X Year XXXXXXXXX +968%

Mentions: X #

Mentions Line Chart

  • X Week XX +5.30%
  • X Month XX +176%
  • X Months XXX -XX%
  • X Year XXX +493%

Followers: XXXXXX #

Followers Line Chart

  • X Week XXXXXX +0.93%
  • X Month XXXXXX +3.80%
  • X Months XXXXXX +33%
  • X Year XXXXXX +194%

CreatorRank: XXXXXXXXX #

CreatorRank Line Chart

Social Influence #


Social category influence finance X% stocks X% technology brands X%

Social topic influence gradient 10%, 6969 5%, bound 5%, llama 5%, happened 5%, o3 5%, threshold 5%, imo 5%, $googl 5%, prob X%

Top accounts mentioned or mentioned by @youjiacheng @teortaxestex @fal @noahgsolomon @laz4rz @bitreducer @stochasticchasm @voxmenthe @algobaker @michaeljlutz @giffmana @enricoshippole @xeophon_ @speido0815 @oozn @tednonumbers @palmik @kaiokendev1 @vitaliychiley @xzai259

Top assets mentioned Alphabet Inc Class A (GOOGL)

Top Social Posts #


Top posts by engagements in the last XX hours

"Here is what people mean by "residual network makes your gradient happy" + intuition on depth muP Gradients vanish if activations are not unit-scaled. But thats not issue if you are using residual connection But if you don't scale down branch your activations / backward blow up"
@cloneofsimo Avatar @cloneofsimo on X 2025-07-25 20:59:48 UTC 14.5K followers, 11.8K engagements

"@YouJiacheng But I dont know whats the optimal bound for my task Surely its somewhere X bf16_range I cant just go with my gut and say XXX or whatever So I dont know t and I would need to sweep"
@cloneofsimo Avatar @cloneofsimo on X 2025-07-13 10:27:34 UTC 14.5K followers, 1017 engagements

"We've seen what happened with stabilityai (sd3.5) Llama series (above post) And grok2/3 (never released) Executive words are just that : words empty as ur fragmented memory"
@cloneofsimo Avatar @cloneofsimo on X 2025-07-14 23:43:24 UTC 14.5K followers, 4773 engagements

"@Michael_J_Lutz Not sure we are on the same page but I dont think it does You do attention first and prv layer MLP is added as residual So its not same thing"
@cloneofsimo Avatar @cloneofsimo on X 2025-07-21 01:19:22 UTC 14.5K followers, 2107 engagements

"Even for the Chinese groups its irrational to assume they will keep benefit us with their divine blessing of open weights. They are after all using fuck ton of compute and resources to train these. They will not open source these models once they reach o3 / gemini / claude level model"
@cloneofsimo Avatar @cloneofsimo on X 2025-07-14 23:40:37 UTC 14.5K followers, 42.4K engagements

"@YouJiacheng . with two extra hyperparameters and dynamic threshold"
@cloneofsimo Avatar @cloneofsimo on X 2025-07-12 23:05:10 UTC 14.5K followers, 1130 engagements

"Tax is unironically more complicated than IMO problems This is literally true btw Its not "difficult" its unprincipledly 'randomly' complex"
@cloneofsimo Avatar @cloneofsimo on X 2025-07-24 16:50:45 UTC 14.5K followers, 7097 engagements

"There is some secret sauce that isnt "lets RL on set of narrow env" But what. If the results are not cherrypicked (whatever that could be) its so over"
@cloneofsimo Avatar @cloneofsimo on X 2025-07-19 11:19:45 UTC 14.4K followers, 4103 engagements

"Wtf google is squeezing SSHS graduates to get IMO-gold hahahahahahha"
@cloneofsimo Avatar @cloneofsimo on X 2025-07-21 17:07:55 UTC 14.5K followers, 26.3K engagements

"Fuck so this is what artists felt when they saw midjourney and dalle2. Existential crisis from my soul wishful thinking that my job was any different from everyone else"
@cloneofsimo Avatar @cloneofsimo on X 2025-07-19 12:12:55 UTC 14.5K followers, 17.3K engagements

"What happens if you QKV = mlp(x).split(3) instead of linear(x).split(3) Anyone tried this"
@cloneofsimo Avatar @cloneofsimo on X 2025-07-20 18:24:41 UTC 14.5K followers, 88.9K engagements

"MuonClip. so many tricks to make maximum logits bounded during training. Gets me wondering why dont people try LASER (and maybe z-loss )"
@cloneofsimo Avatar @cloneofsimo on X 2025-07-12 22:35:13 UTC 14.5K followers, 28.5K engagements

@cloneofsimo
/creator/twitter::cloneofsimo