[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]  Simo Ryu [@cloneofsimo](/creator/twitter/cloneofsimo) on x 14.6K followers Created: 2025-07-25 20:59:48 UTC Here is what people mean by "residual network makes your gradient happy" + intuition on depth muP Gradients vanish if activations are not unit-scaled. But thats not issue if you are using residual connection! But if you don't scale down branch, your activations / backward blow up!  XXXXXX engagements  **Related Topics** [if you](/topic/if-you) [ryu](/topic/ryu) [Post Link](https://x.com/cloneofsimo/status/1948850696725692493)
[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]
Simo Ryu @cloneofsimo on x 14.6K followers
Created: 2025-07-25 20:59:48 UTC
Here is what people mean by "residual network makes your gradient happy" + intuition on depth muP Gradients vanish if activations are not unit-scaled. But thats not issue if you are using residual connection! But if you don't scale down branch, your activations / backward blow up!
XXXXXX engagements
/post/tweet::1948850696725692493