[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.] [@chanwoopark20](/creator/twitter/chanwoopark20) "Many thoughts after reading this blog post. Learning from someone with deep real-world experience in an unfamiliar domain is genuinely exciting it really helps spark new research ideas" [X Link](https://x.com/chanwoopark20/status/1980065695208784340) [@chanwoopark20](/creator/x/chanwoopark20) 2025-10-20T00:17Z 1602 followers, 22.3K engagements "(1/6) I just realized that muon naturally emerges from the principles of mirror descent. Of course there are many possible mirror descent formulations but muon represents one of the most natural and elegant ways to apply it" [X Link](https://x.com/chanwoopark20/status/1980840931285963028) [@chanwoopark20](/creator/x/chanwoopark20) 2025-10-22T03:37Z 1602 followers, XXX engagements "(2/6) In particular the 2-norm is the only norm that is self-dual under the Euclidean inner product. Why is self-duality important Because every neural network is essentially a concatenation of linear layers where the output of one layer becomes the input of the next" [X Link](https://x.com/chanwoopark20/status/1980840932737184179) [@chanwoopark20](/creator/x/chanwoopark20) 2025-10-22T03:37Z 1594 followers, XX engagements "(4/6) Therefore p=q=2 makes everything easier because you do not need to control two norms -- p-norm and q-norm" [X Link](https://x.com/chanwoopark20/status/1980840936004632841) [@chanwoopark20](/creator/x/chanwoopark20) 2025-10-22T03:37Z 1594 followers, XX engagements "(5/6) If we want each intermediate layer to maintain the same scale the only norm that preserves consistent scaling under this self-duality is the RMS norm. In other words muon can be interpreted as performing mirror descent using the RMS norm" [X Link](https://x.com/chanwoopark20/status/1980840937388667131) [@chanwoopark20](/creator/x/chanwoopark20) 2025-10-22T03:37Z 1602 followers, XXX engagements
[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]
@chanwoopark20
"Many thoughts after reading this blog post. Learning from someone with deep real-world experience in an unfamiliar domain is genuinely exciting it really helps spark new research ideas"
X Link @chanwoopark20 2025-10-20T00:17Z 1602 followers, 22.3K engagements
"(1/6) I just realized that muon naturally emerges from the principles of mirror descent. Of course there are many possible mirror descent formulations but muon represents one of the most natural and elegant ways to apply it"
X Link @chanwoopark20 2025-10-22T03:37Z 1602 followers, XXX engagements
"(2/6) In particular the 2-norm is the only norm that is self-dual under the Euclidean inner product. Why is self-duality important Because every neural network is essentially a concatenation of linear layers where the output of one layer becomes the input of the next"
X Link @chanwoopark20 2025-10-22T03:37Z 1594 followers, XX engagements
"(4/6) Therefore p=q=2 makes everything easier because you do not need to control two norms -- p-norm and q-norm"
X Link @chanwoopark20 2025-10-22T03:37Z 1594 followers, XX engagements
"(5/6) If we want each intermediate layer to maintain the same scale the only norm that preserves consistent scaling under this self-duality is the RMS norm. In other words muon can be interpreted as performing mirror descent using the RMS norm"
X Link @chanwoopark20 2025-10-22T03:37Z 1602 followers, XXX engagements
/creator/twitter::1457347791723069440/posts