Dark | Light
[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

# ![@saagnikkk Avatar](https://lunarcrush.com/gi/w:26/cr:twitter::1617896838950187009.png) @saagnikkk Sagnik

Sagnik posts on X about momentum, llm, r1, deep dive the most. They currently have XXX followers and XX posts still getting attention that total XXX engagements in the last XX hours.

### Engagements: XXX [#](/creator/twitter::1617896838950187009/interactions)
![Engagements Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::1617896838950187009/c:line/m:interactions.svg)

- X Week XXXXX -XX%
- X Months XXXXXXX +255%
- X Year XXXXXXX +1,867%

### Mentions: XX [#](/creator/twitter::1617896838950187009/posts_active)
![Mentions Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::1617896838950187009/c:line/m:posts_active.svg)

- X Months XX +1,100%
- X Year XX +1,100%

### Followers: XXX [#](/creator/twitter::1617896838950187009/followers)
![Followers Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::1617896838950187009/c:line/m:followers.svg)

- X Week XXX +0.77%
- X Months XXX +83%

### CreatorRank: XXXXXXXXX [#](/creator/twitter::1617896838950187009/influencer_rank)
![CreatorRank Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::1617896838950187009/c:line/m:influencer_rank.svg)

### Social Influence

**Social category influence**
[currencies](/list/currencies) 

**Social topic influence**
[momentum](/topic/momentum), [llm](/topic/llm), [r1](/topic/r1), [deep dive](/topic/deep-dive), [deep](/topic/deep), [rl](/topic/rl), [math](/topic/math)

**Top accounts mentioned or mentioned by**
[@pavanjayasinha](/creator/undefined) [@lifanyuan](/creator/undefined) [@dilekhakkanitur](/creator/undefined) [@haopenguiuc](/creator/undefined) [@mrtnm](/creator/undefined) [@haqueishfaq](/creator/undefined) [@tmpethick](/creator/undefined) [@creativemath](/creator/undefined) [@hayousoufiane](/creator/undefined) [@johnschulman2](/creator/undefined)
### Top Social Posts
Top posts by engagements in the last XX hours

"🚨New Blog Alert: Is AdamW an overkill for RLVR We found that vanilla SGD is X. As performant as AdamW X. 36x more parameter efficient naturally. (much more than a rank X lora) 🤯 Looks like a "free lunch". Maybe Its time to rethink the optimizers for RLVR 🧵"  
[X Link](https://x.com/saagnikkk/status/1995186197145022944)  2025-11-30T17:40Z XXX followers, 166.8K engagements


"The assumption: You need adaptive optimizers because the LLM loss landscape is complex. Our earlier observation: RL finetunes a small subnetwork. Maybe the loss landscape is simpler than we think Maybe we dont need momentum"  
[X Link](https://x.com/saagnikkk/status/1995186199787417861)  2025-11-30T17:40Z XXX followers, 7343 engagements


"We did controlled experiments to compare SGD against AdamW on math reasoning. The result X. Reward convergence was identical. X. Downstream benchmark accuracy was identical. SGD and AdamW performance is similar while SGD uses less memory. 💾"  
[X Link](https://x.com/saagnikkk/status/1995186201993576624)  2025-11-30T17:40Z XXX followers, 3879 engagements


"SGD Does it for cheap: While performance is comparable the cost is not. AdamW requires maintaining X extra states per parameter (momentum + variance). SGD requires zero. By switching we slash the optimizer memory footprint"  
[X Link](https://x.com/saagnikkk/status/1995186206527643931)  2025-11-30T17:40Z XXX followers, 3184 engagements


"The most surprising finding: The Sparsity. 🤯 With SGD we achieved full fine-tuning performance while only XXXX% (yes you read that right) of parameters had meaningful updates. This is actually less than a Rank-1-LoRA"  
[X Link](https://x.com/saagnikkk/status/1995186208209588525)  2025-11-30T17:40Z XXX followers, 21.6K engagements


"Why this matters for the "GPU Poor": Memory. 💾 AdamW maintains X extra states. SGD maintains X. By switching you slash your memory footprint instantly. Allows larger batch size or longer context on same hardware. The update sparsity of XXXXX% opens up doors to efficiency gains"  
[X Link](https://x.com/saagnikkk/status/1995186209996308855)  2025-11-30T17:40Z XXX followers, 3162 engagements


"This aligns with recent thoughts from @johnschulman2 on how RLVR works well with parameter-efficient methods like LoRA. It turns out SGD might be the "naturally sparse" alternative weve been sleeping on"  
[X Link](https://x.com/saagnikkk/status/1995186211611173027)  2025-11-30T17:40Z XXX followers, 2689 engagements


"We leave with some concluding remarks"  
[X Link](https://x.com/saagnikkk/status/1995186213313970351)  2025-11-30T17:40Z XXX followers, 3048 engagements


"Read full blog and detailed findings here - Work done with amazing team and advisors @lifan__yuan @pavanjayasinha @dilekhakkanitur @haopeng_uiuc"  
[X Link](https://x.com/saagnikkk/status/1995186215599911094)  2025-11-30T17:40Z XXX followers, 2981 engagements


"@HaqueIshfaq @pavanjayasinha @lifan__yuan @dilekhakkanitur @haopeng_uiuc The learning rates are actually not comparable in SGD/AdamW actually since AdamW uses an adaptive learning rate. SGD typically needs a much higher LR than AdamW"  
[X Link](https://x.com/saagnikkk/status/1998653870986129764)  2025-12-10T07:19Z XXX followers, XX engagements

[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

@saagnikkk Avatar @saagnikkk Sagnik

Sagnik posts on X about momentum, llm, r1, deep dive the most. They currently have XXX followers and XX posts still getting attention that total XXX engagements in the last XX hours.

Engagements: XXX #

Engagements Line Chart

  • X Week XXXXX -XX%
  • X Months XXXXXXX +255%
  • X Year XXXXXXX +1,867%

Mentions: XX #

Mentions Line Chart

  • X Months XX +1,100%
  • X Year XX +1,100%

Followers: XXX #

Followers Line Chart

  • X Week XXX +0.77%
  • X Months XXX +83%

CreatorRank: XXXXXXXXX #

CreatorRank Line Chart

Social Influence

Social category influence currencies

Social topic influence momentum, llm, r1, deep dive, deep, rl, math

Top accounts mentioned or mentioned by @pavanjayasinha @lifanyuan @dilekhakkanitur @haopenguiuc @mrtnm @haqueishfaq @tmpethick @creativemath @hayousoufiane @johnschulman2

Top Social Posts

Top posts by engagements in the last XX hours

"🚨New Blog Alert: Is AdamW an overkill for RLVR We found that vanilla SGD is X. As performant as AdamW X. 36x more parameter efficient naturally. (much more than a rank X lora) 🤯 Looks like a "free lunch". Maybe Its time to rethink the optimizers for RLVR 🧵"
X Link 2025-11-30T17:40Z XXX followers, 166.8K engagements

"The assumption: You need adaptive optimizers because the LLM loss landscape is complex. Our earlier observation: RL finetunes a small subnetwork. Maybe the loss landscape is simpler than we think Maybe we dont need momentum"
X Link 2025-11-30T17:40Z XXX followers, 7343 engagements

"We did controlled experiments to compare SGD against AdamW on math reasoning. The result X. Reward convergence was identical. X. Downstream benchmark accuracy was identical. SGD and AdamW performance is similar while SGD uses less memory. 💾"
X Link 2025-11-30T17:40Z XXX followers, 3879 engagements

"SGD Does it for cheap: While performance is comparable the cost is not. AdamW requires maintaining X extra states per parameter (momentum + variance). SGD requires zero. By switching we slash the optimizer memory footprint"
X Link 2025-11-30T17:40Z XXX followers, 3184 engagements

"The most surprising finding: The Sparsity. 🤯 With SGD we achieved full fine-tuning performance while only XXXX% (yes you read that right) of parameters had meaningful updates. This is actually less than a Rank-1-LoRA"
X Link 2025-11-30T17:40Z XXX followers, 21.6K engagements

"Why this matters for the "GPU Poor": Memory. 💾 AdamW maintains X extra states. SGD maintains X. By switching you slash your memory footprint instantly. Allows larger batch size or longer context on same hardware. The update sparsity of XXXXX% opens up doors to efficiency gains"
X Link 2025-11-30T17:40Z XXX followers, 3162 engagements

"This aligns with recent thoughts from @johnschulman2 on how RLVR works well with parameter-efficient methods like LoRA. It turns out SGD might be the "naturally sparse" alternative weve been sleeping on"
X Link 2025-11-30T17:40Z XXX followers, 2689 engagements

"We leave with some concluding remarks"
X Link 2025-11-30T17:40Z XXX followers, 3048 engagements

"Read full blog and detailed findings here - Work done with amazing team and advisors @lifan__yuan @pavanjayasinha @dilekhakkanitur @haopeng_uiuc"
X Link 2025-11-30T17:40Z XXX followers, 2981 engagements

"@HaqueIshfaq @pavanjayasinha @lifan__yuan @dilekhakkanitur @haopeng_uiuc The learning rates are actually not comparable in SGD/AdamW actually since AdamW uses an adaptive learning rate. SGD typically needs a much higher LR than AdamW"
X Link 2025-12-10T07:19Z XXX followers, XX engagements

@saagnikkk
/creator/twitter::saagnikkk