Dark | Light
[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

# ![@WenSun1 Avatar](https://lunarcrush.com/gi/w:26/cr:twitter::824609918.png) @WenSun1 Wen Sun

Wen Sun posts on X about wen, gradient, o3, lays the most. They currently have XXX followers and X posts still getting attention that total XXX engagements in the last XX hours.

### Engagements: XXX [#](/creator/twitter::824609918/interactions)
![Engagements Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::824609918/c:line/m:interactions.svg)

- X Months XXXXX -XX%

### Mentions: X [#](/creator/twitter::824609918/posts_active)
![Mentions Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::824609918/c:line/m:posts_active.svg)


### Followers: XXX [#](/creator/twitter::824609918/followers)
![Followers Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::824609918/c:line/m:followers.svg)

- X Months XXX +61%

### CreatorRank: XXXXXXXXX [#](/creator/twitter::824609918/influencer_rank)
![CreatorRank Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::824609918/c:line/m:influencer_rank.svg)

### Social Influence [#](/creator/twitter::824609918/influence)
---

**Social topic influence**
[wen](/topic/wen) #140, [gradient](/topic/gradient), [o3](/topic/o3), [lays](/topic/lays), [inference](/topic/inference)
### Top Social Posts [#](/creator/twitter::824609918/posts)
---
Top posts by engagements in the last XX hours

"Does RL actually learn positively under random rewards when optimizing Qwen on MATH Is Qwen really that magical such that even RLing on random rewards can make it reason better Following prior work on spurious rewards on RL we ablated algorithms. It turns out that if you deploy algorithms like Reinforce and REBEL (a generalization of Natural Policy Gradient) RL does not learn under random rewards. These two simple algorithms simply behave as we would expect in this case. GRPO and PPO indeed can behave strangely. They can learn positively or negatively depending on different random seeds. The"  
![@WenSun1 Avatar](https://lunarcrush.com/gi/w:16/cr:twitter::824609918.png) [@WenSun1](/creator/x/WenSun1) on [X](/post/tweet/1945313845804724697) 2025-07-16 02:45:37 UTC XXX followers, 12.4K engagements


"How can small LLMs match or even surpass frontier models like DeepSeek R1 and o3 Mini in math competition (AIME & HMMT) reasoning Prior work seems to suggest that ideas like PRMs do not really work or scale well for long context reasoning. @kaiwenw_ai will reveal how a novel value model (not the usual PRMs) can be trained to enable massive search at inference time"  
![@WenSun1 Avatar](https://lunarcrush.com/gi/w:16/cr:twitter::824609918.png) [@WenSun1](/creator/x/WenSun1) on [X](/post/tweet/1946043864097169620) 2025-07-18 03:06:27 UTC XXX followers, 5078 engagements

[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

@WenSun1 Avatar @WenSun1 Wen Sun

Wen Sun posts on X about wen, gradient, o3, lays the most. They currently have XXX followers and X posts still getting attention that total XXX engagements in the last XX hours.

Engagements: XXX #

Engagements Line Chart

  • X Months XXXXX -XX%

Mentions: X #

Mentions Line Chart

Followers: XXX #

Followers Line Chart

  • X Months XXX +61%

CreatorRank: XXXXXXXXX #

CreatorRank Line Chart

Social Influence #


Social topic influence wen #140, gradient, o3, lays, inference

Top Social Posts #


Top posts by engagements in the last XX hours

"Does RL actually learn positively under random rewards when optimizing Qwen on MATH Is Qwen really that magical such that even RLing on random rewards can make it reason better Following prior work on spurious rewards on RL we ablated algorithms. It turns out that if you deploy algorithms like Reinforce and REBEL (a generalization of Natural Policy Gradient) RL does not learn under random rewards. These two simple algorithms simply behave as we would expect in this case. GRPO and PPO indeed can behave strangely. They can learn positively or negatively depending on different random seeds. The"
@WenSun1 Avatar @WenSun1 on X 2025-07-16 02:45:37 UTC XXX followers, 12.4K engagements

"How can small LLMs match or even surpass frontier models like DeepSeek R1 and o3 Mini in math competition (AIME & HMMT) reasoning Prior work seems to suggest that ideas like PRMs do not really work or scale well for long context reasoning. @kaiwenw_ai will reveal how a novel value model (not the usual PRMs) can be trained to enable massive search at inference time"
@WenSun1 Avatar @WenSun1 on X 2025-07-18 03:06:27 UTC XXX followers, 5078 engagements

creator/x::WenSun1
/creator/x::WenSun1