[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]
@yifan_zhang_ Yifan Zhang @ NeurIPSYifan Zhang @ NeurIPS posts on X about rpg, llm, muon, 1b the most. They currently have XXXXX followers and XX posts still getting attention that total XXXXXX engagements in the last XX hours.
Social category influence currencies
Social topic influence rpg #203, llm, muon, 1b, the official, strong
Top posts by engagements in the last XX hours
"RPG (KL-regularized Policy Gradients) is a second-order optimizer that uses Hessian information (Fisher information matrices). TLDR: PG is like SGD RPG is like Muon/Shampoo 😃"
X Link 2025-12-11T01:38Z 3116 followers, 12.4K engagements
"🚀DeepSeek V3.2 officially utilized our corrected KL regularization term in their training objective On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning See also It will be even better if they can properly cite our work😀"
X Link 2025-12-01T13:47Z 3106 followers, 143K engagements
"@jasondeanlee True but frontier labs really use Muon and Shampoo"
X Link 2025-12-11T03:14Z 3072 followers, XXX engagements
"@jasondeanlee Any ideas worth 1B"
X Link 2025-12-11T03:14Z 3081 followers, XXX engagements
"🚀Introducing GRAPE: Group Representational Position Encoding. Embracing General Relative Law of Position Encoding unifying and improving Multiplicative and Additive Position Encoding such as RoPE and Alibi Better performance with a clear theoretical formulation Project Page: Paper: Devoted to the frontier of superintelligence hope you will enjoy it"
X Link 2025-12-08T21:05Z 3116 followers, 28.2K engagements
"Mixture of Parrots: Experts improve memorization more than reasoning"
X Link 2025-12-09T16:56Z 3115 followers, 70.6K engagements
"🚀 Newly updated RPG (KL-Regularized Policy Gradient) is available on arXiv: X. Our trained model beats the official checkpoint of Qwen3-4B-Instruct We extend our experiments to an 8K context length and find that RPG-REINFORCE with RPG-Style Clip achieves XX% accuracy on AIME25 surpassing the official Qwen3-4B-Instruct model (47%) and outperforming strong baselines. X. REINFORCE ESTIMATOR IS ALL YOU NEED RPG-REINFORCE consistently outperforms PPO/GRPO-style gradient estimators X. RPG is a second-order optimizer It utilizes second-order information and has a clear connection to the Natural"
X Link 2025-12-12T16:49Z 3117 followers, 21.9K engagements