[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]
Yuda Song posts on X about rl, stacking, faster, bound the most. They currently have XXX followers and X posts still getting attention that total X engagements in the last XX hours.
Social category influence finance XX%
Social topic influence rl 37.5%, stacking 12.5%, faster 12.5%, bound 12.5%, drew 12.5%, san diego XXXX%
Top accounts mentioned or mentioned by @max_simchowitz
Top posts by engagements in the last XX hours
"🔹 Theory says: in perturbed BMDPs belief contraction error decays exponentially with the frame-stack length. 👉 This explains why RL with enough stacking works in locomotion. (6/n)"
X Link @yus167 2025-10-15T03:02Z XXX followers, XXX engagements
"This choice is fundamental: Distillation can be much faster but has well-documented failure cases. RL with longer history can usually succeed but at huge compute cost. We wanted a framework (theory + experiments) to predict which wins in practice. (2/n)"
X Link @yus167 2025-10-15T03:02Z XXX followers, XXX engagements
"We first identify two key quantities: X. Decodability error: how stochastic the belief (the posterior of the underlying state given observations) is. X. Belief contraction error GMR 23: how much old observations affect the belief. (3/n)"
X Link @yus167 2025-10-15T03:02Z XXX followers, XXX engagements
"Our main insight: 🔹 RL succeeds if belief contraction error is small. 🔸 Distillation succeeds if decodability error and belief contraction error are small. The remaining question: how do these quantities compare (4/n)"
X Link @yus167 2025-10-15T03:02Z XXX followers, XXX engagements
"As an instructive model we propose the perturbed Block MDP: a Block MDP with small emission noise. This models robotics settings where states are largely decodable but not perfectly (e.g. occlusion sensor noise). (5/n)"
X Link @yus167 2025-10-15T03:02Z XXX followers, XXX engagements
"But when the dynamics are stochastic we prove a lower bound: expert distillation can be arbitrarily bad. Empirically the gap between distillation and RL grows as environments get more stochastic. (8/n)"
X Link @yus167 2025-10-15T03:02Z XXX followers, XXX engagements
"So the tl;dr: under stochastic dynamics avoid distillation and pay the compute tax for RL. But can distillation be rescued Yesour theory shows it benefits from a smooth expert and indeed experts trained with moderate motor noise transfer better (cf. DART) (9/n)"
X Link @yus167 2025-10-15T03:02Z XXX followers, XXX engagements
"This is joint work with the amazing Dhruv Rohatgi and my advisors Aarti Singh and Drew Bagnell. Check out our arXiv if you are interested: Our paper is accepted to NeurIPS 2025 with a spotlight. See you in San Diego (10/10)"
X Link @yus167 2025-10-15T03:02Z XXX followers, XXX engagements