[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]
Yuda Song posts on X about rl, bound, drew, san diego the most. They currently have XXX followers and XX posts still getting attention that total XX engagements in the last XX hours.
Social category influence finance
Social topic influence rl, bound, drew, san diego, stacking, faster
Top posts by engagements in the last XX hours
"So the tl;dr: under stochastic dynamics avoid distillation and pay the compute tax for RL. But can distillation be rescued Yesour theory shows it benefits from a smooth expert and indeed experts trained with moderate motor noise transfer better (cf. DART) (9/n)"
X Link @yus167 2025-10-15T03:02Z XXX followers, XXX engagements
"As an instructive model we propose the perturbed Block MDP: a Block MDP with small emission noise. This models robotics settings where states are largely decodable but not perfectly (e.g. occlusion sensor noise). (5/n)"
X Link @yus167 2025-10-15T03:02Z XXX followers, XXX engagements
"But when the dynamics are stochastic we prove a lower bound: expert distillation can be arbitrarily bad. Empirically the gap between distillation and RL grows as environments get more stochastic. (8/n)"
X Link @yus167 2025-10-15T03:02Z XXX followers, XXX engagements
"This is joint work with the amazing Dhruv Rohatgi and my advisors Aarti Singh and Drew Bagnell. Check out our arXiv if you are interested: Our paper is accepted to NeurIPS 2025 with a spotlight. See you in San Diego (10/10)"
X Link @yus167 2025-10-15T03:02Z XXX followers, XXX engagements
"Our main insight: 🔹 RL succeeds if belief contraction error is small. 🔸 Distillation succeeds if decodability error and belief contraction error are small. The remaining question: how do these quantities compare (4/n)"
X Link @yus167 2025-10-15T03:02Z XXX followers, XXX engagements
"We first identify two key quantities: X. Decodability error: how stochastic the belief (the posterior of the underlying state given observations) is. X. Belief contraction error GMR 23: how much old observations affect the belief. (3/n)"
X Link @yus167 2025-10-15T03:02Z XXX followers, XXX engagements
"🔹 Theory says: in perturbed BMDPs belief contraction error decays exponentially with the frame-stack length. 👉 This explains why RL with enough stacking works in locomotion. (6/n)"
X Link @yus167 2025-10-15T03:02Z XXX followers, XXX engagements
"This choice is fundamental: Distillation can be much faster but has well-documented failure cases. RL with longer history can usually succeed but at huge compute cost. We wanted a framework (theory + experiments) to predict which wins in practice. (2/n)"
X Link @yus167 2025-10-15T03:02Z XXX followers, XXX engagements