[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]
Omar Khattab @lateinteraction on x 22.7K followers
Created: 2025-07-28 19:11:13 UTC
@hallerite @AlexGDimakis These are multi-step tasks, e.g. HoVer makes ~8 LLM calls iirc
Anyway, I'd expect better performance from a well-designed reflective optimizer than from GRPO in particular for long-horizon problems.
Credit assignment is rather easy for an LLM that actually sees the trajectory
XXX engagements
Related Topics llm