[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

lateinteraction Avatar Omar Khattab @lateinteraction on x 22.7K followers Created: 2025-07-28 19:11:13 UTC

@hallerite @AlexGDimakis These are multi-step tasks, e.g. HoVer makes ~8 LLM calls iirc

Anyway, I'd expect better performance from a well-designed reflective optimizer than from GRPO in particular for long-horizon problems.

Credit assignment is rather easy for an LLM that actually sees the trajectory

XXX engagements

Engagements Line Chart

Related Topics llm

Post Link