[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]  Rohan Paul [@rohanpaul_ai](/creator/twitter/rohanpaul_ai) on x 73.9K followers Created: 2025-07-11 03:16:21 UTC Small model, big browser skills, thanks to smart compute splitting. Open web agents usually need huge models or tedious hit‑and‑miss tuning, so training a small open model that finishes multi‑step website tasks still feels like luck. This study shows how to split the training budget so an 8B Llama even beats its 70B teacher on many tasks. Weak 8B student first copies 70B demos through supervised fine tuning, then swaps to on‑policy reinforcement learning while the lessons are fresh. The authors tried XXXXX hyperparameter mixes and used bootstrap sampling to learn which ones really matter instead of chasing noisy single seeds. Jumping into RL after roughly XX% of the compute lifts MiniWob++ success from XX% to XX% while cutting FLOPs by 45%, and it is the only open approach that keeps up with GPT‑4o. Picking temp 0.25, batch 512, zero‑advantage filtering, and grouped advantages stayed stable across budgets, so smaller labs can start there and dodge expensive sweeps. The compute‑aware recipe moves open agents closer to reliable browser automation. ---- Paper – arxiv. org/abs/2507.04103 Paper Title: "How to Train Your LLM Web Agent: A Statistical Diagnosis"  XXXXX engagements  **Related Topics** [llama](/topic/llama) [budgeting](/topic/budgeting) [Post Link](https://x.com/rohanpaul_ai/status/1943509638437310906)
[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]
Rohan Paul @rohanpaul_ai on x 73.9K followers
Created: 2025-07-11 03:16:21 UTC
Small model, big browser skills, thanks to smart compute splitting.
Open web agents usually need huge models or tedious hit‑and‑miss tuning, so training a small open model that finishes multi‑step website tasks still feels like luck.
This study shows how to split the training budget so an 8B Llama even beats its 70B teacher on many tasks.
Weak 8B student first copies 70B demos through supervised fine tuning, then swaps to on‑policy reinforcement learning while the lessons are fresh.
The authors tried XXXXX hyperparameter mixes and used bootstrap sampling to learn which ones really matter instead of chasing noisy single seeds.
Jumping into RL after roughly XX% of the compute lifts MiniWob++ success from XX% to XX% while cutting FLOPs by 45%, and it is the only open approach that keeps up with GPT‑4o.
Picking temp 0.25, batch 512, zero‑advantage filtering, and grouped advantages stayed stable across budgets, so smaller labs can start there and dodge expensive sweeps.
The compute‑aware recipe moves open agents closer to reliable browser automation.
Paper – arxiv. org/abs/2507.04103
Paper Title: "How to Train Your LLM Web Agent: A Statistical Diagnosis"
XXXXX engagements
/post/tweet::1943509638437310906