LunarCrush LLM | creator/twitter::1706770561903497216/posts

[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

[@METR_Evals](/creator/twitter/METR_Evals)
"We estimate that Claude Sonnet XXX has a 50%-time-horizon of around X hr XX min (95% confidence interval of XX to XXX minutes) on our agentic multi-step software engineering tasks. This estimate is lower than the current highest time-horizon point estimate of around X hr XX min"  
[X Link](https://x.com/METR_Evals/status/1976331315772580274) [@METR_Evals](/creator/x/METR_Evals) 2025-10-09T16:57Z 12.2K followers, 287.9K engagements


"When will AI systems be able to carry out long projects independently In new research we find a kind of Moores Law for AI agents: the length of tasks that AIs can do is doubling about every X months"  
[X Link](https://x.com/METR_Evals/status/1902384481111322929) [@METR_Evals](/creator/x/METR_Evals) 2025-03-19T15:39Z 12.3K followers, 8.5M engagements


"We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were XX% faster with AI tools but they were actually XX% slower when they had access to AI than when they didn't"  
[X Link](https://x.com/METR_Evals/status/1943360399220388093) [@METR_Evals](/creator/x/METR_Evals) 2025-07-10T17:23Z 12.3K followers, 3.7M engagements


"METR provided external feedback to OpenAI on its methodology for assessing whether its new open-weight models could be fine-tuned to pose catastrophic biosecurity/cybersecurity risks. We plan to share more about our involvement soon"  
[X Link](https://x.com/METR_Evals/status/1952839752501371228) [@METR_Evals](/creator/x/METR_Evals) 2025-08-05T21:10Z 12.3K followers, 10.4K engagements


"Check out the brief report now published on our blog for more details about our review of OpenAI's gpt-oss risk assessment methodology:"  
[X Link](https://x.com/METR_Evals/status/1981485845950713889) [@METR_Evals](/creator/x/METR_Evals) 2025-10-23T22:20Z 12.3K followers, 6031 engagements

[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

@METR_Evals "We estimate that Claude Sonnet XXX has a 50%-time-horizon of around X hr XX min (95% confidence interval of XX to XXX minutes) on our agentic multi-step software engineering tasks. This estimate is lower than the current highest time-horizon point estimate of around X hr XX min"
X Link @METR_Evals 2025-10-09T16:57Z 12.2K followers, 287.9K engagements

"When will AI systems be able to carry out long projects independently In new research we find a kind of Moores Law for AI agents: the length of tasks that AIs can do is doubling about every X months"
X Link @METR_Evals 2025-03-19T15:39Z 12.3K followers, 8.5M engagements

"We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were XX% faster with AI tools but they were actually XX% slower when they had access to AI than when they didn't"
X Link @METR_Evals 2025-07-10T17:23Z 12.3K followers, 3.7M engagements

"METR provided external feedback to OpenAI on its methodology for assessing whether its new open-weight models could be fine-tuned to pose catastrophic biosecurity/cybersecurity risks. We plan to share more about our involvement soon"
X Link @METR_Evals 2025-08-05T21:10Z 12.3K followers, 10.4K engagements

"Check out the brief report now published on our blog for more details about our review of OpenAI's gpt-oss risk assessment methodology:"
X Link @METR_Evals 2025-10-23T22:20Z 12.3K followers, 6031 engagements