Dark | Light
[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

![rohanpaul_ai Avatar](https://lunarcrush.com/gi/w:24/cr:twitter::2588345408.png) Rohan Paul [@rohanpaul_ai](/creator/twitter/rohanpaul_ai) on x 73.7K followers
Created: 2025-07-17 18:29:45 UTC

🖥️ OpenAI rolled out “agent mode,” in ChatGPT.

Lets the model click around a virtual computer, run code, and finish multistep jobs on its own, hitting XXXX% on Humanity’s Last Exam while handling chores like building slide decks or buying groceries. 

It reaches XXXX% accuracy on Humanity’s Last Exam (HLE), while older baselines like OpenAI o3 without tools sit at XXXX% and deep-research with browsing reaches 26.6%.

The HLE exam spans XXXXX expert-level questions across 100+ subjects that were crowdsourced specifically to stump modern language models.

So coubling the previous best pass@1 score signals a jump in broad reasoning skill, not just memorization.

The leap comes from giving the model its own virtual computer with a browser, terminal, and API hooks so it can fetch data, run code, and decide which tool to use on the fly.

OpenAI also says that the agent ran up to X attempts in parallel and picked the answer the model felt most confident about, got the score pushed to XXXX%

So overall, agents that can act as well as reason are starting to close the gap with human experts.

![](https://pbs.twimg.com/media/GwFGRFtXYAASyYk.png)

XXXXX engagements

![Engagements Line Chart](https://lunarcrush.com/gi/w:600/p:tweet::1945913831370453072/c:line.svg)

**Related Topics**
[groceries](/topic/groceries)
[virtual](/topic/virtual)
[open ai](/topic/open-ai)

[Post Link](https://x.com/rohanpaul_ai/status/1945913831370453072)

[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

rohanpaul_ai Avatar Rohan Paul @rohanpaul_ai on x 73.7K followers Created: 2025-07-17 18:29:45 UTC

🖥️ OpenAI rolled out “agent mode,” in ChatGPT.

Lets the model click around a virtual computer, run code, and finish multistep jobs on its own, hitting XXXX% on Humanity’s Last Exam while handling chores like building slide decks or buying groceries.

It reaches XXXX% accuracy on Humanity’s Last Exam (HLE), while older baselines like OpenAI o3 without tools sit at XXXX% and deep-research with browsing reaches 26.6%.

The HLE exam spans XXXXX expert-level questions across 100+ subjects that were crowdsourced specifically to stump modern language models.

So coubling the previous best pass@1 score signals a jump in broad reasoning skill, not just memorization.

The leap comes from giving the model its own virtual computer with a browser, terminal, and API hooks so it can fetch data, run code, and decide which tool to use on the fly.

OpenAI also says that the agent ran up to X attempts in parallel and picked the answer the model felt most confident about, got the score pushed to XXXX%

So overall, agents that can act as well as reason are starting to close the gap with human experts.

XXXXX engagements

Engagements Line Chart

Related Topics groceries virtual open ai

Post Link

post/tweet::1945913831370453072
/post/tweet::1945913831370453072