Dark | Light
[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

![rohanpaul_ai Avatar](https://lunarcrush.com/gi/w:24/cr:twitter::2588345408.png) Rohan Paul [@rohanpaul_ai](/creator/twitter/rohanpaul_ai) on x 73.7K followers
Created: 2025-07-18 10:01:00 UTC

This paper wants to understand LLM's proficiency in enhancing code performance at the repository level or delivering meaningful speed gains.

Because they do not know which lines of code waste the most time or how to coordinate fixes across several files.

The authors built SWE-Perf to measure that shortfall.

Human reviewers in the benchmark trimmed average runtime by 10.9%, while the best agent improved only 2.3%, even though it passed almost XX% of the functional checks. That gap shows that real performance work still needs profiling tools, cross file reasoning, and awareness of low level trade-offs like vectorized math versus plain Python loops, skills the current models do not have.

Shows that models can suggest correct code but struggle to find costly hot paths, coordinate changes across files, or judge micro optimizations like vectorized math versus Python loops.

----

Paper – arxiv. org/abs/2507.12415

Paper Title: "SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?"

![](https://pbs.twimg.com/media/GwHVW0CWIAACDZA.jpg)

XXXXX engagements

![Engagements Line Chart](https://lunarcrush.com/gi/w:600/p:tweet::1946148188982800470/c:line.svg)

**Related Topics**
[files](/topic/files)

[Post Link](https://x.com/rohanpaul_ai/status/1946148188982800470)

[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

rohanpaul_ai Avatar Rohan Paul @rohanpaul_ai on x 73.7K followers Created: 2025-07-18 10:01:00 UTC

This paper wants to understand LLM's proficiency in enhancing code performance at the repository level or delivering meaningful speed gains.

Because they do not know which lines of code waste the most time or how to coordinate fixes across several files.

The authors built SWE-Perf to measure that shortfall.

Human reviewers in the benchmark trimmed average runtime by 10.9%, while the best agent improved only 2.3%, even though it passed almost XX% of the functional checks. That gap shows that real performance work still needs profiling tools, cross file reasoning, and awareness of low level trade-offs like vectorized math versus plain Python loops, skills the current models do not have.

Shows that models can suggest correct code but struggle to find costly hot paths, coordinate changes across files, or judge micro optimizations like vectorized math versus Python loops.


Paper – arxiv. org/abs/2507.12415

Paper Title: "SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?"

XXXXX engagements

Engagements Line Chart

Related Topics files

Post Link

post/tweet::1946148188982800470
/post/tweet::1946148188982800470