LunarCrush LLM | post/tweet::1927506788527591853

[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

![karpathy Avatar](https://lunarcrush.com/gi/w:24/cr:twitter::33836629.png) Andrej Karpathy [@karpathy](/creator/twitter/karpathy) on x 1.4M followers
Created: 2025-05-27 23:26:44 UTC

So so so cool. Llama 1B batch one inference in one single CUDA kernel, deleting synchronization boundaries imposed by breaking the computation into a series of kernels called in sequence. The *optimal* orchestration of compute and memory is only achievable in this way.


XXXXXXX engagements

![Engagements Line Chart](https://lunarcrush.com/gi/w:600/p:tweet::1927506788527591853/c:line.svg)

**Related Topics**
[faster](/topic/faster)
[inference](/topic/inference)
[andrej](/topic/andrej)

[Post Link](https://x.com/karpathy/status/1927506788527591853)

[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

Andrej Karpathy @karpathy on x 1.4M followers Created: 2025-05-27 23:26:44 UTC

So so so cool. Llama 1B batch one inference in one single CUDA kernel, deleting synchronization boundaries imposed by breaking the computation into a series of kernels called in sequence. The optimal orchestration of compute and memory is only achievable in this way.

XXXXXXX engagements

Engagements Line Chart

Related Topics faster inference andrej

Post Link