[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]  Andrej Karpathy [@karpathy](/creator/twitter/karpathy) on x 1.4M followers Created: 2025-05-27 23:26:44 UTC So so so cool. Llama 1B batch one inference in one single CUDA kernel, deleting synchronization boundaries imposed by breaking the computation into a series of kernels called in sequence. The *optimal* orchestration of compute and memory is only achievable in this way. XXXXXXX engagements  **Related Topics** [faster](/topic/faster) [inference](/topic/inference) [andrej](/topic/andrej) [Post Link](https://x.com/karpathy/status/1927506788527591853)
[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]
Andrej Karpathy @karpathy on x 1.4M followers
Created: 2025-05-27 23:26:44 UTC
So so so cool. Llama 1B batch one inference in one single CUDA kernel, deleting synchronization boundaries imposed by breaking the computation into a series of kernels called in sequence. The optimal orchestration of compute and memory is only achievable in this way.
XXXXXXX engagements
/post/tweet::1927506788527591853