Dark | Light
[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

![karpathy Avatar](https://lunarcrush.com/gi/w:24/cr:twitter::33836629.png) Andrej Karpathy [@karpathy](/creator/twitter/karpathy) on x 1.4M followers
Created: 2025-02-27 01:31:24 UTC

This is interesting as a first large diffusion-based LLM.

Most of the LLMs you've been seeing are ~clones as far as the core modeling approach goes. They're all trained "autoregressively", i.e. predicting tokens from left to right. Diffusion is different - it doesn't go left to right, but all at once. You start with noise and gradually denoise into a token stream.

Most of the image / video generation AI tools actually work this way and use Diffusion, not Autoregression. It's only text (and sometimes audio!) that have resisted. So it's been a bit of a mystery to me and many others why, for some reason, text prefers Autoregression, but images/videos prefer Diffusion. This turns out to be a fairly deep rabbit hole that has to do with the distribution of information and noise and our own perception of them, in these domains. If you look close enough, a lot of interesting connections emerge between the two as well.

All that to say that this model has the potential to be different, and possibly showcase new, unique psychology, or new strengths and weaknesses. I encourage people to try it out!


XXXXXXX engagements

![Engagements Line Chart](https://lunarcrush.com/gi/w:600/p:tweet::1894923254864978091/c:line.svg)

**Related Topics**
[token](/topic/token)
[llm](/topic/llm)

[Post Link](https://x.com/karpathy/status/1894923254864978091)

[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

karpathy Avatar Andrej Karpathy @karpathy on x 1.4M followers Created: 2025-02-27 01:31:24 UTC

This is interesting as a first large diffusion-based LLM.

Most of the LLMs you've been seeing are ~clones as far as the core modeling approach goes. They're all trained "autoregressively", i.e. predicting tokens from left to right. Diffusion is different - it doesn't go left to right, but all at once. You start with noise and gradually denoise into a token stream.

Most of the image / video generation AI tools actually work this way and use Diffusion, not Autoregression. It's only text (and sometimes audio!) that have resisted. So it's been a bit of a mystery to me and many others why, for some reason, text prefers Autoregression, but images/videos prefer Diffusion. This turns out to be a fairly deep rabbit hole that has to do with the distribution of information and noise and our own perception of them, in these domains. If you look close enough, a lot of interesting connections emerge between the two as well.

All that to say that this model has the potential to be different, and possibly showcase new, unique psychology, or new strengths and weaknesses. I encourage people to try it out!

XXXXXXX engagements

Engagements Line Chart

Related Topics token llm

Post Link

post/tweet::1894923254864978091
/post/tweet::1894923254864978091