Dark | Light
[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

# ![@mike64_t Avatar](https://lunarcrush.com/gi/w:26/cr:twitter::1581044963114209286.png) @mike64_t mike64_t

mike64_t posts on X about minecraft, theta, fps, mib the most. They currently have XXXXX followers and XX posts still getting attention that total XXXXX engagements in the last XX hours.

### Engagements: XXXXX [#](/creator/twitter::1581044963114209286/interactions)
![Engagements Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::1581044963114209286/c:line/m:interactions.svg)

- X Week XXXXXXX -XX%
- X Month XXXXXXX +959%
- X Months XXXXXXX +2,551%
- X Year XXXXXXX +11%

### Mentions: XX [#](/creator/twitter::1581044963114209286/posts_active)
![Mentions Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::1581044963114209286/c:line/m:posts_active.svg)

- X Months XX +143%
- X Year XX +86%

### Followers: XXXXX [#](/creator/twitter::1581044963114209286/followers)
![Followers Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::1581044963114209286/c:line/m:followers.svg)

- X Week XXXXX +17%
- X Month XXXXX +55%
- X Months XXXXX +165%
- X Year XXXXX +237%

### CreatorRank: XXXXXXX [#](/creator/twitter::1581044963114209286/influencer_rank)
![CreatorRank Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::1581044963114209286/c:line/m:influencer_rank.svg)

### Social Influence [#](/creator/twitter::1581044963114209286/influence)
---

**Social category influence**
[currencies](/list/currencies)  XXXX% [technology brands](/list/technology-brands)  XXXX% [stocks](/list/stocks)  XXXX% [gaming](/list/gaming)  XXXX%

**Social topic influence**
[minecraft](/topic/minecraft) 6.45%, [theta](/topic/theta) 3.23%, [fps](/topic/fps) 3.23%, [mib](/topic/mib) 3.23%, [meta](/topic/meta) 3.23%, [$googl](/topic/$googl) 3.23%, [make your](/topic/make-your) 3.23%, [rl](/topic/rl) 3.23%, [wrt](/topic/wrt) 3.23%, [loop](/topic/loop) XXXX%

**Top accounts mentioned or mentioned by**
[@samsja19](/creator/undefined) [@agilejebrim](/creator/undefined) [@redmondai](/creator/undefined) [@kalomaze](/creator/undefined) [@dcower](/creator/undefined) [@scottjmaddox](/creator/undefined) [@3thanpetersen](/creator/undefined) [@chrszegedy](/creator/undefined) [@mike64t](/creator/undefined) [@xxshaurizardxx](/creator/undefined) [@kylemarieb](/creator/undefined) [@sebaaltonen](/creator/undefined) [@ar_douillard](/creator/undefined) [@hitysam](/creator/undefined) [@tinygrad](/creator/undefined) [@anushelangovan](/creator/undefined) [@infogulch](/creator/undefined) [@mikasenghaas](/creator/undefined) [@ffmpeg](/creator/undefined) [@primeintellect](/creator/undefined)

**Top assets mentioned**
[Alphabet Inc Class A (GOOGL)](/topic/$googl)
### Top Social Posts [#](/creator/twitter::1581044963114209286/posts)
---
Top posts by engagements in the last XX hours

"SGD usually means you are performing some sequence of updates W_n+1 = W_n - theta_n(W_n). Note how theta is a function of the current weights. To build the gradients at a point in time it has to be a function of the weights. Therefore theta_n being computed implies W_n is already computed. If given some sequence of tokens x_1 x_2 . x_n we expect a given function f(x_1 x_2 . x_n) to compute n such steps that would imply realizing n such computations of theta and W. In the worst case we can make no assumptions about the form of the function computing theta_n(.). In this case these N"  
[X Link](https://x.com/mike64_t/status/1979535382065864858) [@mike64_t](/creator/x/mike64_t) 2025-10-18T13:09Z 5954 followers, XXX engagements


"Deleted my previous response because it wasn't entirely accurate. The minimum viable architecture that realizes this update relation is linear attention as far as I can tell. Having said that yes under the paper's definitions you get low-rank block-local "fast-weight updates". Two things: - Increasing sequence length mainly adds more low-rank directions not "steps". If "inference time compute = more tokens" you are not actually running SGD w.r.t. to W_t-1 but W_0. - The important part would be to show that this ICL-loss actually manifests if you aren't synthetically inducing it by doing mse"  
[X Link](https://x.com/mike64_t/status/1979492090422948013) [@mike64_t](/creator/x/mike64_t) 2025-10-18T10:17Z 5954 followers, 2887 engagements


"Perks of writing your training code by rolling your own tensorlib: XXX fps on a 4090 @ batch size XX pytorch impl reaches 1000 fps @ batch size XXX on a 8xH100 node And yet you can use Triton and Cutlass while shipping a X MiB executable while still supporting AMD"  
[X Link](https://x.com/mike64_t/status/1979893414507471343) [@mike64_t](/creator/x/mike64_t) 2025-10-19T12:52Z 5955 followers, 27.1K engagements


"@3thanPetersen Genie X is still strictly within what I'd define as a "walking simulator". rigid body/soft body dynamics factorio factories minecraft redstone etc. is what I mean. Simple rules with emergent properties that are *exploitable* via design"  
[X Link](https://x.com/mike64_t/status/1977866913595220145) [@mike64_t](/creator/x/mike64_t) 2025-10-13T22:39Z 5831 followers, XXX engagements


"Its true that the compounding non linear effect is lost. The ICL sgd could be argued to be caused by meta optimization but I think that evidence that this actually happens is shaky at best. It just happens to be expressible but that doesnt mean its the path of least resistance actually taken by the optimizer"  
[X Link](https://x.com/mike64_t/status/1979914937360347240) [@mike64_t](/creator/x/mike64_t) 2025-10-19T14:18Z 5954 followers, XX engagements


"just so you guys know the bottleneck for getting data out of Minecraft is literally FFmpeg and I can guarantee this model is both slower and worse than actually being ingame. The timer speed can be increased frame capture can happen in game at scaled rate and the thing that will make your system come to a crawl is FFmpeg and there's nothing you can do about it. You can't encode video at 1200 fps with current technology unless you're recording at like 240p or you have a 20PB SSD somewhere and decide to use avi"  
[X Link](https://x.com/mike64_t/status/1975737784557314378) [@mike64_t](/creator/x/mike64_t) 2025-10-08T01:39Z 5956 followers, 108.3K engagements


"I'm fairly convinced RL will not get us to end-to-end implementation of huge projects. Codex still has zero smell of "anticipating the future". We will likely have to revisit pre-training on *long*-running agentic data before attempting RL again. And even if we do "get there" through RL whenever natively agentic models get their dose of RL they will make them utterly obsolete"  
[X Link](https://x.com/mike64_t/status/1977793685111845248) [@mike64_t](/creator/x/mike64_t) 2025-10-13T17:48Z 5954 followers, 117.3K engagements


"W.r.t world models it would be unwise to expect anything more than a fancy walking simulator with good graphics. A world model which cannot properly simulate interaction dynamics and complex emergent properties from simple rules is insufficient for exhaustive exploration. It was never about the graphics it was always about *systems*. And I think classical Physics Engines are close to the minimum program representation possible. It only gets slower and less consistent if you use neural techniques. There is no Free Lunch for Parallelism and Physics is inherently serial. You *cannot* expect a"  
[X Link](https://x.com/mike64_t/status/1977811478498640369) [@mike64_t](/creator/x/mike64_t) 2025-10-13T18:59Z 5952 followers, 13.3K engagements


"I think the notion that "Transformers do Gradient Descent in Context" is quite misleading. The least charitable interpretation of the statement is ofc. "does the forward pass contain its own backward pass" which is of course non-sensical and thus obviously false. It's of course very possible to expect state to be "advanced" w.r.t to new information and for simple problems where the architecture happens to express associative state updates but this is all heavily coincidental on input representations. What I would care a lot more about is *non-linear* updates to the context state i.e. a"  
[X Link](https://x.com/mike64_t/status/1979359743018963392) [@mike64_t](/creator/x/mike64_t) 2025-10-18T01:31Z 5954 followers, 12.5K engagements


"The intended meaning is of course a bit more subtle. The notion that there is some weaker sub-optimization process that emerges from the outer loop is plausible to me--but that has nothing to do with SGD"  
[X Link](https://x.com/mike64_t/status/1979364017383837838) [@mike64_t](/creator/x/mike64_t) 2025-10-18T01:48Z 5954 followers, 1324 engagements


"Unpopular opinion: the green button is good because it forces you to think about how you will deploy your application. If you actually have to reason about a run configuration as a primitive your future self will thank you as soon as you deploy that thing"  
[X Link](https://x.com/mike64_t/status/1980087251846758683) [@mike64_t](/creator/x/mike64_t) 2025-10-20T01:42Z 5954 followers, 2705 engagements

[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

@mike64_t Avatar @mike64_t mike64_t

mike64_t posts on X about minecraft, theta, fps, mib the most. They currently have XXXXX followers and XX posts still getting attention that total XXXXX engagements in the last XX hours.

Engagements: XXXXX #

Engagements Line Chart

  • X Week XXXXXXX -XX%
  • X Month XXXXXXX +959%
  • X Months XXXXXXX +2,551%
  • X Year XXXXXXX +11%

Mentions: XX #

Mentions Line Chart

  • X Months XX +143%
  • X Year XX +86%

Followers: XXXXX #

Followers Line Chart

  • X Week XXXXX +17%
  • X Month XXXXX +55%
  • X Months XXXXX +165%
  • X Year XXXXX +237%

CreatorRank: XXXXXXX #

CreatorRank Line Chart

Social Influence #


Social category influence currencies XXXX% technology brands XXXX% stocks XXXX% gaming XXXX%

Social topic influence minecraft 6.45%, theta 3.23%, fps 3.23%, mib 3.23%, meta 3.23%, $googl 3.23%, make your 3.23%, rl 3.23%, wrt 3.23%, loop XXXX%

Top accounts mentioned or mentioned by @samsja19 @agilejebrim @redmondai @kalomaze @dcower @scottjmaddox @3thanpetersen @chrszegedy @mike64t @xxshaurizardxx @kylemarieb @sebaaltonen @ar_douillard @hitysam @tinygrad @anushelangovan @infogulch @mikasenghaas @ffmpeg @primeintellect

Top assets mentioned Alphabet Inc Class A (GOOGL)

Top Social Posts #


Top posts by engagements in the last XX hours

"SGD usually means you are performing some sequence of updates W_n+1 = W_n - theta_n(W_n). Note how theta is a function of the current weights. To build the gradients at a point in time it has to be a function of the weights. Therefore theta_n being computed implies W_n is already computed. If given some sequence of tokens x_1 x_2 . x_n we expect a given function f(x_1 x_2 . x_n) to compute n such steps that would imply realizing n such computations of theta and W. In the worst case we can make no assumptions about the form of the function computing theta_n(.). In this case these N"
X Link @mike64_t 2025-10-18T13:09Z 5954 followers, XXX engagements

"Deleted my previous response because it wasn't entirely accurate. The minimum viable architecture that realizes this update relation is linear attention as far as I can tell. Having said that yes under the paper's definitions you get low-rank block-local "fast-weight updates". Two things: - Increasing sequence length mainly adds more low-rank directions not "steps". If "inference time compute = more tokens" you are not actually running SGD w.r.t. to W_t-1 but W_0. - The important part would be to show that this ICL-loss actually manifests if you aren't synthetically inducing it by doing mse"
X Link @mike64_t 2025-10-18T10:17Z 5954 followers, 2887 engagements

"Perks of writing your training code by rolling your own tensorlib: XXX fps on a 4090 @ batch size XX pytorch impl reaches 1000 fps @ batch size XXX on a 8xH100 node And yet you can use Triton and Cutlass while shipping a X MiB executable while still supporting AMD"
X Link @mike64_t 2025-10-19T12:52Z 5955 followers, 27.1K engagements

"@3thanPetersen Genie X is still strictly within what I'd define as a "walking simulator". rigid body/soft body dynamics factorio factories minecraft redstone etc. is what I mean. Simple rules with emergent properties that are exploitable via design"
X Link @mike64_t 2025-10-13T22:39Z 5831 followers, XXX engagements

"Its true that the compounding non linear effect is lost. The ICL sgd could be argued to be caused by meta optimization but I think that evidence that this actually happens is shaky at best. It just happens to be expressible but that doesnt mean its the path of least resistance actually taken by the optimizer"
X Link @mike64_t 2025-10-19T14:18Z 5954 followers, XX engagements

"just so you guys know the bottleneck for getting data out of Minecraft is literally FFmpeg and I can guarantee this model is both slower and worse than actually being ingame. The timer speed can be increased frame capture can happen in game at scaled rate and the thing that will make your system come to a crawl is FFmpeg and there's nothing you can do about it. You can't encode video at 1200 fps with current technology unless you're recording at like 240p or you have a 20PB SSD somewhere and decide to use avi"
X Link @mike64_t 2025-10-08T01:39Z 5956 followers, 108.3K engagements

"I'm fairly convinced RL will not get us to end-to-end implementation of huge projects. Codex still has zero smell of "anticipating the future". We will likely have to revisit pre-training on long-running agentic data before attempting RL again. And even if we do "get there" through RL whenever natively agentic models get their dose of RL they will make them utterly obsolete"
X Link @mike64_t 2025-10-13T17:48Z 5954 followers, 117.3K engagements

"W.r.t world models it would be unwise to expect anything more than a fancy walking simulator with good graphics. A world model which cannot properly simulate interaction dynamics and complex emergent properties from simple rules is insufficient for exhaustive exploration. It was never about the graphics it was always about systems. And I think classical Physics Engines are close to the minimum program representation possible. It only gets slower and less consistent if you use neural techniques. There is no Free Lunch for Parallelism and Physics is inherently serial. You cannot expect a"
X Link @mike64_t 2025-10-13T18:59Z 5952 followers, 13.3K engagements

"I think the notion that "Transformers do Gradient Descent in Context" is quite misleading. The least charitable interpretation of the statement is ofc. "does the forward pass contain its own backward pass" which is of course non-sensical and thus obviously false. It's of course very possible to expect state to be "advanced" w.r.t to new information and for simple problems where the architecture happens to express associative state updates but this is all heavily coincidental on input representations. What I would care a lot more about is non-linear updates to the context state i.e. a"
X Link @mike64_t 2025-10-18T01:31Z 5954 followers, 12.5K engagements

"The intended meaning is of course a bit more subtle. The notion that there is some weaker sub-optimization process that emerges from the outer loop is plausible to me--but that has nothing to do with SGD"
X Link @mike64_t 2025-10-18T01:48Z 5954 followers, 1324 engagements

"Unpopular opinion: the green button is good because it forces you to think about how you will deploy your application. If you actually have to reason about a run configuration as a primitive your future self will thank you as soon as you deploy that thing"
X Link @mike64_t 2025-10-20T01:42Z 5954 followers, 2705 engagements

@mike64_t
/creator/twitter::mike64_t