LunarCrush LLM | creator/twitter::16217909/posts

[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

[@tambetm](/creator/twitter/tambetm)
"@karpathy Potential alternative paradigm for neural networks - do gradient descent at inference time. Instead of y=f(x) its f(x)=argmin_y L(xy). Here the loss itself is learned (possibly during sleep). Basically its energy based models that Yann LeCun has been promoting all the time"  
[X Link](https://x.com/tambetm/status/1979766621519372349) [@tambetm](/creator/x/tambetm) 2025-10-19T04:28Z XXX followers, XXX engagements


"@karpathy Still the caveat of advetsarial examples applies - as long as the L(xy) or Q(sa) is neural network you can find adversarial example that minimizes or maximizes it"  
[X Link](https://x.com/tambetm/status/1979771860666786108) [@tambetm](/creator/x/tambetm) 2025-10-19T04:49Z XXX followers, XX engagements


"@karpathy It is often overlooked that the key component of RL is credit assignment and that is usually implemented as naive exponential discounting. We need more sophisticated possibly learned credit assignment. Updating system prompt is one example of that but limited to LLMs"  
[X Link](https://x.com/tambetm/status/1979772943870906492) [@tambetm](/creator/x/tambetm) 2025-10-19T04:53Z XXX followers, XX engagements


"@karpathy Alternative (and maybe more logical) formulation for RL: pi(s)=argmax_a Q(sa). Here Q(sa) can be learned off-policy possibly by observing others"  
[X Link](https://x.com/tambetm/status/1979771733516402710) [@tambetm](/creator/x/tambetm) 2025-10-19T04:48Z XXX followers, XX engagements


"@karpathy Maybe slightly better formalism is to also learn guess function yhat = g(x) that provides the starting point for gradient descent. It is common practice to provide coarse-grained first guess and fine-tune it with search e.g. gradient descent"  
[X Link](https://x.com/tambetm/status/1979945591850377446) [@tambetm](/creator/x/tambetm) 2025-10-19T16:19Z XXX followers, XX engagements


"@karpathy Why it might be a good idea: - STDP that implements gradient descent in the brain also works at inference time - loss function might be easier to express than the constructive function similar to how verifying the solution is easier than constructing it with NP-complete problems"  
[X Link](https://x.com/tambetm/status/1979768198892601484) [@tambetm](/creator/x/tambetm) 2025-10-19T04:34Z XXX followers, XX engagements


"@karpathy The context of the thread was the claim in the interview that sometimes in-context learning performs gradient descent. Then why not make gradient the first-class citizen of inference then And then you naturally arrive at the energy-based models"  
[X Link](https://x.com/tambetm/status/1979943973855637818) [@tambetm](/creator/x/tambetm) 2025-10-19T16:13Z XXX followers, XX engagements


"@karpathy For example: - evolution provides starting point for brain connectivity and life experience takes it from there - in AlphaGo policy provides probable moves but MCTS chooses the best move - network predicts initial trajectory for self-driving car but optimization ensures safety"  
[X Link](https://x.com/tambetm/status/1979947030597582965) [@tambetm](/creator/x/tambetm) 2025-10-19T16:25Z XXX followers, XX engagements

[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

@tambetm "@karpathy Potential alternative paradigm for neural networks - do gradient descent at inference time. Instead of y=f(x) its f(x)=argmin_y L(xy). Here the loss itself is learned (possibly during sleep). Basically its energy based models that Yann LeCun has been promoting all the time"
X Link @tambetm 2025-10-19T04:28Z XXX followers, XXX engagements

"@karpathy Still the caveat of advetsarial examples applies - as long as the L(xy) or Q(sa) is neural network you can find adversarial example that minimizes or maximizes it"
X Link @tambetm 2025-10-19T04:49Z XXX followers, XX engagements

"@karpathy It is often overlooked that the key component of RL is credit assignment and that is usually implemented as naive exponential discounting. We need more sophisticated possibly learned credit assignment. Updating system prompt is one example of that but limited to LLMs"
X Link @tambetm 2025-10-19T04:53Z XXX followers, XX engagements

"@karpathy Alternative (and maybe more logical) formulation for RL: pi(s)=argmax_a Q(sa). Here Q(sa) can be learned off-policy possibly by observing others"
X Link @tambetm 2025-10-19T04:48Z XXX followers, XX engagements

"@karpathy Maybe slightly better formalism is to also learn guess function yhat = g(x) that provides the starting point for gradient descent. It is common practice to provide coarse-grained first guess and fine-tune it with search e.g. gradient descent"
X Link @tambetm 2025-10-19T16:19Z XXX followers, XX engagements

"@karpathy Why it might be a good idea: - STDP that implements gradient descent in the brain also works at inference time - loss function might be easier to express than the constructive function similar to how verifying the solution is easier than constructing it with NP-complete problems"
X Link @tambetm 2025-10-19T04:34Z XXX followers, XX engagements

"@karpathy The context of the thread was the claim in the interview that sometimes in-context learning performs gradient descent. Then why not make gradient the first-class citizen of inference then And then you naturally arrive at the energy-based models"
X Link @tambetm 2025-10-19T16:13Z XXX followers, XX engagements

"@karpathy For example: - evolution provides starting point for brain connectivity and life experience takes it from there - in AlphaGo policy provides probable moves but MCTS chooses the best move - network predicts initial trajectory for self-driving car but optimization ensures safety"
X Link @tambetm 2025-10-19T16:25Z XXX followers, XX engagements