Dark | Light
[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

[@jesse_hoogland](/creator/twitter/jesse_hoogland)
"How does training data shape model behavior Well its complicated 1/10"  
[X Link](https://x.com/jesse_hoogland/status/1981034189282742780) [@jesse_hoogland](/creator/x/jesse_hoogland) 2025-10-22T16:25Z 1857 followers, 92.7K engagements


"Unfortunately classical influence functions are fundamentally limited: X. Theoretically IFs assume a unique isolated global minimum. This is never true for NNs. X. Practically the Hessian dependency poses a severe memory bottleneck that explodes with model size. 3/10"  
[X Link](https://x.com/jesse_hoogland/status/1981034195163185376) [@jesse_hoogland](/creator/x/jesse_hoogland) 2025-10-22T16:25Z 1857 followers, 3946 engagements


"Still theres hope. The Bayesian influence function (BIF) addresses both issues. The idea: X. Study influence not on a single minimum but on the distribution of low-loss solutions. X. Skip the Hessian inversion. Compute covariance over this distribution. 4/10"  
[X Link](https://x.com/jesse_hoogland/status/1981034198191513978) [@jesse_hoogland](/creator/x/jesse_hoogland) 2025-10-22T16:25Z 1857 followers, 3573 engagements


"In our new paper we solve both problems by introducing: X. A local version of the BIF that applies to individual NN checkpoints. X. A scalable stochastic-gradient MCMC estimator. 6/10"  
[X Link](https://x.com/jesse_hoogland/status/1981034202712994038) [@jesse_hoogland](/creator/x/jesse_hoogland) 2025-10-22T16:25Z 1857 followers, 2571 engagements


"To validate the BIF we test it on a standard retraining benchmark via the Linear Datamodeling Score (LDS). We find that it is competitive with leading IF-based approximations especially in the small-dataset regime. 8/10"  
[X Link](https://x.com/jesse_hoogland/status/1981034208601751825) [@jesse_hoogland](/creator/x/jesse_hoogland) 2025-10-22T16:25Z 1855 followers, 2139 engagements


"At first glance this looks like a step backwards: computing a covariance over the full Bayesian posterior is much more intractable than computing the Hessian And we typically care about influence for a specific checkpoint not aggregated over all possible solutions. 5/10"  
[X Link](https://x.com/jesse_hoogland/status/1981034200720675197) [@jesse_hoogland](/creator/x/jesse_hoogland) 2025-10-22T16:25Z 1856 followers, 2761 engagements


"The local BIF bypasses the Hessian bottleneck and is well-defined even for degenerate models. It can be batched and scales to billions of parameters. One of the best perks is that we get fine-grained per-token influence functions for no extra compute cost. 7/10"  
[X Link](https://x.com/jesse_hoogland/status/1981034205761962398) [@jesse_hoogland](/creator/x/jesse_hoogland) 2025-10-22T16:25Z 1856 followers, 2313 engagements


"There are caveats: the BIF exhibits worse scaling with dataset size were still in the early days of understanding the role of SGMCMC hyperparameters and generally more investigation is needed But we see straightforward ways to make progress on these problems. 9/10"  
[X Link](https://x.com/jesse_hoogland/status/1981034210774155483) [@jesse_hoogland](/creator/x/jesse_hoogland) 2025-10-22T16:25Z 1856 followers, 2303 engagements


"@BhattGantavya @bfpill @FurmanZach Thank you Gantavya We have a library for MCMC sampling which is easy to adapt for the BIF (you just need to compute forward passes over reference samples at each MCMC draw). A native BIF integration is on the roadmap"  
[X Link](https://x.com/jesse_hoogland/status/1981160047129612394) [@jesse_hoogland](/creator/x/jesse_hoogland) 2025-10-23T00:45Z 1854 followers, XX engagements

[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

@jesse_hoogland "How does training data shape model behavior Well its complicated 1/10"
X Link @jesse_hoogland 2025-10-22T16:25Z 1857 followers, 92.7K engagements

"Unfortunately classical influence functions are fundamentally limited: X. Theoretically IFs assume a unique isolated global minimum. This is never true for NNs. X. Practically the Hessian dependency poses a severe memory bottleneck that explodes with model size. 3/10"
X Link @jesse_hoogland 2025-10-22T16:25Z 1857 followers, 3946 engagements

"Still theres hope. The Bayesian influence function (BIF) addresses both issues. The idea: X. Study influence not on a single minimum but on the distribution of low-loss solutions. X. Skip the Hessian inversion. Compute covariance over this distribution. 4/10"
X Link @jesse_hoogland 2025-10-22T16:25Z 1857 followers, 3573 engagements

"In our new paper we solve both problems by introducing: X. A local version of the BIF that applies to individual NN checkpoints. X. A scalable stochastic-gradient MCMC estimator. 6/10"
X Link @jesse_hoogland 2025-10-22T16:25Z 1857 followers, 2571 engagements

"To validate the BIF we test it on a standard retraining benchmark via the Linear Datamodeling Score (LDS). We find that it is competitive with leading IF-based approximations especially in the small-dataset regime. 8/10"
X Link @jesse_hoogland 2025-10-22T16:25Z 1855 followers, 2139 engagements

"At first glance this looks like a step backwards: computing a covariance over the full Bayesian posterior is much more intractable than computing the Hessian And we typically care about influence for a specific checkpoint not aggregated over all possible solutions. 5/10"
X Link @jesse_hoogland 2025-10-22T16:25Z 1856 followers, 2761 engagements

"The local BIF bypasses the Hessian bottleneck and is well-defined even for degenerate models. It can be batched and scales to billions of parameters. One of the best perks is that we get fine-grained per-token influence functions for no extra compute cost. 7/10"
X Link @jesse_hoogland 2025-10-22T16:25Z 1856 followers, 2313 engagements

"There are caveats: the BIF exhibits worse scaling with dataset size were still in the early days of understanding the role of SGMCMC hyperparameters and generally more investigation is needed But we see straightforward ways to make progress on these problems. 9/10"
X Link @jesse_hoogland 2025-10-22T16:25Z 1856 followers, 2303 engagements

"@BhattGantavya @bfpill @FurmanZach Thank you Gantavya We have a library for MCMC sampling which is easy to adapt for the BIF (you just need to compute forward passes over reference samples at each MCMC draw). A native BIF integration is on the roadmap"
X Link @jesse_hoogland 2025-10-23T00:45Z 1854 followers, XX engagements

creator/twitter::1534576666084786178/posts
/creator/twitter::1534576666084786178/posts