LunarCrush LLM | post/tweet::1919389344617414824

[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

![scaling01 Avatar](https://lunarcrush.com/gi/w:24/cr:twitter::1825243643529027584.png) Lisan al Gaib [@scaling01](/creator/twitter/scaling01) on x 17.5K followers
Created: 2025-05-05 13:50:54 UTC

I'm back and Gemini XXX Pro is still the king (no glaze)

I did some more manual data cleaning and scrapped the shitty "average scaled score" and replaced it with Glicko-2 rating system with params:
INITIAL_RATING = 1500
INITIAL_RD     = XXX
INITIAL_VOL    = XXXX
TAU (τ)        = XXX

Furthermore I increased the minimum number of appearances from X to XX benchmarks to make it more stable.

The labels show the lower XX% ratings (a conservative lower skill estimate) and in brackets the number of benchmarks the model appeared in.
Below this post I attached the full table with mu, sigma, lower XX% ratings and number of appearances.

![](https://pbs.twimg.com/media/GqMDTNOW0AAhvaa.png)

XXXXXX engagements

![Engagements Line Chart](https://lunarcrush.com/gi/w:600/p:tweet::1919389344617414824/c:line.svg)

**Related Topics**
[llm](/topic/llm)
[tau](/topic/tau)
[cleaning](/topic/cleaning)
[gaib](/topic/gaib)

[Post Link](https://x.com/scaling01/status/1919389344617414824)

[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

Lisan al Gaib @scaling01 on x 17.5K followers Created: 2025-05-05 13:50:54 UTC

I'm back and Gemini XXX Pro is still the king (no glaze)

I did some more manual data cleaning and scrapped the shitty "average scaled score" and replaced it with Glicko-2 rating system with params: INITIAL_RATING = 1500 INITIAL_RD = XXX INITIAL_VOL = XXXX TAU (τ) = XXX

Furthermore I increased the minimum number of appearances from X to XX benchmarks to make it more stable.

The labels show the lower XX% ratings (a conservative lower skill estimate) and in brackets the number of benchmarks the model appeared in. Below this post I attached the full table with mu, sigma, lower XX% ratings and number of appearances.

XXXXXX engagements

Engagements Line Chart

Related Topics llm tau cleaning gaib

Post Link