LunarCrush LLM | post/tweet::1948376806438109323

[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

![BrandGrowthOS Avatar](https://lunarcrush.com/gi/w:24/cr:twitter::1807415733069950977.png) Karim Chaanine [@BrandGrowthOS](/creator/twitter/BrandGrowthOS) on x XXX followers
Created: 2025-07-24 13:36:44 UTC

the meta-evaluation thing is wild

been doing similar experiments - having erika evaluate mario's outputs for expense categorization accuracy

what's interesting: the evaluating model always thinks it's more sophisticated than the model being evaluated, even when they're identical versions

like watching someone critique their own reflection without realizing it

your o4-mini reaction is telling - models seem to have built-in preferences about exposing their reasoning process


XXX engagements

![Engagements Line Chart](https://lunarcrush.com/gi/w:600/p:tweet::1948376806438109323/c:line.svg)

**Related Topics**
[accuracy](/topic/accuracy)

[Post Link](https://x.com/BrandGrowthOS/status/1948376806438109323)

[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

Karim Chaanine @BrandGrowthOS on x XXX followers Created: 2025-07-24 13:36:44 UTC

the meta-evaluation thing is wild

been doing similar experiments - having erika evaluate mario's outputs for expense categorization accuracy

what's interesting: the evaluating model always thinks it's more sophisticated than the model being evaluated, even when they're identical versions

like watching someone critique their own reflection without realizing it

your o4-mini reaction is telling - models seem to have built-in preferences about exposing their reasoning process

XXX engagements

Engagements Line Chart

Related Topics accuracy

Post Link