[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]  Karim Chaanine [@BrandGrowthOS](/creator/twitter/BrandGrowthOS) on x XXX followers Created: 2025-07-24 13:36:44 UTC the meta-evaluation thing is wild been doing similar experiments - having erika evaluate mario's outputs for expense categorization accuracy what's interesting: the evaluating model always thinks it's more sophisticated than the model being evaluated, even when they're identical versions like watching someone critique their own reflection without realizing it your o4-mini reaction is telling - models seem to have built-in preferences about exposing their reasoning process XXX engagements  **Related Topics** [accuracy](/topic/accuracy) [Post Link](https://x.com/BrandGrowthOS/status/1948376806438109323)
[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]
Karim Chaanine @BrandGrowthOS on x XXX followers
Created: 2025-07-24 13:36:44 UTC
the meta-evaluation thing is wild
been doing similar experiments - having erika evaluate mario's outputs for expense categorization accuracy
what's interesting: the evaluating model always thinks it's more sophisticated than the model being evaluated, even when they're identical versions
like watching someone critique their own reflection without realizing it
your o4-mini reaction is telling - models seem to have built-in preferences about exposing their reasoning process
XXX engagements
Related Topics accuracy
/post/tweet::1948376806438109323