Dark | Light
[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

![omarsar0 Avatar](https://lunarcrush.com/gi/w:24/cr:twitter::3448284313.png) elvis [@omarsar0](/creator/twitter/omarsar0) on x 254.7K followers
Created: 2025-07-14 15:17:11 UTC

"Master keys" break LLM judges

Simple, generic lead-ins (e.g., “Let’s solve this step by step”) and even punctuation marks can elicit false YES judgments from top reward models.

This manipulation works across models (GPT-4o, Claude-4, Qwen2.5, etc.), tasks (math and general reasoning), and prompt formats, reaching up to XX% false positive rates in some cases.

![](https://pbs.twimg.com/media/Gv09eOtbwAAI6ge.jpg)

XXXXX engagements

![Engagements Line Chart](https://lunarcrush.com/gi/w:600/p:tweet::1944778206504231201/c:line.svg)

**Related Topics**
[manipulation](/topic/manipulation)
[marks](/topic/marks)
[llm](/topic/llm)
[elvis](/topic/elvis)

[Post Link](https://x.com/omarsar0/status/1944778206504231201)

[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

omarsar0 Avatar elvis @omarsar0 on x 254.7K followers Created: 2025-07-14 15:17:11 UTC

"Master keys" break LLM judges

Simple, generic lead-ins (e.g., “Let’s solve this step by step”) and even punctuation marks can elicit false YES judgments from top reward models.

This manipulation works across models (GPT-4o, Claude-4, Qwen2.5, etc.), tasks (math and general reasoning), and prompt formats, reaching up to XX% false positive rates in some cases.

XXXXX engagements

Engagements Line Chart

Related Topics manipulation marks llm elvis

Post Link

post/tweet::1944778206504231201
/post/tweet::1944778206504231201