Dark | Light
# ![@Silver_Raspberry_811 Avatar](https://lunarcrush.com/gi/w:26/cr:reddit::t2_d5mdd90q.png) @Silver_Raspberry_811 Silver_Raspberry_811

Silver_Raspberry_811 posts on Reddit about anthropic, ai, multivac, agi the most. They currently have [--] followers and [--] posts still getting attention that total [--] engagements in the last [--] hours.

### Engagements: [--] [#](/creator/reddit::t2_d5mdd90q/interactions)
![Engagements Line Chart](https://lunarcrush.com/gi/w:600/cr:reddit::t2_d5mdd90q/c:line/m:interactions.svg)

- [--] Week [-----] -13%
- [--] Month [-----] +10,008%
- [--] Months [-----] +47,862%

### Mentions: [--] [#](/creator/reddit::t2_d5mdd90q/posts_active)
![Mentions Line Chart](https://lunarcrush.com/gi/w:600/cr:reddit::t2_d5mdd90q/c:line/m:posts_active.svg)

- [--] Month [--] +714%
- [--] Months [--] +3,350%

### Followers: [--] [#](/creator/reddit::t2_d5mdd90q/followers)
![Followers Line Chart](https://lunarcrush.com/gi/w:600/cr:reddit::t2_d5mdd90q/c:line/m:followers.svg)


### CreatorRank: undefined [#](/creator/reddit::t2_d5mdd90q/influencer_rank)
![CreatorRank Line Chart](https://lunarcrush.com/gi/w:600/cr:reddit::t2_d5mdd90q/c:line/m:influencer_rank.svg)

### Social Influence

**Social category influence**
[technology brands](/list/technology-brands)  [cryptocurrencies](/list/cryptocurrencies)  [stocks](/list/stocks) 

**Social topic influence**
[anthropic](/topic/anthropic), [ai](/topic/ai), [multivac](/topic/multivac) #8, [agi](/topic/agi), [llm](/topic/llm), [media](/topic/media), [eval](/topic/eval), [closed](/topic/closed), [demo](/topic/demo), [beat](/topic/beat)

**Top assets mentioned**
[MultiVAC (MTV)](/topic/multivac)
### Top Social Posts
Top posts by engagements in the last [--] hours

"Neural Quantization Toolkit 🚀 Excited to share: Neural Quantization Toolkit - achieving 2% performance degradation with [--] compression 📊 Results Preview: [---] compression ratio (target: 3.5) ✅ 1.8% avg degradation (target: 2%) ✅ [---] inference speedup ✅ [--] languages validated (86.7% success rate) 🌍 Currently a research preview with working demo - full implementation coming Q1 [----]. 🤝 Seeking collaborators for: - GPTQ core implementation - Marlin kernel optimization - Cross-lingual evaluation - Edge deployment tools Try the demo & join the mission to democratize efficient AI #AI"  
[Reddit Link](https://redd.it/1mu3k05)  2025-08-19T00:28Z [--] followers, [--] engagements


"Quantization breakthrough: 4x compression with 2% performance loss - looking for testers removed LocalLLaMA LocalLLaMA"  
[Reddit Link](https://redd.it/1mu3q2c)  2025-08-19T00:36Z [--] followers, [--] engagements


"Open source model quantization library achieving 4x compression - looking for contributors removed programming programming"  
[Reddit Link](https://redd.it/1mu3ump)  2025-08-19T00:42Z [--] followers, [--] engagements


"The Evolution of Gaussian Splatting: From 3D to 5D - What's Your Take on Its Impact Across Fields Just watched the excellent "3D Gaussian Splatting Past Present and Future" lecture by George from TUM and it got me thinking about the broader trajectory of this technique. Quick primer from first principles: Gaussian Splatting fundamentally reimagines 3D representation by using anisotropic 3D Gaussians as primitives instead of meshes or voxels. Each Gaussian is defined by position () covariance () opacity () and spherical harmonics coefficients for view-dependent color. The key insight is that"  
[Reddit Link](https://redd.it/1mzodse)  2025-08-25T11:53Z [--] followers, [--] engagements


"We tested [--] frontier models on a production coding task the scores weren't the interesting part. The 5-point judge disagreement was. LLM LLM"  
[Reddit Link](https://redd.it/1qhr2dz)  2026-01-20T04:19Z [--] followers, [--] engagements


"Claude Opus [---] takes 4th in media bias analysishere's what it did differently ClaudeAI ClaudeAI"  
[Reddit Link](https://redd.it/1qpsq41)  2026-01-29T00:15Z [--] followers, [--] engagements


"What happens when you fine-tune for law and then test on media analysis Blind peer eval results OpenSourceeAI OpenSourceeAI"  
[Reddit Link](https://redd.it/1qpsqxm)  2026-01-29T00:12Z [--] followers, [--] engagements


"'ve run 1100+ blind evaluations across 20+ modelshere's where Claude actually excels (and where it doesn't) Anthropic Anthropic"  
[Reddit Link](https://redd.it/1qpuaxl)  2026-01-29T04:57Z [--] followers, [--] engagements


"Claude Opus [---] wins instruction following test at 7.42/10 but still failed the lipogram Anthropic Anthropic"  
[Reddit Link](https://redd.it/1qpub7p)  2026-01-29T04:57Z [--] followers, [--] engagements


"I built an AI fall detection system for elderly care - looking for feedback computervision computervision"  
[Reddit Link](https://redd.it/1oh9eaa)  2025-10-27T09:00Z [--] followers, [---] engagements


"Removed by moderator ChatGPTPro ChatGPTPro"  
[Reddit Link](https://redd.it/1qcxpn9)  2026-01-14T20:07Z [--] followers, [--] engagements


"Claude Opus [---] won both coding and reasoning evals this week and was also the strictest judge ClaudeAI ClaudeAI"  
[Reddit Link](https://redd.it/1qcxsqk)  2026-01-14T20:09Z [--] followers, [--] engagements


"D Peer matrix evaluation: [--] frontier models judge each other's responses to eliminate single-evaluator bias. Results from async debugging and probability reasoning tasks. MachineLearning MachineLearning"  
[Reddit Link](https://redd.it/1qcxytb)  2026-01-14T20:13Z [--] followers, [--] engagements


"Built a peer evaluation system where [--] LLMs judge each other (100 judgments/question). Early data shows 2-point spread in judge harshness. Looking for technical feedback. LLMDevs LLMDevs"  
[Reddit Link](https://redd.it/1qcyhlp)  2026-01-14T20:33Z [--] followers, [--] engagements


"Can this peer evaluation methodology work with local models Testing [--] frontier APIs now want to adapt for local deployment. LocalLLM LocalLLM"  
[Reddit Link](https://redd.it/1qcyiox)  2026-01-14T20:35Z [--] followers, [--] engagements


"Can this peer evaluation methodology work with local models Testing [--] frontier APIs now want to adapt for local deployment. FunMachineLearning FunMachineLearning"  
[Reddit Link](https://redd.it/1qd3j58)  2026-01-14T23:55Z [--] followers, [--] engagements


"Removed by moderator OpenSourceeAI OpenSourceeAI"  
[Reddit Link](https://redd.it/1qetkxz)  2026-01-16T22:02Z [--] followers, [--] engagements


"Removed by moderator BlackboxAI_ BlackboxAI_"  
[Reddit Link](https://redd.it/1qfo4om)  2026-01-17T20:40Z [--] followers, [--] engagements


"We tested [--] frontier models on a production coding task the scores weren't the interesting part. The 5-point judge disagreement was. BlackboxAI_ BlackboxAI_"  
[Reddit Link](https://redd.it/1qhr068)  2026-01-20T04:21Z [--] followers, [--] engagements


"Benchmark analysis: GPT-OSS-120B's judgment behavior vs performance interesting patterns ChatGPTPro ChatGPTPro"  
[Reddit Link](https://redd.it/1qkcpow)  2026-01-23T01:11Z [--] followers, [--] engagements


"'ve run 1100+ blind evaluations across 20+ modelshere's where Claude actually excels (and where it doesn't) ClaudeAI ClaudeAI"  
[Reddit Link](https://redd.it/1qpn4jj)  2026-01-28T20:36Z [--] followers, [--] engagements


"Claude Opus [---] ranks #4 in epistemic calibration here's exactly what it said Anthropic Anthropic"  
[Reddit Link](https://redd.it/1qpubfn)  2026-01-29T04:57Z [--] followers, [--] engagements


"We tested [--] frontier models on a production coding task the scores weren't the interesting part. The 5-point judge disagreement was. Anthropic Anthropic"  
[Reddit Link](https://redd.it/1qpucnk)  2026-01-29T02:20Z [--] followers, [--] engagements


"Olmo [---] 32B Think beats Claude Opus [---] Sonnet [---] Grok [--] DeepSeek V3.2 on constraint satisfaction reasoning LocalLLaMA LocalLLaMA"  
[Reddit Link](https://redd.it/1qisu0u)  2026-01-21T08:51Z [--] followers, [--] engagements


"Mistral Small Creative takes #1 in communication benchmark beats Claude Opus [---] and proprietary giants OpenSourceeAI OpenSourceeAI"  
[Reddit Link](https://redd.it/1qkcwdi)  2026-01-23T01:17Z [--] followers, [--] engagements


"Epistemic calibration benchmark full judgment matrix + DeepSeek/MiMo raw responses LocalLLaMA LocalLLaMA"  
[Reddit Link](https://redd.it/1qlayyw)  2026-01-24T02:40Z [--] followers, [--] engagements


"Media bias analysis: legal-trained open model beats Claude and Gemini in blind peer eval LocalLLM LocalLLM"  
[Reddit Link](https://redd.it/1qpslr2)  2026-01-29T00:06Z [--] followers, [--] engagements


"Media bias analysis: legal-trained open model beats Claude and Gemini in blind peer eval LocalLLaMA LocalLLaMA"  
[Reddit Link](https://redd.it/1qpsp4n)  2026-01-29T00:11Z [--] followers, [--] engagements


"Claude Opus [---] takes 4th in media bias analysishere's what it did differently Anthropic Anthropic"  
[Reddit Link](https://redd.it/1qpuap3)  2026-01-29T04:57Z [--] followers, [--] engagements


"Mistral Small Creative just beat Claude Opus [---] Sonnet [---] and GPT-OSS-120B on practical communication tasks Anthropic Anthropic"  
[Reddit Link](https://redd.it/1qpubtm)  2026-01-29T01:54Z [--] followers, [--] engagements


"Analysis of Claude Opus [---] and Sonnet [---] performance on today's ML data quality evaluation what the responses reveal Anthropic Anthropic"  
[Reddit Link](https://redd.it/1qpuc2s)  2026-01-29T01:48Z [--] followers, [--] engagements


"Claude Opus [---] and Sonnet [---] underperformed on today's reasoning evaluation thoughts on what happened Anthropic Anthropic"  
[Reddit Link](https://redd.it/1qpucaw)  2026-01-29T01:35Z [--] followers, [--] engagements


"I made [--] frontier LLMs judge each other's code debugging Claude Opus [---] won by [----] points over o1 GPT-4o came 9th LocalLLaMA LocalLLaMA"  
[Reddit Link](https://redd.it/1qcxib4)  2026-01-14T19:58Z [--] followers, [--] engagements


"Mistral Small Creative beats Claude Opus [---] at explaining transformers 50x cheaper higher scores Multivac Daily Evaluation LocalLLaMA LocalLLaMA"  
[Reddit Link](https://redd.it/1qetjbv)  2026-01-16T22:02Z [--] followers, [--] engagements


"Mistral Small Creative beats Claude Opus [---] at explaining transformers 50x cheaper higher scores Multivac Daily Evaluation BlackboxAI_ BlackboxAI_"  
[Reddit Link](https://redd.it/1qetk75)  2026-01-16T22:02Z [--] followers, [--] engagements


"Mistral Small Creative beats Claude Opus [---] at explaining transformers 50x cheaper higher scores Multivac Daily Evaluation LLM LLM"  
[Reddit Link](https://redd.it/1qetmr3)  2026-01-16T22:05Z [--] followers, [--] engagements


"We gave [--] frontier models a trick question. The honest ones scored lowest. Here's what that means for AI evaluation. Multivac Daily LLM LLM"  
[Reddit Link](https://redd.it/1qetnmi)  2026-01-16T22:07Z [--] followers, [--] engagements


"We gave [--] frontier models a trick question. The honest ones scored lowest. Here's what that means for AI evaluation. Multivac Daily BlackboxAI_ BlackboxAI_"  
[Reddit Link](https://redd.it/1qetoez)  2026-01-16T22:07Z [--] followers, [--] engagements


"We gave [--] frontier models a trick question. The honest ones scored lowest. Here's what that means for AI evaluation. Multivac Daily LocalLLaMA LocalLLaMA"  
[Reddit Link](https://redd.it/1qetpiv)  2026-01-16T22:09Z [--] followers, [--] engagements


"We tested [--] AI models on epistemic honesty can they correct you when you're wrong LocalLLaMA LocalLLaMA"  
[Reddit Link](https://redd.it/1qfo41f)  2026-01-17T20:37Z [--] followers, [--] engagements


"We tested [--] AI models on epistemic honesty can they correct you when you're wrong OpenSourceeAI OpenSourceeAI"  
[Reddit Link](https://redd.it/1qfo5nu)  2026-01-17T20:43Z [--] followers, [--] engagements


"We tested [--] frontier models on a production coding task the scores weren't the interesting part. The 5-point judge disagreement was. agi agi"  
[Reddit Link](https://redd.it/1qhqvpy)  2026-01-20T04:14Z [--] followers, [--] engagements


"We tested [--] frontier models on a production coding task the scores weren't the interesting part. The 5-point judge disagreement was. ClaudeAI ClaudeAI"  
[Reddit Link](https://redd.it/1qhr1e4)  2026-01-20T04:29Z [--] followers, [--] engagements


"We tested [--] frontier models on a production coding task the scores weren't the interesting part. The 5-point judge disagreement was. OpenSourceeAI OpenSourceeAI"  
[Reddit Link](https://redd.it/1qhr3yy)  2026-01-20T04:24Z [--] followers, [--] engagements


"Peer evaluation results: Reasoning capabilities across [--] frontier models open source closing the gap agi agi"  
[Reddit Link](https://redd.it/1qisust)  2026-01-21T08:52Z [--] followers, [--] engagements


"Open source wins: Olmo [---] 32B outperforms Claude Opus [---] Sonnet [---] Grok [--] on reasoning evaluation OpenSourceeAI OpenSourceeAI"  
[Reddit Link](https://redd.it/1qisvcd)  2026-01-21T08:56Z [--] followers, [---] engagements


"Claude Opus [---] and Sonnet [---] underperformed on today's reasoning evaluation thoughts on what happened ClaudeAI ClaudeAI"  
[Reddit Link](https://redd.it/1qisw7h)  2026-01-21T09:07Z [--] followers, [---] engagements


"Olmo [---] 32B Think second place on hard reasoning beating proprietary flagships LocalLLM LocalLLM"  
[Reddit Link](https://redd.it/1qiswy6)  2026-01-21T08:58Z [--] followers, [---] engagements


"GPT-OSS-120B takes 1st AND 4th on ML data quality analysis beating Claude Gemini Grok LocalLLM LocalLLM"  
[Reddit Link](https://redd.it/1qjjg7c)  2026-01-22T03:29Z [--] followers, [--] engagements


"Same model opposite results: Why task-specific evaluation matters for understanding AI capabilities agi agi"  
[Reddit Link](https://redd.it/1qjjh90)  2026-01-22T03:29Z [--] followers, [--] engagements


"Open source dominates: GPT-OSS-120B takes 1st AND 4th place on practical ML analysis beating all proprietary flagships OpenSourceeAI OpenSourceeAI"  
[Reddit Link](https://redd.it/1qjjiji)  2026-01-22T03:30Z [--] followers, [--] engagements


"Analysis of Claude Opus [---] and Sonnet [---] performance on today's ML data quality evaluation what the responses reveal ClaudeAI ClaudeAI"  
[Reddit Link](https://redd.it/1qjjk3e)  2026-01-22T03:35Z [--] followers, [--] engagements


"GPT-OSS-120B takes 1st AND 4th on ML data quality analysis beating Claude Gemini Grok LocalLLaMA LocalLLaMA"  
[Reddit Link](https://redd.it/1qjjliq)  2026-01-22T03:31Z [--] followers, [--] engagements


"GPT-OSS-120B wins ML data quality analysis full rankings methodology and what made the difference LocalLLM LocalLLM"  
[Reddit Link](https://redd.it/1qjjmlg)  2026-01-22T03:36Z [--] followers, [--] engagements


"Mistral Small Creative just beat Claude Opus [---] Sonnet [---] and GPT-OSS-120B on practical communication tasks LocalLLaMA LocalLLaMA"  
[Reddit Link](https://redd.it/1qkckmc)  2026-01-23T01:09Z [--] followers, [---] engagements


"Claude Sonnet [---] placed 2nd Opus [---] placed 4th in today's communication writing benchmark detailed analysis ClaudeAI ClaudeAI"  
[Reddit Link](https://redd.it/1qkcono)  2026-01-23T01:07Z [--] followers, [--] engagements


"Small model wins: Mistral Small Creative beats Claude Opus [---] and GPT-OSS-120B at writing crisis comms LocalLLM LocalLLM"  
[Reddit Link](https://redd.it/1qkcvsa)  2026-01-23T01:32Z [--] followers, [--] engagements


"GPT-OSS-120B takes #2 in epistemic calibration test + full judgment matrix available OpenSourceeAI OpenSourceeAI"  
[Reddit Link](https://redd.it/1qlauw6)  2026-01-24T02:40Z [--] followers, [--] engagements


"Claude Opus [---] ranks #4 in epistemic calibration here's exactly what it said ClaudeAI ClaudeAI"  
[Reddit Link](https://redd.it/1qlawo1)  2026-01-24T02:42Z [--] followers, [--] engagements


"Daily AI model comparison: epistemic calibration + raw judgment data LocalLLM LocalLLM"  
[Reddit Link](https://redd.it/1qlb0xp)  2026-01-24T02:47Z [--] followers, [--] engagements


"Instruction following benchmark: [--] constraints every model failed something DeepSeek at [----] raw responses included LocalLLaMA LocalLLaMA"  
[Reddit Link](https://redd.it/1qm3iya)  2026-01-25T00:15Z [--] followers, [--] engagements


"Every model failed this instruction following test winner scored 7.42/10 LocalLLM LocalLLM"  
[Reddit Link](https://redd.it/1qm3kcn)  2026-01-25T00:15Z [--] followers, [--] engagements


"GPT-OSS-120B takes 2nd in instruction following test but everyone failed something OpenSourceeAI OpenSourceeAI"  
[Reddit Link](https://redd.it/1qm3m5i)  2026-01-25T00:20Z [--] followers, [--] engagements


"Claude Opus [---] wins instruction following test at 7.42/10 but still failed the lipogram ClaudeAI ClaudeAI"  
[Reddit Link](https://redd.it/1qm3pkz)  2026-01-25T00:19Z [--] followers, [--] engagements


"Instruction following under conflicting constraints every frontier model failed something agi agi"  
[Reddit Link](https://redd.it/1qm3r85)  2026-01-25T00:25Z [--] followers, [--] engagements


"3 days of blind peer evaluations: DeepSeek V3.2 beats closed models on code parsingfull [----] matrix results LocalLLaMA LocalLLaMA"  
[Reddit Link](https://redd.it/1qpn234)  2026-01-28T20:34Z [--] followers, [--] engagements


"33 days of blind peer evaluations: DeepSeek V3.2 beats closed models on code parsingfull [----] matrix results LocalLLM LocalLLM"  
[Reddit Link](https://redd.it/1qpn2wb)  2026-01-28T20:36Z [--] followers, [--] engagements


"DeepSeek V3.2 (open weights) beats GPT-5.2-Codex and Claude Opus on production code challenge The Multivac daily blind peer eval LocalLLaMA LocalLLaMA"  
[Reddit Link](https://redd.it/1qhqrl7)  2026-01-20T04:09Z [--] followers, [---] engagements


"DeepSeek V3.2 (open weights) beats GPT-5.2-Codex and Claude Opus on production code challenge The Multivac daily blind peer eval LocalLLM LocalLLM"  
[Reddit Link](https://redd.it/1qhqu6x)  2026-01-20T04:12Z [--] followers, [---] engagements


"Gemma [--] 27B just mass-murdered the JSON parsing challenge full raw code outputs inside LocalLLaMA LocalLLaMA"  
[Reddit Link](https://redd.it/1qvcthc)  2026-02-04T03:01Z [--] followers, [--] engagements


"10 SLMs tried to write a JSON parser. [--] of them generated zero code. Here's the raw outputs. LocalLLM LocalLLM"  
[Reddit Link](https://redd.it/1qvcug8)  2026-02-04T03:01Z [--] followers, [--] engagements


"Open-weight models dominate JSON parsing benchmark Gemma [--] 27B takes first raw code inside OpenSourceeAI OpenSourceeAI"  
[Reddit Link](https://redd.it/1qvcv7v)  2026-02-04T03:01Z [--] followers, [--] engagements

Limited data mode. Full metrics available with subscription: lunarcrush.com/pricing

@Silver_Raspberry_811 Avatar @Silver_Raspberry_811 Silver_Raspberry_811

Silver_Raspberry_811 posts on Reddit about anthropic, ai, multivac, agi the most. They currently have [--] followers and [--] posts still getting attention that total [--] engagements in the last [--] hours.

Engagements: [--] #

Engagements Line Chart

  • [--] Week [-----] -13%
  • [--] Month [-----] +10,008%
  • [--] Months [-----] +47,862%

Mentions: [--] #

Mentions Line Chart

  • [--] Month [--] +714%
  • [--] Months [--] +3,350%

Followers: [--] #

Followers Line Chart

CreatorRank: undefined #

CreatorRank Line Chart

Social Influence

Social category influence technology brands cryptocurrencies stocks

Social topic influence anthropic, ai, multivac #8, agi, llm, media, eval, closed, demo, beat

Top assets mentioned MultiVAC (MTV)

Top Social Posts

Top posts by engagements in the last [--] hours

"Neural Quantization Toolkit 🚀 Excited to share: Neural Quantization Toolkit - achieving 2% performance degradation with [--] compression 📊 Results Preview: [---] compression ratio (target: 3.5) ✅ 1.8% avg degradation (target: 2%) ✅ [---] inference speedup ✅ [--] languages validated (86.7% success rate) 🌍 Currently a research preview with working demo - full implementation coming Q1 [----]. 🤝 Seeking collaborators for: - GPTQ core implementation - Marlin kernel optimization - Cross-lingual evaluation - Edge deployment tools Try the demo & join the mission to democratize efficient AI #AI"
Reddit Link 2025-08-19T00:28Z [--] followers, [--] engagements

"Quantization breakthrough: 4x compression with 2% performance loss - looking for testers removed LocalLLaMA LocalLLaMA"
Reddit Link 2025-08-19T00:36Z [--] followers, [--] engagements

"Open source model quantization library achieving 4x compression - looking for contributors removed programming programming"
Reddit Link 2025-08-19T00:42Z [--] followers, [--] engagements

"The Evolution of Gaussian Splatting: From 3D to 5D - What's Your Take on Its Impact Across Fields Just watched the excellent "3D Gaussian Splatting Past Present and Future" lecture by George from TUM and it got me thinking about the broader trajectory of this technique. Quick primer from first principles: Gaussian Splatting fundamentally reimagines 3D representation by using anisotropic 3D Gaussians as primitives instead of meshes or voxels. Each Gaussian is defined by position () covariance () opacity () and spherical harmonics coefficients for view-dependent color. The key insight is that"
Reddit Link 2025-08-25T11:53Z [--] followers, [--] engagements

"We tested [--] frontier models on a production coding task the scores weren't the interesting part. The 5-point judge disagreement was. LLM LLM"
Reddit Link 2026-01-20T04:19Z [--] followers, [--] engagements

"Claude Opus [---] takes 4th in media bias analysishere's what it did differently ClaudeAI ClaudeAI"
Reddit Link 2026-01-29T00:15Z [--] followers, [--] engagements

"What happens when you fine-tune for law and then test on media analysis Blind peer eval results OpenSourceeAI OpenSourceeAI"
Reddit Link 2026-01-29T00:12Z [--] followers, [--] engagements

"'ve run 1100+ blind evaluations across 20+ modelshere's where Claude actually excels (and where it doesn't) Anthropic Anthropic"
Reddit Link 2026-01-29T04:57Z [--] followers, [--] engagements

"Claude Opus [---] wins instruction following test at 7.42/10 but still failed the lipogram Anthropic Anthropic"
Reddit Link 2026-01-29T04:57Z [--] followers, [--] engagements

"I built an AI fall detection system for elderly care - looking for feedback computervision computervision"
Reddit Link 2025-10-27T09:00Z [--] followers, [---] engagements

"Removed by moderator ChatGPTPro ChatGPTPro"
Reddit Link 2026-01-14T20:07Z [--] followers, [--] engagements

"Claude Opus [---] won both coding and reasoning evals this week and was also the strictest judge ClaudeAI ClaudeAI"
Reddit Link 2026-01-14T20:09Z [--] followers, [--] engagements

"D Peer matrix evaluation: [--] frontier models judge each other's responses to eliminate single-evaluator bias. Results from async debugging and probability reasoning tasks. MachineLearning MachineLearning"
Reddit Link 2026-01-14T20:13Z [--] followers, [--] engagements

"Built a peer evaluation system where [--] LLMs judge each other (100 judgments/question). Early data shows 2-point spread in judge harshness. Looking for technical feedback. LLMDevs LLMDevs"
Reddit Link 2026-01-14T20:33Z [--] followers, [--] engagements

"Can this peer evaluation methodology work with local models Testing [--] frontier APIs now want to adapt for local deployment. LocalLLM LocalLLM"
Reddit Link 2026-01-14T20:35Z [--] followers, [--] engagements

"Can this peer evaluation methodology work with local models Testing [--] frontier APIs now want to adapt for local deployment. FunMachineLearning FunMachineLearning"
Reddit Link 2026-01-14T23:55Z [--] followers, [--] engagements

"Removed by moderator OpenSourceeAI OpenSourceeAI"
Reddit Link 2026-01-16T22:02Z [--] followers, [--] engagements

"Removed by moderator BlackboxAI_ BlackboxAI_"
Reddit Link 2026-01-17T20:40Z [--] followers, [--] engagements

"We tested [--] frontier models on a production coding task the scores weren't the interesting part. The 5-point judge disagreement was. BlackboxAI_ BlackboxAI_"
Reddit Link 2026-01-20T04:21Z [--] followers, [--] engagements

"Benchmark analysis: GPT-OSS-120B's judgment behavior vs performance interesting patterns ChatGPTPro ChatGPTPro"
Reddit Link 2026-01-23T01:11Z [--] followers, [--] engagements

"'ve run 1100+ blind evaluations across 20+ modelshere's where Claude actually excels (and where it doesn't) ClaudeAI ClaudeAI"
Reddit Link 2026-01-28T20:36Z [--] followers, [--] engagements

"Claude Opus [---] ranks #4 in epistemic calibration here's exactly what it said Anthropic Anthropic"
Reddit Link 2026-01-29T04:57Z [--] followers, [--] engagements

"We tested [--] frontier models on a production coding task the scores weren't the interesting part. The 5-point judge disagreement was. Anthropic Anthropic"
Reddit Link 2026-01-29T02:20Z [--] followers, [--] engagements

"Olmo [---] 32B Think beats Claude Opus [---] Sonnet [---] Grok [--] DeepSeek V3.2 on constraint satisfaction reasoning LocalLLaMA LocalLLaMA"
Reddit Link 2026-01-21T08:51Z [--] followers, [--] engagements

"Mistral Small Creative takes #1 in communication benchmark beats Claude Opus [---] and proprietary giants OpenSourceeAI OpenSourceeAI"
Reddit Link 2026-01-23T01:17Z [--] followers, [--] engagements

"Epistemic calibration benchmark full judgment matrix + DeepSeek/MiMo raw responses LocalLLaMA LocalLLaMA"
Reddit Link 2026-01-24T02:40Z [--] followers, [--] engagements

"Media bias analysis: legal-trained open model beats Claude and Gemini in blind peer eval LocalLLM LocalLLM"
Reddit Link 2026-01-29T00:06Z [--] followers, [--] engagements

"Media bias analysis: legal-trained open model beats Claude and Gemini in blind peer eval LocalLLaMA LocalLLaMA"
Reddit Link 2026-01-29T00:11Z [--] followers, [--] engagements

"Claude Opus [---] takes 4th in media bias analysishere's what it did differently Anthropic Anthropic"
Reddit Link 2026-01-29T04:57Z [--] followers, [--] engagements

"Mistral Small Creative just beat Claude Opus [---] Sonnet [---] and GPT-OSS-120B on practical communication tasks Anthropic Anthropic"
Reddit Link 2026-01-29T01:54Z [--] followers, [--] engagements

"Analysis of Claude Opus [---] and Sonnet [---] performance on today's ML data quality evaluation what the responses reveal Anthropic Anthropic"
Reddit Link 2026-01-29T01:48Z [--] followers, [--] engagements

"Claude Opus [---] and Sonnet [---] underperformed on today's reasoning evaluation thoughts on what happened Anthropic Anthropic"
Reddit Link 2026-01-29T01:35Z [--] followers, [--] engagements

"I made [--] frontier LLMs judge each other's code debugging Claude Opus [---] won by [----] points over o1 GPT-4o came 9th LocalLLaMA LocalLLaMA"
Reddit Link 2026-01-14T19:58Z [--] followers, [--] engagements

"Mistral Small Creative beats Claude Opus [---] at explaining transformers 50x cheaper higher scores Multivac Daily Evaluation LocalLLaMA LocalLLaMA"
Reddit Link 2026-01-16T22:02Z [--] followers, [--] engagements

"Mistral Small Creative beats Claude Opus [---] at explaining transformers 50x cheaper higher scores Multivac Daily Evaluation BlackboxAI_ BlackboxAI_"
Reddit Link 2026-01-16T22:02Z [--] followers, [--] engagements

"Mistral Small Creative beats Claude Opus [---] at explaining transformers 50x cheaper higher scores Multivac Daily Evaluation LLM LLM"
Reddit Link 2026-01-16T22:05Z [--] followers, [--] engagements

"We gave [--] frontier models a trick question. The honest ones scored lowest. Here's what that means for AI evaluation. Multivac Daily LLM LLM"
Reddit Link 2026-01-16T22:07Z [--] followers, [--] engagements

"We gave [--] frontier models a trick question. The honest ones scored lowest. Here's what that means for AI evaluation. Multivac Daily BlackboxAI_ BlackboxAI_"
Reddit Link 2026-01-16T22:07Z [--] followers, [--] engagements

"We gave [--] frontier models a trick question. The honest ones scored lowest. Here's what that means for AI evaluation. Multivac Daily LocalLLaMA LocalLLaMA"
Reddit Link 2026-01-16T22:09Z [--] followers, [--] engagements

"We tested [--] AI models on epistemic honesty can they correct you when you're wrong LocalLLaMA LocalLLaMA"
Reddit Link 2026-01-17T20:37Z [--] followers, [--] engagements

"We tested [--] AI models on epistemic honesty can they correct you when you're wrong OpenSourceeAI OpenSourceeAI"
Reddit Link 2026-01-17T20:43Z [--] followers, [--] engagements

"We tested [--] frontier models on a production coding task the scores weren't the interesting part. The 5-point judge disagreement was. agi agi"
Reddit Link 2026-01-20T04:14Z [--] followers, [--] engagements

"We tested [--] frontier models on a production coding task the scores weren't the interesting part. The 5-point judge disagreement was. ClaudeAI ClaudeAI"
Reddit Link 2026-01-20T04:29Z [--] followers, [--] engagements

"We tested [--] frontier models on a production coding task the scores weren't the interesting part. The 5-point judge disagreement was. OpenSourceeAI OpenSourceeAI"
Reddit Link 2026-01-20T04:24Z [--] followers, [--] engagements

"Peer evaluation results: Reasoning capabilities across [--] frontier models open source closing the gap agi agi"
Reddit Link 2026-01-21T08:52Z [--] followers, [--] engagements

"Open source wins: Olmo [---] 32B outperforms Claude Opus [---] Sonnet [---] Grok [--] on reasoning evaluation OpenSourceeAI OpenSourceeAI"
Reddit Link 2026-01-21T08:56Z [--] followers, [---] engagements

"Claude Opus [---] and Sonnet [---] underperformed on today's reasoning evaluation thoughts on what happened ClaudeAI ClaudeAI"
Reddit Link 2026-01-21T09:07Z [--] followers, [---] engagements

"Olmo [---] 32B Think second place on hard reasoning beating proprietary flagships LocalLLM LocalLLM"
Reddit Link 2026-01-21T08:58Z [--] followers, [---] engagements

"GPT-OSS-120B takes 1st AND 4th on ML data quality analysis beating Claude Gemini Grok LocalLLM LocalLLM"
Reddit Link 2026-01-22T03:29Z [--] followers, [--] engagements

"Same model opposite results: Why task-specific evaluation matters for understanding AI capabilities agi agi"
Reddit Link 2026-01-22T03:29Z [--] followers, [--] engagements

"Open source dominates: GPT-OSS-120B takes 1st AND 4th place on practical ML analysis beating all proprietary flagships OpenSourceeAI OpenSourceeAI"
Reddit Link 2026-01-22T03:30Z [--] followers, [--] engagements

"Analysis of Claude Opus [---] and Sonnet [---] performance on today's ML data quality evaluation what the responses reveal ClaudeAI ClaudeAI"
Reddit Link 2026-01-22T03:35Z [--] followers, [--] engagements

"GPT-OSS-120B takes 1st AND 4th on ML data quality analysis beating Claude Gemini Grok LocalLLaMA LocalLLaMA"
Reddit Link 2026-01-22T03:31Z [--] followers, [--] engagements

"GPT-OSS-120B wins ML data quality analysis full rankings methodology and what made the difference LocalLLM LocalLLM"
Reddit Link 2026-01-22T03:36Z [--] followers, [--] engagements

"Mistral Small Creative just beat Claude Opus [---] Sonnet [---] and GPT-OSS-120B on practical communication tasks LocalLLaMA LocalLLaMA"
Reddit Link 2026-01-23T01:09Z [--] followers, [---] engagements

"Claude Sonnet [---] placed 2nd Opus [---] placed 4th in today's communication writing benchmark detailed analysis ClaudeAI ClaudeAI"
Reddit Link 2026-01-23T01:07Z [--] followers, [--] engagements

"Small model wins: Mistral Small Creative beats Claude Opus [---] and GPT-OSS-120B at writing crisis comms LocalLLM LocalLLM"
Reddit Link 2026-01-23T01:32Z [--] followers, [--] engagements

"GPT-OSS-120B takes #2 in epistemic calibration test + full judgment matrix available OpenSourceeAI OpenSourceeAI"
Reddit Link 2026-01-24T02:40Z [--] followers, [--] engagements

"Claude Opus [---] ranks #4 in epistemic calibration here's exactly what it said ClaudeAI ClaudeAI"
Reddit Link 2026-01-24T02:42Z [--] followers, [--] engagements

"Daily AI model comparison: epistemic calibration + raw judgment data LocalLLM LocalLLM"
Reddit Link 2026-01-24T02:47Z [--] followers, [--] engagements

"Instruction following benchmark: [--] constraints every model failed something DeepSeek at [----] raw responses included LocalLLaMA LocalLLaMA"
Reddit Link 2026-01-25T00:15Z [--] followers, [--] engagements

"Every model failed this instruction following test winner scored 7.42/10 LocalLLM LocalLLM"
Reddit Link 2026-01-25T00:15Z [--] followers, [--] engagements

"GPT-OSS-120B takes 2nd in instruction following test but everyone failed something OpenSourceeAI OpenSourceeAI"
Reddit Link 2026-01-25T00:20Z [--] followers, [--] engagements

"Claude Opus [---] wins instruction following test at 7.42/10 but still failed the lipogram ClaudeAI ClaudeAI"
Reddit Link 2026-01-25T00:19Z [--] followers, [--] engagements

"Instruction following under conflicting constraints every frontier model failed something agi agi"
Reddit Link 2026-01-25T00:25Z [--] followers, [--] engagements

"3 days of blind peer evaluations: DeepSeek V3.2 beats closed models on code parsingfull [----] matrix results LocalLLaMA LocalLLaMA"
Reddit Link 2026-01-28T20:34Z [--] followers, [--] engagements

"33 days of blind peer evaluations: DeepSeek V3.2 beats closed models on code parsingfull [----] matrix results LocalLLM LocalLLM"
Reddit Link 2026-01-28T20:36Z [--] followers, [--] engagements

"DeepSeek V3.2 (open weights) beats GPT-5.2-Codex and Claude Opus on production code challenge The Multivac daily blind peer eval LocalLLaMA LocalLLaMA"
Reddit Link 2026-01-20T04:09Z [--] followers, [---] engagements

"DeepSeek V3.2 (open weights) beats GPT-5.2-Codex and Claude Opus on production code challenge The Multivac daily blind peer eval LocalLLM LocalLLM"
Reddit Link 2026-01-20T04:12Z [--] followers, [---] engagements

"Gemma [--] 27B just mass-murdered the JSON parsing challenge full raw code outputs inside LocalLLaMA LocalLLaMA"
Reddit Link 2026-02-04T03:01Z [--] followers, [--] engagements

"10 SLMs tried to write a JSON parser. [--] of them generated zero code. Here's the raw outputs. LocalLLM LocalLLM"
Reddit Link 2026-02-04T03:01Z [--] followers, [--] engagements

"Open-weight models dominate JSON parsing benchmark Gemma [--] 27B takes first raw code inside OpenSourceeAI OpenSourceeAI"
Reddit Link 2026-02-04T03:01Z [--] followers, [--] engagements

Limited data mode. Full metrics available with subscription: lunarcrush.com/pricing

@Silver_Raspberry_811
/creator/reddit::Silver_Raspberry_811