Dark | Light
[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

![rohanpaul_ai Avatar](https://lunarcrush.com/gi/w:24/cr:twitter::2588345408.png) Rohan Paul [@rohanpaul_ai](/creator/twitter/rohanpaul_ai) on x 74.3K followers
Created: 2025-07-15 13:06:00 UTC

Most apps pick one large language model then hope it can do every job.

FusionBench proves that mixing models with smart routing, shared thoughts, or distillation beats any solo model.

FusionBench gathers 103M tokens of queries, answers, and thought sketches from XX open models that range from 8B to 671B parameters.

It covers XX familiar tasks in math, code, commonsense, world knowledge, and reading so tests feel realistic.

Each query holds two answer styles, a straight reply and a detailed reasoning path, then a judge score and cost tag.

FusionFactory then tries three fusion tricks.

Query level trains a tiny router that picks a model per request while watching quality and tokens.

Thought level strips the best chains of thought into reusable templates, plugs them as few shot hints, and lifts accuracy the most.

Model level fine tunes a base 8B model on top answers, helping some domains yet often overfits.

Across XX benchmarks the best fusion choice outscores the strongest individual model and often cuts price, with thought fusion leading and query fusion offering the best speed cost mix.

World facts and strict math see smaller gains because they punish even tiny noise.

In short, routing logs are not trash, they form a training set that lets different models cover each other's gaps.

----

Paper – arxiv. org/abs/2507.10540

Paper Title: "Fusing LLM Capabilities with Routing Data"

![](https://pbs.twimg.com/media/Gv4kU35XQAATWhI.png)

XXXXX engagements

![Engagements Line Chart](https://lunarcrush.com/gi/w:600/p:tweet::1945107581409632328/c:line.svg)

[Post Link](https://x.com/rohanpaul_ai/status/1945107581409632328)

[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

rohanpaul_ai Avatar Rohan Paul @rohanpaul_ai on x 74.3K followers Created: 2025-07-15 13:06:00 UTC

Most apps pick one large language model then hope it can do every job.

FusionBench proves that mixing models with smart routing, shared thoughts, or distillation beats any solo model.

FusionBench gathers 103M tokens of queries, answers, and thought sketches from XX open models that range from 8B to 671B parameters.

It covers XX familiar tasks in math, code, commonsense, world knowledge, and reading so tests feel realistic.

Each query holds two answer styles, a straight reply and a detailed reasoning path, then a judge score and cost tag.

FusionFactory then tries three fusion tricks.

Query level trains a tiny router that picks a model per request while watching quality and tokens.

Thought level strips the best chains of thought into reusable templates, plugs them as few shot hints, and lifts accuracy the most.

Model level fine tunes a base 8B model on top answers, helping some domains yet often overfits.

Across XX benchmarks the best fusion choice outscores the strongest individual model and often cuts price, with thought fusion leading and query fusion offering the best speed cost mix.

World facts and strict math see smaller gains because they punish even tiny noise.

In short, routing logs are not trash, they form a training set that lets different models cover each other's gaps.


Paper – arxiv. org/abs/2507.10540

Paper Title: "Fusing LLM Capabilities with Routing Data"

XXXXX engagements

Engagements Line Chart

Post Link

post/tweet::1945107581409632328
/post/tweet::1945107581409632328