Dark | Light
[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

![rohanpaul_ai Avatar](https://lunarcrush.com/gi/w:24/cr:twitter::2588345408.png) Rohan Paul [@rohanpaul_ai](/creator/twitter/rohanpaul_ai) on x 73.6K followers
Created: 2025-06-27 17:46:51 UTC

These guys literally burned the transformer architecture into their silicon. 🤯

And built the fastest chip of the world of all time for transformers architecture. 

XXXXXXX tokens per second with Llama 70B throughput.  🤯

World’s first specialized chip (ASIC) for transformers: Sohu

One 8xSohu server replaces XXX H100 GPUs.

And raised $120mn to build it.

🚀 The Big Bet

@Etched froze the transformer recipe into silicon.

By burning the transformer architecture into its chip means it can’t run many traditional AI models: like CNNs, RNNs, or LSTMs. also it can not run the DLRMs powering Instagram ads, protein-folding models like AlphaFold 2, or older image models like Stable Diffusion X. 

But for transformers, Sohu lets you build products impossible on GPUs.

HOW ❓❓

Because Sohu can only run one algorithm, the vast majority of control flow logic can be removed, allowing it to have many more math blocks. 

As a result, Sohu boasts over XX% FLOPS utilization (compared to ~30% on a GPU7 with TRT-LLM).

![](https://pbs.twimg.com/media/Gud3GvaXEAAZYEF.png)

XXXXXXX engagements

![Engagements Line Chart](https://lunarcrush.com/gi/w:600/p:tweet::1938655279173792025/c:line.svg)

**Related Topics**
[$120mn](/topic/$120mn)
[specialized](/topic/specialized)
[llama](/topic/llama)
[world of](/topic/world-of)

[Post Link](https://x.com/rohanpaul_ai/status/1938655279173792025)

[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

rohanpaul_ai Avatar Rohan Paul @rohanpaul_ai on x 73.6K followers Created: 2025-06-27 17:46:51 UTC

These guys literally burned the transformer architecture into their silicon. 🤯

And built the fastest chip of the world of all time for transformers architecture.

XXXXXXX tokens per second with Llama 70B throughput. 🤯

World’s first specialized chip (ASIC) for transformers: Sohu

One 8xSohu server replaces XXX H100 GPUs.

And raised $120mn to build it.

🚀 The Big Bet

@Etched froze the transformer recipe into silicon.

By burning the transformer architecture into its chip means it can’t run many traditional AI models: like CNNs, RNNs, or LSTMs. also it can not run the DLRMs powering Instagram ads, protein-folding models like AlphaFold 2, or older image models like Stable Diffusion X.

But for transformers, Sohu lets you build products impossible on GPUs.

HOW ❓❓

Because Sohu can only run one algorithm, the vast majority of control flow logic can be removed, allowing it to have many more math blocks.

As a result, Sohu boasts over XX% FLOPS utilization (compared to ~30% on a GPU7 with TRT-LLM).

XXXXXXX engagements

Engagements Line Chart

Related Topics $120mn specialized llama world of

Post Link

post/tweet::1938655279173792025
/post/tweet::1938655279173792025