[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]  Rohan Paul [@rohanpaul_ai](/creator/twitter/rohanpaul_ai) on x 73.6K followers Created: 2025-06-27 17:46:51 UTC These guys literally burned the transformer architecture into their silicon. 🤯 And built the fastest chip of the world of all time for transformers architecture. XXXXXXX tokens per second with Llama 70B throughput. 🤯 World’s first specialized chip (ASIC) for transformers: Sohu One 8xSohu server replaces XXX H100 GPUs. And raised $120mn to build it. 🚀 The Big Bet @Etched froze the transformer recipe into silicon. By burning the transformer architecture into its chip means it can’t run many traditional AI models: like CNNs, RNNs, or LSTMs. also it can not run the DLRMs powering Instagram ads, protein-folding models like AlphaFold 2, or older image models like Stable Diffusion X. But for transformers, Sohu lets you build products impossible on GPUs. HOW ❓❓ Because Sohu can only run one algorithm, the vast majority of control flow logic can be removed, allowing it to have many more math blocks. As a result, Sohu boasts over XX% FLOPS utilization (compared to ~30% on a GPU7 with TRT-LLM).  XXXXXXX engagements  **Related Topics** [$120mn](/topic/$120mn) [specialized](/topic/specialized) [llama](/topic/llama) [world of](/topic/world-of) [Post Link](https://x.com/rohanpaul_ai/status/1938655279173792025)
[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]
Rohan Paul @rohanpaul_ai on x 73.6K followers
Created: 2025-06-27 17:46:51 UTC
These guys literally burned the transformer architecture into their silicon. 🤯
And built the fastest chip of the world of all time for transformers architecture.
XXXXXXX tokens per second with Llama 70B throughput. 🤯
World’s first specialized chip (ASIC) for transformers: Sohu
One 8xSohu server replaces XXX H100 GPUs.
And raised $120mn to build it.
🚀 The Big Bet
@Etched froze the transformer recipe into silicon.
By burning the transformer architecture into its chip means it can’t run many traditional AI models: like CNNs, RNNs, or LSTMs. also it can not run the DLRMs powering Instagram ads, protein-folding models like AlphaFold 2, or older image models like Stable Diffusion X.
But for transformers, Sohu lets you build products impossible on GPUs.
HOW ❓❓
Because Sohu can only run one algorithm, the vast majority of control flow logic can be removed, allowing it to have many more math blocks.
As a result, Sohu boasts over XX% FLOPS utilization (compared to ~30% on a GPU7 with TRT-LLM).
XXXXXXX engagements
Related Topics $120mn specialized llama world of
/post/tweet::1938655279173792025