LunarCrush LLM | post/tweet::1894553164235640933

[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

![deepseek_ai Avatar](https://lunarcrush.com/gi/w:24/cr:twitter::1714580962569588736.png) DeepSeek [@deepseek_ai](/creator/twitter/deepseek_ai) on x 971.4K followers
Created: 2025-02-26 01:00:48 UTC

🚀 Day X of #OpenSourceWeek: DeepGEMM

Introducing DeepGEMM - an FP8 GEMM library that supports both dense and MoE GEMMs, powering V3/R1 training and inference.

⚡ Up to 1350+ FP8 TFLOPS on Hopper GPUs
✅ No heavy dependency, as clean as a tutorial
✅ Fully Just-In-Time compiled
✅ Core logic at ~300 lines - yet outperforms expert-tuned kernels across most matrix sizes
✅ Supports dense layout and two MoE layouts

🔗 GitHub:


XXXXXXX engagements

![Engagements Line Chart](https://lunarcrush.com/gi/w:600/p:tweet::1894553164235640933/c:line.svg)

**Related Topics**
[matrix](/topic/matrix)
[inference](/topic/inference)

[Post Link](https://x.com/deepseek_ai/status/1894553164235640933)

[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

DeepSeek @deepseek_ai on x 971.4K followers Created: 2025-02-26 01:00:48 UTC

🚀 Day X of #OpenSourceWeek: DeepGEMM

Introducing DeepGEMM - an FP8 GEMM library that supports both dense and MoE GEMMs, powering V3/R1 training and inference.

⚡ Up to 1350+ FP8 TFLOPS on Hopper GPUs ✅ No heavy dependency, as clean as a tutorial ✅ Fully Just-In-Time compiled ✅ Core logic at ~300 lines - yet outperforms expert-tuned kernels across most matrix sizes ✅ Supports dense layout and two MoE layouts

🔗 GitHub:

XXXXXXX engagements

Engagements Line Chart

Related Topics matrix inference

Post Link