Dark | Light
[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

[@ziming_mao](/creator/twitter/ziming_mao)
"A couple of months ago we were perplexed by slow MoE communication perf on cloud (e.g. EFA with perplexity kernels). So we built UCCL-EP an efficient GPU-driven EP library that runs on public clouds (e.g. AWS EFA) and heterogeneous GPUs/NICs with the same APIs as DeepEP"  
[X Link](https://x.com/ziming_mao/status/1983265339418038546) [@ziming_mao](/creator/x/ziming_mao) 2025-10-28T20:11Z XXX followers, 1101 engagements


"Cool work diffs from UCCL-EP: (1) UCCL-EP runs on AMD and Broadcom beyond EFA (2) UCCL-EP has better perf with larger # tokens (e.g. 4096) e.g. 2.1ms for dispatch 4.9ms for combine at EP32 (3) API-compatible with DeepEP no code changes needed"  
[X Link](https://x.com/ziming_mao/status/1986235756789289347) [@ziming_mao](/creator/x/ziming_mao) 2025-11-06T00:54Z XXX followers, 1457 engagements

[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

@ziming_mao "A couple of months ago we were perplexed by slow MoE communication perf on cloud (e.g. EFA with perplexity kernels). So we built UCCL-EP an efficient GPU-driven EP library that runs on public clouds (e.g. AWS EFA) and heterogeneous GPUs/NICs with the same APIs as DeepEP"
X Link @ziming_mao 2025-10-28T20:11Z XXX followers, 1101 engagements

"Cool work diffs from UCCL-EP: (1) UCCL-EP runs on AMD and Broadcom beyond EFA (2) UCCL-EP has better perf with larger # tokens (e.g. 4096) e.g. 2.1ms for dispatch 4.9ms for combine at EP32 (3) API-compatible with DeepEP no code changes needed"
X Link @ziming_mao 2025-11-06T00:54Z XXX followers, 1457 engagements

creator/twitter::812259454599065601/posts
/creator/twitter::812259454599065601/posts