[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.] [@ziming_mao](/creator/twitter/ziming_mao) "A couple of months ago we were perplexed by slow MoE communication perf on cloud (e.g. EFA with perplexity kernels). So we built UCCL-EP an efficient GPU-driven EP library that runs on public clouds (e.g. AWS EFA) and heterogeneous GPUs/NICs with the same APIs as DeepEP" [X Link](https://x.com/ziming_mao/status/1983265339418038546) [@ziming_mao](/creator/x/ziming_mao) 2025-10-28T20:11Z XXX followers, 1101 engagements "Cool work diffs from UCCL-EP: (1) UCCL-EP runs on AMD and Broadcom beyond EFA (2) UCCL-EP has better perf with larger # tokens (e.g. 4096) e.g. 2.1ms for dispatch 4.9ms for combine at EP32 (3) API-compatible with DeepEP no code changes needed" [X Link](https://x.com/ziming_mao/status/1986235756789289347) [@ziming_mao](/creator/x/ziming_mao) 2025-11-06T00:54Z XXX followers, 1457 engagements
[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]
@ziming_mao
"A couple of months ago we were perplexed by slow MoE communication perf on cloud (e.g. EFA with perplexity kernels). So we built UCCL-EP an efficient GPU-driven EP library that runs on public clouds (e.g. AWS EFA) and heterogeneous GPUs/NICs with the same APIs as DeepEP"
X Link @ziming_mao 2025-10-28T20:11Z XXX followers, 1101 engagements
"Cool work diffs from UCCL-EP: (1) UCCL-EP runs on AMD and Broadcom beyond EFA (2) UCCL-EP has better perf with larger # tokens (e.g. 4096) e.g. 2.1ms for dispatch 4.9ms for combine at EP32 (3) API-compatible with DeepEP no code changes needed"
X Link @ziming_mao 2025-11-06T00:54Z XXX followers, 1457 engagements
/creator/twitter::812259454599065601/posts