[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.] #  @SemiAnalysis_ SemiAnalysis SemiAnalysis posts on X about gpu, kong, has been, token the most. They currently have XXXXXX followers and XXX posts still getting attention that total XXXXXXX engagements in the last XX hours. ### Engagements: XXXXXXX [#](/creator/twitter::1745106082790318080/interactions)  - X Week XXXXXXXXX +37% - X Month XXXXXXXXX -XX% - X Months XXXXXXXXXX +458% - X Year XXXXXXXXXX +11,568% ### Mentions: XX [#](/creator/twitter::1745106082790318080/posts_active)  - X Week XX +29% - X Month XXX +26% - X Months XXX +976% - X Year XXX +1,650% ### Followers: XXXXXX [#](/creator/twitter::1745106082790318080/followers)  - X Week XXXXXX +2.90% - X Month XXXXXX +9.70% - X Months XXXXXX +135% - X Year XXXXXX +1,101% ### CreatorRank: XXXXXXX [#](/creator/twitter::1745106082790318080/influencer_rank)  ### Social Influence [#](/creator/twitter::1745106082790318080/influence) --- **Social category influence** [technology brands](/list/technology-brands) XXXXX% [stocks](/list/stocks) XXXXX% [finance](/list/finance) XXX% [countries](/list/countries) XXXX% [currencies](/list/currencies) XXXX% [automotive brands](/list/automotive-brands) XXXX% [gaming](/list/gaming) XXXX% **Social topic influence** [gpu](/topic/gpu) #1043, [kong](/topic/kong) 2.5%, [has been](/topic/has-been) 1.88%, [token](/topic/token) #3462, [open ai](/topic/open-ai) 1.25%, [dot](/topic/dot) #866, [meta](/topic/meta) 1.25%, [taiwan](/topic/taiwan) 1.25%, [number of](/topic/number-of) #834, [banger](/topic/banger) XXXX% **Top accounts mentioned or mentioned by** [@dingo__hunter](/creator/undefined) [@anushelangovan](/creator/undefined) [@hotaisle](/creator/undefined) [@coreweave](/creator/undefined) [@dylan522p](/creator/undefined) [@techvisionasia](/creator/undefined) [@nvidia](/creator/undefined) [@mikelongterm](/creator/undefined) [@grok](/creator/undefined) [@the_ai_investor](/creator/undefined) [@frameworkwisely](/creator/undefined) [@amd](/creator/undefined) [@bookwormengr](/creator/undefined) [@from_uom](/creator/undefined) [@julientechinvst](/creator/undefined) [@ineverrememberu](/creator/undefined) [@dorse054](/creator/undefined) [@lmsysorg](/creator/undefined) [@lisasu](/creator/undefined) [@openai](/creator/undefined) **Top assets mentioned** [Tesla, Inc. (TSLA)](/topic/tesla) [Applied Materials, Inc. (AMAT)](/topic/$amat) [Microsoft Corp. (MSFT)](/topic/microsoft) ### Top Social Posts [#](/creator/twitter::1745106082790318080/posts) --- Top posts by engagements in the last XX hours "Youre just cherry-picking data to fit your confirmation bias. If you go to InferenceMAX dot ai and look at the @OpenAI GPT-OSS 120B model and change the Y-axis selector to TCO per million tokens youll see that the MI355X performs competitively with the B200 across certain interactivity levels. There are examples in the data that show B200 is way better than MI355X on the current software stack. As we mentioned in the article theres a lot of nuance in the data the world isnt black and white" [X Link](https://x.com/SemiAnalysis_/status/1977203361524203687) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-10-12T02:43Z 39.8K followers, 91.8K engagements "POV: when the cousin of @dylan522p @dwarkesh_sp reaches 1mil subscribers on his @dwarkeshpodcast channel after his banger interview drops" [X Link](https://x.com/SemiAnalysis_/status/1977962981792108717) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-10-14T05:01Z 39.8K followers, 113.4K engagements "Their NIC only sidebar contains XX 2U JBOK trays. Since each tray is 2U in height they is enough space to fit their X K2V6 400GbE NICs. 6/N š§µ" [X Link](https://x.com/SemiAnalysis_/status/1979665241777885680) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-10-18T21:45Z 39.8K followers, 6090 engagements "The quality of AMD software now is totally different from when we started deeply using summer 2024. In 2024 we were running into many ROCm specific bugs. Today the frequency in running ROCm bugs is orders of magnitude lower. AMD hardware is pretty good & the software is getting better every night. On Llama3 70B FP8 reasoning workloads at frontier lab volume pricing MI300X vLLM offers 5-10% lower perf per TCO than H100 vLLM from our benchmarking across all interactivity levels (tok/s/user) and competitive perf per TCO on MI325X vLLM vs H200 vLLM and GPTOSS MX4 weights 120B Mi355 vs B200. Of" [X Link](https://x.com/SemiAnalysis_/status/1977571931504153076) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-10-13T03:07Z 39.8K followers, 97.9K engagements "Atomic Layer Deposition (ALD) is a deposition process that allows the formation of extremely thin films through multiple cycles. It is known as the most expensive deposition equipment but also the one that delivers the highest film quality. Being much more precise costs several times more than CVD. As a result in a 30K wpm fab ALD typically accounts for only a low double-digit per centage of the total deposition equipment. (1/8)š§µ" [X Link](https://x.com/SemiAnalysis_/status/1977907602089484327) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-10-14T01:21Z 39.8K followers, 16.8K engagements "SGLang team was the first open-source solution to reproduce DeepSeeks multi-node inference system and performance (ultra-wide expert parallelism disaggregated prefill and DP attention). Looking forward to adding disaggregated prefill and wide EP on InferenceMAX 8-way machines like H100 H200 and B200. In addition were looking forward to optimizing GB200 NVL72 SGLang and adding FP4 SGLang GB200 as well" [X Link](https://x.com/SemiAnalysis_/status/1978256694397550688) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-10-15T00:28Z 39.8K followers, 32.7K engagements "One of the key questions surrounding co-packaged optics (CPO) has been its reliability. While many claimed it was reliable there had been limited empirical test results to substantiate that and build confidence. Meta addressed this at the recent ECOC confernece held in Denmark where it released testing data for Tomahawk 5based Bailly CPO switches accumulating XX million port hours on CPO systems and X million on pluggable transceivers (as control group). The data revealed strong margins across key optical performance metrics. Notably there were no link flaps observed during the first 1" [X Link](https://x.com/SemiAnalysis_/status/1978606285773066299) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-10-15T23:37Z 39.8K followers, 26.1K engagements "Most people in the Bay Area are familiar with Georgia Tech as the HPC Garage parallel computing research lab. Not a lot of Bay Area techies know that the state is also home to the University of Georgia which apparently is good at football" [X Link](https://x.com/SemiAnalysis_/status/1979713658575048906) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-10-19T00:58Z 39.8K followers, 24.5K engagements "In comparison @nvidia's H2 2026 product offering the VR200 NVL144 will only connect XX GPU packages together in the scale up domain. This means that AMD could potentially have a XX% advantage for scale up world sizes 3/7" [X Link](https://x.com/SemiAnalysis_/status/1923143000823066918) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-05-15T22:26Z 39.6K followers, 7353 engagements "Looking closer at the Intel NVIDIA partnership shows no vote of confidence in Intel Foundry The deal primarily drives demand in Intel Products with minimal NVIDIA IP fabbed on Intel nodes. While the deal is negative for ARM in datacenter and AMD in PC Intel Foundry does not gain external revenue either. (1/9) š§µ" [X Link](https://x.com/SemiAnalysis_/status/1976031092970033480) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-10-08T21:05Z 39.7K followers, 85.2K engagements "This drives incremental volumes/revenue for INTC Xeons beyond current NVL8 HGX "AI head nodes". This is negative for ARM as it replaces ARM CPU cores in Grace/Vera. (4/9)" [X Link](https://x.com/SemiAnalysis_/status/1976031100117057790) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-10-08T21:05Z 39.6K followers, 3638 engagements "On PC chips: NVIDIA will sell GPU chiplets to Intel. Intel will package them with their x86 CPUs replacing the Intel iGPU tile. The integration will be similar to the Mediatek GB10 configuration with NVLink-C2C Low Power Interface. This GPU chiplet will be made by TSMC delivered to NVIDA then sold to Intel. (5/9)" [X Link](https://x.com/SemiAnalysis_/status/1976031102151307512) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-10-08T21:05Z 39.6K followers, 3978 engagements "InferenceMAX: Open Source Inference Benchmarking Support from OpenAI @LisaSu @AnushElangovan @ia_buck @tri_dao and many more. NVIDIA GB200 NVL72 AMD MI355X Throughput Token per GPU Latency Tok/s/user Perf per Dollar Cost per Million Tokens Tokens per Provisioned Megawatt DeepSeek R1 670B GPTOSS 120B Llama3 70B" [X Link](https://x.com/SemiAnalysis_/status/1976429017134924220) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-10-09T23:26Z 39.6K followers, 37.7K engagements "Linear Pluggable Optics (LPO) were keenly discussed throughout the conference with the likes of Alibaba and Baidu showing interest in its use but as in the rest of the world adoption continues to be slow and a few expressed concerns with the difficulty of getting it to work at 200G per lane. (3/6)" [X Link](https://x.com/SemiAnalysis_/status/1975402165360312719) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-10-07T03:25Z 39.8K followers, 1099 engagements "At COMPUTEX this May NVIDIA announced plans to establish its Constellation headquarters in Taiwan. However the project now faces uncertainty. (1/7)š§µ" [X Link](https://x.com/SemiAnalysis_/status/1975956258240004329) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-10-08T16:07Z 39.8K followers, 98.4K engagements "The proposed site for the Taiwan HQ was the T17 and T18 plots in the Beitou-Shilin Technology Park. NVIDIA had signed a Memorandum of Understanding (MOU) with Shin Kong Life Insurance a Taiwanese company with total assets exceeding USD XXX billion but the MOU expired on September XX and is no longer valid. (2/7)" [X Link](https://x.com/SemiAnalysis_/status/1975956261360500989) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-10-08T16:07Z 39.8K followers, 6875 engagements "The key obstacles are as follows: X. Shin Kong Life completed the registration of building rights for the T17 and T18 plots in February 2022 securing a 50-year leasehold. (3/7)" [X Link](https://x.com/SemiAnalysis_/status/1975956264913142138) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-10-08T16:07Z 39.8K followers, 5203 engagements "2. Shin Kong expressed willingness to directly transfer the rights of T17 and T18 to NVIDIA. However the Taipei City Government opposed this arguing that Shin Kong must first construct the buildings and obtain an occupancy permit before transferring the rights. The government is concerned that if Shin Kong gains additional benefits from NVIDIA through a direct transfer it may be seen as favoritism. Furthermore NVIDIA requested that the road between T17 and T18 be incorporated into the site which would require changes to the urban planning regulations. This could increase the value of the land" [X Link](https://x.com/SemiAnalysis_/status/1975956267798872180) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-10-08T16:07Z 39.8K followers, 4968 engagements "3. Shin Kong proposed constructing the headquarters for NVIDIA and transferring it afterward but NVIDIA rejected. NVIDIA wants full control over the design and construction of its headquarters and does not wish to involve third parties. X. The Taipei City Government suggested terminating the existing contract and signing a new one directly with NVIDIA. Shin Kong opposed this citing its 50-year leasehold and expected rental income. The citys proposal did not account for Shin Kongs anticipated future revenue. (5/7)" [X Link](https://x.com/SemiAnalysis_/status/1975956270491595192) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-10-08T16:07Z 39.8K followers, 7469 engagements "GB200 NVL72 is incredibly power efficiency compared to H200 & B200. GB200 NVL72 is 10x more power (all in provisioned MegaWatt) efficient per token across certain interactivity (tok/s/user) compared to H200 Single Node on document querying scenarios. This is due this to optimizations like NVFP4 disagg prefill & wideEP tcgen05 X CTA MMA etc. In the next X months we are excited to implement disagg prefill & wideEP on multi-node H200 to figure the power efficiency gains of GB200 NVL72 vs multi-node H200. The answer is clear disagg prefill wide EP & larger scale up domains is needed for frontier" [X Link](https://x.com/SemiAnalysis_/status/1976699103380693185) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-10-10T17:19Z 39.8K followers, 68.9K engagements "The Die Yield Calculator has been updated It now features the option to Auto-optimize the placement of dies on a wafer tabulated via an iterative method to ensure that the most dies are placed on the given wafer @wassickt @cyrustabery" [X Link](https://x.com/SemiAnalysis_/status/1976750688614203864) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-10-10T20:44Z 39.8K followers, 13.4K engagements "AMD's software quality has massively improved since AMD DC GPU division went hardcore mode back in January 2025. It isn't just us saying this but many of AMD's Instinct GPU customers are saying this too. Great work to @AnushElangovan 's team of amazing engineers.š„³" [X Link](https://x.com/SemiAnalysis_/status/1977441726974542111) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-10-12T18:30Z 39.7K followers, 65.4K engagements "Was watching the Georgia game and noticed @dwarkesh_sp Host of @dwarkeshpodcast was literally on TV š¤Æš¤Æš¤Æš¤Æš¤Æ" [X Link](https://x.com/SemiAnalysis_/status/1977483675819262259) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-10-12T21:17Z 39.8K followers, 197.3K engagements "On @OpenAI's GPT-OSS 120B with MX4 weights at 215tok/s/user interactivity HGX B200 offers 10x better perf per TCO compared to HGX H100. Great work to the engineers at @vllm_project @nvidiaai and @redhat on this massive achievement š„³ Looking forward to vLLM #25689 which aims to reduce the number of flags needed to be manually set to get optimal Blackwell performance. visit inferencemax dot ai for the full open source & free benchmark result dataset" [X Link](https://x.com/SemiAnalysis_/status/1977751187643658553) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-10-13T15:00Z 39.8K followers, 12.1K engagements "On @OpenAI's GPT-OSS 120B vLLM with MX4 weights at 210tok/s/user interactivity for translation tasks AMD's new MI355X offers 10x better perf per TCO compared to MI300X. Great work to @LisaSu @AnushElangovan and their team of world class engineers. Looking forward to the continued performance software updates on across MI300X MI325X MI355X platforms š„³ Great to see a lot of improvements to MI355X UX have already landed such as TRITON_HIP_ASYNC_COPY_BYPASS_PERMUTE TRITON_HIP_USE_ASYNC_COPYTRITON_HIP_USE_BLOCK_PINGPONG TRITON_HIP_ASYNC_FAST_SWIZZLE flags now being set as default" [X Link](https://x.com/SemiAnalysis_/status/1977917272938422672) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-10-14T02:00Z 39.8K followers, 14.8K engagements "We were too lazy busy to rebuild the whole container image to compile NCCL from scratch with debug symbols enabled. Thus we next used strace to figure out what syscall calls ptxjitcompiler was making in order to dive one layer deeper into which functions are being called. We see that ptxjitcompiler was creating and adding files to /.nv/ComputeCache/ inside the container. (5/8)" [X Link](https://x.com/SemiAnalysis_/status/1977938896144019595) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-10-14T03:25Z 39.8K followers, 1607 engagements "Peeling back yet another layer of the onion we read up on what /.nv/ComputeCache/ does. According to documentation it is the cache to convert PTX virtual ISA to SASS machine code. This was also very puzzling to us as typically NCCL is built with the machine code already bundled in addition to the PTX virtual ISA. We started reading the NCCL build scripts and we noticed that SM100 (Blackwell) wasnt enabled for CUDA XX which was what we were using and found out that they had only enabled it for the upcoming CUDA XX. (6/8)" [X Link](https://x.com/SemiAnalysis_/status/1977938898161393782) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-10-14T03:25Z 39.8K followers, 1534 engagements "This means that SM100 SASS was not bundled in and we were JIT converting compute_90 (hopper) PTX to SM100 SASS resulting in the process taking an extremely long time. The reason why other people didnt see this bug when he ran it was that he was using an internal cluster using slurm with a setting that manually mounted his home directory. Since the SASS JIT cache is stored in the home directory /.nv/ComputeCache/ the SASS was already cached (7/8)" [X Link](https://x.com/SemiAnalysis_/status/1977938899562328499) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-10-14T03:25Z 39.8K followers, 6463 engagements ""I don't wear a watch and the reason I don't wear a watch is now is the most important time" - Jensen Huang Founder of NVIDIA Former Dennys dishwasher" [X Link](https://x.com/SemiAnalysis_/status/1978224665026834469) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-10-14T22:21Z 39.8K followers, 36.1K engagements "yea will look into it once Intel fixes all of their Intel GPU/oneAPI/oneDNN exclusive broken PyTorch unit tests and Intel adds their GPUs to the PyTorch inductor CI. Intel AI products will continue to be sad unless they start having W CI support in open source PyTorch" [X Link](https://x.com/SemiAnalysis_/status/1978659358977544674) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-10-16T03:08Z 39.8K followers, 51.8K engagements "ASIC vs. CPU: The AMD/Intel Analogy" [X Link](https://x.com/SemiAnalysis_/status/1978838349290045870) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-10-16T15:00Z 39.8K followers, 6299 engagements "DGX Spark seems like a cool product that simplifies AI dev such that debugging things can be easier. I.e. being able to look at the generated videos from ablation experiments without needing an scp. One important question is what is the time to compile PyTorch from scratch though DGX Spark only has XX tiny ARM cores (with XX out of the XX being even smaller ARM cores). It already takes a decent chunk of time compiling on dual socket xeon/eypc and dual socket DC grade grace CPUs" [X Link](https://x.com/SemiAnalysis_/status/1979247451287572960) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-10-17T18:05Z 39.8K followers, 21K engagements "AWS believes that their custom K2v5/6 NIC with their in house EFA protocol has better perf than NVIDIA ConnectX-7/8 NICs but due to how increasingly how tightly integrated NVIDIA racks are it becomes increasingly difficult for hyperscalers to use their own NICs. This is what led to AWS GB300 NVL72 to disaggregate their NICs from the compute tray into an NIC only sidecar called "JBOK". Below we breakdown the decisions and constraints that led to this design. š1N š§µ" [X Link](https://x.com/SemiAnalysis_/status/1979665232369782899) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-10-18T21:45Z 39.8K followers, 41K engagements "The reason AWS GB200 wasn't able to do acutal NVL72 and had to do NVL36x2 & NVL36 was due to their need to fill X 200GbE K2V5 NICs (8 EFA backend + X ENA/EBS frontend) per compute tray and that requires having an 2U compute tray. Only NVL36x2 & NVL36 supports 2U compute tray. NVL72 compute tray only supports 1U compute tray. 3/N š§µ" [X Link](https://x.com/SemiAnalysis_/status/1979665236522365053) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-10-18T21:45Z 39.8K followers, 2537 engagements "xAIs Colossus X First Gigawatt Datacenter In The World Unique RL Methodology On Site Turbines + Mississippi Expansion with Solaris Energy Can xAI afford it Middle East Funding Tesla Talent Exodus API revenue Consumer Growth RL Environment" [X Link](https://x.com/SemiAnalysis_/status/1968019636018090047) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-09-16T18:30Z 39.8K followers, 869.2K engagements "PFAS-free is a big deal in semiconductors. Getting rid of forever chemicals seems like a clear win right Turns out its probably greenwashing: š§µ1/10" [X Link](https://x.com/SemiAnalysis_/status/1972702706717352432) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-09-29T16:39Z 39.8K followers, 2.8M engagements "While CVD is a highly effective technique it still has main drawbacks including: ā High-Temperature Requirement (Thermal Budget): CVD often requires a high-temperature environment to achieve film growth. This can lead to thermal damage to certain substrate materials or existing device components limiting its application in some integrated circuit manufacturing steps. ā Process Complexity: The CVD process is generally more complex compared to other thin-film growth techniques demanding precise control over gas supply flow dynamics and reaction conditions. ā High Environmental Sensitivity: The" [X Link](https://x.com/SemiAnalysis_/status/1974137360481738862) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-10-03T15:39Z 39.8K followers, 5147 engagements "The market is primarily led by Applied Materials (AMAT) which holds the largest share. Lam Research Corporation follows as the second major market force. The rest of them are TEL ASM Kokusai and other Taiwanese supplier. (6/6)" [X Link](https://x.com/SemiAnalysis_/status/1974137362109390913) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-10-03T15:39Z 39.8K followers, 5319 engagements "Chinese transceiver vendors like Innolight and Eoptolink have rapidly pivoted towards the use of Silicon Photonics mitigating the impact of an ongoing 200G EML supply shortage. (2/6)" [X Link](https://x.com/SemiAnalysis_/status/1975402163829383283) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-10-07T03:25Z 39.8K followers, 1048 engagements "Chinas State Council on October X approved Order No. XX of 2025 announcing export controls on certain overseas rare-earth items. This marks the fourth round of rare-earth export restriction efforts; the previous round was on April X. (1/8)š§µ" [X Link](https://x.com/SemiAnalysis_/status/1976317611966341265) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-10-09T16:03Z 39.8K followers, 91.1K engagements "On GPT-OSS 120B documentation summarization scenario MI355X vLLM is seeing competitive perf per TCO compared to B200 vLLM for below XXX tok/s/user interactivity. For above XXX tok/s/user we are seeing B200 vLLM & B200 trtllm having an advantage on the current software. There is a lot of nuances to which scenarios and interactivity NVIDIA is currently better perf per TCO at and which scenarios and interactivity AMD is currently better perf per TCO at across a wide range of different model architectures. Full writeup to the article & free dashboard with the complete dataset below in the" [X Link](https://x.com/SemiAnalysis_/status/1976766280192426299) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-10-10T21:46Z 39.8K followers, 18K engagements "Were already seeing a XXX% performance improvement on vLLM MI355X FP8 for medium-dense models since the launch of SemiAnalysis InferenceMAX last week. These optimizations were contributed by an AMD principal engineer building upon the existing InferenceMAX MI355X configuration also contributed by AMD engineers. This highlights one of the key advantages of a continuous benchmark: InferenceMAX is able to move at the speed as open source AI software. Great work to the AMD engineers For the full set of nuanced results visit inferencemax dot ai" [X Link](https://x.com/SemiAnalysis_/status/1978562758292693124) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-10-15T20:44Z 39.8K followers, 47K engagements "One of the biggest announcements during the keynote at OCP Summit this year was the use of NVIDIAs Spectrum-X switch technology at hyperscalers such as Microsoft Meta and Oracle Stargate via whitebox ODM vendors. This customer traction validates the use of NVIDIA switch silicon outside of the full NVIDIA networking ecosystem and against competitors such as Broadcoms Tomahawk and Cisco Silicon One. Notably Microsoft will use the open source network operating system SONiC instead of NVIDIAs Cumulus Linux or a custom OS. Meta will use their own open source NOS called FBOSS" [X Link](https://x.com/SemiAnalysis_/status/1978856726032949268) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-10-16T16:13Z 39.8K followers, 17.2K engagements "Since AMD's Advancing AI in June 2025 there was an massive regression in the number of ROCm PyTorch exclusive disabled/skipped test. We are glad to see massive course correction since September with the AMD ROCm PyTorch team now going massively reducing the number of ROCm exclusive disabled/skipped test. Great work to the AMD PyTorch engineers. š Whats particularly concerning was that many of these tests are not for niche or legacy operators. Critical functionality including numerous transformer tests fused TP matmul and even attention the single most important operator in transformers has" [X Link](https://x.com/SemiAnalysis_/status/1978937817007804558) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-10-16T21:35Z 39.8K followers, 16.2K engagements "Intel just took another step on combining forces š„ with NVIDIA by integrating their new Gaudi3 rack scale systems together with NVIDIA B200 via disaggregated PD inferencing. Intel claims that compared their B200 only baseline and inferencing system using Gaudi3 for decode part & B200 for prefill part connected over Nvidia ConnectX-7 networking results in an 1.7x better perf per TCO for small dense models. We believe this will be done through integrating Gaudi3 into the Nvidia open source Apache2 Dynamo framework. Intel took their massive warehouse inventory full of Gaudi3 chips that they" [X Link](https://x.com/SemiAnalysis_/status/1979347047401533748) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-10-18T00:41Z 39.8K followers, 39.7K engagements "For GB200 AWS only supported GB200 NVL36x2 and NVL36 which allowed up to XX GPUs per NVLink domain while allowing each rack to be 66kW power & 2U compute trays by connecting X NVL36 with NVLink ACC cables. As many GCP & AWS customers have noticed NVIDIA's driver & physical engineering support for NVL36x2 has been lackluster and way more bugs than their standalone NVL72 design. Although AWS markets their NVL36x2 as "NVL72" it is not topologically equivalent to an actual NVL72. 2/Nš§µ" [X Link](https://x.com/SemiAnalysis_/status/1979665234664054808) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-10-18T21:45Z 39.8K followers, 2908 engagements "With GB300 NVIDIA decided to only NVL72 and not support NVL36x2 thus AWS had make a critical choice. Either use NVIDIA ConnectX-8 RoCEv2 NICs which AWS believes is subpar to their superior EFA NICs or come up with an out of the box design. As a footnote SemiAnalysis is still not convinced that EFA NIC is better than RoCEv2 ethernet on performance or user experience but we are open to be convinced. 4/N š§µ" [X Link](https://x.com/SemiAnalysis_/status/1979665238652793208) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-10-18T21:45Z 39.8K followers, 2646 engagements "Was all of this engineering complexity worth it š¤ Some people argue that it is not worth it as it adds a bunch of physical complexity like AEC cables an additional sidecar while still having "EFA quality" UX & performance. Other argue that this is worth it as it prevents AWS from being too locked and dependent on the NVIDIA ecosystem & also removing an single point of failure in the NVIDIA reference design where each GPU talks to only X ConnectX-8 NIC. Verus in AWS GB300 NVL72 design each GPU talks to X k2v6 NICs allowing workloads to not crash if X NIC fails. AWS bigly believes that EFA" [X Link](https://x.com/SemiAnalysis_/status/1979665244147638531) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-10-18T21:45Z 39.8K followers, 5850 engagements "Before silicon teams do A0 tapeout they first simulate it with Minecraft Redstone. Rubin NVL144 DV team is logging into the Minecraft server to run formal verification testing for their new 400G BiDi SerDes where they can send 224G RX and 224G TX simultaneously on the same wire in parallel at the same time. Previously there was a dedicated TX differential pair copper cable and a dedicated RX differential pair copper cable" [X Link](https://x.com/SemiAnalysis_/status/1980044861190652381) [@SemiAnalysis_](/creator/x/SemiAnalysis_) 2025-10-19T22:54Z 39.8K followers, 31.7K engagements
[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]
SemiAnalysis posts on X about gpu, kong, has been, token the most. They currently have XXXXXX followers and XXX posts still getting attention that total XXXXXXX engagements in the last XX hours.
Social category influence technology brands XXXXX% stocks XXXXX% finance XXX% countries XXXX% currencies XXXX% automotive brands XXXX% gaming XXXX%
Social topic influence gpu #1043, kong 2.5%, has been 1.88%, token #3462, open ai 1.25%, dot #866, meta 1.25%, taiwan 1.25%, number of #834, banger XXXX%
Top accounts mentioned or mentioned by @dingo__hunter @anushelangovan @hotaisle @coreweave @dylan522p @techvisionasia @nvidia @mikelongterm @grok @the_ai_investor @frameworkwisely @amd @bookwormengr @from_uom @julientechinvst @ineverrememberu @dorse054 @lmsysorg @lisasu @openai
Top assets mentioned Tesla, Inc. (TSLA) Applied Materials, Inc. (AMAT) Microsoft Corp. (MSFT)
Top posts by engagements in the last XX hours
"Youre just cherry-picking data to fit your confirmation bias. If you go to InferenceMAX dot ai and look at the @OpenAI GPT-OSS 120B model and change the Y-axis selector to TCO per million tokens youll see that the MI355X performs competitively with the B200 across certain interactivity levels. There are examples in the data that show B200 is way better than MI355X on the current software stack. As we mentioned in the article theres a lot of nuance in the data the world isnt black and white"
X Link @SemiAnalysis_ 2025-10-12T02:43Z 39.8K followers, 91.8K engagements
"POV: when the cousin of @dylan522p @dwarkesh_sp reaches 1mil subscribers on his @dwarkeshpodcast channel after his banger interview drops"
X Link @SemiAnalysis_ 2025-10-14T05:01Z 39.8K followers, 113.4K engagements
"Their NIC only sidebar contains XX 2U JBOK trays. Since each tray is 2U in height they is enough space to fit their X K2V6 400GbE NICs. 6/N š§µ"
X Link @SemiAnalysis_ 2025-10-18T21:45Z 39.8K followers, 6090 engagements
"The quality of AMD software now is totally different from when we started deeply using summer 2024. In 2024 we were running into many ROCm specific bugs. Today the frequency in running ROCm bugs is orders of magnitude lower. AMD hardware is pretty good & the software is getting better every night. On Llama3 70B FP8 reasoning workloads at frontier lab volume pricing MI300X vLLM offers 5-10% lower perf per TCO than H100 vLLM from our benchmarking across all interactivity levels (tok/s/user) and competitive perf per TCO on MI325X vLLM vs H200 vLLM and GPTOSS MX4 weights 120B Mi355 vs B200. Of"
X Link @SemiAnalysis_ 2025-10-13T03:07Z 39.8K followers, 97.9K engagements
"Atomic Layer Deposition (ALD) is a deposition process that allows the formation of extremely thin films through multiple cycles. It is known as the most expensive deposition equipment but also the one that delivers the highest film quality. Being much more precise costs several times more than CVD. As a result in a 30K wpm fab ALD typically accounts for only a low double-digit per centage of the total deposition equipment. (1/8)š§µ"
X Link @SemiAnalysis_ 2025-10-14T01:21Z 39.8K followers, 16.8K engagements
"SGLang team was the first open-source solution to reproduce DeepSeeks multi-node inference system and performance (ultra-wide expert parallelism disaggregated prefill and DP attention). Looking forward to adding disaggregated prefill and wide EP on InferenceMAX 8-way machines like H100 H200 and B200. In addition were looking forward to optimizing GB200 NVL72 SGLang and adding FP4 SGLang GB200 as well"
X Link @SemiAnalysis_ 2025-10-15T00:28Z 39.8K followers, 32.7K engagements
"One of the key questions surrounding co-packaged optics (CPO) has been its reliability. While many claimed it was reliable there had been limited empirical test results to substantiate that and build confidence. Meta addressed this at the recent ECOC confernece held in Denmark where it released testing data for Tomahawk 5based Bailly CPO switches accumulating XX million port hours on CPO systems and X million on pluggable transceivers (as control group). The data revealed strong margins across key optical performance metrics. Notably there were no link flaps observed during the first 1"
X Link @SemiAnalysis_ 2025-10-15T23:37Z 39.8K followers, 26.1K engagements
"Most people in the Bay Area are familiar with Georgia Tech as the HPC Garage parallel computing research lab. Not a lot of Bay Area techies know that the state is also home to the University of Georgia which apparently is good at football"
X Link @SemiAnalysis_ 2025-10-19T00:58Z 39.8K followers, 24.5K engagements
"In comparison @nvidia's H2 2026 product offering the VR200 NVL144 will only connect XX GPU packages together in the scale up domain. This means that AMD could potentially have a XX% advantage for scale up world sizes 3/7"
X Link @SemiAnalysis_ 2025-05-15T22:26Z 39.6K followers, 7353 engagements
"Looking closer at the Intel NVIDIA partnership shows no vote of confidence in Intel Foundry The deal primarily drives demand in Intel Products with minimal NVIDIA IP fabbed on Intel nodes. While the deal is negative for ARM in datacenter and AMD in PC Intel Foundry does not gain external revenue either. (1/9) š§µ"
X Link @SemiAnalysis_ 2025-10-08T21:05Z 39.7K followers, 85.2K engagements
"This drives incremental volumes/revenue for INTC Xeons beyond current NVL8 HGX "AI head nodes". This is negative for ARM as it replaces ARM CPU cores in Grace/Vera. (4/9)"
X Link @SemiAnalysis_ 2025-10-08T21:05Z 39.6K followers, 3638 engagements
"On PC chips: NVIDIA will sell GPU chiplets to Intel. Intel will package them with their x86 CPUs replacing the Intel iGPU tile. The integration will be similar to the Mediatek GB10 configuration with NVLink-C2C Low Power Interface. This GPU chiplet will be made by TSMC delivered to NVIDA then sold to Intel. (5/9)"
X Link @SemiAnalysis_ 2025-10-08T21:05Z 39.6K followers, 3978 engagements
"InferenceMAX: Open Source Inference Benchmarking Support from OpenAI @LisaSu @AnushElangovan @ia_buck @tri_dao and many more. NVIDIA GB200 NVL72 AMD MI355X Throughput Token per GPU Latency Tok/s/user Perf per Dollar Cost per Million Tokens Tokens per Provisioned Megawatt DeepSeek R1 670B GPTOSS 120B Llama3 70B"
X Link @SemiAnalysis_ 2025-10-09T23:26Z 39.6K followers, 37.7K engagements
"Linear Pluggable Optics (LPO) were keenly discussed throughout the conference with the likes of Alibaba and Baidu showing interest in its use but as in the rest of the world adoption continues to be slow and a few expressed concerns with the difficulty of getting it to work at 200G per lane. (3/6)"
X Link @SemiAnalysis_ 2025-10-07T03:25Z 39.8K followers, 1099 engagements
"At COMPUTEX this May NVIDIA announced plans to establish its Constellation headquarters in Taiwan. However the project now faces uncertainty. (1/7)š§µ"
X Link @SemiAnalysis_ 2025-10-08T16:07Z 39.8K followers, 98.4K engagements
"The proposed site for the Taiwan HQ was the T17 and T18 plots in the Beitou-Shilin Technology Park. NVIDIA had signed a Memorandum of Understanding (MOU) with Shin Kong Life Insurance a Taiwanese company with total assets exceeding USD XXX billion but the MOU expired on September XX and is no longer valid. (2/7)"
X Link @SemiAnalysis_ 2025-10-08T16:07Z 39.8K followers, 6875 engagements
"The key obstacles are as follows: X. Shin Kong Life completed the registration of building rights for the T17 and T18 plots in February 2022 securing a 50-year leasehold. (3/7)"
X Link @SemiAnalysis_ 2025-10-08T16:07Z 39.8K followers, 5203 engagements
"2. Shin Kong expressed willingness to directly transfer the rights of T17 and T18 to NVIDIA. However the Taipei City Government opposed this arguing that Shin Kong must first construct the buildings and obtain an occupancy permit before transferring the rights. The government is concerned that if Shin Kong gains additional benefits from NVIDIA through a direct transfer it may be seen as favoritism. Furthermore NVIDIA requested that the road between T17 and T18 be incorporated into the site which would require changes to the urban planning regulations. This could increase the value of the land"
X Link @SemiAnalysis_ 2025-10-08T16:07Z 39.8K followers, 4968 engagements
"3. Shin Kong proposed constructing the headquarters for NVIDIA and transferring it afterward but NVIDIA rejected. NVIDIA wants full control over the design and construction of its headquarters and does not wish to involve third parties. X. The Taipei City Government suggested terminating the existing contract and signing a new one directly with NVIDIA. Shin Kong opposed this citing its 50-year leasehold and expected rental income. The citys proposal did not account for Shin Kongs anticipated future revenue. (5/7)"
X Link @SemiAnalysis_ 2025-10-08T16:07Z 39.8K followers, 7469 engagements
"GB200 NVL72 is incredibly power efficiency compared to H200 & B200. GB200 NVL72 is 10x more power (all in provisioned MegaWatt) efficient per token across certain interactivity (tok/s/user) compared to H200 Single Node on document querying scenarios. This is due this to optimizations like NVFP4 disagg prefill & wideEP tcgen05 X CTA MMA etc. In the next X months we are excited to implement disagg prefill & wideEP on multi-node H200 to figure the power efficiency gains of GB200 NVL72 vs multi-node H200. The answer is clear disagg prefill wide EP & larger scale up domains is needed for frontier"
X Link @SemiAnalysis_ 2025-10-10T17:19Z 39.8K followers, 68.9K engagements
"The Die Yield Calculator has been updated It now features the option to Auto-optimize the placement of dies on a wafer tabulated via an iterative method to ensure that the most dies are placed on the given wafer @wassickt @cyrustabery"
X Link @SemiAnalysis_ 2025-10-10T20:44Z 39.8K followers, 13.4K engagements
"AMD's software quality has massively improved since AMD DC GPU division went hardcore mode back in January 2025. It isn't just us saying this but many of AMD's Instinct GPU customers are saying this too. Great work to @AnushElangovan 's team of amazing engineers.š„³"
X Link @SemiAnalysis_ 2025-10-12T18:30Z 39.7K followers, 65.4K engagements
"Was watching the Georgia game and noticed @dwarkesh_sp Host of @dwarkeshpodcast was literally on TV š¤Æš¤Æš¤Æš¤Æš¤Æ"
X Link @SemiAnalysis_ 2025-10-12T21:17Z 39.8K followers, 197.3K engagements
"On @OpenAI's GPT-OSS 120B with MX4 weights at 215tok/s/user interactivity HGX B200 offers 10x better perf per TCO compared to HGX H100. Great work to the engineers at @vllm_project @nvidiaai and @redhat on this massive achievement š„³ Looking forward to vLLM #25689 which aims to reduce the number of flags needed to be manually set to get optimal Blackwell performance. visit inferencemax dot ai for the full open source & free benchmark result dataset"
X Link @SemiAnalysis_ 2025-10-13T15:00Z 39.8K followers, 12.1K engagements
"On @OpenAI's GPT-OSS 120B vLLM with MX4 weights at 210tok/s/user interactivity for translation tasks AMD's new MI355X offers 10x better perf per TCO compared to MI300X. Great work to @LisaSu @AnushElangovan and their team of world class engineers. Looking forward to the continued performance software updates on across MI300X MI325X MI355X platforms š„³ Great to see a lot of improvements to MI355X UX have already landed such as TRITON_HIP_ASYNC_COPY_BYPASS_PERMUTE TRITON_HIP_USE_ASYNC_COPYTRITON_HIP_USE_BLOCK_PINGPONG TRITON_HIP_ASYNC_FAST_SWIZZLE flags now being set as default"
X Link @SemiAnalysis_ 2025-10-14T02:00Z 39.8K followers, 14.8K engagements
"We were too lazy busy to rebuild the whole container image to compile NCCL from scratch with debug symbols enabled. Thus we next used strace to figure out what syscall calls ptxjitcompiler was making in order to dive one layer deeper into which functions are being called. We see that ptxjitcompiler was creating and adding files to /.nv/ComputeCache/ inside the container. (5/8)"
X Link @SemiAnalysis_ 2025-10-14T03:25Z 39.8K followers, 1607 engagements
"Peeling back yet another layer of the onion we read up on what /.nv/ComputeCache/ does. According to documentation it is the cache to convert PTX virtual ISA to SASS machine code. This was also very puzzling to us as typically NCCL is built with the machine code already bundled in addition to the PTX virtual ISA. We started reading the NCCL build scripts and we noticed that SM100 (Blackwell) wasnt enabled for CUDA XX which was what we were using and found out that they had only enabled it for the upcoming CUDA XX. (6/8)"
X Link @SemiAnalysis_ 2025-10-14T03:25Z 39.8K followers, 1534 engagements
"This means that SM100 SASS was not bundled in and we were JIT converting compute_90 (hopper) PTX to SM100 SASS resulting in the process taking an extremely long time. The reason why other people didnt see this bug when he ran it was that he was using an internal cluster using slurm with a setting that manually mounted his home directory. Since the SASS JIT cache is stored in the home directory /.nv/ComputeCache/ the SASS was already cached (7/8)"
X Link @SemiAnalysis_ 2025-10-14T03:25Z 39.8K followers, 6463 engagements
""I don't wear a watch and the reason I don't wear a watch is now is the most important time" - Jensen Huang Founder of NVIDIA Former Dennys dishwasher"
X Link @SemiAnalysis_ 2025-10-14T22:21Z 39.8K followers, 36.1K engagements
"yea will look into it once Intel fixes all of their Intel GPU/oneAPI/oneDNN exclusive broken PyTorch unit tests and Intel adds their GPUs to the PyTorch inductor CI. Intel AI products will continue to be sad unless they start having W CI support in open source PyTorch"
X Link @SemiAnalysis_ 2025-10-16T03:08Z 39.8K followers, 51.8K engagements
"ASIC vs. CPU: The AMD/Intel Analogy"
X Link @SemiAnalysis_ 2025-10-16T15:00Z 39.8K followers, 6299 engagements
"DGX Spark seems like a cool product that simplifies AI dev such that debugging things can be easier. I.e. being able to look at the generated videos from ablation experiments without needing an scp. One important question is what is the time to compile PyTorch from scratch though DGX Spark only has XX tiny ARM cores (with XX out of the XX being even smaller ARM cores). It already takes a decent chunk of time compiling on dual socket xeon/eypc and dual socket DC grade grace CPUs"
X Link @SemiAnalysis_ 2025-10-17T18:05Z 39.8K followers, 21K engagements
"AWS believes that their custom K2v5/6 NIC with their in house EFA protocol has better perf than NVIDIA ConnectX-7/8 NICs but due to how increasingly how tightly integrated NVIDIA racks are it becomes increasingly difficult for hyperscalers to use their own NICs. This is what led to AWS GB300 NVL72 to disaggregate their NICs from the compute tray into an NIC only sidecar called "JBOK". Below we breakdown the decisions and constraints that led to this design. š1N š§µ"
X Link @SemiAnalysis_ 2025-10-18T21:45Z 39.8K followers, 41K engagements
"The reason AWS GB200 wasn't able to do acutal NVL72 and had to do NVL36x2 & NVL36 was due to their need to fill X 200GbE K2V5 NICs (8 EFA backend + X ENA/EBS frontend) per compute tray and that requires having an 2U compute tray. Only NVL36x2 & NVL36 supports 2U compute tray. NVL72 compute tray only supports 1U compute tray. 3/N š§µ"
X Link @SemiAnalysis_ 2025-10-18T21:45Z 39.8K followers, 2537 engagements
"xAIs Colossus X First Gigawatt Datacenter In The World Unique RL Methodology On Site Turbines + Mississippi Expansion with Solaris Energy Can xAI afford it Middle East Funding Tesla Talent Exodus API revenue Consumer Growth RL Environment"
X Link @SemiAnalysis_ 2025-09-16T18:30Z 39.8K followers, 869.2K engagements
"PFAS-free is a big deal in semiconductors. Getting rid of forever chemicals seems like a clear win right Turns out its probably greenwashing: š§µ1/10"
X Link @SemiAnalysis_ 2025-09-29T16:39Z 39.8K followers, 2.8M engagements
"While CVD is a highly effective technique it still has main drawbacks including: ā High-Temperature Requirement (Thermal Budget): CVD often requires a high-temperature environment to achieve film growth. This can lead to thermal damage to certain substrate materials or existing device components limiting its application in some integrated circuit manufacturing steps. ā Process Complexity: The CVD process is generally more complex compared to other thin-film growth techniques demanding precise control over gas supply flow dynamics and reaction conditions. ā High Environmental Sensitivity: The"
X Link @SemiAnalysis_ 2025-10-03T15:39Z 39.8K followers, 5147 engagements
"The market is primarily led by Applied Materials (AMAT) which holds the largest share. Lam Research Corporation follows as the second major market force. The rest of them are TEL ASM Kokusai and other Taiwanese supplier. (6/6)"
X Link @SemiAnalysis_ 2025-10-03T15:39Z 39.8K followers, 5319 engagements
"Chinese transceiver vendors like Innolight and Eoptolink have rapidly pivoted towards the use of Silicon Photonics mitigating the impact of an ongoing 200G EML supply shortage. (2/6)"
X Link @SemiAnalysis_ 2025-10-07T03:25Z 39.8K followers, 1048 engagements
"Chinas State Council on October X approved Order No. XX of 2025 announcing export controls on certain overseas rare-earth items. This marks the fourth round of rare-earth export restriction efforts; the previous round was on April X. (1/8)š§µ"
X Link @SemiAnalysis_ 2025-10-09T16:03Z 39.8K followers, 91.1K engagements
"On GPT-OSS 120B documentation summarization scenario MI355X vLLM is seeing competitive perf per TCO compared to B200 vLLM for below XXX tok/s/user interactivity. For above XXX tok/s/user we are seeing B200 vLLM & B200 trtllm having an advantage on the current software. There is a lot of nuances to which scenarios and interactivity NVIDIA is currently better perf per TCO at and which scenarios and interactivity AMD is currently better perf per TCO at across a wide range of different model architectures. Full writeup to the article & free dashboard with the complete dataset below in the"
X Link @SemiAnalysis_ 2025-10-10T21:46Z 39.8K followers, 18K engagements
"Were already seeing a XXX% performance improvement on vLLM MI355X FP8 for medium-dense models since the launch of SemiAnalysis InferenceMAX last week. These optimizations were contributed by an AMD principal engineer building upon the existing InferenceMAX MI355X configuration also contributed by AMD engineers. This highlights one of the key advantages of a continuous benchmark: InferenceMAX is able to move at the speed as open source AI software. Great work to the AMD engineers For the full set of nuanced results visit inferencemax dot ai"
X Link @SemiAnalysis_ 2025-10-15T20:44Z 39.8K followers, 47K engagements
"One of the biggest announcements during the keynote at OCP Summit this year was the use of NVIDIAs Spectrum-X switch technology at hyperscalers such as Microsoft Meta and Oracle Stargate via whitebox ODM vendors. This customer traction validates the use of NVIDIA switch silicon outside of the full NVIDIA networking ecosystem and against competitors such as Broadcoms Tomahawk and Cisco Silicon One. Notably Microsoft will use the open source network operating system SONiC instead of NVIDIAs Cumulus Linux or a custom OS. Meta will use their own open source NOS called FBOSS"
X Link @SemiAnalysis_ 2025-10-16T16:13Z 39.8K followers, 17.2K engagements
"Since AMD's Advancing AI in June 2025 there was an massive regression in the number of ROCm PyTorch exclusive disabled/skipped test. We are glad to see massive course correction since September with the AMD ROCm PyTorch team now going massively reducing the number of ROCm exclusive disabled/skipped test. Great work to the AMD PyTorch engineers. š Whats particularly concerning was that many of these tests are not for niche or legacy operators. Critical functionality including numerous transformer tests fused TP matmul and even attention the single most important operator in transformers has"
X Link @SemiAnalysis_ 2025-10-16T21:35Z 39.8K followers, 16.2K engagements
"Intel just took another step on combining forces š„ with NVIDIA by integrating their new Gaudi3 rack scale systems together with NVIDIA B200 via disaggregated PD inferencing. Intel claims that compared their B200 only baseline and inferencing system using Gaudi3 for decode part & B200 for prefill part connected over Nvidia ConnectX-7 networking results in an 1.7x better perf per TCO for small dense models. We believe this will be done through integrating Gaudi3 into the Nvidia open source Apache2 Dynamo framework. Intel took their massive warehouse inventory full of Gaudi3 chips that they"
X Link @SemiAnalysis_ 2025-10-18T00:41Z 39.8K followers, 39.7K engagements
"For GB200 AWS only supported GB200 NVL36x2 and NVL36 which allowed up to XX GPUs per NVLink domain while allowing each rack to be 66kW power & 2U compute trays by connecting X NVL36 with NVLink ACC cables. As many GCP & AWS customers have noticed NVIDIA's driver & physical engineering support for NVL36x2 has been lackluster and way more bugs than their standalone NVL72 design. Although AWS markets their NVL36x2 as "NVL72" it is not topologically equivalent to an actual NVL72. 2/Nš§µ"
X Link @SemiAnalysis_ 2025-10-18T21:45Z 39.8K followers, 2908 engagements
"With GB300 NVIDIA decided to only NVL72 and not support NVL36x2 thus AWS had make a critical choice. Either use NVIDIA ConnectX-8 RoCEv2 NICs which AWS believes is subpar to their superior EFA NICs or come up with an out of the box design. As a footnote SemiAnalysis is still not convinced that EFA NIC is better than RoCEv2 ethernet on performance or user experience but we are open to be convinced. 4/N š§µ"
X Link @SemiAnalysis_ 2025-10-18T21:45Z 39.8K followers, 2646 engagements
"Was all of this engineering complexity worth it š¤ Some people argue that it is not worth it as it adds a bunch of physical complexity like AEC cables an additional sidecar while still having "EFA quality" UX & performance. Other argue that this is worth it as it prevents AWS from being too locked and dependent on the NVIDIA ecosystem & also removing an single point of failure in the NVIDIA reference design where each GPU talks to only X ConnectX-8 NIC. Verus in AWS GB300 NVL72 design each GPU talks to X k2v6 NICs allowing workloads to not crash if X NIC fails. AWS bigly believes that EFA"
X Link @SemiAnalysis_ 2025-10-18T21:45Z 39.8K followers, 5850 engagements
"Before silicon teams do A0 tapeout they first simulate it with Minecraft Redstone. Rubin NVL144 DV team is logging into the Minecraft server to run formal verification testing for their new 400G BiDi SerDes where they can send 224G RX and 224G TX simultaneously on the same wire in parallel at the same time. Previously there was a dedicated TX differential pair copper cable and a dedicated RX differential pair copper cable"
X Link @SemiAnalysis_ 2025-10-19T22:54Z 39.8K followers, 31.7K engagements
/creator/twitter::SemiAnalysis_