Dark | Light
[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

[@ptrschmdtnlsn](/creator/twitter/ptrschmdtnlsn)
"My cofounder and I are making an FPGA-accelerated server for high-memory high-bandwidth workloads. We're looking for one or two companies to partner with; we'll do the work to port your application to our hardware. Please DM me if you've got a tricky workload"  
[X Link](https://x.com/ptrschmdtnlsn/status/1981021789774893176) [@ptrschmdtnlsn](/creator/x/ptrschmdtnlsn) 2025-10-22T15:36Z 8973 followers, 74.2K engagements


"@idoccor The host Linux system is an AMD Epyc platform. The FPGAs are from Xilinx. These are raw speeds of the various interfaces not compressed speeds. The flash speed is that we have XXX separate NVMe interfaces so re: rand reads picture the performance characteristics of XXX SSDs"  
[X Link](https://x.com/ptrschmdtnlsn/status/1981085894603133052) [@ptrschmdtnlsn](/creator/x/ptrschmdtnlsn) 2025-10-22T19:50Z 8973 followers, 1387 engagements


"Many years ago I wrote an LD_PRELOAD that would sneakily turn every cudaMalloc in pytorch into a cudaMallocManaged. It actually just kind of worked for running huge VRAM models. I keep expecting this to be a standard feature or someone else to have done this. Is this a thing"  
[X Link](https://x.com/ptrschmdtnlsn/status/1981640730708767106) [@ptrschmdtnlsn](/creator/x/ptrschmdtnlsn) 2025-10-24T08:35Z 8974 followers, 4711 engagements


"Like for modern ASICs presumably "make guesses about the cells in the PDK" isn't even the right approach because you'd hand-optimize these circuits so I should reason on circuits of sum-of-product gates (basically what CMOS gives you) directly. I think this should be doable"  
[X Link](https://x.com/ptrschmdtnlsn/status/1976351456711315619) [@ptrschmdtnlsn](/creator/x/ptrschmdtnlsn) 2025-10-09T18:18Z 8971 followers, XXX engagements


"@i2cjak Wait this is literally from a board I had made with a 676-ball UltraScale+ part from Xilinx. I think I was a little excessive but I was serious"  
[X Link](https://x.com/ptrschmdtnlsn/status/1978588524221006052) [@ptrschmdtnlsn](/creator/x/ptrschmdtnlsn) 2025-10-15T22:27Z 8972 followers, XXX engagements


"Okay this is 50x oversubscribed. I think I should maybe just do a big group order of new boards at cost (probably $XX without tariffs maybe $XXX with tariffs) and charge everyone with a tech job 1.5x cost to subsidize giving them out for free to all the students"  
[X Link](https://x.com/ptrschmdtnlsn/status/1978836690853593434) [@ptrschmdtnlsn](/creator/x/ptrschmdtnlsn) 2025-10-16T14:53Z 8958 followers, 31.8K engagements


"@HotAisle Super reasonable questions. I'd love to get into it more. The core answer is basically that we're targeting different applications (in particular lower arithmetic intensity ones with sparser memory access patterns like search/vector DBs) where the MI355X loses on DRAM/$"  
[X Link](https://x.com/ptrschmdtnlsn/status/1981052301524816092) [@ptrschmdtnlsn](/creator/x/ptrschmdtnlsn) 2025-10-22T17:37Z 8971 followers, XXX engagements


"@CChef1980 Yeah I mean I can't say I've done enough testing to be able to know. But I have the room and it's free so why not I've seen some folks surround high-speed diff pair layer changes with a constellation of as many vias as they can"  
[X Link](https://x.com/ptrschmdtnlsn/status/1981086477208748353) [@ptrschmdtnlsn](/creator/x/ptrschmdtnlsn) 2025-10-22T19:53Z 8970 followers, XXX engagements


"Dry erase manufacturers could double marker lifespan with a single change: Make the caps white and make the *back* of the marker be its drawing color. Then folks would instinctually store them tip-down in jars which *dramatically* extends their life"  
[X Link](https://x.com/ptrschmdtnlsn/status/1961331149394870505) [@ptrschmdtnlsn](/creator/x/ptrschmdtnlsn) 2025-08-29T07:32Z 8973 followers, 3.6M engagements


"Saving a note for myself: I want to invent the best 4-bit to 6-bit quantization schemes by optimizing the list of numbers in the quantization scheme directly and having a reasonable model of tech mapping to frontier processes and doing my wacky SAT solver exact synthesis stuff"  
[X Link](https://x.com/ptrschmdtnlsn/status/1976350775959032009) [@ptrschmdtnlsn](/creator/x/ptrschmdtnlsn) 2025-10-09T18:15Z 8970 followers, 1199 engagements


"@AksharVastarpar Right now this project is using UltraScale+ parts from Xilinx"  
[X Link](https://x.com/ptrschmdtnlsn/status/1981129390244450337) [@ptrschmdtnlsn](/creator/x/ptrschmdtnlsn) 2025-10-22T22:43Z 8968 followers, XXX engagements


"I have a ton of old custom FPGA boards and I'm trying to clear out my office. Anyone in the Bay Area want some FPGA boards for free Here's a "build your own GPU" board I designed with working PCIe half a gigabyte of "VRAM" 63k LUTs and 1080p 60Hz HDMI output. I have two"  
[X Link](https://x.com/ptrschmdtnlsn/status/1978582368140083556) [@ptrschmdtnlsn](/creator/x/ptrschmdtnlsn) 2025-10-15T22:02Z 8974 followers, 99.6K engagements


"I know many people have made this observation but it's so funny that the assumptions about high-end computing needs has gone from overwhelmingly FP64 ("FP32 is just for gamers serious HPC is FP64") to now being FP16 or even 8-bit to 4-bit for ML. It's a complete reversal"  
[X Link](https://x.com/ptrschmdtnlsn/status/1981790376345448797) [@ptrschmdtnlsn](/creator/x/ptrschmdtnlsn) 2025-10-24T18:30Z 8975 followers, 13.2K engagements

[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

@ptrschmdtnlsn "My cofounder and I are making an FPGA-accelerated server for high-memory high-bandwidth workloads. We're looking for one or two companies to partner with; we'll do the work to port your application to our hardware. Please DM me if you've got a tricky workload"
X Link @ptrschmdtnlsn 2025-10-22T15:36Z 8973 followers, 74.2K engagements

"@idoccor The host Linux system is an AMD Epyc platform. The FPGAs are from Xilinx. These are raw speeds of the various interfaces not compressed speeds. The flash speed is that we have XXX separate NVMe interfaces so re: rand reads picture the performance characteristics of XXX SSDs"
X Link @ptrschmdtnlsn 2025-10-22T19:50Z 8973 followers, 1387 engagements

"Many years ago I wrote an LD_PRELOAD that would sneakily turn every cudaMalloc in pytorch into a cudaMallocManaged. It actually just kind of worked for running huge VRAM models. I keep expecting this to be a standard feature or someone else to have done this. Is this a thing"
X Link @ptrschmdtnlsn 2025-10-24T08:35Z 8974 followers, 4711 engagements

"Like for modern ASICs presumably "make guesses about the cells in the PDK" isn't even the right approach because you'd hand-optimize these circuits so I should reason on circuits of sum-of-product gates (basically what CMOS gives you) directly. I think this should be doable"
X Link @ptrschmdtnlsn 2025-10-09T18:18Z 8971 followers, XXX engagements

"@i2cjak Wait this is literally from a board I had made with a 676-ball UltraScale+ part from Xilinx. I think I was a little excessive but I was serious"
X Link @ptrschmdtnlsn 2025-10-15T22:27Z 8972 followers, XXX engagements

"Okay this is 50x oversubscribed. I think I should maybe just do a big group order of new boards at cost (probably $XX without tariffs maybe $XXX with tariffs) and charge everyone with a tech job 1.5x cost to subsidize giving them out for free to all the students"
X Link @ptrschmdtnlsn 2025-10-16T14:53Z 8958 followers, 31.8K engagements

"@HotAisle Super reasonable questions. I'd love to get into it more. The core answer is basically that we're targeting different applications (in particular lower arithmetic intensity ones with sparser memory access patterns like search/vector DBs) where the MI355X loses on DRAM/$"
X Link @ptrschmdtnlsn 2025-10-22T17:37Z 8971 followers, XXX engagements

"@CChef1980 Yeah I mean I can't say I've done enough testing to be able to know. But I have the room and it's free so why not I've seen some folks surround high-speed diff pair layer changes with a constellation of as many vias as they can"
X Link @ptrschmdtnlsn 2025-10-22T19:53Z 8970 followers, XXX engagements

"Dry erase manufacturers could double marker lifespan with a single change: Make the caps white and make the back of the marker be its drawing color. Then folks would instinctually store them tip-down in jars which dramatically extends their life"
X Link @ptrschmdtnlsn 2025-08-29T07:32Z 8973 followers, 3.6M engagements

"Saving a note for myself: I want to invent the best 4-bit to 6-bit quantization schemes by optimizing the list of numbers in the quantization scheme directly and having a reasonable model of tech mapping to frontier processes and doing my wacky SAT solver exact synthesis stuff"
X Link @ptrschmdtnlsn 2025-10-09T18:15Z 8970 followers, 1199 engagements

"@AksharVastarpar Right now this project is using UltraScale+ parts from Xilinx"
X Link @ptrschmdtnlsn 2025-10-22T22:43Z 8968 followers, XXX engagements

"I have a ton of old custom FPGA boards and I'm trying to clear out my office. Anyone in the Bay Area want some FPGA boards for free Here's a "build your own GPU" board I designed with working PCIe half a gigabyte of "VRAM" 63k LUTs and 1080p 60Hz HDMI output. I have two"
X Link @ptrschmdtnlsn 2025-10-15T22:02Z 8974 followers, 99.6K engagements

"I know many people have made this observation but it's so funny that the assumptions about high-end computing needs has gone from overwhelmingly FP64 ("FP32 is just for gamers serious HPC is FP64") to now being FP16 or even 8-bit to 4-bit for ML. It's a complete reversal"
X Link @ptrschmdtnlsn 2025-10-24T18:30Z 8975 followers, 13.2K engagements

creator/twitter::1418081315677605889/posts
/creator/twitter::1418081315677605889/posts