[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

@shubham_arora_0 "curious if I can replicate results with using a single rtx 5070 working XX hours"
X Link @shubham_arora_0 2025-10-13T16:22Z XXX followers, XX engagements

"yo apple's press release for m5 showcases LMstudio running qwen3-coder-30b local models ftw"
X Link @shubham_arora_0 2025-10-15T17:50Z XXX followers, XX engagements

"@danielmerja m3 max mxfp4 gpt-oss-20b with mlx you get about XX tok/s"
X Link @shubham_arora_0 2025-10-15T04:40Z XXX followers, 5170 engagements

"on my m3 max running the gpt-oss-20b using the MXFP4 quant running via MLX (via LMstudio) I get XX tok/sec chat performance 0.46s to first token"
X Link @shubham_arora_0 2025-10-15T04:39Z XXX followers, 5447 engagements

"@DanAdvantage this is the model that comes to mind. it is actually great for learning local inference experimentation. i use mac with huge ram for basically for the same reason. wouldn't recommend for work though"
X Link @shubham_arora_0 2025-10-15T18:15Z XXX followers, X engagements

"so I guess the conclusion to draw is just slower memory bandwidth XXX GB/s (spark) vs XXX GB/s (m3 max)"
X Link @shubham_arora_0 2025-10-15T18:04Z XXX followers, XX engagements