[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

@modellingReason "Qwen and Kimi are battling it out in real world and pushing the open source frontiers. Meanwhile Deepmind and ClosedAI use a small country's GDP worth of thinking tokens to solve math puzzles. We don't need agents that book flights we need capable tool callers"
@modellingReason Avatar @modellingReason on X 2025-07-22 20:44:28 UTC XXX followers, XXX engagements

"I have never seen anyone talk about grok3-mini even though all benchmarks say it's a price/perf king. Has anyone done experiments with it for high volume tasks"
@modellingReason Avatar @modellingReason on X 2025-07-13 22:50:58 UTC XXX followers, XXX engagements

"@larsencc @Cloudflare why would it be do they even have positive free cash flow"
@modellingReason Avatar @modellingReason on X 2025-07-19 19:29:18 UTC XXX followers, XX engagements

"@sbeastwindy @rasbt There definitely is a gap of documented research but I think we can infer a lot from the published models. Qwen published two MoE models for Qwen3: 235B A22B and 30B A3B. The rest are dense models. I think if it would have made sense for smaller models they would have done it"
@modellingReason Avatar @modellingReason on X 2025-07-22 15:02:14 UTC XXX followers, XX engagements

"@sbeastwindy @rasbt it could also be the case that splitting into experts doesn't work for specialized models. In that case you would probably rather fine tune multiple small models"
@modellingReason Avatar @modellingReason on X 2025-07-22 11:27:24 UTC XXX followers, XX engagements

"@lowvram I stopped abusing my 3080 for training after I found out that an H100 has 30x as many f32 flops. I think I will really just start renting one a few hours a month to do my silly little experiments"
@modellingReason Avatar @modellingReason on X 2025-07-20 14:00:40 UTC XXX followers, 1335 engagements

"@kalomaze Did you try this just for the sake of it or do you have a use for a dense 405B llama model"
@modellingReason Avatar @modellingReason on X 2025-07-17 12:17:51 UTC XXX followers, XXX engagements

"Software devs too often forget that there is actual research and engineering behind the AI models they wrap in their SaaS tools"
@modellingReason Avatar @modellingReason on X 2025-07-19 15:40:26 UTC XXX followers, XX engagements

"Has anyone done experiments with using XML instead of JSON for structured LLM outputs The big AI labs often use XML in their CoT examples but only provide structured outputs via JSON. Curious"
@modellingReason Avatar @modellingReason on X 2025-07-19 11:39:15 UTC XXX followers, XXX engagements

"Everyday I am grateful to be part of the last batch of SWEs who didn't have access to LLMs in college. From what I've heard everything went to shit as soon as ChatGPT was good enough to do the homework in "Intro to Programming" for you"
@modellingReason Avatar @modellingReason on X 2025-07-17 10:35:08 UTC XXX followers, 33.2K engagements

"Is it just me or do you also assume that someone is an idiot as soon as they start talking about AWS Lambdas"
@modellingReason Avatar @modellingReason on X 2025-07-19 12:35:42 UTC XXX followers, XXX engagements

"@robiscoding German CS programs are very heavy on homework. Lots of proofs and theoretical compsci. This entire system breaks apart if people can just let ChatGPT do the homework for them"
@modellingReason Avatar @modellingReason on X 2025-07-17 18:19:42 UTC XXX followers, XXX engagements