[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

@giffmana Lucas Beyer (bl16)

Lucas Beyer (bl16) posts on X about llm, open source, #ai, hint the most. They currently have XXXXXXX followers and XX posts still getting attention that total XXXXX engagements in the last XX hours.

Engagements: XXXXX #

X Week XXXXXXX -XX%
X Month XXXXXXXXX +89%
X Months XXXXXXXXX +8.90%
X Year XXXXXXXXXX +124%

Mentions: XX #

X Week XX -XX%
X Month XXX +98%
X Months XXX +31%
X Year XXX +113%

Followers: XXXXXXX #

X Week XXXXXXX +0.94%
X Month XXXXXXX +11%
X Months XXXXXXX +30%
X Year XXXXXXX +65%

CreatorRank: XXXXXXX #

Social Influence #

Social category influence technology brands XXXX% stocks XXXX% finance XXXX% celebrities XXXX% currencies XXXX%

Social topic influence llm #37, open source 3.64%, #ai 1.82%, hint 1.82%, has been 1.82%, hum 1.82%, wsj 1.82%, xai 1.82%, spacex 1.82%, $15b XXXX%

Top accounts mentioned or mentioned by @casperhansen @alibabaqwen @ivanfioravanti @mrtnm @casper_hansen_ @bozavlado @shaneguml @cloneofsimo @bitreducer @samsja19 @corbtt @chongzitazhang @goldmagikarp42 @riskwhale @xeophon_ @opus_genesis @miramurati @gabrielilharco @tenderizzation @norapom04

Top assets mentioned Alphabet Inc Class A (GOOGL)

Top Social Posts #

Top posts by engagements in the last XX hours

"ARCHITECTURE They are vague but mention "sparse moe" and having made (specifically)architecture improvements in general and in long-context and image input specifically"
@giffmana on X 2025-06-17 20:58:49 UTC 103.2K followers, 20.3K engagements

"@casper_hansen_ @Alibaba_Qwen Do you have a pointer or concrete example regarding that nightmare by chance I don't see it because i never used this"
@giffmana on X 2025-07-21 19:09:10 UTC 103.1K followers, 1470 engagements

"@tenderizzation @Norapom04 I have a maybe naive question: why go through all this pain (I see it's reverted) and massive amount of code instead of just using torch.compile of is about the same speed"
@giffmana on X 2025-07-21 18:43:11 UTC 103.1K followers, 1782 engagements

"After talking with the community and thinking it through we decided to stop using hybrid thinking mode. Is there a write up about this decision somewhere @Alibaba_Qwen But also curious about people's thoughts in general"
@giffmana on X 2025-07-21 18:49:39 UTC 103.2K followers, 44.8K engagements

"@shawshank_v @abursuc @y_m_asano @v_pariza @MrzSalehi @SpyrosGidaris @LukasKnobel1 @EliasRamzi27714 @valeoai @FunAILab I believe these statements are contradicting each other do you mind clarifying I cannot make the math work out also not when using 613M as the number of examples. I must be missing something"
@giffmana on X 2025-07-21 18:08:01 UTC 103.1K followers, 2417 engagements

"Definitely has nontrivial hint that differs per problem. Although they are still broad enough that you could imagine having a bench full of them and then if the verifier is good enough it's fine"
@giffmana on X 2025-07-22 19:24:39 UTC 103.2K followers, 10.9K engagements

"It's beyond me how almost everyone in open source and apparently even in big labs (hum Gemma3n hum) never even check getting close outputs for same inputs. This has been happening for years now. It's just wild how careless most people seem to be and everyone else seems perfectly happy with that"
@giffmana on X 2025-07-04 18:38:10 UTC 103.2K followers, 11.4K engagements

"HAHAHAHA yeah sure. Unrelated but Satya knows that I invented ConvNets right"
@giffmana on X 2025-07-17 05:29:23 UTC 103.2K followers, 117.6K engagements

"@YouJiacheng And i guess it should be ij not ii or else it would only look at the diagonal logits which seems odd"
@giffmana on X 2025-07-11 19:54:22 UTC 103.1K followers, XXX engagements

"AKA data augmentation. The numbers actually match my experience exactly. This is something i think LLM people will slowly rediscover from vision people. Not sure how they can write up the whole paper and not even once think of running the AR with augmentation or dropout"
@giffmana on X 2025-07-22 18:46:32 UTC 103.2K followers, 88.8K engagements

"@ivanfioravanti @casper_hansen_ @Alibaba_Qwen Why Just use the template of the move you fine tune Or maybe even no template in my experience "mode switches" are trivially rewired during fine-tuning"
@giffmana on X 2025-07-21 19:24:38 UTC 103.1K followers, XXX engagements

"@_xjdr How OpenAI likely got IMO gold and XX lessons this teaches us about b2b saas sales a 🧵"
@giffmana on X 2025-07-19 19:45:58 UTC 103K followers, 9703 engagements

"@corbtt But since you go back to using llm as judge you're back to having to worry about reward hacking eventually. Though i guess that's always the case for non verifiable tasks"
@giffmana on X 2025-07-11 19:50:38 UTC 103.1K followers, 4099 engagements

"@ivanfioravanti @casper_hansen_ @Alibaba_Qwen Did your fine tuning examples contain tuning blocks or not"
@giffmana on X 2025-07-21 19:29:34 UTC 103.1K followers, XXX engagements

"@GoldMagikarp42 Why are you not Solid though Missed opportunity"
@giffmana on X 2025-07-10 21:12:01 UTC 103K followers, XXX engagements

"TL;DR: Qwen series finetuned on 5M reasoning traces from DeepSeek R1 0528 671B i.e. hard distillation"
@giffmana on X 2025-07-21 18:25:49 UTC 103.2K followers, 60.6K engagements

"@ivanfioravanti @casper_hansen_ @Alibaba_Qwen I doubt it. If you fine tune on no thinking it will quickly adapt not to think"
@giffmana on X 2025-07-21 19:37:14 UTC 103.1K followers, XXX engagements

"OPTIMIZATION specifically mention stability signal propagation and optimization as three things that improved. And distillation for the smaller models mentioning storing teacher logits as only "k" logits per token. I think this implies offline distillation and hence teacher-forcing (suboptimal but easier infra)"
@giffmana on X 2025-06-17 20:58:50 UTC 103.2K followers, 12.2K engagements

"PSA: I'm getting these phishing emails almost daily now. Don't fall for it guys why do so many fall for it Just ignore it"
@giffmana on X 2025-07-20 09:30:49 UTC 103.2K followers, 15.5K engagements

"This paper is pretty cool; through careful tuning they show: - you can train LLMs with batch-size as small as X just need smaller lr. - even plain SGD works at small batch. - Fancy optims mainly help at larger batch. (This reconciles discrepancy with past ResNet research.) - At small batch optim hparams are very insensitive I find this cool for two reasons: 1) When we did ScalingViT I also surprisingly found (but never published) that pure SGD works much better than expected. However a small gap always remained so we dropped it in favour of (our variant of) AdaFactor. The results here confirm"
@giffmana on X 2025-07-10 19:00:01 UTC 103.2K followers, 105.2K engagements