[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]
You Jiacheng posts on X about open ai, bound, vibe, specialized the most. They currently have XXXXX followers and XXX posts still getting attention that total XXXXX engagements in the last XX hours.
Social category influence technology brands
Social topic influence open ai, bound, vibe, specialized, rl, token, damn, gamma
Top posts by engagements in the last XX hours
"@euxenus so-called "340M parameters setting" doesn't guarantee the parameters counts for new architectures are 340M. You can check their model gallery" @YouJiacheng on X 2025-07-25 15:10:56 UTC 7949 followers, XXX engagements
"@vikhyatk You can modify flash attention by a bit its online softmax should give max logits for free" @YouJiacheng on X 2025-07-20 15:35:12 UTC 7951 followers, XXX engagements
"So OpenAI has X advantages except the size of team (OpenAI team is claimed to be smaller)" @YouJiacheng on X 2025-07-21 17:12:33 UTC 7950 followers, 74.8K engagements
"@euxenus if so why there are 466.71M parameters" @YouJiacheng on X 2025-07-25 20:21:46 UTC 7950 followers, XX engagements
"damn I always have a mental model that an action of a LM should be a sequence (a turn or until a tool call) instead of a token but people keep telling my that token-level loss is better Thank Qwen team for verifying my mental model now it makes much more sense" @YouJiacheng on X 2025-07-26 10:11:35 UTC 7953 followers, 17.9K engagements
"@cloneofsimo So you don't need to tune t at all if you have a max logits target. In contrast in softcap you need to control the "softness" in addition to the bound. In QK-Norm you have to tune weight decay & lr of the gamma to indirectly control max logits" @YouJiacheng on X 2025-07-12 23:12:01 UTC 7951 followers, XXX engagements
"False they didn't control the number of parameters when comparing architectures" @YouJiacheng on X 2025-07-25 14:21:43 UTC 7941 followers, 2709 engagements
"@cloneofsimo lol just =0.5 I think its complexity is on-par or simpler than QK-Norm or tanh logits softcap" @YouJiacheng on X 2025-07-12 23:07:43 UTC 7951 followers, XXX engagements
"@cloneofsimo "make maximum logits bounded during training" as long as you know the bound you need you know t. Actually max logits will automatically become smaller and stable at XX after about XX% of training" @YouJiacheng on X 2025-07-13 09:50:13 UTC 7951 followers, XXX engagements
"@euxenus If "340M parameters setting" means 340M parameters what does "filtered out architectures with excessive complexity or parameter counts" mean You can't have both 340M parameters and excessive parameter count" @YouJiacheng on X 2025-07-25 15:12:31 UTC 7950 followers, XXX engagements
"Congrats but i still don't know why a research team needs a PM" @YouJiacheng on X 2025-07-23 01:17:37 UTC 7939 followers, 57K engagements
"correction: actually there is a clamp_max on . (equivalently rescaling only happens if max(qk) t)" @YouJiacheng on X 2025-07-11 18:26:55 UTC 7951 followers, 10.1K engagements