[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

@YouJiacheng Avatar @YouJiacheng You Jiacheng

You Jiacheng posts on X about open ai, bound, vibe, specialized the most. They currently have XXXXX followers and XXX posts still getting attention that total XXXXX engagements in the last XX hours.

Engagements: XXXXX #

Engagements Line Chart

Mentions: XX #

Mentions Line Chart

Followers: XXXXX #

Followers Line Chart

CreatorRank: XXXXXXX #

CreatorRank Line Chart

Social Influence #


Social category influence technology brands

Social topic influence open ai, bound, vibe, specialized, rl, token, damn, gamma

Top Social Posts #


Top posts by engagements in the last XX hours

"@euxenus so-called "340M parameters setting" doesn't guarantee the parameters counts for new architectures are 340M. You can check their model gallery"
@YouJiacheng Avatar @YouJiacheng on X 2025-07-25 15:10:56 UTC 7949 followers, XXX engagements

"@vikhyatk You can modify flash attention by a bit its online softmax should give max logits for free"
@YouJiacheng Avatar @YouJiacheng on X 2025-07-20 15:35:12 UTC 7951 followers, XXX engagements

"So OpenAI has X advantages except the size of team (OpenAI team is claimed to be smaller)"
@YouJiacheng Avatar @YouJiacheng on X 2025-07-21 17:12:33 UTC 7950 followers, 74.8K engagements

"@euxenus if so why there are 466.71M parameters"
@YouJiacheng Avatar @YouJiacheng on X 2025-07-25 20:21:46 UTC 7950 followers, XX engagements

"damn I always have a mental model that an action of a LM should be a sequence (a turn or until a tool call) instead of a token but people keep telling my that token-level loss is better Thank Qwen team for verifying my mental model now it makes much more sense"
@YouJiacheng Avatar @YouJiacheng on X 2025-07-26 10:11:35 UTC 7953 followers, 17.9K engagements

"@cloneofsimo So you don't need to tune t at all if you have a max logits target. In contrast in softcap you need to control the "softness" in addition to the bound. In QK-Norm you have to tune weight decay & lr of the gamma to indirectly control max logits"
@YouJiacheng Avatar @YouJiacheng on X 2025-07-12 23:12:01 UTC 7951 followers, XXX engagements

"False they didn't control the number of parameters when comparing architectures"
@YouJiacheng Avatar @YouJiacheng on X 2025-07-25 14:21:43 UTC 7941 followers, 2709 engagements

"@cloneofsimo lol just =0.5 I think its complexity is on-par or simpler than QK-Norm or tanh logits softcap"
@YouJiacheng Avatar @YouJiacheng on X 2025-07-12 23:07:43 UTC 7951 followers, XXX engagements

"@cloneofsimo "make maximum logits bounded during training" as long as you know the bound you need you know t. Actually max logits will automatically become smaller and stable at XX after about XX% of training"
@YouJiacheng Avatar @YouJiacheng on X 2025-07-13 09:50:13 UTC 7951 followers, XXX engagements

"@euxenus If "340M parameters setting" means 340M parameters what does "filtered out architectures with excessive complexity or parameter counts" mean You can't have both 340M parameters and excessive parameter count"
@YouJiacheng Avatar @YouJiacheng on X 2025-07-25 15:12:31 UTC 7950 followers, XXX engagements

"Congrats but i still don't know why a research team needs a PM"
@YouJiacheng Avatar @YouJiacheng on X 2025-07-23 01:17:37 UTC 7939 followers, 57K engagements

"correction: actually there is a clamp_max on . (equivalently rescaling only happens if max(qk) t)"
@YouJiacheng Avatar @YouJiacheng on X 2025-07-11 18:26:55 UTC 7951 followers, 10.1K engagements