Dark | Light
# ![@YouJiacheng Avatar](https://lunarcrush.com/gi/w:26/cr:twitter::3426190841.png) @YouJiacheng YouJiacheng

YouJiacheng posts on X about china, open ai, ai, gpus the most. They currently have [------] followers and [----] posts still getting attention that total [-------] engagements in the last [--] hours.

### Engagements: [-------] [#](/creator/twitter::3426190841/interactions)
![Engagements Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::3426190841/c:line/m:interactions.svg)

- [--] Week [---------] +781%
- [--] Month [---------] +464%
- [--] Months [----------] +250%
- [--] Year [----------] +378%

### Mentions: [--] [#](/creator/twitter::3426190841/posts_active)
![Mentions Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::3426190841/c:line/m:posts_active.svg)

- [--] Week [--] +106%
- [--] Month [---] +99%
- [--] Months [---] +206%
- [--] Year [---] +7.20%

### Followers: [------] [#](/creator/twitter::3426190841/followers)
![Followers Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::3426190841/c:line/m:followers.svg)

- [--] Week [------] +1.70%
- [--] Month [------] +5.50%
- [--] Months [------] +22%
- [--] Year [------] +91%

### CreatorRank: [-------] [#](/creator/twitter::3426190841/influencer_rank)
![CreatorRank Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::3426190841/c:line/m:influencer_rank.svg)

### Social Influence

**Social category influence**
[technology brands](/list/technology-brands)  [countries](/list/countries)  [stocks](/list/stocks)  [finance](/list/finance)  [currencies](/list/currencies)  [social networks](/list/social-networks)  [celebrities](/list/celebrities)  [travel destinations](/list/travel-destinations)  [automotive brands](/list/automotive-brands)  [us election](/list/us-election) 

**Social topic influence**
[china](/topic/china), [open ai](/topic/open-ai), [ai](/topic/ai), [gpus](/topic/gpus), [bytedance](/topic/bytedance), [gpu](/topic/gpu), [huawei](/topic/huawei), [money](/topic/money) #5054, [microsoft](/topic/microsoft), [agi](/topic/agi)

**Top assets mentioned**
[Microsoft Corp. (MSFT)](/topic/microsoft) [Robot Consulting Co., Ltd. (LAWR)](/topic/robot) [StarShip (STARSHIP)](/topic/starship) [Alphabet Inc Class A (GOOGL)](/topic/$googl) [Frontier (FRONT)](/topic/frontier) [NVIDIA Corp. (NVDA)](/topic/$nvda) [DeepSeek (DEEPSEEK)](/topic/deepseek) [Tesla, Inc. (TSLA)](/topic/tesla) [StarLink (STARL)](/topic/starlink) [GrokCoin (GROKCOIN)](/topic/grok) [Grin (GRIN)](/topic/grin)
### Top Social Posts
Top posts by engagements in the last [--] hours

"@JesseFarebro Loss and parameterization are separate problems. Using logits (of category distribution) to parameterize a scalar is a widely adopted method. IMO there is another regression baseline:"  
[X Link](https://x.com/YouJiacheng/status/1765804102955769951)  2024-03-07T18:17Z [----] followers, [---] engagements


"@JesseFarebro BTW I think there are at least two possible MSE+Softmax: [--]. (sum(softmax(logits) * bin_value) - y) ** [--] [--]. sum(softmax(logits) * (bin_value - y) ** 2) Clearly [--]. is not a parameterization but seems to be a maximum mean discrepancy between softmax(logits) and one-hot"  
[X Link](https://x.com/YouJiacheng/status/1765820286996185102)  2024-03-07T19:22Z [----] followers, [--] engagements


"@qinzytech @OpenAI @Meta JetMoE doesn't use stick-breaking attention as ModuleFormer right"  
[X Link](https://x.com/YouJiacheng/status/1776031154887606649)  2024-04-04T23:36Z [--] followers, [---] engagements


"@aaron_defazio It seems z_t+1=z_t- g_t will cause z_t+1 explode when t At least z does not converge like SGD without lr decay. Express x_T in terms of g_t we can get: It seems that x preserves too much stale information"  
[X Link](https://x.com/YouJiacheng/status/1776618816107118810)  2024-04-06T14:31Z [--] followers, [---] engagements


"Impressed by the performance vs. size: 104B model [----] 3235B model [----]. Disappointed by the price: R=$1.5/$0.5 RPlus=$15/$3(Azure) Qwen1.5-32B=$0.8(Together AI) per 1M token (out/in). Given Qwen1.5-72B=$0.9 I would expect 3235B=$0.50.6 100B=$23. Exciting news - the latest Arena result are out @cohere's Command R+ has climbed to the 6th spot matching GPT-4-0314 level by 13K+ human votes It's undoubtedly the **best** open model on the leaderboard now🔥 Big congrats to @cohere's incredible work & valuable contribution https://t.co/5PzpPolC9F Exciting news - the latest Arena result are out"  
[X Link](https://x.com/YouJiacheng/status/1777907867653452091)  2024-04-10T03:54Z [--] followers, [--] engagements


"@Stone_Tao Using UMI or something similar we can easily collect [--] demos in [--] worker*hour cost 5$ (in China or Africa/India/Vietnam). So 100M$=1B demos. Each demo should be viewed as 1k tokens (semantic [--] demo=1 conversation) or 30k (compute cost) that is 1T30T tokens"  
[X Link](https://x.com/YouJiacheng/status/1779726925289898204)  2024-04-15T04:22Z [--] followers, [--] engagements


"Interesting can we mitigate this issue by einsum('.i.j-.ij' x @ weight_A x @ weight_B) or the method proposed by [----------] or other nonlinearities (6/10) This problem is in fact very much related with the softmax bottleneck issue (https://t.co/gOQZbM6t9e) Basically we try to map "low" dimensional contextual representations to potentially high-dimensional contextual probability manifolds using a simple linear layer: https://t.co/HlAslGUfVM (6/10) This problem is in fact very much related with the softmax bottleneck issue (https://t.co/gOQZbM6t9e) Basically we try to map "low" dimensional"  
[X Link](https://x.com/YouJiacheng/status/1779923500217807022)  2024-04-15T17:23Z [----] followers, [--] engagements


"@aNoobonaJourney @teortaxesTex @dylan522p NVIDIA report its gross profit margin is 75.97%. You can guess the margin of H100"  
[X Link](https://x.com/YouJiacheng/status/1782384218565411110)  2024-04-22T12:21Z [--] followers, [---] engagements


"@aNoobonaJourney @Geronimo_AI Where can we get [--] usd / hour H100"  
[X Link](https://x.com/YouJiacheng/status/1782458698566336584)  2024-04-22T17:17Z [--] followers, [--] engagements


"@iamhitarth @bitlgeuse @GroqInc @OpenAI @AnthropicAI @Meta Since +21600 will 8x the capacity GroqCloud has only [----] LPUs. The cost can be estimated by: design cost (hundreds of millions $) + 14nm tape-out cost (millions $) + [----] $500. The marginal cost is very low. https://twitter.com/JonathanRoss321/status/1782921857928401091 @YouJiacheng Just the [-----]. https://twitter.com/JonathanRoss321/status/1782921857928401091 @YouJiacheng Just the 21600"  
[X Link](https://x.com/YouJiacheng/status/1782945752870006920)  2024-04-24T01:32Z [--] followers, [--] engagements


"I guess @GroqInc has spent $100M$200M for chip design. But the marginal cost is only $100$500 per chip. 70B model needs [------] chips to run costs $40000$250000 roughly equivalent to [-------] hours of p5.48xlarge (8 H100) on AWS. https://wow.groq.com/groq-closes-300-million-fundraise/ https://wow.groq.com/groq-closes-300-million-fundraise/"  
[X Link](https://x.com/YouJiacheng/status/1782965366270116111)  2024-04-24T02:50Z [--] followers, [--] engagements


"@preminstrel Let me clarify what I am talking about: JF68M drafts _1=2 tokens then 7B with partial cache needs to compute the logits of these [--] tokens in a single forward pass. So there are [--] q*k scores for each chunk. How to rank all chunks"  
[X Link](https://x.com/YouJiacheng/status/1783424508579467449)  2024-04-25T09:15Z [----] followers, [--] engagements


"@preminstrel IIUC when the model with full cache computes the logits (for verification) we can get the q*k score with nearly [--] cost (Ofc there is engineering cost). Keeping avg k cache and computing q*avg_k might cost more than avg(q*k) (before softmax) since q*k is free"  
[X Link](https://x.com/YouJiacheng/status/1783426080596267140)  2024-04-25T09:21Z [----] followers, [--] engagements


"@preminstrel 🤔And if you maintain full KV cache for layer0&1 you can reuse the feature from draft model and skip layer0&1 when computing logits for verification😂 sounds like a 6% extra speedup"  
[X Link](https://x.com/YouJiacheng/status/1783429214890492187)  2024-04-25T09:33Z [----] followers, [--] engagements


"@LChoshen @AIatMeta @FabianGloeckle @byoubii @b_roziere @dfpazr @syhw Medusa lol. Also ACT (Action Chunking Transformer) used in robotics"  
[X Link](https://x.com/YouJiacheng/status/1786065195879792834)  2024-05-02T16:08Z [--] followers, [--] engagements


"@giffmana BTW SigLIP only compared with EVA-CLIP but EVA-02-CLIP (about [--] year later) still cannot surpass SigLIP. What a STRONG result"  
[X Link](https://x.com/YouJiacheng/status/1787058556765958339)  2024-05-05T09:55Z [--] followers, [--] engagements


"Inplace-grad has been implemented in TriDao's fused softmaxCE (lm_head computation is not chunked) Chunk can be easily achieved by checkpoint/remat (not [--] cost). BUT Fused fwd bwd is MAGIC: it achieves [--] or even NEGATIVE cost inplace-grad + chunk. It only load logits ONCE @karpathy For fused cross-entropy could reduce peak memory by 1) blocking the logit computation to avoid materialising the full logits tensor and 2) overwriting the logits with their grad during training. Can save a lot of memory especially when vocab size dim https://t.co/ZucNOj3XT3 @karpathy For fused cross-entropy could"  
[X Link](https://x.com/YouJiacheng/status/1787065642639888645)  2024-05-05T10:23Z [----] followers, [--] engagements


"Teleoperation is insufficient in an ideal world. However in the real world US people can hire kenya people to teleoperate a robot in their house to do house work at a very low price and without the risk of being burglarized. agree on the last part: Teleoperation is necessary but insufficient my opinion is still that we need sim + generative AI to scale data to a sufficient scale for usedul generalization. ps check out our just released GPU acc robotics simulator https://t.co/3rCRKP1TbN agree on the last part: Teleoperation is necessary but insufficient my opinion is still that we need sim +"  
[X Link](https://x.com/YouJiacheng/status/1787388835191812369)  2024-05-06T07:48Z [--] followers, [---] engagements


"The problem may come from the weight gradient accumulation what precision do you use Fused cross-entropy loss with Llama-3 is very promising in terms of VRAM savings but the loss is ever so slightly off. Perhaps @danielhanchen can help track down where it's off Comparison below b/w unsloth (cel optimization only) standard pytorch and fused cel. https://t.co/KuxDN7FLBI Fused cross-entropy loss with Llama-3 is very promising in terms of VRAM savings but the loss is ever so slightly off. Perhaps @danielhanchen can help track down where it's off Comparison below b/w unsloth (cel optimization"  
[X Link](https://x.com/YouJiacheng/status/1788188148947984720)  2024-05-08T12:44Z [--] followers, [--] engagements


"@teortaxesTex @main_horse @_xjdr CPU is all you need. Sapphire Rappids have 7.5TOPS/core INT8 matrix compute i.e. 240TOPS for a 32-core CPU. This is much higher than its memory bandwidth so speculative decoding is viable"  
[X Link](https://x.com/YouJiacheng/status/1788354322146967612)  2024-05-08T23:44Z [--] followers, [--] engagements


"@burkov Cloud Service to compete with AWS and Azure"  
[X Link](https://x.com/YouJiacheng/status/1789233802415571117)  2024-05-11T09:59Z [--] followers, [---] engagements


"DeepSeek-V2 decoding with 4K context requires more MACs for Attention (only SPDA part projections are excluded) than for Linear (includes projections in Self-Attention Layer). 128(512+64+512)4K=544M [--] layers = 31.875G MACs 21B activated parameters = 21G MACs"  
[X Link](https://x.com/YouJiacheng/status/1789365888812208498)  2024-05-11T18:44Z [----] followers, [---] engagements


"@adcock_brett They didn't build the "scaling law" with variable controlled. Given fixed delivery deadline the faster the development the better the quality. Low quality actually comer from near delivery deadline"  
[X Link](https://x.com/YouJiacheng/status/1789370575598858451)  2024-05-11T19:02Z [--] followers, [--] engagements


"@Mankaran32 @chichengcc Good. Do you test the SLAM accuracy GoPro can achieve 3mm & 1"  
[X Link](https://x.com/YouJiacheng/status/1789616824461705541)  2024-05-12T11:21Z [--] followers, [--] engagements


"That's why I said a humanoid robot will be much cheaper than a car soon. I said it will $5000 when Elon Musk said $20000. Now it is $14000 (100000 RMB = $13822.66). Unitree Introducing Unitree G1 Humanoid Agent AI Avatar Price from $16K 🤩 Unlock unlimited sports potential(Extra large joint movement angle [----] joints) Force control of dexterous hands manipulation of all things Imitation & reinforcement learning driven #Unitree #AI https://t.co/Dv1yGaGpoJ Unitree Introducing Unitree G1 Humanoid Agent AI Avatar Price from $16K 🤩 Unlock unlimited sports potential(Extra large joint movement"  
[X Link](https://x.com/YouJiacheng/status/1789962901828120797)  2024-05-13T10:16Z [--] followers, [---] engagements


"GPT-4o is blazingly FAST. Web 70token/s API 100token/s. 35x speed of GPT-4 Turbo (47x for Chinese). 1/3 of @GroqInc (300 token/s Llama [--] 70B)"  
[X Link](https://x.com/YouJiacheng/status/1790152494565425204)  2024-05-13T22:49Z [----] followers, [---] engagements


"@bfspector wait does TK only support Hopper GPU's wgmma"  
[X Link](https://x.com/YouJiacheng/status/1790177497608561020)  2024-05-14T00:29Z [--] followers, [--] engagements


"@terryclothBuyer @teortaxesTex Do we live in a different world There isn't a decrease in FX reserve. The real problem is deflation: M2 increased 10% but CPI decrease"  
[X Link](https://x.com/YouJiacheng/status/1790491787242701144)  2024-05-14T21:18Z [--] followers, [---] engagements


"@giffmana GPT-4o can recognize a 1px defect in 1024x1024 image that is INSANE. Can you guess how they achieve this"  
[X Link](https://x.com/YouJiacheng/status/1790594226163786064)  2024-05-15T04:05Z [--] followers, [---] engagements


"So why not just allow PRC to build advanced chips in Mainland China(allow them buy advanced devices from ASML AMAT Lam Research etc.) Then the world will feel chip overcapacity so AI will accelerate. The real reason is "national security". @teortaxesTex Someone's got to get China to cool off on Taiwan. Disrupting the chip supply is too big of a threat to ignore. @teortaxesTex Someone's got to get China to cool off on Taiwan. Disrupting the chip supply is too big of a threat to ignore"  
[X Link](https://x.com/YouJiacheng/status/1791103324554682837)  2024-05-16T13:48Z [---] followers, [--] engagements


"@teortaxesTex Nope. Unitree build electric robot before BD build electric robot. BD's decades of research is mainly on hydraulic robot. The relationship is like SpaceX vs. Boeing Tesla v.s Ford"  
[X Link](https://x.com/YouJiacheng/status/1791110737966010450)  2024-05-16T14:17Z [---] followers, [---] engagements


"@teortaxesTex I actually want to say that Unitree doesn't "catch up" someone they are pioneer in industry"  
[X Link](https://x.com/YouJiacheng/status/1791114471609627048)  2024-05-16T14:32Z [---] followers, [--] engagements


"@teortaxesTex For the market yes the tech is not important but the price is. Tech-wise Unitree is a pioneer in electric robot. Market-wise Unitree is a pioneer in affordable robot (even excluding the manufacturing factors). BD didn't make new tech/design to reduce the cost"  
[X Link](https://x.com/YouJiacheng/status/1791123135548547352)  2024-05-16T15:06Z [---] followers, [--] engagements


"@yaroslavvb @CFGeek @deliprao I didn't see your tweet about Yi or NVIDIA paper"  
[X Link](https://x.com/YouJiacheng/status/1791266143279497601)  2024-05-17T00:35Z [---] followers, [--] engagements


"@b_c_p_source @0xpangolin @airkatakana Did you notice that all these objects are well separated Of course todays computer vision can get similar accuracy in much complex environments. I have to admit if the grasping can be solved by the sucker then things become a pure CV task can be done [--] week"  
[X Link](https://x.com/YouJiacheng/status/1792336672384008695)  2024-05-19T23:28Z [---] followers, [--] engagements


"@adcock_brett "one of the biggest weeks for AI and Robotics of the entire year" -- so far. Haha"  
[X Link](https://x.com/YouJiacheng/status/1792943517708488710)  2024-05-21T15:40Z [---] followers, [--] engagements


"@teortaxesTex I think the reunification has nothing to do with "the trend of the world". But there should be an end of the Chinese civil war (there even isn't a ceasefire). For Chinese it is sth like "UK intervene the American civil war so the Confederate States still exists in 1930s""  
[X Link](https://x.com/YouJiacheng/status/1793051197139800340)  2024-05-21T22:48Z [---] followers, [--] engagements


"On April [--] [----] Dongling Technologies Co.Ltd. released ES-1000 the world first and largest Single-Shaker 100ton trust Electrodynamic Vibration Test System. They have developed the 1000kN shaker in [----]. http://donglingtest.com/profile/history/162336/0/ http://donglingtest.com/profile/history/162336/0/"  
[X Link](https://x.com/YouJiacheng/status/1793481942299820064)  2024-05-23T03:19Z [---] followers, [--] engagements


"BTW U.S. still ironically forbid the exportation of 9ton thrust shaker to China. Lmao"  
[X Link](https://x.com/YouJiacheng/status/1793482618845179954)  2024-05-23T03:22Z [---] followers, [--] engagements


"@JasonHanDC @teortaxesTex Llama [--] 70B costs 6e24FLOP Chinese AI leaders have enough NVIDIA chips to do this. BTW I think that Chinese AI leaders have already achieved L3 70B performance by data/arch innovation. See Yi-Large by @01AI_Yi "  
[X Link](https://x.com/YouJiacheng/status/1794422976454254685)  2024-05-25T17:39Z [----] followers, [---] engagements


"@main_horse @teortaxesTex Agree the real challenge is the interconnect. BUT I don't think anyone currently can train a 1e26 model using [--] months. BTW China cannot make HBM. I think we can use slower but bigger DRAM then less/none TP/PP more DP to save communication"  
[X Link](https://x.com/YouJiacheng/status/1794431537699782787)  2024-05-25T18:13Z [---] followers, [--] engagements


"@teortaxesTex @main_horse 14nm is "worst case analysis". Actually HUAWEI can make 7nm or even 5nm ML chips. Problem is still the interconnect. Even HUAWEI is an advanced network device&solution provider (e.g. it hasn't provided a LLM oriented solution. https://e.huawei.com/hk/products/optical-transmission/dc908 https://e.huawei.com/hk/products/optical-transmission/dc908"  
[X Link](https://x.com/YouJiacheng/status/1794444095445983261)  2024-05-25T19:03Z [---] followers, [--] engagements


"@teortaxesTex @main_horse If interconnect is the main bottleneck even 20% yield is quite okay for 5nm. Rumors said HUAWEI had achieved 50% yield for 7nm"  
[X Link](https://x.com/YouJiacheng/status/1794446197698281725)  2024-05-25T19:11Z [---] followers, [--] engagements


"@SamVanivray @MattPirkowski @bitcloud https://twitter.com/UnitreeRobotics/status/1720386810197471257 Unitree Released B2 Beyond the Limit Hyperevolution😍Maximum speed of 6m/s sustained load of 40kg and sustained walking endurance of 5h. The comprehensive performance is two to three times that of existing quadruped robots worldwide https://t.co/lcAIe0lyrb https://t.co/vNDLSm9qA2 https://twitter.com/UnitreeRobotics/status/1720386810197471257 Unitree Released B2 Beyond the Limit Hyperevolution😍Maximum speed of 6m/s sustained load of 40kg and sustained walking endurance of 5h. The comprehensive"  
[X Link](https://x.com/YouJiacheng/status/1794602335135830498)  2024-05-26T05:31Z [---] followers, [---] engagements


"I have noticed the length bias in RLHF objective since March: For both PPO and DPO families reward is O(1) but KL is O(L). That is partially because we all set discount factor =1 in RLHF"  
[X Link](https://x.com/YouJiacheng/status/1794617041116446852)  2024-05-26T06:30Z [---] followers, [---] engagements


"So LN is not an approximation of ref they can co-exist. Experiments are needed to determine whether LN+ref is better. Also comparison to ref+margin is useful. IPO introduced margin but it uses MSE instead of logsigmoid"  
[X Link](https://x.com/YouJiacheng/status/1794617447355658257)  2024-05-26T06:31Z [---] followers, [---] engagements


"@giffmana Windows Terminal sir"  
[X Link](https://x.com/YouJiacheng/status/1794924493825364133)  2024-05-27T02:52Z [---] followers, [--] engagements


"WTF everything happened in [--] minutes"  
[X Link](https://x.com/YouJiacheng/status/1796073668113514739)  2024-05-30T06:58Z [---] followers, [--] engagements


"@jon_barron I simply type mu (with IME enabled) then the Microsoft IME will give me "  
[X Link](https://x.com/YouJiacheng/status/1796677926794227916)  2024-05-31T22:59Z [---] followers, [---] engagements


"@xhluca @ericjang11 @clonerobotics Why can't hold a needle/scalpel Data collection stuff is a separate issue I only talked about capabilities here"  
[X Link](https://x.com/YouJiacheng/status/1797403910161977663)  2024-06-02T23:04Z [----] followers, [--] engagements


"@TimothyDuignan @schnabloS @Dr_Gingerballs My understanding: coarse grained classical MD: nucleation is not the ground state coarse grained NNP MD: this work coarse grained DFT-MD: takes more time for each step full atom classical MD: more time each step & slower rate can't observe nucleation in a reasonable time"  
[X Link](https://x.com/YouJiacheng/status/1798896922536898608)  2024-06-07T01:57Z [---] followers, [--] engagements


"@teortaxesTex I formalized the problem still no LLM can solve it robustly"  
[X Link](https://x.com/YouJiacheng/status/1799836593039290436)  2024-06-09T16:10Z [---] followers, [----] engagements


"@teortaxesTex @elonmusk Very good point buy Lenovo (& its Motorola)"  
[X Link](https://x.com/YouJiacheng/status/1800334669340348423)  2024-06-11T01:10Z [---] followers, [---] engagements


"@adcock_brett 1e-9 flight hours before a catastrophic event should be 1e9 flight hours IIUC"  
[X Link](https://x.com/YouJiacheng/status/1802628224226415075)  2024-06-17T09:03Z [---] followers, [---] engagements


"@adcock_brett It seems that autos cause [--] death per 1e8 miles. For 100mph eVTOL it is equivalent to 1e6 hours MTBF"  
[X Link](https://x.com/YouJiacheng/status/1802630367473701300)  2024-06-17T09:12Z [---] followers, [---] engagements


"@main_horse Yeah they will do some batch size ramp up earlier but the main compute stage won't start early"  
[X Link](https://x.com/YouJiacheng/status/1802869293119569958)  2024-06-18T01:01Z [---] followers, [--] engagements


"I guess OpenAI has already achieved this internally just like LCM. Maybe there will be one sentence in their future tech report: "we apply diffusion loss instead of standard CE loss for image tokens in the multimodality next token prediction / autoregressive modeling" Autoregressive Image Generation without Vector Quantization Achieves competitive performance without vector quantization by using diffusion loss function https://t.co/LUNnJFHZNf https://t.co/dtVzvneRY8 Autoregressive Image Generation without Vector Quantization Achieves competitive performance without vector quantization by"  
[X Link](https://x.com/YouJiacheng/status/1802915062384120307)  2024-06-18T04:03Z [---] followers, [---] engagements


"lol I just noticed that Continuous Next Token Prediction without Vector Quantization has been achieved in Octo. https://arxiv.org/abs/2406.11838 https://arxiv.org/abs/2406.11838"  
[X Link](https://x.com/YouJiacheng/status/1802989238440312954)  2024-06-18T08:58Z [---] followers, [---] engagements


"@MoreBirths @CharlieTTEcon long-sighted doesn't make sense given the rapid advancement in AI and robotics"  
[X Link](https://x.com/YouJiacheng/status/1803811738128646418)  2024-06-20T15:26Z [---] followers, [--] engagements


"WTF is "inappropriately touching""  
[X Link](https://x.com/YouJiacheng/status/1804134056516555117)  2024-06-21T12:47Z [---] followers, [--] engagements


"lmao 🐳Coder-V2-Instruct gained only only [--] point on MMLU but [---] points on MMLU-Pro over 🐳V2-Chat. So it got quite a bit smarter. Curiously the website version is also less into Core Values Of Socialism. Coincidence I don't think so https://t.co/DEhyoRodwJ 🐳Coder-V2-Instruct gained only only [--] point on MMLU but [---] points on MMLU-Pro over 🐳V2-Chat. So it got quite a bit smarter. Curiously the website version is also less into Core Values Of Socialism. Coincidence I don't think so https://t.co/DEhyoRodwJ"  
[X Link](https://x.com/YouJiacheng/status/1804763253995864397)  2024-06-23T06:27Z [---] followers, [---] engagements


"Both Jensen Huang and Elon Musk are making technology inexpensive. Great entrepreneurs"  
[X Link](https://x.com/YouJiacheng/status/1804901826564657238)  2024-06-23T15:38Z [---] followers, [---] engagements


"@ChongZitaZhang Aren't cats and dogs more agile than human"  
[X Link](https://x.com/YouJiacheng/status/1804941982260998496)  2024-06-23T18:17Z [---] followers, [--] engagements


"@kadecgos @xlr8harder EUV light source from US (cymer purchased by ASML). Optics from Germany"  
[X Link](https://x.com/YouJiacheng/status/1805516018376130893)  2024-06-25T08:18Z [---] followers, [---] engagements


"@teortaxesTex AFAIK DeepSeek is not supported by the gov they only have very limited compute resources😭. I think HUAWEI should invest DeepSeek by providing compute resources (HUAWEI is also a cloud service provider) just like what Microsoft do for OpenAI"  
[X Link](https://x.com/YouJiacheng/status/1805609401791005184)  2024-06-25T14:30Z [---] followers, [---] engagements


"@RadarHits starship can destory the drone factory with ease lol😂"  
[X Link](https://x.com/YouJiacheng/status/1805610514367791561)  2024-06-25T14:34Z [---] followers, [--] engagements


"Zeng Yuqun's Response: "To strive for a hundred days is to call on everyone to master the fundamentals without forcing anyone to do so." () Lmao"  
[X Link](https://x.com/YouJiacheng/status/1806186849280688462)  2024-06-27T04:44Z [---] followers, [--] engagements


"Anyone knows logit soft-capping in Gemma-2"  
[X Link](https://x.com/YouJiacheng/status/1806365999115161807)  2024-06-27T16:36Z [----] followers, [---] engagements


"Maybe the key (beside strong Pre-training) is a very large reward model in RLHF: "We use a similar RLHF algorithm as Gemma v1.1 (Gemma Team 2024) but a different reward model which is an order of magnitude larger than the policy." 🤯🤯🤯27B only slightly larger than the active parameter of DeepSeek-V2 achieve Llama [--] 70B Elo. 🤯🤯🤯27B only slightly larger than the active parameter of DeepSeek-V2 achieve Llama [--] 70B Elo"  
[X Link](https://x.com/YouJiacheng/status/1806374853332897895)  2024-06-27T17:11Z [---] followers, [---] engagements


"@danielhanchen I am curious about why Gemma [--] & [--] use normed * (1 + w.float()) and init w=0 instead of normed * w.float() and init w=1. Cuz it is slightly more accurate"  
[X Link](https://x.com/YouJiacheng/status/1806386707505528893)  2024-06-27T17:58Z [----] followers, [----] engagements


"😅gemma-2-27b-i failed on my basic math test both prompting in English and in Chinese. DeepSeek-V2 and DeepSeek-Coder-V2 can solve it 100% in Chinese but 0% in Enligsh. L3-70B & GPT-4T can solve it 100%. Old GPT-4 can solve it 60% in English 80% in Chinese"  
[X Link](https://x.com/YouJiacheng/status/1806564629977710758)  2024-06-28T05:45Z [---] followers, [---] engagements


"China domestic EUV is it possible in [----] How can they achieve enough wafer per hour (light power) & overlay accuracy Resolution is not a problem Even 0.2NA can achieve 20nm resolution much better than the best DUV. @DrFrederickChen The rumored transistor density of Kirin [----] is [---] MTx/mm2 If all things go well then china's domestic EUV machine will get shipped to SMIC this year and the huawei 3nm chip should be commercially available between [----] H2 and [----] H1 @DrFrederickChen The rumored transistor density of Kirin [----] is [---] MTx/mm2 If all things go well then china's domestic EUV"  
[X Link](https://x.com/YouJiacheng/status/1807121188092883316)  2024-06-29T18:37Z [---] followers, [---] engagements


"@EffortDefines In China it seems that Apple use AI developed by Baidu"  
[X Link](https://x.com/YouJiacheng/status/1807185243788751141)  2024-06-29T22:51Z [---] followers, [--] engagements


"My list for 2030s: [--]. Transformer [--]. Starship [--]. Starlink especially DTC [--]. I know all of these. But: The top right one is the only of these that will have a measurable societal impact by [----]. I know all of these. But: The top right one is the only of these that will have a measurable societal impact by 2030"  
[X Link](https://x.com/YouJiacheng/status/1807192949245276486)  2024-06-29T23:22Z [---] followers, [---] engagements


"@OPEN_THE_PORTAL no police won't shoot them (sorry if offensive). and US gov won't provide job and education(free) opportunity for 95% the blacks but it is true for the Uygurs. because well educated ppl with job won't join terrorism organizations"  
[X Link](https://x.com/YouJiacheng/status/1807953094283874717)  2024-07-02T01:43Z [---] followers, [--] engagements


"@CyrusSMing @michaelxpettis @YouTube Agree with other parts but India is more aggressive than China just look at Sikkim Bhutan Nepal and Bangladesh"  
[X Link](https://x.com/YouJiacheng/status/1808298132104794593)  2024-07-03T00:34Z [---] followers, [--] engagements


"@angelusm0rt1s @teortaxesTex It is a department of HUAWEI Enterprise founded at least [--] years ago. It was responsible for sales and support of Ascend. But it seems that there is R&D of Ascend now"  
[X Link](https://x.com/YouJiacheng/status/1808874581861806170)  2024-07-04T14:44Z [---] followers, [--] engagements


"@angelusm0rt1s @teortaxesTex An interesting phenomenon is that [----] Labs and HUAWEI Computing are competing for talents. 🧐🧐"  
[X Link](https://x.com/YouJiacheng/status/1808878859762479541)  2024-07-04T15:01Z [---] followers, [--] engagements


"There is a huge gap between OpenAI/Anthropic Models and others. "What are the typical areas of 1T1C DRAM and 6T SRAM in terms of F2" Gemini [---] Pro is not even wrong"  
[X Link](https://x.com/YouJiacheng/status/1809170056712176056)  2024-07-05T10:18Z [---] followers, [---] engagements


"China shoud impose higher export tariffs and subsidize domestic consumer to increase domestic Marshallian surplus and mitigate world worries about China overcapacity"  
[X Link](https://x.com/YouJiacheng/status/1809393597822956011)  2024-07-06T01:07Z [---] followers, [---] engagements


"lmao poor sequoia. Honestly love that Sequoia has been the worst VC at AI. They've missed every hot AI startup even if it is a bubble they should be in there like the other bubbles they embrace and made money on. All it takes is listening to the leaked FTX call to see they're clowns. Honestly love that Sequoia has been the worst VC at AI. They've missed every hot AI startup even if it is a bubble they should be in there like the other bubbles they embrace and made money on. All it takes is listening to the leaked FTX call to see they're clowns"  
[X Link](https://x.com/YouJiacheng/status/1809446406551400615)  2024-07-06T04:36Z [---] followers, [---] engagements


"@gazorp5 @dylan522p Yep Sequoia invested NVIDIA OpenAI Hugging Face Replicate"  
[X Link](https://x.com/YouJiacheng/status/1809458826686656567)  2024-07-06T05:26Z [---] followers, [---] engagements


"@Robotbeat the dumbest thing is to militarily land or blockade Taiwan. even bomb TSMC is better than landing or blockade🤐. unless China has her own starship & starlink"  
[X Link](https://x.com/YouJiacheng/status/1810448361801310478)  2024-07-08T22:58Z [---] followers, [--] engagements


"I immediately realized (guess) that TTT is delta rule when I saw the figure of TTT - even before I knew it uses L2 loss. Online gradient descent version of TTT-linear is a variant of DeltaNet and could be parallelized efficiently: https://t.co/yrINFRVfZ8 Online gradient descent version of TTT-linear is a variant of DeltaNet and could be parallelized efficiently: https://t.co/yrINFRVfZ8"  
[X Link](https://x.com/YouJiacheng/status/1810815367725601195)  2024-07-09T23:16Z [---] followers, [---] engagements


"good arts by Kolors (kuaishou)"  
[X Link](https://x.com/YouJiacheng/status/1811725711314702654)  2024-07-12T11:34Z [----] followers, [---] engagements


"He also said OpenAI/Anthropic (the most advanced team outside China) has 2x model-architecture&training-dynamics efficiency and 2x data efficiency (so overall 4x compute efficiency) comparing to the most advanced Chinese team. That is a HUGE gap.🥵💪 Deepseek founder Liang Wenfeng: We will not go closed-source. We believe that having a strong technical ecosystem first is more important. https://t.co/d6qhzdF4G5 Deepseek founder Liang Wenfeng: We will not go closed-source. We believe that having a strong technical ecosystem first is more important. https://t.co/d6qhzdF4G5"  
[X Link](https://x.com/YouJiacheng/status/1813662489663664540)  2024-07-17T19:50Z [---] followers, [----] engagements


"@BruDCDO @teortaxesTex AFAIK there is no subsidy. but it's true they have a much lower margin comparing to OpenAI. However technology (MLA + fine-grained MoE) is the key. but I guess OpenAI have similar if not more advanced tech"  
[X Link](https://x.com/YouJiacheng/status/1814049736569372772)  2024-07-18T21:28Z [---] followers, [--] engagements


"@jonasgeiping so basically you fuse the matmul into the kernel on the top of malek's method That's great. I also noticed that you use some local lse so tiled softmax (like flash attention) is used and it is possible to only keep fp32 logits on SRAM"  
[X Link](https://x.com/YouJiacheng/status/1814373469058150857)  2024-07-19T18:55Z [----] followers, [--] engagements


"@Noahpinion Poland is a great country. They achieve both high efficiency and high equity"  
[X Link](https://x.com/YouJiacheng/status/1814706623409234011)  2024-07-20T16:59Z [---] followers, 10.1K engagements


"@_philschmid @Alibaba_Qwen @OpenAI @AnthropicAI There is a typo (do you use a LLM to generate this table) AMC [----] Qwen2 maj@64 should be 21/40 not 12/40"  
[X Link](https://x.com/YouJiacheng/status/1815053627112993043)  2024-07-21T15:58Z [---] followers, [---] engagements


"@evil_malloc @_philschmid @Alibaba_Qwen @OpenAI @AnthropicAI From NuminaMath tech report"  
[X Link](https://x.com/YouJiacheng/status/1815101269897511415)  2024-07-21T19:07Z [---] followers, [---] engagements


"@decentralizedX1 @Kanthan2030 NVIDIA founded in [----] SpaceX founded in [----]. At that time China was poor and under-educated. Plus U.S. gov forbid Chinese chip design companies from manufacturing advanced chip with TSMC"  
[X Link](https://x.com/YouJiacheng/status/1815304258566463809)  2024-07-22T08:33Z [---] followers, [--] engagements


"@angelusm0rt1s what is SPAC"  
[X Link](https://x.com/YouJiacheng/status/1815417333005042036)  2024-07-22T16:03Z [---] followers, [--] engagements


"CPU is well-suited for DeepSeek-V2. I wonder why Intel & AMD haven't taken any move. @qtnx_ Amazing how we went from 236B too much nobody has that kind of hardware to 405B fits right in you just gotta have courage literally overnight https://t.co/Q8Jh41BZCW @qtnx_ Amazing how we went from 236B too much nobody has that kind of hardware to 405B fits right in you just gotta have courage literally overnight https://t.co/Q8Jh41BZCW"  
[X Link](https://x.com/YouJiacheng/status/1815428859158007957)  2024-07-22T16:49Z [---] followers, [---] engagements


"@angelusm0rt1s they provide the only 90% sparse MoE with frontier performance. CPU has no other choice (well both Intel & AMD have GPU). Intel even paid for advertisement of "Llama-2-7B 100TPS""  
[X Link](https://x.com/YouJiacheng/status/1815436569593213160)  2024-07-22T17:19Z [---] followers, [--] engagements


"@RealJosephus @aidan_mclau 4o mini can output up to 200token/s. If it were MoE I would estimate it is a 40B-Active-5B or something similar"  
[X Link](https://x.com/YouJiacheng/status/1815452641444720948)  2024-07-22T18:23Z [---] followers, [--] engagements


"@RealJosephus @aidan_mclau The fastest non-groq provider of Mistral 8x7B (Active 13B) run up to 250TPS @ $0.5 per 1M tokens. How can OpenAI serve a larger model with similar TPOT @ $0.15/$0.6 per 1M in/out tokens and they need more margin to cover training cost"  
[X Link](https://x.com/YouJiacheng/status/1815461762181066773)  2024-07-22T18:59Z [---] followers, [--] engagements


"@RealJosephus @aidan_mclau I haven't trained LLMs on my own. But I think it is more unlikely that OpenAI have some inference magics than they have some training magics"  
[X Link](https://x.com/YouJiacheng/status/1815462078838436331)  2024-07-22T19:01Z [---] followers, [--] engagements


"@terryyuezhuo Do OpenAI and Anthropic report MultiPL-E version Is there a leaderboard for MultiPL-E version"  
[X Link](https://x.com/YouJiacheng/status/1815683785284825410)  2024-07-23T09:42Z [---] followers, [---] engagements


"@OpenAI protected disclosures. protected by what big brother sam"  
[X Link](https://x.com/YouJiacheng/status/1815731396713083382)  2024-07-23T12:51Z [---] followers, [---] engagements


"@PauseusMaximus @mathepi Actually Llama make many China AI startups invest more resources on application instead of training. They said "we don't need to train a model just fine-tune llama" and "UX intelligence". However training technique IS the KEY to win "the AGI war". UX is NOT"  
[X Link](https://x.com/YouJiacheng/status/1815900230006759617)  2024-07-24T00:02Z [---] followers, [---] engagements


"@soumithchintala How to spot silent data corruption on GPU"  
[X Link](https://x.com/YouJiacheng/status/1815910609080443085)  2024-07-24T00:43Z [---] followers, [---] engagements


"MSFT has USD 19.634B cash. @AlibabaGroup has CNY 286.424B cash. (USD 39.9B) But @Alibaba_Qwen can't buy a single A100 H100 or B200 GPU into China.🫠 BTW Qwen2 72B GPT-4o and Claude [---] sonnet on AMC [----] and AIME 2024"  
[X Link](https://x.com/YouJiacheng/status/1815997103946227935)  2024-07-24T06:27Z [---] followers, [---] engagements


"@gazorp5 @AlibabaGroup @Alibaba_Qwen AFAIK the only effective way is to adhere the ban i.e. build AI datacenter outside China. Possibly in Malaysia and Singapore"  
[X Link](https://x.com/YouJiacheng/status/1816000242304901356)  2024-07-24T06:39Z [---] followers, [--] engagements


"@lmsysorg LMSYS releases 20% but OpenAI has 100% battles when their models are involved"  
[X Link](https://x.com/YouJiacheng/status/1816129529318191576)  2024-07-24T15:13Z [---] followers, [---] engagements


"@Ji_Ha_Kim DeepSeek can't solve the logarithm version even with explicit hint. It made computation mistake consistently. (tested in Chinese)"  
[X Link](https://x.com/YouJiacheng/status/1816224083941761359)  2024-07-24T21:29Z [---] followers, [---] engagements


"ta-da I bow down to Kuaishou. 🎉 The moment we've all been waiting for is HERE 🎊 Introducing the official global launch of Kling AI's International Version1.0🌍 📧ANY email address gets you inno mobile number required 👉 Direct linkhttps://t.co/68WvKSDuBg 🔥 Daily login grants [--] free Credits for https://t.co/TgFZIwInPg 🎉 The moment we've all been waiting for is HERE 🎊 Introducing the official global launch of Kling AI's International Version1.0🌍 📧ANY email address gets you inno mobile number required 👉 Direct linkhttps://t.co/68WvKSDuBg 🔥 Daily login grants [--] free Credits for"  
[X Link](https://x.com/YouJiacheng/status/1816273801182863819)  2024-07-25T00:46Z [---] followers, [---] engagements


"Actually more depend on exports now. 2024H1 vs. 2023H1 new real estate sales -1.6T CNY but GDP +2.4T CNY (+4.0%). If real estate sales doesn't change and assume the correlation [---] will be GDP +4.0T (+6.7%) subtract -1.0% deflator = +7.7%. consumption only +3.7%. @robin_j_brooks Chinese economy no longer as dependent on exports. Try it again see what happens @robin_j_brooks Chinese economy no longer as dependent on exports. Try it again see what happens"  
[X Link](https://x.com/YouJiacheng/status/1816291060529913899)  2024-07-25T01:55Z [---] followers, [---] engagements


"@jiayq Just checked 8B on OpenRouter. Wow [---] token/s @ $0.07 is impressive. Much better than previous [---] token/s on Llama [--] and Mistral 7B. Is that speed temporary (cuz low load)"  
[X Link](https://x.com/YouJiacheng/status/1816313403104997589)  2024-07-25T03:23Z [---] followers, [---] engagements


"@basedjensen I think it is a typo should be "formalization". The formalization of training problems is done by finetuned gemini because accuracy/faithfulness/fidelity is not important here. The formalization of contest problems is done by human"  
[X Link](https://x.com/YouJiacheng/status/1816704042502095263)  2024-07-26T05:16Z [---] followers, [---] engagements


"@teortaxesTex probably caused by the increased USD interest and the devaluation of Euro. plus the recession of balance sheet. China encountered similar issues so that Xi officially raise a question: why new unicorn companies in China become fewer"  
[X Link](https://x.com/YouJiacheng/status/1817375851291542011)  2024-07-28T01:45Z [---] followers, [--] engagements


"kinda confirm they use base64 for data augmentation @teortaxesTex Lmao if you just provide it the B64 without any additional consideration it respond accurately in Base64. https://t.co/tASfaYRxPT @teortaxesTex Lmao if you just provide it the B64 without any additional consideration it respond accurately in Base64. https://t.co/tASfaYRxPT"  
[X Link](https://x.com/YouJiacheng/status/1817376935590470138)  2024-07-28T01:50Z [---] followers, [---] engagements


"@RyanEls4 NO you should store base64-encoded protobuf and/or parquet🤓. don't tell me that we can store binary/bytes in RDBMS I JUST WANT TO STORE A HUMAN READABLE STRING. so why not just use @MongoDB 🤓"  
[X Link](https://x.com/YouJiacheng/status/1817398245104255242)  2024-07-28T03:14Z [---] followers, [--] engagements


"Now GPT-4o (& mini) and Claude-3.5-sonnet can solve Wason selection task (mini and 3.5-sonnet are not reliable) but DeepSeek can't. Note: it is not about reasoning capability but about post-training data. Original GPT-4 can't solve it either"  
[X Link](https://x.com/YouJiacheng/status/1818845522683371726)  2024-08-01T03:05Z [----] followers, [---] engagements


"BTW Wason selection task is one of the problems mentioned by "GPT-4 Can't Reason (arxiv:2308.03762)". So it is likely that OpenAI add specific data after that"  
[X Link](https://x.com/YouJiacheng/status/1818886298566115406)  2024-08-01T05:47Z [---] followers, [--] engagements


"@teortaxesTex Exactly my point. If OpenAI increases the cost-effectiveness of GPT-4o mini at the cost of decreasing diversity"  
[X Link](https://x.com/YouJiacheng/status/1818943204252860896)  2024-08-01T09:33Z [---] followers, [--] engagements


"I don't have enough Poe compute points to ask GPT-4o multiple times. It got 2/2"  
[X Link](https://x.com/YouJiacheng/status/1819351802510442705)  2024-08-02T12:37Z [---] followers, [---] engagements


"Reasoning can help GPT-4o mini (chatgpt . com) to get 4/5 but 3.5-sonnet still failed (0/2). GPT-4o mini (poe) also failed (0/2) Failure cases:"  
[X Link](https://x.com/YouJiacheng/status/1819358049599164586)  2024-08-02T13:02Z [---] followers, [--] engagements


"lmao @YouJiacheng That's like you put your lowest TOEFL of history on your graduate school application. @YouJiacheng That's like you put your lowest TOEFL of history on your graduate school application"  
[X Link](https://x.com/YouJiacheng/status/1819665696693137593)  2024-08-03T09:24Z [---] followers, [---] engagements


"@HrishbhDalal And if they are using tools the result shouldn't be slightly off (i.e. should be exact)"  
[X Link](https://x.com/YouJiacheng/status/1819696114825863214)  2024-08-03T11:25Z [---] followers, [--] engagements


"@eatsmokedmeat @abacaj TPU. Avoid NVIDIA tax"  
[X Link](https://x.com/YouJiacheng/status/1820025107563348433)  2024-08-04T09:12Z [---] followers, [--] engagements


"China hands U.S. first ever loss in men's 4x100m medley relay https://www.espn.com/olympics/story/_/id/40724995/china-hands-us-first-ever-loss-men-4x100m-medley-relay https://www.espn.com/olympics/story/_/id/40724995/china-hands-us-first-ever-loss-men-4x100m-medley-relay"  
[X Link](https://x.com/YouJiacheng/status/1820288460592398816)  2024-08-05T02:39Z [---] followers, [---] engagements


"@angelusm0rt1s China is losing labor age population at a rate of 10M per year"  
[X Link](https://x.com/YouJiacheng/status/1820442208522359012)  2024-08-05T12:50Z [---] followers, [---] engagements


"@angelusm0rt1s China uses 224B land for 800B kg vegetables per year. Plant factory cost 7kWh/kg. (future target=4kW/kg) PVs can produce 100kWh/ per year (PV modules themselves can be 250kWh/). More importantly PVs can be deployed in "unusable" land"  
[X Link](https://x.com/YouJiacheng/status/1820459438282854772)  2024-08-05T13:58Z [---] followers, [--] engagements


"lmao @zhengyiluo Just add a limbo pole in post processing "emergent behaviors" https://t.co/J8PywI5B07 @zhengyiluo Just add a limbo pole in post processing "emergent behaviors" https://t.co/J8PywI5B07"  
[X Link](https://x.com/YouJiacheng/status/1820698275026178431)  2024-08-06T05:47Z [---] followers, [---] engagements


"Especially when FTC prevent acquisition😨 @mvpatel2000 same with shazeer returning to google. faang is the inevitable sink for all talent flows. @mvpatel2000 same with shazeer returning to google. faang is the inevitable sink for all talent flows"  
[X Link](https://x.com/YouJiacheng/status/1820721943588524137)  2024-08-06T07:21Z [---] followers, [---] engagements


"@teortaxesTex The only sensible movement for the U.S. is to put trillions of dollars into SpaceX and militarize Starship (incl. SuperHeavy). Starship can bring U.S. permanent nuclear & information advantage and no one can revolt"  
[X Link](https://x.com/YouJiacheng/status/1820759742178091071)  2024-08-06T09:52Z [---] followers, [----] engagements


"@coder543 @LouisKnightWebb so your result implies that mini uses larger concurrency"  
[X Link](https://x.com/YouJiacheng/status/1820821415622578240)  2024-08-06T13:57Z [---] followers, [--] engagements


"@teortaxesTex Raptor engine is the key. It's very hard to design and manufacture Raptor [--] & Raptor [--]. Powered by Raptor [--] an expendable Starship can be more cost effective than a 20x reused Falcon 9"  
[X Link](https://x.com/YouJiacheng/status/1821081652371898659)  2024-08-07T07:11Z [---] followers, [---] engagements


"@teortaxesTex [--]. ASML shipped the first prototype EUV to TSMC in Feb [----]. But SpaceX flew the first Starship (SN8) in Dec [----]. [----] years vs. [---] years from now. [--]. SpaceX's number of employees is about 1/3 of ASML"  
[X Link](https://x.com/YouJiacheng/status/1821110209995288857)  2024-08-07T09:04Z [---] followers, [---] engagements


"@terafoenix @teortaxesTex many suppliers e.g. cymer are acquired by ASML. Ofc there are important suppliers like ZEISS and TRUMPF but they are big companies with many groups and only several groups are involved in EUV"  
[X Link](https://x.com/YouJiacheng/status/1821164850879590608)  2024-08-07T12:41Z [---] followers, [--] engagements


"@teortaxesTex @terafoenix You overrate the power of cooperation. Do you know "the mythical man-month" A multi-company ecosystem might need [--] or more time to run a R&D cycle comparing to a monolithic company"  
[X Link](https://x.com/YouJiacheng/status/1821176229338419235)  2024-08-07T13:27Z [---] followers, [---] engagements


"@angelusm0rt1s @teortaxesTex @terafoenix ASML build wafer stage reticle stage EUV light source and many parts of optics (especially illumination system) in-house. It's possible that a company at 1.5-2 scale of ASML can build EUVL in-house only relying on standard parts from suppliers"  
[X Link](https://x.com/YouJiacheng/status/1821190824161694123)  2024-08-07T14:25Z [---] followers, [--] engagements


"@angelusm0rt1s China's RE market has entered garbage time -25% CAGR"  
[X Link](https://x.com/YouJiacheng/status/1821257748140662934)  2024-08-07T18:50Z [---] followers, [--] engagements


"@BRussellsimp Google's moat is TPU. [--] cheaper than NVIDIA GPU"  
[X Link](https://x.com/YouJiacheng/status/1821628122317648171)  2024-08-08T19:22Z [----] followers, [--] engagements


"@teortaxesTex Rumors: 9.9"  
[X Link](https://x.com/YouJiacheng/status/1821629705789092255)  2024-08-08T19:29Z [----] followers, [---] engagements


"Similar phenomena are observed in robotics imitation learning. lower the noise lower the loss. Just look at the incredible difference between generated data and real data. (when both are known to be of high quality) This might be the repetitiveness of the generated data or the higher noise of real data. ( or. something else. ) https://t.co/1cU9p74bq2 Just look at the incredible difference between generated data and real data. (when both are known to be of high quality) This might be the repetitiveness of the generated data or the higher noise of real data. ( or. something else. )"  
[X Link](https://x.com/YouJiacheng/status/1821674434987749773)  2024-08-08T22:26Z [----] followers, [---] engagements


"@bdsqlsz and [--] cards per node WTF"  
[X Link](https://x.com/YouJiacheng/status/1821820594205421748)  2024-08-09T08:07Z [----] followers, [----] engagements


"Any government or big company with super AI is likely to build an authoritarian system. Think about what Google and Amazon do in "Project Nimbus". If government/big company can monitor thoughts "hate crime" might include "hate thought". Gmail creator Paul Buchheit says if China wins the race to build super AI we could end up as zoo animals in permanent lockdown where escape is impossible and even our own thoughts are censored https://t.co/bIRcQAcHrj Gmail creator Paul Buchheit says if China wins the race to build super AI we could end up as zoo animals in permanent lockdown where escape is"  
[X Link](https://x.com/YouJiacheng/status/1822197950816133376)  2024-08-10T09:07Z [----] followers, [---] engagements


"@angelusm0rt1s @teortaxesTex SOEs actually didn't need explicit subsidies. Bank loans and equity funds are quite available for well-operated SOEs"  
[X Link](https://x.com/YouJiacheng/status/1822651088190992436)  2024-08-11T15:07Z [----] followers, [--] engagements


"@vasud3vshyam What's generating function here I only know moment-generating function: M_v(t)=Eexp(t*v)=P_a * exp(t*v_a) do you mean "generating functional of one-point correlation function" (in QFT)"  
[X Link](https://x.com/YouJiacheng/status/1823113984679281043)  2024-08-12T21:47Z [----] followers, [--] engagements


"egirl example: Tencent even has b2b saas egirls platform https://t.co/rXOcSBrxwb Tencent even has b2b saas egirls platform https://t.co/rXOcSBrxwb"  
[X Link](https://x.com/YouJiacheng/status/1823118158124945444)  2024-08-12T22:03Z [----] followers, [---] engagements


"Ask them determine the number of vertices first. Funny how this still doesn't work Last time I checked Gemini could solve it occasionally https://t.co/ZOIfdWDjBE Funny how this still doesn't work Last time I checked Gemini could solve it occasionally https://t.co/ZOIfdWDjBE"  
[X Link](https://x.com/YouJiacheng/status/1823328155169636754)  2024-08-13T11:58Z [----] followers, [----] engagements


"@angelusm0rt1s All models I tested said Ilya"  
[X Link](https://x.com/YouJiacheng/status/1823349139369075103)  2024-08-13T13:21Z [----] followers, [--] engagements


"@GraceLiu78 Your work looks intriguing but I don't know what is contrastive RL. Is it an unsupervised RL method Can you elaborate it in easy to understand words"  
[X Link](https://x.com/YouJiacheng/status/1823670400380993928)  2024-08-14T10:38Z [----] followers, [---] engagements


"Apple Robotics. https://www.bnnbloomberg.ca/business/technology/2024/08/14/apple-pushes-ahead-with-tabletop-home-device-in-shift-to-robotics/ https://www.bnnbloomberg.ca/business/technology/2024/08/14/apple-pushes-ahead-with-tabletop-home-device-in-shift-to-robotics/"  
[X Link](https://x.com/YouJiacheng/status/1823812174965555665)  2024-08-14T20:01Z [----] followers, [---] engagements


"The only small (34B) open-weight model that can solve this problem: InternLM2.5-20B-Chat AND it solve it with [--] methods (induction & telescoping sum). Ofc its solutions are sometimes imperfect or even wrong (e.g. in figure [--] the explanation of telescoping sum is flawed)"  
[X Link](https://x.com/YouJiacheng/status/1823821105922236690)  2024-08-14T20:36Z [----] followers, [----] engagements


"Another problem. The good-old GPT-4 (not Turbo or omni) pass only 1/5"  
[X Link](https://x.com/YouJiacheng/status/1823822420157079853)  2024-08-14T20:42Z [----] followers, [---] engagements


"Based on my simple test @LeptonAI is about 5x speed of official API hosted by @deepseek_ai and 1.5-2x speed of @SiliconFlowAI . It is about 100-135 token/s. I can't do more precise test because there is network latency and I can't top up. Great work @jiayq Seems like one Western provider has put up DeepSeek at a reasonable price at last. Seems like one Western provider has put up DeepSeek at a reasonable price at last"  
[X Link](https://x.com/YouJiacheng/status/1824027337895366715)  2024-08-15T10:16Z [----] followers, [----] engagements


"$489 lightweight (75g) AR glasses from @RokidGlobal looks good. ($489 includes the glasses and a mobile compute device "Rokid Station 2") source: (the below clip begins at 1:05 2x speed) https://www.bilibili.com/video/BV1GZvQeUEci https://www.bilibili.com/video/BV1GZvQeUEci"  
[X Link](https://x.com/YouJiacheng/status/1824059575244570753)  2024-08-15T12:24Z [----] followers, [---] engagements


"@hyhieu226 @xai 😂why all @xai guys I saw on X are Asian (except Elon)"  
[X Link](https://x.com/YouJiacheng/status/1824316230871945409)  2024-08-16T05:24Z [----] followers, [---] engagements


"@gazorp5 @hyhieu226 @xai @ibab okay I didn't see him/her🤣"  
[X Link](https://x.com/YouJiacheng/status/1824317345105580525)  2024-08-16T05:28Z [----] followers, [---] engagements


"Happy Independence from Britain Day is the best Happy Independence Day to all Indians https://t.co/2rMgVsnTCH Happy Independence Day to all Indians https://t.co/2rMgVsnTCH"  
[X Link](https://x.com/YouJiacheng/status/1824320996289155123)  2024-08-16T05:43Z [----] followers, [---] engagements


"@hyhieu226 @ibab @gazorp5 @xai is this private info https://github.com/ibab https://github.com/ibab"  
[X Link](https://x.com/YouJiacheng/status/1824329134455263373)  2024-08-16T06:15Z [----] followers, [--] engagements


"@entropicEm They just have problem with scientific notation calculation. For the intuition part a huge dyson sphere with 20K temperature in our solar system will be quite detectable"  
[X Link](https://x.com/YouJiacheng/status/1824935625960866171)  2024-08-17T22:25Z [----] followers, [--] engagements


"@teortaxesTex It's really hard even for GPT-4o and Grok-2. A lot of SFT data should be added for calculation in scientific notation. I noticed that DeepSeek might indefinitely repeat in the calculation which hints insufficient training"  
[X Link](https://x.com/YouJiacheng/status/1824954401464926505)  2024-08-17T23:40Z [----] followers, [---] engagements


"@PandaAshwinee RMSProp + apply in backward cost 2+2 bytes per param if you maintain the denominator term in 16bit (maybe 32bit is required). CPU offloaded optimizer cost [--] bytes (if gradient accumulated on CPU) or 2+2 bytes (grad acc on GPU). The latter is equivalent but with grad acc enabled"  
[X Link](https://x.com/YouJiacheng/status/1825593272297017764)  2024-08-19T17:58Z [----] followers, [--] engagements


"@PandaAshwinee [--] GPU don't need FSDP (or you just leverage its offload impl) okay maybe the impl doesn't support "offload in backward". but with large enough grad acc steps offload+grad acc should not affect the speed significantly because only update will incur GPU-GPU traffic"  
[X Link](https://x.com/YouJiacheng/status/1825601089204592838)  2024-08-19T18:29Z [----] followers, [--] engagements


"Meanwhile HUAWEI uses GenAI to help both amateurs and professional artists enjoy the art. In the video below the draft is generated by AI given a photo. It can also generate reference result of the draft which is useful for entry-level amateurs. https://www.bilibili.com/video/BV1gS42197hf Procreate admits to serve sentimental rather than productive needs and I find it remarkable. Current imagegen can absolutely make a difference in a professional workflow. But artists signal paying for art therapy good feels not for tools giving them a competitive edge."  
[X Link](https://x.com/YouJiacheng/status/1825643753656422750)  2024-08-19T21:19Z [----] followers, [----] engagements


"At the first glance I found this system has [--] main disadvantages compared to UMI: [--]. High BOM cost Robotiq gripper is not cheap [--]. High weight [--]. Cable required limited mobility [--]. Limited FOV of Realsense I spent some time at TU Munich to build a hand-held data collection system for learning of manipulation tasks. https://t.co/dMzThB1FCw I spent some time at TU Munich to build a hand-held data collection system for learning of manipulation tasks. https://t.co/dMzThB1FCw"  
[X Link](https://x.com/YouJiacheng/status/1825966991917854957)  2024-08-20T18:43Z [----] followers, [---] engagements


"exciting breakthrough. but there is only one author on the paper😂 "our" maybe you can add your cat/dog or something😂. 🚀 Can #MPC rival and even surpass #ReinforcementLearning in solving #DexterousManipulation Our answer is a resounding YES PROUD to share: 🔥Complementarity-Free Multi-Contact Modeling and Optimization our latest method that sets shattering benchmarks in various challenging https://t.co/dUuAjNceuV 🚀 Can #MPC rival and even surpass #ReinforcementLearning in solving #DexterousManipulation Our answer is a resounding YES PROUD to share: 🔥Complementarity-Free Multi-Contact"  
[X Link](https://x.com/YouJiacheng/status/1826012161338126560)  2024-08-20T21:43Z [----] followers, [---] engagements


"@Stone_Tao exactly the source of my comment😂"  
[X Link](https://x.com/YouJiacheng/status/1826103923855569315)  2024-08-21T03:47Z [----] followers, [--] engagements


"Totally true. Cells are sold at $0.04/W. 100GW=$4B 500GW=$20B. It is literally nothing comparing to Amazon or Apple. Actually installation labor cost is more than cells. Partly why China dominates stuff like rare Earths or solar cells is cuz the global revenue for those is tiny. All rare earths mined globally are only $6B/yr. Theres 500GW/yr of solar production capacity only $7-50B in cell revenue. Amazon has $50B in revenue per *month*. Partly why China dominates stuff like rare Earths or solar cells is cuz the global revenue for those is tiny. All rare earths mined globally are only $6B/yr."  
[X Link](https://x.com/YouJiacheng/status/1826131837376409625)  2024-08-21T05:38Z [----] followers, [---] engagements


"@angelusm0rt1s In China the installation labor cost is not that high. I estimated the direct labor for module installation costs only $0.02/W. All building & installation cost is about $0.08/W"  
[X Link](https://x.com/YouJiacheng/status/1826138970654388737)  2024-08-21T06:07Z [----] followers, [--] engagements


"@angelusm0rt1s But in developed countries like U.S. module installation labor might cost more than $0.1/W"  
[X Link](https://x.com/YouJiacheng/status/1826139524251975864)  2024-08-21T06:09Z [----] followers, [--] engagements


"@angelusm0rt1s Really I heard they got [--] cents per watt subsidy. Chinese modules are sold at only [--] cents per watt"  
[X Link](https://x.com/YouJiacheng/status/1826144285126955261)  2024-08-21T06:28Z [----] followers, [--] engagements


"@teortaxesTex Harmonic drive is harder to control than small reduce ratio planetary + strong motor. That's the main risk. If they can't make it work they will lose a huge market"  
[X Link](https://x.com/YouJiacheng/status/1826159892857573745)  2024-08-21T07:30Z [----] followers, [--] engagements


"I suggest implement tax on unrealized gains by call options. JUST IN: Kamala Harris backs President Biden's 44.6% capital gains tax proposal the highest in history. The proposal also includes a 25% tax on unrealized gains for high-net-worth individuals. https://t.co/ERw22lgRqq JUST IN: Kamala Harris backs President Biden's 44.6% capital gains tax proposal the highest in history. The proposal also includes a 25% tax on unrealized gains for high-net-worth individuals. https://t.co/ERw22lgRqq"  
[X Link](https://x.com/YouJiacheng/status/1826173147487551795)  2024-08-21T08:23Z [----] followers, [---] engagements


"Why not just implement a capital/property tax by zero price call options or convertible bonds JUST IN: Kamala Harris backs President Biden's 44.6% capital gains tax proposal the highest in history. The proposal also includes a 25% tax on unrealized gains for high-net-worth individuals. https://t.co/ERw22lgRqq JUST IN: Kamala Harris backs President Biden's 44.6% capital gains tax proposal the highest in history. The proposal also includes a 25% tax on unrealized gains for high-net-worth individuals. https://t.co/ERw22lgRqq"  
[X Link](https://x.com/YouJiacheng/status/1826173579395760356)  2024-08-21T08:24Z [----] followers, [---] engagements


"tbh this must be in the training data cuz it's a very common text book exercise. And GPT-4o mini solve it with ease (correct [--] in 3). Surprisingly Claude-3.5-Sonnet can't solve it. Ah oh Anthropic need to do more text book exercise. Qwen2-Math-7B is definitely the best model for math. It solved a 6-step calculus problem involving a combination of trigonometric functions and required some technique to solve. Really happy with the explanation and the step-by-step solution. Qwen shipped real 🍓 @Alibaba_Qwen https://t.co/gq8PtxVFYu Qwen2-Math-7B is definitely the best model for math. It solved a"  
[X Link](https://x.com/YouJiacheng/status/1826188028672454714)  2024-08-21T09:22Z [----] followers, [---] engagements


"@teortaxesTex It's hard for HUAWEI to catch up NVIDIA at chip tray or even rack level. But it might be easier to win at cluster or even campus level"  
[X Link](https://x.com/YouJiacheng/status/1826551770887442843)  2024-08-22T09:27Z [----] followers, [---] engagements


"@_lukaemon @teortaxesTex I knew NVIDIA acquired Mellanox. But HUAWEI is one of the world's largest and most advanced network solution vendor. HUAWEI is one of the world's most advanced power and cooling solution vendor. HUAWEI is a large (but not very successful) cloud provider. Vertical Integration"  
[X Link](https://x.com/YouJiacheng/status/1826557747158282298)  2024-08-22T09:51Z [----] followers, [---] engagements


"Looks good. FWIW: Large=398B-A94B Mini=52B-A12B License is strange "now or in the *future* generate more than $50M in annual revenue regardless whether that revenue is generated from Jamba Materials or Derivatives " how can we predict the future Chatbot Arena update: the latest Jamba [---] Large/Mini from @ai21labs is now live on the leaderboard Open weights. Novel SSM-Transformer architecture with long context window. Congrats @ai21labs on the strong open model release https://t.co/AAiL2xX9DI Chatbot Arena update: the latest Jamba [---] Large/Mini from @ai21labs is now live on the leaderboard"  
[X Link](https://x.com/YouJiacheng/status/1826750353955848417)  2024-08-22T22:36Z [----] followers, [---] engagements


"@kywch500 Simulation is run on CPU What CPU is used I know Pufferlib has remarkably low overhead but running a sweep with dozens of experiments in 500s is crazy"  
[X Link](https://x.com/YouJiacheng/status/1826915059773366445)  2024-08-23T09:31Z [----] followers, [----] engagements


"Grok-2-Mini is the killer of 4o-mini Chatbot Arena update❤🔥 Exciting news@xAI's Grok-2 and Grok-mini are now officially on the leaderboard With over [----] community votes Grok-2 has claimed the #2 spot surpassing GPT-4o (May) and tying with the latest Gemini Grok-2-mini also impresses at #5. Grok-2 excels in https://t.co/5lyQgratJQ Chatbot Arena update❤🔥 Exciting news@xAI's Grok-2 and Grok-mini are now officially on the leaderboard With over [----] community votes Grok-2 has claimed the #2 spot surpassing GPT-4o (May) and tying with the latest Gemini Grok-2-mini also impresses at #5. Grok-2"  
[X Link](https://x.com/YouJiacheng/status/1827069473993732490)  2024-08-23T19:44Z [----] followers, [---] engagements


"Wait Grok-2 requires multi-host inference is it that big It's so fast that I can't believe it's too big to fit in a single node Grok [--] mini is now 2x faster than it was yesterday. In the last three days @lm_zheng and @MalekiSaeed rewrote our inference stack from scratch using SGLang (https://t.co/M1M8BlXosH). This has also allowed us to serve the big Grok [--] model which requires multi-host inference at a https://t.co/G9iXTV8o0z Grok [--] mini is now 2x faster than it was yesterday. In the last three days @lm_zheng and @MalekiSaeed rewrote our inference stack from scratch using SGLang"  
[X Link](https://x.com/YouJiacheng/status/1827070268088709386)  2024-08-23T19:47Z [----] followers, [---] engagements


"@hsu_byron @karpathy @tri_dao @woosuk_k @SonglinYang4 @danielhanchen mgmalek is @mikegmalek https://x.com/mikegmalek/status/1786503367193461032 @karpathy For fused cross-entropy could reduce peak memory by 1) blocking the logit computation to avoid materialising the full logits tensor and 2) overwriting the logits with their grad during training. Can save a lot of memory especially when vocab size dim https://t.co/ZucNOj3XT3 https://x.com/mikegmalek/status/1786503367193461032 @karpathy For fused cross-entropy could reduce peak memory by 1) blocking the logit computation to avoid materialising"  
[X Link](https://x.com/YouJiacheng/status/1827116443835756776)  2024-08-23T22:51Z [----] followers, [---] engagements


"Actually if China gov is willing to increase the military expense the economy can be improved at least for a while. The central government debt is quite low compared to US. China appears to be in serious economic straits now. Yet it has not changed course at all and continues its massive military buildup having constructed a [-----] ton ship in [--] months. It's also worth noting China fought us to a standstill in [----] while impoverished. China appears to be in serious economic straits now. Yet it has not changed course at all and continues its massive military buildup having constructed a 40000"  
[X Link](https://x.com/YouJiacheng/status/1827251476181438566)  2024-08-24T07:47Z [----] followers, [---] engagements


"@angelusm0rt1s I know. But the fact is that these NPLs (moved to asset management companies in 2000-2005) were moved to (in 2009-2012) and are still on the account of gov (incl. PBOC) and state owned banks in the form of gov debt. The growth in the past [--] years absorbs these debt"  
[X Link](https://x.com/YouJiacheng/status/1827257093243728146)  2024-08-24T08:10Z [----] followers, [--] engagements


"@angelusm0rt1s RE stuffs are actually tiny the real crisis is the growth. I don't think we are spending $800B in military (incl. military R&D) per year. For example the EM launcher on the CNS FuJian costs only $30M in R&D stage"  
[X Link](https://x.com/YouJiacheng/status/1827261079896125489)  2024-08-24T08:26Z [----] followers, [--] engagements


"@angelusm0rt1s RE contraction will come to an end within a few years. In 2024H1 RE (newly constructed) sales -25% YoY the reduction is about 2.5% of GDP"  
[X Link](https://x.com/YouJiacheng/status/1827264753510842450)  2024-08-24T08:40Z [----] followers, [--] engagements


"@angelusm0rt1s China's capital gain tax is too low. Nearly zero. Plus the ubiquitous and immense indirect tax create an environment that the less you earn the higher effective tax rate you experience (regressive tax)"  
[X Link](https://x.com/YouJiacheng/status/1827265675825643947)  2024-08-24T08:44Z [----] followers, [--] engagements


"Here comes @xai and Elon. Anthropic is waiting for GPT-5 🔄 OpenAI is waiting for Opus [---] Its so over. It's a never-ending loop. We're trapped in an endless cycle of waiting. We are cooked Anthropic is waiting for GPT-5 🔄 OpenAI is waiting for Opus [---] Its so over. It's a never-ending loop. We're trapped in an endless cycle of waiting. We are cooked"  
[X Link](https://x.com/YouJiacheng/status/1827452616386998707)  2024-08-24T21:07Z [----] followers, [---] engagements


"Russia would be the opponent of China if there were no US. During 1840-1949 Russia snatched about [-------] square kilometers of land from China (Qing and ROC) which is more than [--] of the land area of Ukraine. @douglasritz @robin_j_brooks the russians would simply invade Ukraine and arrange a massacre like in Buch in every Ukrainian city. do you have a problem with your head @douglasritz @robin_j_brooks the russians would simply invade Ukraine and arrange a massacre like in Buch in every Ukrainian city. do you have a problem with your head"  
[X Link](https://x.com/YouJiacheng/status/1827464199389507645)  2024-08-24T21:53Z [----] followers, [---] engagements


"And 3-year is too long. NVIDIA rollout new hardware each year with [--] capability but only [---] price"  
[X Link](https://x.com/YouJiacheng/status/1827485648800149777)  2024-08-24T23:18Z [----] followers, [---] engagements


"@cccntu IIUC it does materialize logits but in a chunked manner"  
[X Link](https://x.com/YouJiacheng/status/1827696826948161611)  2024-08-25T13:17Z [----] followers, [---] engagements


"One important advantage of xAI: you will probably share your ChatGPT account with friends but not X/twitter account"  
[X Link](https://x.com/YouJiacheng/status/1827703008962248982)  2024-08-25T13:42Z [----] followers, [---] engagements


"@jskf__ Would xAI provide such an option They might provide API but not a separate subscription"  
[X Link](https://x.com/YouJiacheng/status/1827737949649432945)  2024-08-25T16:00Z [----] followers, [--] engagements


"Making mothers be with high status is important and might be more important than economic subsidies. Elevating the Status of Motherhood Solves Low Birthrates: The Extraordinary Case of Mongolia For [--] years Mongolian leaders have given the Order of Maternal Glory to mothers. This raised the status of motherhood and helped forge a remarkably pronatal culture. 🧵 please share https://t.co/4o8hZo84lM Elevating the Status of Motherhood Solves Low Birthrates: The Extraordinary Case of Mongolia For [--] years Mongolian leaders have given the Order of Maternal Glory to mothers. This raised the status"  
[X Link](https://x.com/YouJiacheng/status/1827744420122477043)  2024-08-25T16:26Z [----] followers, [---] engagements


"It's only me that just knew that OpenAI allow special customers like Cursor access the prompt tokens logprobs of their frontier models https://www.cursor.com/blog/instant-apply https://www.cursor.com/blog/instant-apply"  
[X Link](https://x.com/YouJiacheng/status/1827755737772367930)  2024-08-25T17:11Z [----] followers, 19.2K engagements


"@teortaxesTex @MA1984251984 @angelusm0rt1s @EIFY @alicemazzy Actually the advantage is not that correlated with reusable rocket. Even expendable SuperHeavy+Starship are superpower"  
[X Link](https://x.com/YouJiacheng/status/1828108307959357906)  2024-08-26T16:32Z [----] followers, [--] engagements


"@angelusm0rt1s @teortaxesTex @MA1984251984 @EIFY @alicemazzy UBTECH is pure hype"  
[X Link](https://x.com/YouJiacheng/status/1828118566815834327)  2024-08-26T17:13Z [----] followers, [--] engagements


"@angelusm0rt1s @teortaxesTex @MA1984251984 @EIFY @alicemazzy Oh no xAI will be nuked if so"  
[X Link](https://x.com/YouJiacheng/status/1828125766925631688)  2024-08-26T17:42Z [----] followers, [--] engagements


"@teortaxesTex That's why I suggest preparing 10-100 million tons of rocket before try"  
[X Link](https://x.com/YouJiacheng/status/1828136065498321098)  2024-08-26T18:22Z [----] followers, [---] engagements


"@angelusm0rt1s @teortaxesTex BTW "current production capacity" is a useless metric for China. Upstream material production capacity makes more sense"  
[X Link](https://x.com/YouJiacheng/status/1828145720857370813)  2024-08-26T19:01Z [----] followers, [--] engagements


"@angelusm0rt1s @teortaxesTex I actually mean CPC (aka CCP). CPC only had its army after they are massacred crazy by KMT. In [----] Mao said: "China has but one option today: to seek harmony for in harmony lies strength. Any other course of action would be a mistake.""  
[X Link](https://x.com/YouJiacheng/status/1828148251654033683)  2024-08-26T19:11Z [----] followers, [--] engagements


"Good results. But I afraid it won't be a thing for GPU poor. We need TP&PP to distribute param. In contrast Microsoft and Google can use multiple DCs to train one model more efficiently with this. Ofc they have their in-house low-comm DP algo. What if you could use all the computing power in the world to train a shared open source AI model Preliminary report: https://t.co/b1XgJylsnV Nous Research is proud to release a preliminary report on DisTrO (Distributed Training Over-the-Internet) a family of https://t.co/h2gQJ4m7lB What if you could use all the computing power in the world to train a"  
[X Link](https://x.com/YouJiacheng/status/1828154118029901858)  2024-08-26T19:34Z [----] followers, [----] engagements


"@chankhavu You have a node but still a GPU poor Okay assume you have a [-----] node it has only 192GB memory. You can at most train a 150B model with FP8/INT8 + CPU offloaded optimizer states. Remember the model size is bottlenecked by the node with the least memory in the network"  
[X Link](https://x.com/YouJiacheng/status/1828192902885130361)  2024-08-26T22:08Z [----] followers, [---] engagements


"@Dmitry31593946 @rohanpaul_ai INT8 is available and widely adopted in A100 era. But hardware is an important factor. In Jul [----] H100 was $2.5/h. In Nov [----] A100 was $2/h (not sure). But compute & bandwidth are about 2-3x"  
[X Link](https://x.com/YouJiacheng/status/1828422184391541167)  2024-08-27T13:19Z [----] followers, [--] engagements

Limited data mode. Full metrics available with subscription: lunarcrush.com/pricing

@YouJiacheng Avatar @YouJiacheng YouJiacheng

YouJiacheng posts on X about china, open ai, ai, gpus the most. They currently have [------] followers and [----] posts still getting attention that total [-------] engagements in the last [--] hours.

Engagements: [-------] #

Engagements Line Chart

  • [--] Week [---------] +781%
  • [--] Month [---------] +464%
  • [--] Months [----------] +250%
  • [--] Year [----------] +378%

Mentions: [--] #

Mentions Line Chart

  • [--] Week [--] +106%
  • [--] Month [---] +99%
  • [--] Months [---] +206%
  • [--] Year [---] +7.20%

Followers: [------] #

Followers Line Chart

  • [--] Week [------] +1.70%
  • [--] Month [------] +5.50%
  • [--] Months [------] +22%
  • [--] Year [------] +91%

CreatorRank: [-------] #

CreatorRank Line Chart

Social Influence

Social category influence technology brands countries stocks finance currencies social networks celebrities travel destinations automotive brands us election

Social topic influence china, open ai, ai, gpus, bytedance, gpu, huawei, money #5054, microsoft, agi

Top assets mentioned Microsoft Corp. (MSFT) Robot Consulting Co., Ltd. (LAWR) StarShip (STARSHIP) Alphabet Inc Class A (GOOGL) Frontier (FRONT) NVIDIA Corp. (NVDA) DeepSeek (DEEPSEEK) Tesla, Inc. (TSLA) StarLink (STARL) GrokCoin (GROKCOIN) Grin (GRIN)

Top Social Posts

Top posts by engagements in the last [--] hours

"@JesseFarebro Loss and parameterization are separate problems. Using logits (of category distribution) to parameterize a scalar is a widely adopted method. IMO there is another regression baseline:"
X Link 2024-03-07T18:17Z [----] followers, [---] engagements

"@JesseFarebro BTW I think there are at least two possible MSE+Softmax: [--]. (sum(softmax(logits) * bin_value) - y) ** [--] [--]. sum(softmax(logits) * (bin_value - y) ** 2) Clearly [--]. is not a parameterization but seems to be a maximum mean discrepancy between softmax(logits) and one-hot"
X Link 2024-03-07T19:22Z [----] followers, [--] engagements

"@qinzytech @OpenAI @Meta JetMoE doesn't use stick-breaking attention as ModuleFormer right"
X Link 2024-04-04T23:36Z [--] followers, [---] engagements

"@aaron_defazio It seems z_t+1=z_t- g_t will cause z_t+1 explode when t At least z does not converge like SGD without lr decay. Express x_T in terms of g_t we can get: It seems that x preserves too much stale information"
X Link 2024-04-06T14:31Z [--] followers, [---] engagements

"Impressed by the performance vs. size: 104B model [----] 3235B model [----]. Disappointed by the price: R=$1.5/$0.5 RPlus=$15/$3(Azure) Qwen1.5-32B=$0.8(Together AI) per 1M token (out/in). Given Qwen1.5-72B=$0.9 I would expect 3235B=$0.50.6 100B=$23. Exciting news - the latest Arena result are out @cohere's Command R+ has climbed to the 6th spot matching GPT-4-0314 level by 13K+ human votes It's undoubtedly the best open model on the leaderboard now🔥 Big congrats to @cohere's incredible work & valuable contribution https://t.co/5PzpPolC9F Exciting news - the latest Arena result are out"
X Link 2024-04-10T03:54Z [--] followers, [--] engagements

"@Stone_Tao Using UMI or something similar we can easily collect [--] demos in [--] worker*hour cost 5$ (in China or Africa/India/Vietnam). So 100M$=1B demos. Each demo should be viewed as 1k tokens (semantic [--] demo=1 conversation) or 30k (compute cost) that is 1T30T tokens"
X Link 2024-04-15T04:22Z [--] followers, [--] engagements

"Interesting can we mitigate this issue by einsum('.i.j-.ij' x @ weight_A x @ weight_B) or the method proposed by [----------] or other nonlinearities (6/10) This problem is in fact very much related with the softmax bottleneck issue (https://t.co/gOQZbM6t9e) Basically we try to map "low" dimensional contextual representations to potentially high-dimensional contextual probability manifolds using a simple linear layer: https://t.co/HlAslGUfVM (6/10) This problem is in fact very much related with the softmax bottleneck issue (https://t.co/gOQZbM6t9e) Basically we try to map "low" dimensional"
X Link 2024-04-15T17:23Z [----] followers, [--] engagements

"@aNoobonaJourney @teortaxesTex @dylan522p NVIDIA report its gross profit margin is 75.97%. You can guess the margin of H100"
X Link 2024-04-22T12:21Z [--] followers, [---] engagements

"@aNoobonaJourney @Geronimo_AI Where can we get [--] usd / hour H100"
X Link 2024-04-22T17:17Z [--] followers, [--] engagements

"@iamhitarth @bitlgeuse @GroqInc @OpenAI @AnthropicAI @Meta Since +21600 will 8x the capacity GroqCloud has only [----] LPUs. The cost can be estimated by: design cost (hundreds of millions $) + 14nm tape-out cost (millions $) + [----] $500. The marginal cost is very low. https://twitter.com/JonathanRoss321/status/1782921857928401091 @YouJiacheng Just the [-----]. https://twitter.com/JonathanRoss321/status/1782921857928401091 @YouJiacheng Just the 21600"
X Link 2024-04-24T01:32Z [--] followers, [--] engagements

"I guess @GroqInc has spent $100M$200M for chip design. But the marginal cost is only $100$500 per chip. 70B model needs [------] chips to run costs $40000$250000 roughly equivalent to [-------] hours of p5.48xlarge (8 H100) on AWS. https://wow.groq.com/groq-closes-300-million-fundraise/ https://wow.groq.com/groq-closes-300-million-fundraise/"
X Link 2024-04-24T02:50Z [--] followers, [--] engagements

"@preminstrel Let me clarify what I am talking about: JF68M drafts _1=2 tokens then 7B with partial cache needs to compute the logits of these [--] tokens in a single forward pass. So there are [--] q*k scores for each chunk. How to rank all chunks"
X Link 2024-04-25T09:15Z [----] followers, [--] engagements

"@preminstrel IIUC when the model with full cache computes the logits (for verification) we can get the qk score with nearly [--] cost (Ofc there is engineering cost). Keeping avg k cache and computing qavg_k might cost more than avg(qk) (before softmax) since qk is free"
X Link 2024-04-25T09:21Z [----] followers, [--] engagements

"@preminstrel 🤔And if you maintain full KV cache for layer0&1 you can reuse the feature from draft model and skip layer0&1 when computing logits for verification😂 sounds like a 6% extra speedup"
X Link 2024-04-25T09:33Z [----] followers, [--] engagements

"@LChoshen @AIatMeta @FabianGloeckle @byoubii @b_roziere @dfpazr @syhw Medusa lol. Also ACT (Action Chunking Transformer) used in robotics"
X Link 2024-05-02T16:08Z [--] followers, [--] engagements

"@giffmana BTW SigLIP only compared with EVA-CLIP but EVA-02-CLIP (about [--] year later) still cannot surpass SigLIP. What a STRONG result"
X Link 2024-05-05T09:55Z [--] followers, [--] engagements

"Inplace-grad has been implemented in TriDao's fused softmaxCE (lm_head computation is not chunked) Chunk can be easily achieved by checkpoint/remat (not [--] cost). BUT Fused fwd bwd is MAGIC: it achieves [--] or even NEGATIVE cost inplace-grad + chunk. It only load logits ONCE @karpathy For fused cross-entropy could reduce peak memory by 1) blocking the logit computation to avoid materialising the full logits tensor and 2) overwriting the logits with their grad during training. Can save a lot of memory especially when vocab size dim https://t.co/ZucNOj3XT3 @karpathy For fused cross-entropy could"
X Link 2024-05-05T10:23Z [----] followers, [--] engagements

"Teleoperation is insufficient in an ideal world. However in the real world US people can hire kenya people to teleoperate a robot in their house to do house work at a very low price and without the risk of being burglarized. agree on the last part: Teleoperation is necessary but insufficient my opinion is still that we need sim + generative AI to scale data to a sufficient scale for usedul generalization. ps check out our just released GPU acc robotics simulator https://t.co/3rCRKP1TbN agree on the last part: Teleoperation is necessary but insufficient my opinion is still that we need sim +"
X Link 2024-05-06T07:48Z [--] followers, [---] engagements

"The problem may come from the weight gradient accumulation what precision do you use Fused cross-entropy loss with Llama-3 is very promising in terms of VRAM savings but the loss is ever so slightly off. Perhaps @danielhanchen can help track down where it's off Comparison below b/w unsloth (cel optimization only) standard pytorch and fused cel. https://t.co/KuxDN7FLBI Fused cross-entropy loss with Llama-3 is very promising in terms of VRAM savings but the loss is ever so slightly off. Perhaps @danielhanchen can help track down where it's off Comparison below b/w unsloth (cel optimization"
X Link 2024-05-08T12:44Z [--] followers, [--] engagements

"@teortaxesTex @main_horse @_xjdr CPU is all you need. Sapphire Rappids have 7.5TOPS/core INT8 matrix compute i.e. 240TOPS for a 32-core CPU. This is much higher than its memory bandwidth so speculative decoding is viable"
X Link 2024-05-08T23:44Z [--] followers, [--] engagements

"@burkov Cloud Service to compete with AWS and Azure"
X Link 2024-05-11T09:59Z [--] followers, [---] engagements

"DeepSeek-V2 decoding with 4K context requires more MACs for Attention (only SPDA part projections are excluded) than for Linear (includes projections in Self-Attention Layer). 128(512+64+512)4K=544M [--] layers = 31.875G MACs 21B activated parameters = 21G MACs"
X Link 2024-05-11T18:44Z [----] followers, [---] engagements

"@adcock_brett They didn't build the "scaling law" with variable controlled. Given fixed delivery deadline the faster the development the better the quality. Low quality actually comer from near delivery deadline"
X Link 2024-05-11T19:02Z [--] followers, [--] engagements

"@Mankaran32 @chichengcc Good. Do you test the SLAM accuracy GoPro can achieve 3mm & 1"
X Link 2024-05-12T11:21Z [--] followers, [--] engagements

"That's why I said a humanoid robot will be much cheaper than a car soon. I said it will $5000 when Elon Musk said $20000. Now it is $14000 (100000 RMB = $13822.66). Unitree Introducing Unitree G1 Humanoid Agent AI Avatar Price from $16K 🤩 Unlock unlimited sports potential(Extra large joint movement angle [----] joints) Force control of dexterous hands manipulation of all things Imitation & reinforcement learning driven #Unitree #AI https://t.co/Dv1yGaGpoJ Unitree Introducing Unitree G1 Humanoid Agent AI Avatar Price from $16K 🤩 Unlock unlimited sports potential(Extra large joint movement"
X Link 2024-05-13T10:16Z [--] followers, [---] engagements

"GPT-4o is blazingly FAST. Web 70token/s API 100token/s. 35x speed of GPT-4 Turbo (47x for Chinese). 1/3 of @GroqInc (300 token/s Llama [--] 70B)"
X Link 2024-05-13T22:49Z [----] followers, [---] engagements

"@bfspector wait does TK only support Hopper GPU's wgmma"
X Link 2024-05-14T00:29Z [--] followers, [--] engagements

"@terryclothBuyer @teortaxesTex Do we live in a different world There isn't a decrease in FX reserve. The real problem is deflation: M2 increased 10% but CPI decrease"
X Link 2024-05-14T21:18Z [--] followers, [---] engagements

"@giffmana GPT-4o can recognize a 1px defect in 1024x1024 image that is INSANE. Can you guess how they achieve this"
X Link 2024-05-15T04:05Z [--] followers, [---] engagements

"So why not just allow PRC to build advanced chips in Mainland China(allow them buy advanced devices from ASML AMAT Lam Research etc.) Then the world will feel chip overcapacity so AI will accelerate. The real reason is "national security". @teortaxesTex Someone's got to get China to cool off on Taiwan. Disrupting the chip supply is too big of a threat to ignore. @teortaxesTex Someone's got to get China to cool off on Taiwan. Disrupting the chip supply is too big of a threat to ignore"
X Link 2024-05-16T13:48Z [---] followers, [--] engagements

"@teortaxesTex Nope. Unitree build electric robot before BD build electric robot. BD's decades of research is mainly on hydraulic robot. The relationship is like SpaceX vs. Boeing Tesla v.s Ford"
X Link 2024-05-16T14:17Z [---] followers, [---] engagements

"@teortaxesTex I actually want to say that Unitree doesn't "catch up" someone they are pioneer in industry"
X Link 2024-05-16T14:32Z [---] followers, [--] engagements

"@teortaxesTex For the market yes the tech is not important but the price is. Tech-wise Unitree is a pioneer in electric robot. Market-wise Unitree is a pioneer in affordable robot (even excluding the manufacturing factors). BD didn't make new tech/design to reduce the cost"
X Link 2024-05-16T15:06Z [---] followers, [--] engagements

"@yaroslavvb @CFGeek @deliprao I didn't see your tweet about Yi or NVIDIA paper"
X Link 2024-05-17T00:35Z [---] followers, [--] engagements

"@b_c_p_source @0xpangolin @airkatakana Did you notice that all these objects are well separated Of course todays computer vision can get similar accuracy in much complex environments. I have to admit if the grasping can be solved by the sucker then things become a pure CV task can be done [--] week"
X Link 2024-05-19T23:28Z [---] followers, [--] engagements

"@adcock_brett "one of the biggest weeks for AI and Robotics of the entire year" -- so far. Haha"
X Link 2024-05-21T15:40Z [---] followers, [--] engagements

"@teortaxesTex I think the reunification has nothing to do with "the trend of the world". But there should be an end of the Chinese civil war (there even isn't a ceasefire). For Chinese it is sth like "UK intervene the American civil war so the Confederate States still exists in 1930s""
X Link 2024-05-21T22:48Z [---] followers, [--] engagements

"On April [--] [----] Dongling Technologies Co.Ltd. released ES-1000 the world first and largest Single-Shaker 100ton trust Electrodynamic Vibration Test System. They have developed the 1000kN shaker in [----]. http://donglingtest.com/profile/history/162336/0/ http://donglingtest.com/profile/history/162336/0/"
X Link 2024-05-23T03:19Z [---] followers, [--] engagements

"BTW U.S. still ironically forbid the exportation of 9ton thrust shaker to China. Lmao"
X Link 2024-05-23T03:22Z [---] followers, [--] engagements

"@JasonHanDC @teortaxesTex Llama [--] 70B costs 6e24FLOP Chinese AI leaders have enough NVIDIA chips to do this. BTW I think that Chinese AI leaders have already achieved L3 70B performance by data/arch innovation. See Yi-Large by @01AI_Yi "
X Link 2024-05-25T17:39Z [----] followers, [---] engagements

"@main_horse @teortaxesTex Agree the real challenge is the interconnect. BUT I don't think anyone currently can train a 1e26 model using [--] months. BTW China cannot make HBM. I think we can use slower but bigger DRAM then less/none TP/PP more DP to save communication"
X Link 2024-05-25T18:13Z [---] followers, [--] engagements

"@teortaxesTex @main_horse 14nm is "worst case analysis". Actually HUAWEI can make 7nm or even 5nm ML chips. Problem is still the interconnect. Even HUAWEI is an advanced network device&solution provider (e.g. it hasn't provided a LLM oriented solution. https://e.huawei.com/hk/products/optical-transmission/dc908 https://e.huawei.com/hk/products/optical-transmission/dc908"
X Link 2024-05-25T19:03Z [---] followers, [--] engagements

"@teortaxesTex @main_horse If interconnect is the main bottleneck even 20% yield is quite okay for 5nm. Rumors said HUAWEI had achieved 50% yield for 7nm"
X Link 2024-05-25T19:11Z [---] followers, [--] engagements

"@SamVanivray @MattPirkowski @bitcloud https://twitter.com/UnitreeRobotics/status/1720386810197471257 Unitree Released B2 Beyond the Limit Hyperevolution😍Maximum speed of 6m/s sustained load of 40kg and sustained walking endurance of 5h. The comprehensive performance is two to three times that of existing quadruped robots worldwide https://t.co/lcAIe0lyrb https://t.co/vNDLSm9qA2 https://twitter.com/UnitreeRobotics/status/1720386810197471257 Unitree Released B2 Beyond the Limit Hyperevolution😍Maximum speed of 6m/s sustained load of 40kg and sustained walking endurance of 5h. The comprehensive"
X Link 2024-05-26T05:31Z [---] followers, [---] engagements

"I have noticed the length bias in RLHF objective since March: For both PPO and DPO families reward is O(1) but KL is O(L). That is partially because we all set discount factor =1 in RLHF"
X Link 2024-05-26T06:30Z [---] followers, [---] engagements

"So LN is not an approximation of ref they can co-exist. Experiments are needed to determine whether LN+ref is better. Also comparison to ref+margin is useful. IPO introduced margin but it uses MSE instead of logsigmoid"
X Link 2024-05-26T06:31Z [---] followers, [---] engagements

"@giffmana Windows Terminal sir"
X Link 2024-05-27T02:52Z [---] followers, [--] engagements

"WTF everything happened in [--] minutes"
X Link 2024-05-30T06:58Z [---] followers, [--] engagements

"@jon_barron I simply type mu (with IME enabled) then the Microsoft IME will give me "
X Link 2024-05-31T22:59Z [---] followers, [---] engagements

"@xhluca @ericjang11 @clonerobotics Why can't hold a needle/scalpel Data collection stuff is a separate issue I only talked about capabilities here"
X Link 2024-06-02T23:04Z [----] followers, [--] engagements

"@TimothyDuignan @schnabloS @Dr_Gingerballs My understanding: coarse grained classical MD: nucleation is not the ground state coarse grained NNP MD: this work coarse grained DFT-MD: takes more time for each step full atom classical MD: more time each step & slower rate can't observe nucleation in a reasonable time"
X Link 2024-06-07T01:57Z [---] followers, [--] engagements

"@teortaxesTex I formalized the problem still no LLM can solve it robustly"
X Link 2024-06-09T16:10Z [---] followers, [----] engagements

"@teortaxesTex @elonmusk Very good point buy Lenovo (& its Motorola)"
X Link 2024-06-11T01:10Z [---] followers, [---] engagements

"@adcock_brett 1e-9 flight hours before a catastrophic event should be 1e9 flight hours IIUC"
X Link 2024-06-17T09:03Z [---] followers, [---] engagements

"@adcock_brett It seems that autos cause [--] death per 1e8 miles. For 100mph eVTOL it is equivalent to 1e6 hours MTBF"
X Link 2024-06-17T09:12Z [---] followers, [---] engagements

"@main_horse Yeah they will do some batch size ramp up earlier but the main compute stage won't start early"
X Link 2024-06-18T01:01Z [---] followers, [--] engagements

"I guess OpenAI has already achieved this internally just like LCM. Maybe there will be one sentence in their future tech report: "we apply diffusion loss instead of standard CE loss for image tokens in the multimodality next token prediction / autoregressive modeling" Autoregressive Image Generation without Vector Quantization Achieves competitive performance without vector quantization by using diffusion loss function https://t.co/LUNnJFHZNf https://t.co/dtVzvneRY8 Autoregressive Image Generation without Vector Quantization Achieves competitive performance without vector quantization by"
X Link 2024-06-18T04:03Z [---] followers, [---] engagements

"lol I just noticed that Continuous Next Token Prediction without Vector Quantization has been achieved in Octo. https://arxiv.org/abs/2406.11838 https://arxiv.org/abs/2406.11838"
X Link 2024-06-18T08:58Z [---] followers, [---] engagements

"@MoreBirths @CharlieTTEcon long-sighted doesn't make sense given the rapid advancement in AI and robotics"
X Link 2024-06-20T15:26Z [---] followers, [--] engagements

"WTF is "inappropriately touching""
X Link 2024-06-21T12:47Z [---] followers, [--] engagements

"lmao 🐳Coder-V2-Instruct gained only only [--] point on MMLU but [---] points on MMLU-Pro over 🐳V2-Chat. So it got quite a bit smarter. Curiously the website version is also less into Core Values Of Socialism. Coincidence I don't think so https://t.co/DEhyoRodwJ 🐳Coder-V2-Instruct gained only only [--] point on MMLU but [---] points on MMLU-Pro over 🐳V2-Chat. So it got quite a bit smarter. Curiously the website version is also less into Core Values Of Socialism. Coincidence I don't think so https://t.co/DEhyoRodwJ"
X Link 2024-06-23T06:27Z [---] followers, [---] engagements

"Both Jensen Huang and Elon Musk are making technology inexpensive. Great entrepreneurs"
X Link 2024-06-23T15:38Z [---] followers, [---] engagements

"@ChongZitaZhang Aren't cats and dogs more agile than human"
X Link 2024-06-23T18:17Z [---] followers, [--] engagements

"@kadecgos @xlr8harder EUV light source from US (cymer purchased by ASML). Optics from Germany"
X Link 2024-06-25T08:18Z [---] followers, [---] engagements

"@teortaxesTex AFAIK DeepSeek is not supported by the gov they only have very limited compute resources😭. I think HUAWEI should invest DeepSeek by providing compute resources (HUAWEI is also a cloud service provider) just like what Microsoft do for OpenAI"
X Link 2024-06-25T14:30Z [---] followers, [---] engagements

"@RadarHits starship can destory the drone factory with ease lol😂"
X Link 2024-06-25T14:34Z [---] followers, [--] engagements

"Zeng Yuqun's Response: "To strive for a hundred days is to call on everyone to master the fundamentals without forcing anyone to do so." () Lmao"
X Link 2024-06-27T04:44Z [---] followers, [--] engagements

"Anyone knows logit soft-capping in Gemma-2"
X Link 2024-06-27T16:36Z [----] followers, [---] engagements

"Maybe the key (beside strong Pre-training) is a very large reward model in RLHF: "We use a similar RLHF algorithm as Gemma v1.1 (Gemma Team 2024) but a different reward model which is an order of magnitude larger than the policy." 🤯🤯🤯27B only slightly larger than the active parameter of DeepSeek-V2 achieve Llama [--] 70B Elo. 🤯🤯🤯27B only slightly larger than the active parameter of DeepSeek-V2 achieve Llama [--] 70B Elo"
X Link 2024-06-27T17:11Z [---] followers, [---] engagements

"@danielhanchen I am curious about why Gemma [--] & [--] use normed * (1 + w.float()) and init w=0 instead of normed * w.float() and init w=1. Cuz it is slightly more accurate"
X Link 2024-06-27T17:58Z [----] followers, [----] engagements

"😅gemma-2-27b-i failed on my basic math test both prompting in English and in Chinese. DeepSeek-V2 and DeepSeek-Coder-V2 can solve it 100% in Chinese but 0% in Enligsh. L3-70B & GPT-4T can solve it 100%. Old GPT-4 can solve it 60% in English 80% in Chinese"
X Link 2024-06-28T05:45Z [---] followers, [---] engagements

"China domestic EUV is it possible in [----] How can they achieve enough wafer per hour (light power) & overlay accuracy Resolution is not a problem Even 0.2NA can achieve 20nm resolution much better than the best DUV. @DrFrederickChen The rumored transistor density of Kirin [----] is [---] MTx/mm2 If all things go well then china's domestic EUV machine will get shipped to SMIC this year and the huawei 3nm chip should be commercially available between [----] H2 and [----] H1 @DrFrederickChen The rumored transistor density of Kirin [----] is [---] MTx/mm2 If all things go well then china's domestic EUV"
X Link 2024-06-29T18:37Z [---] followers, [---] engagements

"@EffortDefines In China it seems that Apple use AI developed by Baidu"
X Link 2024-06-29T22:51Z [---] followers, [--] engagements

"My list for 2030s: [--]. Transformer [--]. Starship [--]. Starlink especially DTC [--]. I know all of these. But: The top right one is the only of these that will have a measurable societal impact by [----]. I know all of these. But: The top right one is the only of these that will have a measurable societal impact by 2030"
X Link 2024-06-29T23:22Z [---] followers, [---] engagements

"@OPEN_THE_PORTAL no police won't shoot them (sorry if offensive). and US gov won't provide job and education(free) opportunity for 95% the blacks but it is true for the Uygurs. because well educated ppl with job won't join terrorism organizations"
X Link 2024-07-02T01:43Z [---] followers, [--] engagements

"@CyrusSMing @michaelxpettis @YouTube Agree with other parts but India is more aggressive than China just look at Sikkim Bhutan Nepal and Bangladesh"
X Link 2024-07-03T00:34Z [---] followers, [--] engagements

"@angelusm0rt1s @teortaxesTex It is a department of HUAWEI Enterprise founded at least [--] years ago. It was responsible for sales and support of Ascend. But it seems that there is R&D of Ascend now"
X Link 2024-07-04T14:44Z [---] followers, [--] engagements

"@angelusm0rt1s @teortaxesTex An interesting phenomenon is that [----] Labs and HUAWEI Computing are competing for talents. 🧐🧐"
X Link 2024-07-04T15:01Z [---] followers, [--] engagements

"There is a huge gap between OpenAI/Anthropic Models and others. "What are the typical areas of 1T1C DRAM and 6T SRAM in terms of F2" Gemini [---] Pro is not even wrong"
X Link 2024-07-05T10:18Z [---] followers, [---] engagements

"China shoud impose higher export tariffs and subsidize domestic consumer to increase domestic Marshallian surplus and mitigate world worries about China overcapacity"
X Link 2024-07-06T01:07Z [---] followers, [---] engagements

"lmao poor sequoia. Honestly love that Sequoia has been the worst VC at AI. They've missed every hot AI startup even if it is a bubble they should be in there like the other bubbles they embrace and made money on. All it takes is listening to the leaked FTX call to see they're clowns. Honestly love that Sequoia has been the worst VC at AI. They've missed every hot AI startup even if it is a bubble they should be in there like the other bubbles they embrace and made money on. All it takes is listening to the leaked FTX call to see they're clowns"
X Link 2024-07-06T04:36Z [---] followers, [---] engagements

"@gazorp5 @dylan522p Yep Sequoia invested NVIDIA OpenAI Hugging Face Replicate"
X Link 2024-07-06T05:26Z [---] followers, [---] engagements

"@Robotbeat the dumbest thing is to militarily land or blockade Taiwan. even bomb TSMC is better than landing or blockade🤐. unless China has her own starship & starlink"
X Link 2024-07-08T22:58Z [---] followers, [--] engagements

"I immediately realized (guess) that TTT is delta rule when I saw the figure of TTT - even before I knew it uses L2 loss. Online gradient descent version of TTT-linear is a variant of DeltaNet and could be parallelized efficiently: https://t.co/yrINFRVfZ8 Online gradient descent version of TTT-linear is a variant of DeltaNet and could be parallelized efficiently: https://t.co/yrINFRVfZ8"
X Link 2024-07-09T23:16Z [---] followers, [---] engagements

"good arts by Kolors (kuaishou)"
X Link 2024-07-12T11:34Z [----] followers, [---] engagements

"He also said OpenAI/Anthropic (the most advanced team outside China) has 2x model-architecture&training-dynamics efficiency and 2x data efficiency (so overall 4x compute efficiency) comparing to the most advanced Chinese team. That is a HUGE gap.🥵💪 Deepseek founder Liang Wenfeng: We will not go closed-source. We believe that having a strong technical ecosystem first is more important. https://t.co/d6qhzdF4G5 Deepseek founder Liang Wenfeng: We will not go closed-source. We believe that having a strong technical ecosystem first is more important. https://t.co/d6qhzdF4G5"
X Link 2024-07-17T19:50Z [---] followers, [----] engagements

"@BruDCDO @teortaxesTex AFAIK there is no subsidy. but it's true they have a much lower margin comparing to OpenAI. However technology (MLA + fine-grained MoE) is the key. but I guess OpenAI have similar if not more advanced tech"
X Link 2024-07-18T21:28Z [---] followers, [--] engagements

"@jonasgeiping so basically you fuse the matmul into the kernel on the top of malek's method That's great. I also noticed that you use some local lse so tiled softmax (like flash attention) is used and it is possible to only keep fp32 logits on SRAM"
X Link 2024-07-19T18:55Z [----] followers, [--] engagements

"@Noahpinion Poland is a great country. They achieve both high efficiency and high equity"
X Link 2024-07-20T16:59Z [---] followers, 10.1K engagements

"@_philschmid @Alibaba_Qwen @OpenAI @AnthropicAI There is a typo (do you use a LLM to generate this table) AMC [----] Qwen2 maj@64 should be 21/40 not 12/40"
X Link 2024-07-21T15:58Z [---] followers, [---] engagements

"@evil_malloc @_philschmid @Alibaba_Qwen @OpenAI @AnthropicAI From NuminaMath tech report"
X Link 2024-07-21T19:07Z [---] followers, [---] engagements

"@decentralizedX1 @Kanthan2030 NVIDIA founded in [----] SpaceX founded in [----]. At that time China was poor and under-educated. Plus U.S. gov forbid Chinese chip design companies from manufacturing advanced chip with TSMC"
X Link 2024-07-22T08:33Z [---] followers, [--] engagements

"@angelusm0rt1s what is SPAC"
X Link 2024-07-22T16:03Z [---] followers, [--] engagements

"CPU is well-suited for DeepSeek-V2. I wonder why Intel & AMD haven't taken any move. @qtnx_ Amazing how we went from 236B too much nobody has that kind of hardware to 405B fits right in you just gotta have courage literally overnight https://t.co/Q8Jh41BZCW @qtnx_ Amazing how we went from 236B too much nobody has that kind of hardware to 405B fits right in you just gotta have courage literally overnight https://t.co/Q8Jh41BZCW"
X Link 2024-07-22T16:49Z [---] followers, [---] engagements

"@angelusm0rt1s they provide the only 90% sparse MoE with frontier performance. CPU has no other choice (well both Intel & AMD have GPU). Intel even paid for advertisement of "Llama-2-7B 100TPS""
X Link 2024-07-22T17:19Z [---] followers, [--] engagements

"@RealJosephus @aidan_mclau 4o mini can output up to 200token/s. If it were MoE I would estimate it is a 40B-Active-5B or something similar"
X Link 2024-07-22T18:23Z [---] followers, [--] engagements

"@RealJosephus @aidan_mclau The fastest non-groq provider of Mistral 8x7B (Active 13B) run up to 250TPS @ $0.5 per 1M tokens. How can OpenAI serve a larger model with similar TPOT @ $0.15/$0.6 per 1M in/out tokens and they need more margin to cover training cost"
X Link 2024-07-22T18:59Z [---] followers, [--] engagements

"@RealJosephus @aidan_mclau I haven't trained LLMs on my own. But I think it is more unlikely that OpenAI have some inference magics than they have some training magics"
X Link 2024-07-22T19:01Z [---] followers, [--] engagements

"@terryyuezhuo Do OpenAI and Anthropic report MultiPL-E version Is there a leaderboard for MultiPL-E version"
X Link 2024-07-23T09:42Z [---] followers, [---] engagements

"@OpenAI protected disclosures. protected by what big brother sam"
X Link 2024-07-23T12:51Z [---] followers, [---] engagements

"@PauseusMaximus @mathepi Actually Llama make many China AI startups invest more resources on application instead of training. They said "we don't need to train a model just fine-tune llama" and "UX intelligence". However training technique IS the KEY to win "the AGI war". UX is NOT"
X Link 2024-07-24T00:02Z [---] followers, [---] engagements

"@soumithchintala How to spot silent data corruption on GPU"
X Link 2024-07-24T00:43Z [---] followers, [---] engagements

"MSFT has USD 19.634B cash. @AlibabaGroup has CNY 286.424B cash. (USD 39.9B) But @Alibaba_Qwen can't buy a single A100 H100 or B200 GPU into China.🫠 BTW Qwen2 72B GPT-4o and Claude [---] sonnet on AMC [----] and AIME 2024"
X Link 2024-07-24T06:27Z [---] followers, [---] engagements

"@gazorp5 @AlibabaGroup @Alibaba_Qwen AFAIK the only effective way is to adhere the ban i.e. build AI datacenter outside China. Possibly in Malaysia and Singapore"
X Link 2024-07-24T06:39Z [---] followers, [--] engagements

"@lmsysorg LMSYS releases 20% but OpenAI has 100% battles when their models are involved"
X Link 2024-07-24T15:13Z [---] followers, [---] engagements

"@Ji_Ha_Kim DeepSeek can't solve the logarithm version even with explicit hint. It made computation mistake consistently. (tested in Chinese)"
X Link 2024-07-24T21:29Z [---] followers, [---] engagements

"ta-da I bow down to Kuaishou. 🎉 The moment we've all been waiting for is HERE 🎊 Introducing the official global launch of Kling AI's International Version1.0🌍 📧ANY email address gets you inno mobile number required 👉 Direct linkhttps://t.co/68WvKSDuBg 🔥 Daily login grants [--] free Credits for https://t.co/TgFZIwInPg 🎉 The moment we've all been waiting for is HERE 🎊 Introducing the official global launch of Kling AI's International Version1.0🌍 📧ANY email address gets you inno mobile number required 👉 Direct linkhttps://t.co/68WvKSDuBg 🔥 Daily login grants [--] free Credits for"
X Link 2024-07-25T00:46Z [---] followers, [---] engagements

"Actually more depend on exports now. 2024H1 vs. 2023H1 new real estate sales -1.6T CNY but GDP +2.4T CNY (+4.0%). If real estate sales doesn't change and assume the correlation [---] will be GDP +4.0T (+6.7%) subtract -1.0% deflator = +7.7%. consumption only +3.7%. @robin_j_brooks Chinese economy no longer as dependent on exports. Try it again see what happens @robin_j_brooks Chinese economy no longer as dependent on exports. Try it again see what happens"
X Link 2024-07-25T01:55Z [---] followers, [---] engagements

"@jiayq Just checked 8B on OpenRouter. Wow [---] token/s @ $0.07 is impressive. Much better than previous [---] token/s on Llama [--] and Mistral 7B. Is that speed temporary (cuz low load)"
X Link 2024-07-25T03:23Z [---] followers, [---] engagements

"@basedjensen I think it is a typo should be "formalization". The formalization of training problems is done by finetuned gemini because accuracy/faithfulness/fidelity is not important here. The formalization of contest problems is done by human"
X Link 2024-07-26T05:16Z [---] followers, [---] engagements

"@teortaxesTex probably caused by the increased USD interest and the devaluation of Euro. plus the recession of balance sheet. China encountered similar issues so that Xi officially raise a question: why new unicorn companies in China become fewer"
X Link 2024-07-28T01:45Z [---] followers, [--] engagements

"kinda confirm they use base64 for data augmentation @teortaxesTex Lmao if you just provide it the B64 without any additional consideration it respond accurately in Base64. https://t.co/tASfaYRxPT @teortaxesTex Lmao if you just provide it the B64 without any additional consideration it respond accurately in Base64. https://t.co/tASfaYRxPT"
X Link 2024-07-28T01:50Z [---] followers, [---] engagements

"@RyanEls4 NO you should store base64-encoded protobuf and/or parquet🤓. don't tell me that we can store binary/bytes in RDBMS I JUST WANT TO STORE A HUMAN READABLE STRING. so why not just use @MongoDB 🤓"
X Link 2024-07-28T03:14Z [---] followers, [--] engagements

"Now GPT-4o (& mini) and Claude-3.5-sonnet can solve Wason selection task (mini and 3.5-sonnet are not reliable) but DeepSeek can't. Note: it is not about reasoning capability but about post-training data. Original GPT-4 can't solve it either"
X Link 2024-08-01T03:05Z [----] followers, [---] engagements

"BTW Wason selection task is one of the problems mentioned by "GPT-4 Can't Reason (arxiv:2308.03762)". So it is likely that OpenAI add specific data after that"
X Link 2024-08-01T05:47Z [---] followers, [--] engagements

"@teortaxesTex Exactly my point. If OpenAI increases the cost-effectiveness of GPT-4o mini at the cost of decreasing diversity"
X Link 2024-08-01T09:33Z [---] followers, [--] engagements

"I don't have enough Poe compute points to ask GPT-4o multiple times. It got 2/2"
X Link 2024-08-02T12:37Z [---] followers, [---] engagements

"Reasoning can help GPT-4o mini (chatgpt . com) to get 4/5 but 3.5-sonnet still failed (0/2). GPT-4o mini (poe) also failed (0/2) Failure cases:"
X Link 2024-08-02T13:02Z [---] followers, [--] engagements

"lmao @YouJiacheng That's like you put your lowest TOEFL of history on your graduate school application. @YouJiacheng That's like you put your lowest TOEFL of history on your graduate school application"
X Link 2024-08-03T09:24Z [---] followers, [---] engagements

"@HrishbhDalal And if they are using tools the result shouldn't be slightly off (i.e. should be exact)"
X Link 2024-08-03T11:25Z [---] followers, [--] engagements

"@eatsmokedmeat @abacaj TPU. Avoid NVIDIA tax"
X Link 2024-08-04T09:12Z [---] followers, [--] engagements

"China hands U.S. first ever loss in men's 4x100m medley relay https://www.espn.com/olympics/story/_/id/40724995/china-hands-us-first-ever-loss-men-4x100m-medley-relay https://www.espn.com/olympics/story/_/id/40724995/china-hands-us-first-ever-loss-men-4x100m-medley-relay"
X Link 2024-08-05T02:39Z [---] followers, [---] engagements

"@angelusm0rt1s China is losing labor age population at a rate of 10M per year"
X Link 2024-08-05T12:50Z [---] followers, [---] engagements

"@angelusm0rt1s China uses 224B land for 800B kg vegetables per year. Plant factory cost 7kWh/kg. (future target=4kW/kg) PVs can produce 100kWh/ per year (PV modules themselves can be 250kWh/). More importantly PVs can be deployed in "unusable" land"
X Link 2024-08-05T13:58Z [---] followers, [--] engagements

"lmao @zhengyiluo Just add a limbo pole in post processing "emergent behaviors" https://t.co/J8PywI5B07 @zhengyiluo Just add a limbo pole in post processing "emergent behaviors" https://t.co/J8PywI5B07"
X Link 2024-08-06T05:47Z [---] followers, [---] engagements

"Especially when FTC prevent acquisition😨 @mvpatel2000 same with shazeer returning to google. faang is the inevitable sink for all talent flows. @mvpatel2000 same with shazeer returning to google. faang is the inevitable sink for all talent flows"
X Link 2024-08-06T07:21Z [---] followers, [---] engagements

"@teortaxesTex The only sensible movement for the U.S. is to put trillions of dollars into SpaceX and militarize Starship (incl. SuperHeavy). Starship can bring U.S. permanent nuclear & information advantage and no one can revolt"
X Link 2024-08-06T09:52Z [---] followers, [----] engagements

"@coder543 @LouisKnightWebb so your result implies that mini uses larger concurrency"
X Link 2024-08-06T13:57Z [---] followers, [--] engagements

"@teortaxesTex Raptor engine is the key. It's very hard to design and manufacture Raptor [--] & Raptor [--]. Powered by Raptor [--] an expendable Starship can be more cost effective than a 20x reused Falcon 9"
X Link 2024-08-07T07:11Z [---] followers, [---] engagements

"@teortaxesTex [--]. ASML shipped the first prototype EUV to TSMC in Feb [----]. But SpaceX flew the first Starship (SN8) in Dec [----]. [----] years vs. [---] years from now. [--]. SpaceX's number of employees is about 1/3 of ASML"
X Link 2024-08-07T09:04Z [---] followers, [---] engagements

"@terafoenix @teortaxesTex many suppliers e.g. cymer are acquired by ASML. Ofc there are important suppliers like ZEISS and TRUMPF but they are big companies with many groups and only several groups are involved in EUV"
X Link 2024-08-07T12:41Z [---] followers, [--] engagements

"@teortaxesTex @terafoenix You overrate the power of cooperation. Do you know "the mythical man-month" A multi-company ecosystem might need [--] or more time to run a R&D cycle comparing to a monolithic company"
X Link 2024-08-07T13:27Z [---] followers, [---] engagements

"@angelusm0rt1s @teortaxesTex @terafoenix ASML build wafer stage reticle stage EUV light source and many parts of optics (especially illumination system) in-house. It's possible that a company at 1.5-2 scale of ASML can build EUVL in-house only relying on standard parts from suppliers"
X Link 2024-08-07T14:25Z [---] followers, [--] engagements

"@angelusm0rt1s China's RE market has entered garbage time -25% CAGR"
X Link 2024-08-07T18:50Z [---] followers, [--] engagements

"@BRussellsimp Google's moat is TPU. [--] cheaper than NVIDIA GPU"
X Link 2024-08-08T19:22Z [----] followers, [--] engagements

"@teortaxesTex Rumors: 9.9"
X Link 2024-08-08T19:29Z [----] followers, [---] engagements

"Similar phenomena are observed in robotics imitation learning. lower the noise lower the loss. Just look at the incredible difference between generated data and real data. (when both are known to be of high quality) This might be the repetitiveness of the generated data or the higher noise of real data. ( or. something else. ) https://t.co/1cU9p74bq2 Just look at the incredible difference between generated data and real data. (when both are known to be of high quality) This might be the repetitiveness of the generated data or the higher noise of real data. ( or. something else. )"
X Link 2024-08-08T22:26Z [----] followers, [---] engagements

"@bdsqlsz and [--] cards per node WTF"
X Link 2024-08-09T08:07Z [----] followers, [----] engagements

"Any government or big company with super AI is likely to build an authoritarian system. Think about what Google and Amazon do in "Project Nimbus". If government/big company can monitor thoughts "hate crime" might include "hate thought". Gmail creator Paul Buchheit says if China wins the race to build super AI we could end up as zoo animals in permanent lockdown where escape is impossible and even our own thoughts are censored https://t.co/bIRcQAcHrj Gmail creator Paul Buchheit says if China wins the race to build super AI we could end up as zoo animals in permanent lockdown where escape is"
X Link 2024-08-10T09:07Z [----] followers, [---] engagements

"@angelusm0rt1s @teortaxesTex SOEs actually didn't need explicit subsidies. Bank loans and equity funds are quite available for well-operated SOEs"
X Link 2024-08-11T15:07Z [----] followers, [--] engagements

"@vasud3vshyam What's generating function here I only know moment-generating function: M_v(t)=Eexp(tv)=P_a * exp(tv_a) do you mean "generating functional of one-point correlation function" (in QFT)"
X Link 2024-08-12T21:47Z [----] followers, [--] engagements

"egirl example: Tencent even has b2b saas egirls platform https://t.co/rXOcSBrxwb Tencent even has b2b saas egirls platform https://t.co/rXOcSBrxwb"
X Link 2024-08-12T22:03Z [----] followers, [---] engagements

"Ask them determine the number of vertices first. Funny how this still doesn't work Last time I checked Gemini could solve it occasionally https://t.co/ZOIfdWDjBE Funny how this still doesn't work Last time I checked Gemini could solve it occasionally https://t.co/ZOIfdWDjBE"
X Link 2024-08-13T11:58Z [----] followers, [----] engagements

"@angelusm0rt1s All models I tested said Ilya"
X Link 2024-08-13T13:21Z [----] followers, [--] engagements

"@GraceLiu78 Your work looks intriguing but I don't know what is contrastive RL. Is it an unsupervised RL method Can you elaborate it in easy to understand words"
X Link 2024-08-14T10:38Z [----] followers, [---] engagements

"Apple Robotics. https://www.bnnbloomberg.ca/business/technology/2024/08/14/apple-pushes-ahead-with-tabletop-home-device-in-shift-to-robotics/ https://www.bnnbloomberg.ca/business/technology/2024/08/14/apple-pushes-ahead-with-tabletop-home-device-in-shift-to-robotics/"
X Link 2024-08-14T20:01Z [----] followers, [---] engagements

"The only small (34B) open-weight model that can solve this problem: InternLM2.5-20B-Chat AND it solve it with [--] methods (induction & telescoping sum). Ofc its solutions are sometimes imperfect or even wrong (e.g. in figure [--] the explanation of telescoping sum is flawed)"
X Link 2024-08-14T20:36Z [----] followers, [----] engagements

"Another problem. The good-old GPT-4 (not Turbo or omni) pass only 1/5"
X Link 2024-08-14T20:42Z [----] followers, [---] engagements

"Based on my simple test @LeptonAI is about 5x speed of official API hosted by @deepseek_ai and 1.5-2x speed of @SiliconFlowAI . It is about 100-135 token/s. I can't do more precise test because there is network latency and I can't top up. Great work @jiayq Seems like one Western provider has put up DeepSeek at a reasonable price at last. Seems like one Western provider has put up DeepSeek at a reasonable price at last"
X Link 2024-08-15T10:16Z [----] followers, [----] engagements

"$489 lightweight (75g) AR glasses from @RokidGlobal looks good. ($489 includes the glasses and a mobile compute device "Rokid Station 2") source: (the below clip begins at 1:05 2x speed) https://www.bilibili.com/video/BV1GZvQeUEci https://www.bilibili.com/video/BV1GZvQeUEci"
X Link 2024-08-15T12:24Z [----] followers, [---] engagements

"@hyhieu226 @xai 😂why all @xai guys I saw on X are Asian (except Elon)"
X Link 2024-08-16T05:24Z [----] followers, [---] engagements

"@gazorp5 @hyhieu226 @xai @ibab okay I didn't see him/her🤣"
X Link 2024-08-16T05:28Z [----] followers, [---] engagements

"Happy Independence from Britain Day is the best Happy Independence Day to all Indians https://t.co/2rMgVsnTCH Happy Independence Day to all Indians https://t.co/2rMgVsnTCH"
X Link 2024-08-16T05:43Z [----] followers, [---] engagements

"@hyhieu226 @ibab @gazorp5 @xai is this private info https://github.com/ibab https://github.com/ibab"
X Link 2024-08-16T06:15Z [----] followers, [--] engagements

"@entropicEm They just have problem with scientific notation calculation. For the intuition part a huge dyson sphere with 20K temperature in our solar system will be quite detectable"
X Link 2024-08-17T22:25Z [----] followers, [--] engagements

"@teortaxesTex It's really hard even for GPT-4o and Grok-2. A lot of SFT data should be added for calculation in scientific notation. I noticed that DeepSeek might indefinitely repeat in the calculation which hints insufficient training"
X Link 2024-08-17T23:40Z [----] followers, [---] engagements

"@PandaAshwinee RMSProp + apply in backward cost 2+2 bytes per param if you maintain the denominator term in 16bit (maybe 32bit is required). CPU offloaded optimizer cost [--] bytes (if gradient accumulated on CPU) or 2+2 bytes (grad acc on GPU). The latter is equivalent but with grad acc enabled"
X Link 2024-08-19T17:58Z [----] followers, [--] engagements

"@PandaAshwinee [--] GPU don't need FSDP (or you just leverage its offload impl) okay maybe the impl doesn't support "offload in backward". but with large enough grad acc steps offload+grad acc should not affect the speed significantly because only update will incur GPU-GPU traffic"
X Link 2024-08-19T18:29Z [----] followers, [--] engagements

"Meanwhile HUAWEI uses GenAI to help both amateurs and professional artists enjoy the art. In the video below the draft is generated by AI given a photo. It can also generate reference result of the draft which is useful for entry-level amateurs. https://www.bilibili.com/video/BV1gS42197hf Procreate admits to serve sentimental rather than productive needs and I find it remarkable. Current imagegen can absolutely make a difference in a professional workflow. But artists signal paying for art therapy good feels not for tools giving them a competitive edge."
X Link 2024-08-19T21:19Z [----] followers, [----] engagements

"At the first glance I found this system has [--] main disadvantages compared to UMI: [--]. High BOM cost Robotiq gripper is not cheap [--]. High weight [--]. Cable required limited mobility [--]. Limited FOV of Realsense I spent some time at TU Munich to build a hand-held data collection system for learning of manipulation tasks. https://t.co/dMzThB1FCw I spent some time at TU Munich to build a hand-held data collection system for learning of manipulation tasks. https://t.co/dMzThB1FCw"
X Link 2024-08-20T18:43Z [----] followers, [---] engagements

"exciting breakthrough. but there is only one author on the paper😂 "our" maybe you can add your cat/dog or something😂. 🚀 Can #MPC rival and even surpass #ReinforcementLearning in solving #DexterousManipulation Our answer is a resounding YES PROUD to share: 🔥Complementarity-Free Multi-Contact Modeling and Optimization our latest method that sets shattering benchmarks in various challenging https://t.co/dUuAjNceuV 🚀 Can #MPC rival and even surpass #ReinforcementLearning in solving #DexterousManipulation Our answer is a resounding YES PROUD to share: 🔥Complementarity-Free Multi-Contact"
X Link 2024-08-20T21:43Z [----] followers, [---] engagements

"@Stone_Tao exactly the source of my comment😂"
X Link 2024-08-21T03:47Z [----] followers, [--] engagements

"Totally true. Cells are sold at $0.04/W. 100GW=$4B 500GW=$20B. It is literally nothing comparing to Amazon or Apple. Actually installation labor cost is more than cells. Partly why China dominates stuff like rare Earths or solar cells is cuz the global revenue for those is tiny. All rare earths mined globally are only $6B/yr. Theres 500GW/yr of solar production capacity only $7-50B in cell revenue. Amazon has $50B in revenue per month. Partly why China dominates stuff like rare Earths or solar cells is cuz the global revenue for those is tiny. All rare earths mined globally are only $6B/yr."
X Link 2024-08-21T05:38Z [----] followers, [---] engagements

"@angelusm0rt1s In China the installation labor cost is not that high. I estimated the direct labor for module installation costs only $0.02/W. All building & installation cost is about $0.08/W"
X Link 2024-08-21T06:07Z [----] followers, [--] engagements

"@angelusm0rt1s But in developed countries like U.S. module installation labor might cost more than $0.1/W"
X Link 2024-08-21T06:09Z [----] followers, [--] engagements

"@angelusm0rt1s Really I heard they got [--] cents per watt subsidy. Chinese modules are sold at only [--] cents per watt"
X Link 2024-08-21T06:28Z [----] followers, [--] engagements

"@teortaxesTex Harmonic drive is harder to control than small reduce ratio planetary + strong motor. That's the main risk. If they can't make it work they will lose a huge market"
X Link 2024-08-21T07:30Z [----] followers, [--] engagements

"I suggest implement tax on unrealized gains by call options. JUST IN: Kamala Harris backs President Biden's 44.6% capital gains tax proposal the highest in history. The proposal also includes a 25% tax on unrealized gains for high-net-worth individuals. https://t.co/ERw22lgRqq JUST IN: Kamala Harris backs President Biden's 44.6% capital gains tax proposal the highest in history. The proposal also includes a 25% tax on unrealized gains for high-net-worth individuals. https://t.co/ERw22lgRqq"
X Link 2024-08-21T08:23Z [----] followers, [---] engagements

"Why not just implement a capital/property tax by zero price call options or convertible bonds JUST IN: Kamala Harris backs President Biden's 44.6% capital gains tax proposal the highest in history. The proposal also includes a 25% tax on unrealized gains for high-net-worth individuals. https://t.co/ERw22lgRqq JUST IN: Kamala Harris backs President Biden's 44.6% capital gains tax proposal the highest in history. The proposal also includes a 25% tax on unrealized gains for high-net-worth individuals. https://t.co/ERw22lgRqq"
X Link 2024-08-21T08:24Z [----] followers, [---] engagements

"tbh this must be in the training data cuz it's a very common text book exercise. And GPT-4o mini solve it with ease (correct [--] in 3). Surprisingly Claude-3.5-Sonnet can't solve it. Ah oh Anthropic need to do more text book exercise. Qwen2-Math-7B is definitely the best model for math. It solved a 6-step calculus problem involving a combination of trigonometric functions and required some technique to solve. Really happy with the explanation and the step-by-step solution. Qwen shipped real 🍓 @Alibaba_Qwen https://t.co/gq8PtxVFYu Qwen2-Math-7B is definitely the best model for math. It solved a"
X Link 2024-08-21T09:22Z [----] followers, [---] engagements

"@teortaxesTex It's hard for HUAWEI to catch up NVIDIA at chip tray or even rack level. But it might be easier to win at cluster or even campus level"
X Link 2024-08-22T09:27Z [----] followers, [---] engagements

"@_lukaemon @teortaxesTex I knew NVIDIA acquired Mellanox. But HUAWEI is one of the world's largest and most advanced network solution vendor. HUAWEI is one of the world's most advanced power and cooling solution vendor. HUAWEI is a large (but not very successful) cloud provider. Vertical Integration"
X Link 2024-08-22T09:51Z [----] followers, [---] engagements

"Looks good. FWIW: Large=398B-A94B Mini=52B-A12B License is strange "now or in the future generate more than $50M in annual revenue regardless whether that revenue is generated from Jamba Materials or Derivatives " how can we predict the future Chatbot Arena update: the latest Jamba [---] Large/Mini from @ai21labs is now live on the leaderboard Open weights. Novel SSM-Transformer architecture with long context window. Congrats @ai21labs on the strong open model release https://t.co/AAiL2xX9DI Chatbot Arena update: the latest Jamba [---] Large/Mini from @ai21labs is now live on the leaderboard"
X Link 2024-08-22T22:36Z [----] followers, [---] engagements

"@kywch500 Simulation is run on CPU What CPU is used I know Pufferlib has remarkably low overhead but running a sweep with dozens of experiments in 500s is crazy"
X Link 2024-08-23T09:31Z [----] followers, [----] engagements

"Grok-2-Mini is the killer of 4o-mini Chatbot Arena update❤🔥 Exciting news@xAI's Grok-2 and Grok-mini are now officially on the leaderboard With over [----] community votes Grok-2 has claimed the #2 spot surpassing GPT-4o (May) and tying with the latest Gemini Grok-2-mini also impresses at #5. Grok-2 excels in https://t.co/5lyQgratJQ Chatbot Arena update❤🔥 Exciting news@xAI's Grok-2 and Grok-mini are now officially on the leaderboard With over [----] community votes Grok-2 has claimed the #2 spot surpassing GPT-4o (May) and tying with the latest Gemini Grok-2-mini also impresses at #5. Grok-2"
X Link 2024-08-23T19:44Z [----] followers, [---] engagements

"Wait Grok-2 requires multi-host inference is it that big It's so fast that I can't believe it's too big to fit in a single node Grok [--] mini is now 2x faster than it was yesterday. In the last three days @lm_zheng and @MalekiSaeed rewrote our inference stack from scratch using SGLang (https://t.co/M1M8BlXosH). This has also allowed us to serve the big Grok [--] model which requires multi-host inference at a https://t.co/G9iXTV8o0z Grok [--] mini is now 2x faster than it was yesterday. In the last three days @lm_zheng and @MalekiSaeed rewrote our inference stack from scratch using SGLang"
X Link 2024-08-23T19:47Z [----] followers, [---] engagements

"@hsu_byron @karpathy @tri_dao @woosuk_k @SonglinYang4 @danielhanchen mgmalek is @mikegmalek https://x.com/mikegmalek/status/1786503367193461032 @karpathy For fused cross-entropy could reduce peak memory by 1) blocking the logit computation to avoid materialising the full logits tensor and 2) overwriting the logits with their grad during training. Can save a lot of memory especially when vocab size dim https://t.co/ZucNOj3XT3 https://x.com/mikegmalek/status/1786503367193461032 @karpathy For fused cross-entropy could reduce peak memory by 1) blocking the logit computation to avoid materialising"
X Link 2024-08-23T22:51Z [----] followers, [---] engagements

"Actually if China gov is willing to increase the military expense the economy can be improved at least for a while. The central government debt is quite low compared to US. China appears to be in serious economic straits now. Yet it has not changed course at all and continues its massive military buildup having constructed a [-----] ton ship in [--] months. It's also worth noting China fought us to a standstill in [----] while impoverished. China appears to be in serious economic straits now. Yet it has not changed course at all and continues its massive military buildup having constructed a 40000"
X Link 2024-08-24T07:47Z [----] followers, [---] engagements

"@angelusm0rt1s I know. But the fact is that these NPLs (moved to asset management companies in 2000-2005) were moved to (in 2009-2012) and are still on the account of gov (incl. PBOC) and state owned banks in the form of gov debt. The growth in the past [--] years absorbs these debt"
X Link 2024-08-24T08:10Z [----] followers, [--] engagements

"@angelusm0rt1s RE stuffs are actually tiny the real crisis is the growth. I don't think we are spending $800B in military (incl. military R&D) per year. For example the EM launcher on the CNS FuJian costs only $30M in R&D stage"
X Link 2024-08-24T08:26Z [----] followers, [--] engagements

"@angelusm0rt1s RE contraction will come to an end within a few years. In 2024H1 RE (newly constructed) sales -25% YoY the reduction is about 2.5% of GDP"
X Link 2024-08-24T08:40Z [----] followers, [--] engagements

"@angelusm0rt1s China's capital gain tax is too low. Nearly zero. Plus the ubiquitous and immense indirect tax create an environment that the less you earn the higher effective tax rate you experience (regressive tax)"
X Link 2024-08-24T08:44Z [----] followers, [--] engagements

"Here comes @xai and Elon. Anthropic is waiting for GPT-5 🔄 OpenAI is waiting for Opus [---] Its so over. It's a never-ending loop. We're trapped in an endless cycle of waiting. We are cooked Anthropic is waiting for GPT-5 🔄 OpenAI is waiting for Opus [---] Its so over. It's a never-ending loop. We're trapped in an endless cycle of waiting. We are cooked"
X Link 2024-08-24T21:07Z [----] followers, [---] engagements

"Russia would be the opponent of China if there were no US. During 1840-1949 Russia snatched about [-------] square kilometers of land from China (Qing and ROC) which is more than [--] of the land area of Ukraine. @douglasritz @robin_j_brooks the russians would simply invade Ukraine and arrange a massacre like in Buch in every Ukrainian city. do you have a problem with your head @douglasritz @robin_j_brooks the russians would simply invade Ukraine and arrange a massacre like in Buch in every Ukrainian city. do you have a problem with your head"
X Link 2024-08-24T21:53Z [----] followers, [---] engagements

"And 3-year is too long. NVIDIA rollout new hardware each year with [--] capability but only [---] price"
X Link 2024-08-24T23:18Z [----] followers, [---] engagements

"@cccntu IIUC it does materialize logits but in a chunked manner"
X Link 2024-08-25T13:17Z [----] followers, [---] engagements

"One important advantage of xAI: you will probably share your ChatGPT account with friends but not X/twitter account"
X Link 2024-08-25T13:42Z [----] followers, [---] engagements

"@jskf__ Would xAI provide such an option They might provide API but not a separate subscription"
X Link 2024-08-25T16:00Z [----] followers, [--] engagements

"Making mothers be with high status is important and might be more important than economic subsidies. Elevating the Status of Motherhood Solves Low Birthrates: The Extraordinary Case of Mongolia For [--] years Mongolian leaders have given the Order of Maternal Glory to mothers. This raised the status of motherhood and helped forge a remarkably pronatal culture. 🧵 please share https://t.co/4o8hZo84lM Elevating the Status of Motherhood Solves Low Birthrates: The Extraordinary Case of Mongolia For [--] years Mongolian leaders have given the Order of Maternal Glory to mothers. This raised the status"
X Link 2024-08-25T16:26Z [----] followers, [---] engagements

"It's only me that just knew that OpenAI allow special customers like Cursor access the prompt tokens logprobs of their frontier models https://www.cursor.com/blog/instant-apply https://www.cursor.com/blog/instant-apply"
X Link 2024-08-25T17:11Z [----] followers, 19.2K engagements

"@teortaxesTex @MA1984251984 @angelusm0rt1s @EIFY @alicemazzy Actually the advantage is not that correlated with reusable rocket. Even expendable SuperHeavy+Starship are superpower"
X Link 2024-08-26T16:32Z [----] followers, [--] engagements

"@angelusm0rt1s @teortaxesTex @MA1984251984 @EIFY @alicemazzy UBTECH is pure hype"
X Link 2024-08-26T17:13Z [----] followers, [--] engagements

"@angelusm0rt1s @teortaxesTex @MA1984251984 @EIFY @alicemazzy Oh no xAI will be nuked if so"
X Link 2024-08-26T17:42Z [----] followers, [--] engagements

"@teortaxesTex That's why I suggest preparing 10-100 million tons of rocket before try"
X Link 2024-08-26T18:22Z [----] followers, [---] engagements

"@angelusm0rt1s @teortaxesTex BTW "current production capacity" is a useless metric for China. Upstream material production capacity makes more sense"
X Link 2024-08-26T19:01Z [----] followers, [--] engagements

"@angelusm0rt1s @teortaxesTex I actually mean CPC (aka CCP). CPC only had its army after they are massacred crazy by KMT. In [----] Mao said: "China has but one option today: to seek harmony for in harmony lies strength. Any other course of action would be a mistake.""
X Link 2024-08-26T19:11Z [----] followers, [--] engagements

"Good results. But I afraid it won't be a thing for GPU poor. We need TP&PP to distribute param. In contrast Microsoft and Google can use multiple DCs to train one model more efficiently with this. Ofc they have their in-house low-comm DP algo. What if you could use all the computing power in the world to train a shared open source AI model Preliminary report: https://t.co/b1XgJylsnV Nous Research is proud to release a preliminary report on DisTrO (Distributed Training Over-the-Internet) a family of https://t.co/h2gQJ4m7lB What if you could use all the computing power in the world to train a"
X Link 2024-08-26T19:34Z [----] followers, [----] engagements

"@chankhavu You have a node but still a GPU poor Okay assume you have a [-----] node it has only 192GB memory. You can at most train a 150B model with FP8/INT8 + CPU offloaded optimizer states. Remember the model size is bottlenecked by the node with the least memory in the network"
X Link 2024-08-26T22:08Z [----] followers, [---] engagements

"@Dmitry31593946 @rohanpaul_ai INT8 is available and widely adopted in A100 era. But hardware is an important factor. In Jul [----] H100 was $2.5/h. In Nov [----] A100 was $2/h (not sure). But compute & bandwidth are about 2-3x"
X Link 2024-08-27T13:19Z [----] followers, [--] engagements

Limited data mode. Full metrics available with subscription: lunarcrush.com/pricing

creator/x::YouJiacheng
/creator/x::YouJiacheng