#  @jiqizhixin 机器之心 JIQIZHIXIN 机器之心 JIQIZHIXIN posts on X about ai, university of, bytedance, tencent the most. They currently have [------] followers and [---] posts still getting attention that total [------] engagements in the last [--] hours. ### Engagements: [------] [#](/creator/twitter::819861340294524928/interactions)  - [--] Week [-------] -13% - [--] Month [-------] -45% - [--] Months [---------] +319% - [--] Year [---------] +129,234% ### Mentions: [--] [#](/creator/twitter::819861340294524928/posts_active)  - [--] Month [--] -10% - [--] Months [---] +163% - [--] Year [---] +5,408% ### Followers: [------] [#](/creator/twitter::819861340294524928/followers)  - [--] Week [------] +1.50% - [--] Month [------] +5.90% - [--] Months [------] +110% - [--] Year [------] +274% ### CreatorRank: [-------] [#](/creator/twitter::819861340294524928/influencer_rank)  ### Social Influence **Social category influence** [technology brands](/list/technology-brands) [travel destinations](/list/travel-destinations) [stocks](/list/stocks) [social networks](/list/social-networks) [countries](/list/countries) [finance](/list/finance) [cryptocurrencies](/list/cryptocurrencies) [automotive brands](/list/automotive-brands) [nfts](/list/nfts) [exchanges](/list/exchanges) **Social topic influence** [ai](/topic/ai), [university of](/topic/university-of) #328, [bytedance](/topic/bytedance), [tencent](/topic/tencent), [shanghai](/topic/shanghai), [llm](/topic/llm) #633, [huawei](/topic/huawei), [alibaba](/topic/alibaba), [china](/topic/china), [open ai](/topic/open-ai) **Top accounts mentioned or mentioned by** [@precedent_vice](/creator/undefined) [@openai](/creator/undefined) [@shunyuyao12](/creator/undefined) [@zaiorg](/creator/undefined) [@kimimoonshot](/creator/undefined) [@alibabaqwen](/creator/undefined) [@deepseekai](/creator/undefined) [@alibabagroup](/creator/undefined) [@rfsharko](/creator/undefined) [@contextrixai](/creator/undefined) [@256](/creator/undefined) [@korbitai](/creator/undefined) [@ssoni83588](/creator/undefined) [@papercopilot](/creator/undefined) [@wonderingcamel](/creator/undefined) [@tianhongli6](/creator/undefined) [@deepseekais](/creator/undefined) [@taykolasinski](/creator/undefined) [@ylecun](/creator/undefined) [@randallbalestr](/creator/undefined) **Top assets mentioned** [Alphabet Inc Class A (GOOGL)](/topic/$googl) [Microsoft Corp. (MSFT)](/topic/microsoft) [Robot Consulting Co., Ltd. (LAWR)](/topic/robot) [Alibaba Group (BABA)](/topic/alibaba-group) [Voxels (voxels)](/topic/voxels) [IBM (IBM)](/topic/ibm) ### Top Social Posts Top posts by engagements in the last [--] hours "Canadian education technology startup Korbit Technologies (@korbit_ai) has introduced a personalized AI-powered learning experience that it says can help all students learn faster and better in a cost-effective way. #ArtificialIntelligence #startups https://medium.com/syncedreview/bengio-backed-startup-korbit-introduces-stem-intelligent-tutoring-system-d20b3e0d4128 https://medium.com/syncedreview/bengio-backed-startup-korbit-introduces-stem-intelligent-tutoring-system-d20b3e0d4128" [X Link](https://x.com/jiqizhixin/status/1266119353994444800) 2020-05-28T21:29Z 14.7K followers, [--] engagements "GRPO just got a speed boost Xiamen University introduced Completion Pruning Policy Optimization (CPPO) which significantly reduces the number of gradient calculations and updates. How fast On GSM8K it's [----] faster than GRPO and on MATH the speedup is [----]. 🚀🔥" [X Link](https://x.com/jiqizhixin/status/1906586137864585251) 2025-03-31T05:55Z 12.1K followers, 28.3K engagements "Hugging Face has acquired the robotics startup Pollen Robotics according to Fortune. https://fortune.com/2025/04/14/ai-company-hugging-face-buys-humanoid-robot-company-pollen-robotics-reachy-2/ https://fortune.com/2025/04/14/ai-company-hugging-face-buys-humanoid-robot-company-pollen-robotics-reachy-2/" [X Link](https://x.com/jiqizhixin/status/1911771062041190734) 2025-04-14T13:18Z 12.7K followers, [---] engagements "Pangu models from Huawei" [X Link](https://x.com/jiqizhixin/status/1935989058360189124) 2025-06-20T09:12Z 14K followers, [----] engagements "VLMs for embodied agents just got a major upgrade. Introducing World-Aware Planning Narrative Enhancement (WAP) a framework that gives vision-language models true environmental understanding for complex long-horizon tasks. Key upgrades: 🧠 Visual modeling 📐 Spatial reasoning 🔧 Functional abstraction 🗣 Syntactic grounding" [X Link](https://x.com/jiqizhixin/status/1939252761167835152) 2025-06-29T09:21Z 12.2K followers, 12.7K engagements "You might want to know about Shengjia Zhao newly appointed Chief Scientist of Meta's Superintelligence Labs (MSL) by Mark Zuckerberg. Tsinghua (BS) Stanford (PhD in CS) Awarded ICLR [----] Outstanding Paper for first-authored work "Comparing Distributions by Measuring Differences that Affect Decision Making" Ex-OpenAI a contributor to flagship AI projects including ChatGPT and the GPT-4 series (GPT-4 GPT-4.1 GPT-4o) https://twitter.com/i/web/status/1949014826824479102 https://twitter.com/i/web/status/1949014826824479102" [X Link](https://x.com/jiqizhixin/status/1949014826824479102) 2025-07-26T07:52Z 15K followers, [----] engagements "@SSoni83588 Chinese AI companies typically maintain separate operations for domestic and international markets. As for Doubao AI chatbot from ByteDance: Mainland China version: Global version (as Cici): https://www.cici.com/ https://www.doubao.com/chat/ https://www.cici.com/ https://www.doubao.com/chat/" [X Link](https://x.com/jiqizhixin/status/1949024266969833564) 2025-07-26T08:29Z 12.4K followers, [---] engagements "ByteDance is exploring diffusion LLMs too 👀 Seed Diffusion Preview: a blazing-fast LLM for code built on discrete-state diffusion. With [----] tokens/sec inference on H20 GPUs it outpaces Mercury & Gemini Diffusion while matching their performance on standard code benchmarks. New SOTA on the speedquality Pareto frontier. 🚀" [X Link](https://x.com/jiqizhixin/status/1951092714164101590) 2025-08-01T01:28Z 11.5K followers, 46.8K engagements "Chinas media says the nations digital push under the 14th Five-Year Plan is paying off now among the worlds leaders. By June 2025: 📡 4.55M 5G base stations 🌐 226M gigabit broadband users 💻 2nd-largest total computing power 📊 400K+ data enterprises in [----] 💰 Data industry worth 5.86T up 117% from five years ago" [X Link](https://x.com/jiqizhixin/status/1955859149377753399) 2025-08-14T05:08Z 10.4K followers, [---] engagements "According to the AD Scientific Index [----] Yoshua Bengio is the most cited researcher in history with over 973k citations. The second is Geoffrey Hinton with more than 952k citations. Kaiming He ranks fifth with over 733k citations. Ilya Sutskever ranks seventh with over [------] citations" [X Link](https://x.com/jiqizhixin/status/1959863366157263132) 2025-08-25T06:20Z 13K followers, [----] engagements "Results on LiveMCP-101 showing model performance in terms of task success rate (TSR) average result score (ARS) average trajectory score (ATS) average token consumption and average tool calls. (a) TSR (%) vs. ARS (%) with color encoding ATS (%). (b) TSR (%) vs. average tokens per task with color encoding average tool calls" [X Link](https://x.com/jiqizhixin/status/1961262141220196360) 2025-08-29T02:58Z 10.5K followers, [---] engagements "📬 #PapersAccepted by Jiqizhixin Our report: LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries Duke University Zoom Video Communications Paper: https://arxiv.org/abs/2508.15760 https://mp.weixin.qq.com/s/6H2P2YVjBnpCa9yVX53fnw https://arxiv.org/abs/2508.15760 https://mp.weixin.qq.com/s/6H2P2YVjBnpCa9yVX53fnw https://arxiv.org/abs/2508.15760 https://mp.weixin.qq.com/s/6H2P2YVjBnpCa9yVX53fnw https://arxiv.org/abs/2508.15760 https://mp.weixin.qq.com/s/6H2P2YVjBnpCa9yVX53fnw" [X Link](https://x.com/jiqizhixin/status/1961262145339101574) 2025-08-29T02:58Z 10.5K followers, [---] engagements "14B beats 671B Microsofts rStar2-Agent surpasses DeepSeek-R1 in mathematical reasoning. Star2-Agent-14B trained with agentic RL to reach frontier-level performance. Key innovations: ⚡ Efficient RL infra (Python env + [--] MI300X GPUs) 🔄 GRPO-RoC rollout strategy for noisy code tools 🧠 Multi-stage RL recipe advanced cognitive behaviors In just [---] RL steps / [--] week it scores 80.6% (AIME24) & 69.8% (AIME25) surpassing DeepSeek-R1 (671B) with shorter responses. Also generalizes to alignment science reasoning & tool use" [X Link](https://x.com/jiqizhixin/status/1962723633308246394) 2025-09-02T03:45Z 10.5K followers, [----] engagements "What if diffusion-based LLMs didnt have to waste compute predicting (and discarding) redundant tokens A new paper from Duke University introduces Diffusion Scratchpad (DPad) a training-free tweak that trims unnecessary suffix tokens via a sliding window + distance-decay dropout. Result: up to major speedups on dLLMs like LLaDA-1.5 & Dream with no accuracy loss. 🚀 https://twitter.com/i/web/status/1965595863201579244 https://twitter.com/i/web/status/1965595863201579244" [X Link](https://x.com/jiqizhixin/status/1965595863201579244) 2025-09-10T01:59Z 14.2K followers, [----] engagements "How do we benchmark AI beyond exam-style puzzles or trivial user queries A new paper proposes UQ (Unsolved Questions) a benchmark built from [---] tough unanswered Stack Exchange questions across CS theory math sci-fi history & more. Innovations: - Curated pipeline with LLM judges + human review - Validator-assisted screening to pre-check answers - Open platform for expert verification 📉 Current frontier models solve only 15%. 📈 Each success = new real-world knowledge. A bold shift: benchmarks that grow with unsolved human questions. https://twitter.com/i/web/status/1967511497669767186" [X Link](https://x.com/jiqizhixin/status/1967511497669767186) 2025-09-15T08:51Z 14.6K followers, [----] engagements "Kinematic-aware generation for next-gen animation & motion tasks Stability AI presents: Stable Part Diffusion 4D (SP4D) From a single video SP4D generates paired RGB + kinematic part videos going beyond appearance-based segmentation to capture true articulation. Key ideas: - Dual-branch diffusion (RGB + parts) - Spatial color encoding flexible part counts shared VAE - BiDiFuse + contrastive loss temporal & spatial consistency - New KinematicParts20K dataset (20K rigged objects) Results: ✨ Lift 2D part maps 3D skeletons & skinning weights 🌍 Generalizes to real-world novel objects rare poses" [X Link](https://x.com/jiqizhixin/status/1970332094049165610) 2025-09-23T03:39Z 10.4K followers, [----] engagements "Qwen Qwen Qwen Qwen" [X Link](https://x.com/jiqizhixin/status/1970664316727894374) 2025-09-24T01:39Z 12.8K followers, [---] engagements "What if true AI agency doesnt come from more databut from the right data ⚡🤖 A new paper defines Agency as AIs capacity to autonomously discover problems form hypotheses and execute solutionsmarking the shift from thinking systems to working systems. Enter LIMI (Less Is More for Intelligent Agency): - Trained on just [--] curated demonstrations of autonomous behavior - Achieves 73.5% on agency benchmarksbeating models trained on 10000+ samples - Outperforms leading systems like Kimi-K2-Instruct (24.1%) Qwen3 (27.5%) GLM-4.5 (45.1%) and DeepSeek-V3.1 (11.9%)" [X Link](https://x.com/jiqizhixin/status/1971777010658955436) 2025-09-27T03:20Z 10.8K followers, [----] engagements "What if 3D models could be generated with precise cross-modal controlbeyond just text or images Tencent presents Hunyuan3D-Omni a unified framework that accepts point clouds voxels bounding boxes and skeletal priors enabling fine-grained controllable 3D asset creation. Built for games film and design. Model available on Hugging Face" [X Link](https://x.com/jiqizhixin/status/1974279697912770841) 2025-10-04T01:05Z 12.2K followers, [----] engagements "How can LLMs evolve continually in real-world industry without forgetting past tasks Enter: MoE-CL a parameter-efficient adversarial mixture-of-experts framework for continual instruction tuning: - Dedicated LoRA experts per task preserve task knowledge - Shared LoRA expert + task-aware discriminator transfer only task-relevant info - Adversarial learning balances retention & generalization Tested on public & industrial benchmarks (incl. Tencent Video) MoE-CL cut manual review costs by 15.3% proving scalable & practical for real-world deployment" [X Link](https://x.com/jiqizhixin/status/1974284117874520448) 2025-10-04T01:23Z 12.7K followers, [----] engagements "📬 #PapersAccepted by Jiqizhixin Our report: Self-Evolving LLMs via Continual Instruction Tuning Beijing University of Posts and Telecommunications Tencent AI Lab Paper: Code: https://github.com/BAI-LAB/MoE-CL https://arxiv.org/abs/2509.18133 https://mp.weixin.qq.com/s/RuWZSV6xDfdESSxrXjv4_g https://github.com/BAI-LAB/MoE-CL https://arxiv.org/abs/2509.18133 https://mp.weixin.qq.com/s/RuWZSV6xDfdESSxrXjv4_g https://github.com/BAI-LAB/MoE-CL https://arxiv.org/abs/2509.18133 https://mp.weixin.qq.com/s/RuWZSV6xDfdESSxrXjv4_g https://github.com/BAI-LAB/MoE-CL https://arxiv.org/abs/2509.18133" [X Link](https://x.com/jiqizhixin/status/1974284121540276574) 2025-10-04T01:23Z 12.7K followers, [---] engagements "Can autonomous driving think like it sees not just reason symbolically Alibaba and other propose a spatio-temporal Chain-of-Thought (CoT) that lets visual language models (VLMs) reason visually generating imagined future frames to plan trajectories. By unifying visual generation + understanding the model acts as a world simulator predicting how the scene evolves over time not just describing it. 📈 Results show stronger visual reasoning and planning moving autonomous driving beyond text-based logic toward true simulation-based intelligence. This paper has been accepted as a NeurIPS 2025" [X Link](https://x.com/jiqizhixin/status/1975407572569235778) 2025-10-07T03:47Z 12.4K followers, [---] engagements "How can we make LLMs actually use the context theyre given Meet CARE a native retrieval-augmented reasoning framework that teaches models to explicitly integrate evidence into their own thought process. Instead of relying on heavy supervised fine-tuning or external web searches CARE lets the model retrieve and reason internally weaving relevant in-context tokens directly into its reasoning chain. Across real-world and counterfactual QA benchmarks CARE delivers higher retrieval accuracy and more reliable answers than traditional RAG or supervised approaches. 🧠 The result: context-faithful" [X Link](https://x.com/jiqizhixin/status/1976491312405991509) 2025-10-10T03:33Z 10.4K followers, [----] engagements "Well you may not need fine-tuning anymore. Meet ACE (Agentic Context Engineering) a framework that turns LLM contexts into living adaptive playbooks that grow and refine over time. Unlike traditional context-tuning (which suffers from brevity bias and context collapse) ACE uses structured generation reflection curation cycles to preserve rich domain insights and scale with long-context models. Results: ✅ +10.6% on agent benchmarks ✅ +8.6% on finance reasoning ✅ Lower latency & rollout cost Matches top production agents on AppWorld and beats them on harder tests all with smaller open-source" [X Link](https://x.com/jiqizhixin/status/1976903463360774380) 2025-10-11T06:51Z 10.5K followers, [----] engagements "Our report: Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models Stanford SambaNova UC Berkeley Paper: https://www.arxiv.org/abs/2510.04618 https://mp.weixin.qq.com/s/f-1h0Q-QKOWghJb7Fmrvtw https://www.arxiv.org/abs/2510.04618 https://mp.weixin.qq.com/s/f-1h0Q-QKOWghJb7Fmrvtw https://www.arxiv.org/abs/2510.04618 https://mp.weixin.qq.com/s/f-1h0Q-QKOWghJb7Fmrvtw https://www.arxiv.org/abs/2510.04618 https://mp.weixin.qq.com/s/f-1h0Q-QKOWghJb7Fmrvtw" [X Link](https://x.com/jiqizhixin/status/1976903467181785184) 2025-10-11T06:51Z 10.4K followers, [---] engagements "Ever wondered how LLMs evolve from predicting the next token to following your instructions Post-training 101: A hitchhiker's guide into LLM post-training This is a new guide breaks down the basics of LLM post-training covering the full journey from pre-training to instruction tuning: 🔹 Transitioning from language modeling to instruction following 🔹 Supervised Fine-Tuning (SFT) data curation objectives and losses 🔹 Reinforcement Learning methods RLHF RLAIF RLVR and how reward models work 🔹 Evaluation frameworks for measuring post-training quality Link:" [X Link](https://x.com/jiqizhixin/status/1977194258596561211) 2025-10-12T02:07Z 10.4K followers, 34.5K engagements "Robots can now learn to act better through trial and error A new study from Tsinghua Shanghai Qi Zhi Institute and Zhongguancun Academy puts Reinforcement Learning (RL) to the test for Vision-Language-Action (VLA) models. Unlike standard supervised fine-tuning (SFT) which struggles with compounding errors RL directly optimizes for task success. The researchers built a comprehensive benchmark to study how RL affects generalization across: 👀 Visual shifts 🧩 Semantic understanding 🦾 Action execution Key findings: - RL (especially PPO) boosts semantic and execution robustness - Maintains" [X Link](https://x.com/jiqizhixin/status/1978004521419985240) 2025-10-14T07:46Z 10.4K followers, [---] engagements "Are Gaussian Splatting's limitations holding back the future of 3D surface reconstruction 🤔 Enter GeoSVR a novel framework that leverages sparse voxels to create stunningly accurate detailed and complete 3D surfaces. By using a Voxel-Uncertainty Depth Constraint and Sparse Voxel Surface Regularization GeoSVR overcomes common challenges in the field ensuring geometric consistency and sharp details. Experiments show it outperforms existing methods in accuracy and completeness especially in difficult scenarios" [X Link](https://x.com/jiqizhixin/status/1978284197556183072) 2025-10-15T02:18Z 12.4K followers, [----] engagements "📬 #PapersAccepted by Jiqizhixin Our report: GeoSVR: Taming Sparse Voxels for Geometrically Accurate Surface Reconstruction Beihang University Rawmantic AI and others Paper: Project: Code: https://github.com/Fictionarry/GeoSVR https://fictionarry.github.io/GeoSVR-project/ https://arxiv.org/abs/2509.18090 https://mp.weixin.qq.com/s/QA4mY7YL3rsVGHl0QONQdQ https://fictionarry.github.io/GeoSVR-project/ https://arxiv.org/abs/2509.18090 https://mp.weixin.qq.com/s/QA4mY7YL3rsVGHl0QONQdQ https://github.com/Fictionarry/GeoSVR https://fictionarry.github.io/GeoSVR-project/" [X Link](https://x.com/jiqizhixin/status/1978284201800802580) 2025-10-15T02:18Z 10.8K followers, [---] engagements "Can diffusion-based LLMs outpace traditional autoregressive models ⚡🧠 Meet dInfer the first efficient modular framework for inference on diffusion-based large language models (dLLMs) a new generation of parallel text generators. dInfer breaks inference into four key modules: - Model core architecture integration - Diffusion iteration manager orchestrates denoising steps - KV-cache manager optimizes memory reuse - Decoding strategy balances speed and quality With both algorithmic and system-level optimizations dInfer hits [----] tokens/sec on HumanEval and 800+ tokens/sec across benchmarks on" [X Link](https://x.com/jiqizhixin/status/1978662709773373856) 2025-10-16T03:22Z 14.2K followers, [---] engagements "Test-Time Scaling Law for robots just revealed. Meet RoboMonkey a clever framework that boosts Vision-Language-Action (VLA) models by scaling sampling and verification during inference. Researchers first uncover a key insight: VLA action errors follow a power-law decay with more samples revealing an inference-time scaling law. Building on that RoboMonkey: - Samples multiple candidate actions with Gaussian noise - Uses majority voting to form an action proposal distribution - Employs a VLM-based verifier (trained on synthetic data) to pick the best move The result 🚀 +25% on" [X Link](https://x.com/jiqizhixin/status/1978712036231217407) 2025-10-16T06:38Z 10.4K followers, [----] engagements "What is AGI Dan Hendrycks Yoshua Bengio Eric Schmidt Gary Marcus Max Tegmark and many others just released A Definition of AGI. Basically AGI is an AI that can match or exceed the cognitive versatility and proficiency of a well-educated adult. And no surprise GPT-4 and GPT-5 perform very poorly on the ten core cognitive components of their standard" [X Link](https://x.com/jiqizhixin/status/1979019210870395155) 2025-10-17T02:58Z 12.8K followers, 17K engagements "RL keeps evolving Now you can teach LLMs to reason better by rewarding risk-taking. Risk-based Policy Optimization (RiskPO) is a new reinforcement learning framework for post-training LLMs. Instead of averaging rewards like GRPO RiskPO uses a Mixed Value-at-Risk objective to: - Emphasize rare but informative reasoning paths - Prevent entropy collapse and overconfidence - Encourage deeper exploration Plus a smart bundling scheme enriches feedback for more stable training. Results: Big gains in math multimodal and code reasoning beating GRPO on both Pass@1 and Pass@k" [X Link](https://x.com/jiqizhixin/status/1979034046610002125) 2025-10-17T03:57Z 10.4K followers, [----] engagements "How well can multimodal LLMs understand long-distance travel videos Enter VIR-Bench a new benchmark with [---] real-world travel videos that challenges models to reconstruct itineraries and reason over extended geospatial-temporal trajectories. 🚗 Why it matters: mastering long-range video reasoning is key for embodied-AI planning and autonomous navigation. Findings: even top MLLMs struggle revealing major gaps in long-horizon understanding. A prototype travel agent built on VIR-Bench shows clear performance gains proving the benchmarks real-world value" [X Link](https://x.com/jiqizhixin/status/1979098765920473265) 2025-10-17T08:14Z 10.8K followers, [----] engagements "📬 #PapersAccepted by Jiqizhixin Our report: VIR-Bench: Evaluating Geospatial and Temporal Understanding of MLLMs via Travel Video Itinerary Reconstruction Waseda University CyberAgent and others Paper: Code: https://github.com/nlp-waseda/VIR-Bench https://www.arxiv.org/abs/2509.19002 https://mp.weixin.qq.com/s/uXAHAQdaA5EQRhDiMFqlGg https://www.arxiv.org/abs/2509.19002 https://mp.weixin.qq.com/s/uXAHAQdaA5EQRhDiMFqlGg https://github.com/nlp-waseda/VIR-Bench https://www.arxiv.org/abs/2509.19002 https://mp.weixin.qq.com/s/uXAHAQdaA5EQRhDiMFqlGg https://www.arxiv.org/abs/2509.19002" [X Link](https://x.com/jiqizhixin/status/1979098770844602868) 2025-10-17T08:14Z 11.3K followers, [---] engagements "Today's #1 Paper on Hugging Face Agentic Entropy-Balanced Policy Optimization (AEPO) With this method we can train smarter and more capable AI web agents without their learning processes collapsing. Its a reinforcement learning (RL) algorithm that addresses a key instability issue. Existing methods often over-rely on entropy (uncertainty) leading to training failures. AEPO intelligently balances this entropy during both exploration and policy updates. It uses a dynamic rollout that prevents the agent from getting stuck in uncertain loops and a novel optimization technique to learn from tricky" [X Link](https://x.com/jiqizhixin/status/1979161839121707043) 2025-10-17T12:25Z 12K followers, [----] engagements "Agentic Entropy-Balanced Policy Optimization Renmin University of China Kuaishou Technology Paper: Code: https://github.com/dongguanting/ARPO https://huggingface.co/papers/2510.14545 https://github.com/dongguanting/ARPO https://huggingface.co/papers/2510.14545 https://github.com/dongguanting/ARPO https://huggingface.co/papers/2510.14545 https://github.com/dongguanting/ARPO https://huggingface.co/papers/2510.14545" [X Link](https://x.com/jiqizhixin/status/1979161843890520288) 2025-10-17T12:25Z 10.7K followers, [---] engagements "Can high school geometry teach AI to understand space 📐 A new study tackles the critical challenge of spatial intelligence in Multimodal Large Language Models (MLLMs). Researchers found that fine-tuning models on Euclid30K a new dataset of [-----] Euclidean geometry problems confers broadly transferable spatial skills. After this geometry-centric training models achieved substantial zero-shot gains across four separate spatial reasoning benchmarks without any task-specific adaptation. For instance the average accuracy on the VSI-Bench benchmark rose from 34.5% to 40.5% showing this is a" [X Link](https://x.com/jiqizhixin/status/1980053599452303761) 2025-10-19T23:29Z 10.4K followers, [----] engagements "Can todays LLMs safely stay on mission A new study introduces operational safety an LLMs ability to accept or refuse queries appropriately within its intended use. Researchers benchmarked [--] open-weight models and found all remain highly unsafe for real-world deployment: - Qwen-3 (235B): 77.8% - Mistral (24B): 80% - GPTs: 6273% - Gemma & Llama-3: collapse to 40% 24% To fix this they propose prompt-based steering (Q-ground & P-ground) boosting safety by up to +41%. 📬 #PapersAccepted by Jiqizhixin Our report: OffTopicEval: When Large Language Models Enter the Wrong Chat Almost Always Nanyang" [X Link](https://x.com/jiqizhixin/status/1980157765751554555) 2025-10-20T06:22Z 10.4K followers, [----] engagements "Wow Multi-modal Diffusion Mamba MDM is a breakthrough architecture that fuses all modalities through a unified variational autoencoder and a Mamba-based multi-step diffusion process. Instead of separating image and text streams MDM jointly learns and refines representations enabling high-res image generation long-form text synthesis and visual QA & reasoning. MDM outperforms MonoFormer LlamaGen and Chameleon and rivals GPT-4V Gemini Pro and Mistral all while staying computationally efficient" [X Link](https://x.com/jiqizhixin/status/1980161738084602036) 2025-10-20T06:38Z 10.5K followers, [----] engagements "A big step toward stable scalable LLM agent training Rutgers University & Adobe just identifies a key pitfall in LLM agent training: the explorationexploitation cascade failure where agents first prematurely converge to bad strategies then collapse into chaotic exploration. To fix this they propose Entropy-regularized Policy Optimization (EPO) which: [--] Smooths entropy to prevent instability [--] Balances exploration & exploitation adaptively [--] Ensures monotonic entropy variance reduction Results: +152% on ScienceWorld +19.8% on ALFWorld. 📬 #PapersAccepted by Jiqizhixin Our report: EPO:" [X Link](https://x.com/jiqizhixin/status/1980299972684775584) 2025-10-20T15:48Z 12.1K followers, [----] engagements "Can AI think while it speaks Meet VERA (Voice Evaluation of Reasoning Ability) the first benchmark testing reasoning in real-time voice-interactive systems. 💡 [----] voice-native tasks across [--] tracks (Math Web Science Long-Context Factual) reveal a striking modality gap: - Text model: 74.8% (Math) 54.0% avg - Voice model: 6.1% (Math) 11.3% avg - Even adding thinking time barely helps real-time voice AIs still trade accuracy for fluency. 📬 #PapersAccepted by Jiqizhixin Our report: Voice Evaluation of Reasoning Ability: Diagnosing the Modality-Induced Performance Gap Duke University Adobe" [X Link](https://x.com/jiqizhixin/status/1980481139807785417) 2025-10-21T03:47Z 11.3K followers, [----] engagements "Huge ByteDance just unveiled their LLM training infrastructure ByteRobust is their GPU infrastructure system built for robust and continuous LLM training. It tackles common failuressuch as CUDA errors NaNs and job hangswith: - High-capacity fault tolerance - Fast fault demarcation and localization - Data-driven failure recovery Result: Deployed across [----] GPUs ByteRobust achieves a 97% Effective Training Time Ratio (ETTR) over a three-month LLM training jobkeeping massive training pipelines stable and efficient" [X Link](https://x.com/jiqizhixin/status/1980520199158890529) 2025-10-21T06:23Z 10.8K followers, [----] engagements "Robust LLM Training Infrastructure at ByteDance https://arxiv.org/abs/2509.16293 https://arxiv.org/abs/2509.16293 https://arxiv.org/abs/2509.16293 https://arxiv.org/abs/2509.16293" [X Link](https://x.com/jiqizhixin/status/1980520204108279863) 2025-10-21T06:23Z 10.4K followers, [---] engagements "Another breakthrough in world models VAGEN: Reinforcing World Model Reasoning for Multi-Turn VLM Agents A new paper explores a frontier that enables vision-language (VLM) agents to build internal world models much like LLMs reason through text. By framing perception as a Partially Observable MDP the authors decompose reasoning into: - State Estimation Whats happening now - Transition Modeling What happens next They introduce: - World Modeling Reward for dense turn-level feedback - Bi-Level GAE for turn-aware credit assignment A 3B VLM agent scores [----] across [--] benchmarks surpassing GPT-5" [X Link](https://x.com/jiqizhixin/status/1980555086674964886) 2025-10-21T08:41Z 10.5K followers, [----] engagements "You can now generate 4-minute-long videos UCLA ByteDance and UCF have just released a new paper on this. It tackles a core challenge: long-horizon video quality collapse caused by error accumulation when models generate beyond their training length. Their simple but powerful solution: use the teachers own knowledge to guide the student through self-generated long segments no long-video data or retraining needed. ✨ Key results: - Scales video length [--] beyond teachers limit - Generates [--] min [--] sec videos (99.9% of positional span) - Fixes over-exposure & drift without overlap recomputation -" [X Link](https://x.com/jiqizhixin/status/1980711685686997412) 2025-10-21T19:04Z 11.4K followers, 15.7K engagements "Atlas is OpenAIs Mac browser built on Chromium" [X Link](https://x.com/jiqizhixin/status/1980798258881745081) 2025-10-22T00:48Z 10.5K followers, [---] engagements "When using AI browsers like ChatGPT Atlas or Comet you need to be extra careful Brave just released a report warning about a major threat: unseeable prompt injections in screenshots. Thats right: attackers can embed malicious instructions in web content that are invisible or barely noticeable to humans For example they might hide prompt injection commands inside images using faint light-blue text on a yellow background effectively concealing the malicious instructions from the user" [X Link](https://x.com/jiqizhixin/status/1980899350944547234) 2025-10-22T07:29Z 10.5K followers, [----] engagements "In fact Tsinghua University and Zhipu AI are conducting research similar to DeepSeek-OCR an approach that enables large language models (LLMs) to process up to a million tokens effortlessly. They introduce Glyph a framework that converts long text sequences into images and feeds them to vision-language models. This visual compression technique achieves a [--] reduction in token count speeds up processing by approximately [--] and still matches the performance of top-tier LLMsunlocking million-token contexts and enhancing multimodal tasks such as document understanding" [X Link](https://x.com/jiqizhixin/status/1980910306844160165) 2025-10-22T08:13Z 10.5K followers, [----] engagements "Glyph: Scaling Context Windows via Visual-Text Compression Tsinghua University Zhipu AI https://arxiv.org/abs/2510.17800 https://arxiv.org/abs/2510.17800 https://arxiv.org/abs/2510.17800 https://arxiv.org/abs/2510.17800" [X Link](https://x.com/jiqizhixin/status/1980910311487205500) 2025-10-22T08:13Z 10.5K followers, [---] engagements "How can AI truly understand long videos without massive retraining or proprietary models Video-RAG might be an answer. It's a training-free plug-and-play method that boosts long video comprehension by retrieving visually aligned auxiliary textsfrom audio OCR and object cuesand feeding them into existing LVLMs. Its lightweight open and even outperforms Gemini-1.5-Pro and GPT-4o on long-video benchmarks. 📬 #PapersAccepted by Jiqizhixin Our report: Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension Xiamen University University of Rochester Project: Paper: Code:" [X Link](https://x.com/jiqizhixin/status/1980910746922787117) 2025-10-22T08:15Z 10.5K followers, [----] engagements "The Bumi robot is about to go on sale Developed by Noetix Robotics Bumi stands [----] meters tall weighs just [--] kilograms and features [--] degrees of freedom (DOF). It comes equipped with visual and speech understanding capabilitiesand it can even dance. The price is [----] RMB roughly [----] USD" [X Link](https://x.com/jiqizhixin/status/1980930804823044501) 2025-10-22T09:34Z 10.8K followers, 15.1K engagements "The latest issue of DeepLearning AIs The Batch newsletter contains several important updates: - Ling-1T leads non-reasoning performance. - MCP Poses Security Risks - California Builds AI Regulatory Regime - Better Agentic Prompts Automically" [X Link](https://x.com/jiqizhixin/status/1981893399512007050) 2025-10-25T01:19Z 10.5K followers, [---] engagements "Link: https://info.deeplearning.ai/ling-1t-leads-non-reasoning-performance-mcp-poses-security-risks-california-regulates-ai-auto-tune-for-agentic-prompts-1 https://info.deeplearning.ai/ling-1t-leads-non-reasoning-performance-mcp-poses-security-risks-california-regulates-ai-auto-tune-for-agentic-prompts-1" [X Link](https://x.com/jiqizhixin/status/1981893402687164523) 2025-10-25T01:19Z 10.5K followers, [---] engagements "Huge Ant Group Ling Team just unveiled the Ring-linear [---] series: Ring-mini-linear-2.0 (16B) and Ring-flash-linear-2.0 (104B). They are hybrid models combining linear and softmax attention for efficient long-context inference. They cut inference cost to 1/10 of a 32B dense model and improve training efficiency by 50% with a custom FP8 operator library while maintaining state-of-the-art reasoning performance" [X Link](https://x.com/jiqizhixin/status/1982738019846340938) 2025-10-27T09:15Z 12.9K followers, [----] engagements "How far can todays reasoning models really think ahead Fudan & Meituan researchers introduce R-HORIZON a benchmark and training paradigm targeting long-horizon reasoning tasks that require sustained multi-step interdependent reasoning rather than short single-turn answers. Their evaluations reveal that even top Large Reasoning Models (LRMs) degrade sharply as reasoning horizons extend. Using R-HORIZON for reinforcement learning with verified rewards (RLVR) notably boosts both long-horizon and standard reasoning performance showing that R-HORIZON offers a scalable low-cost path to train models" [X Link](https://x.com/jiqizhixin/status/1982875191303713019) 2025-10-27T18:21Z 12.1K followers, [----] engagements "Huge breakthrough from DeepMind In their latest Nature paper Discovering state-of-the-art reinforcement learning algorithms they show that AI can autonomously discover better RL algorithms. "Enabling machines to discover learning algorithms for themselves is one of the most promising ideas in AI." Could the next generation of RL algorithms be machine-discovered BTW the study was led by AlphaGos creator David Silver" [X Link](https://x.com/jiqizhixin/status/1983016412471042277) 2025-10-28T03:42Z 11.3K followers, 135.6K engagements "Wow big release from Ant Group Ever wondered why Ling 1T though not designed as a reasoning model demonstrates such impressive reasoning power Ling [---] is a new reasoning-oriented language foundation built on one principle: every activation should enhance reasoning. The report shares everything about how Ant Group trains its massive models. A rare show of true open-source spirit" [X Link](https://x.com/jiqizhixin/status/1983068785088295029) 2025-10-28T07:10Z 11.5K followers, 19.1K engagements "Now you can fine-tune your local LLMs on your iPhone. Apple presents MeBP Memory-efficient BackPropagation a new method that makes on-device fine-tuning practical. Unlike zeroth-order optimization (ZO) which needs [-----] more steps MeBP achieves faster convergence and stronger performance all while using under 1GB of memory on an iPhone [--] Pro Max for models up to 4B parameters. A big leap toward personalized on-device LLM adaptation" [X Link](https://x.com/jiqizhixin/status/1983377112208986329) 2025-10-29T03:35Z 11.7K followers, [----] engagements "Can AI truly be creative DeepMind just introduced an RL-based framework that teaches Generative AI to create original counter-intuitive chess puzzles using novel rewards derived from chess engine search statistics. The results are striking: counter-intuitive puzzles increase [--] (from 0.22% to 2.5%) surpassing top datasets and Lichess-trained models while maintaining aesthetic depth and human-rated creativity. Experts even judged many of these AI puzzles as approaching the artistry of classic human compositionsa remarkable step toward genuine machine creativity" [X Link](https://x.com/jiqizhixin/status/1983733550735298649) 2025-10-30T03:11Z 10.9K followers, [----] engagements "Can LLMs truly master long-horizon reasoning without crumbling under complexity This study says yes with AgentFlow. It's a trainable agentic framework that decomposes tasks across planner executor verifier & generator modules optimized in-the-flow via Flow-GRPO. A 7B model beats GPT-4o by up to 14.9% across search math & science benchmarks. In-the-Flow Agentic System Optimization for Effective Planning and Tool Use Stanford Texas A&M UC San Diego Lambda Project: Paper: Code: Model: Demo: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/30cvMoQADYj1_Cr2yLDopg" [X Link](https://x.com/jiqizhixin/status/1983754485408145803) 2025-10-30T04:35Z 11.9K followers, 13.3K engagements "How can we detoxify LLMs without dulling their intelligence ARGRE is here to help. Autoregressive Reward-Guided Representation Editing is a new test-time detoxification framework that learns to navigate the fine-grained transition from toxic to non-toxic language inside an LLMs latent space. By modeling these toxicity trajectories ARGRE builds an autoregressive reward model that steers representations toward safe regions with precise lightweight edits. Across [--] major LLMs it cuts toxicity by 62% reduces inference time by 48% and preserves core capabilities. Detoxifying Large Language Models" [X Link](https://x.com/jiqizhixin/status/1984047919159230661) 2025-10-31T00:01Z 10.8K followers, [----] engagements "Cool Fast-dLLM v2. It's a block diffusion language model that efficiently converts pretrained autoregressive (AR) LLMs into parallel generators using just 1B tokens of fine-tuning a [---] data reduction over prior diffusion LLMs like Dream. With a block diffusion mechanism hierarchical caching and a parallel decoding pipeline Fast-dLLM v2 achieves up to [---] faster decoding while matching or surpassing AR baselines in accuracy. Fast indeed. Fast-dLLM v2: Efficient Block-Diffusion LLM HKU Nvidia MIT Paper: Project: Code: Our report: 📬 #PapersAccepted by Jiqizhixin" [X Link](https://x.com/jiqizhixin/status/1984176264647954845) 2025-10-31T08:31Z 11.7K followers, [---] engagements "Can light solve AIs power crisis While the world races to feed AIs insatiable hunger for compute two 25-year-old founders from China are betting on a bold new paradigm: optical computing powered by phase-change materials. Their startup Lightstandard has built the worlds first [------] optical computing chip integrating silicon photonics and phase-change materials to achieve massive matrix computation with ultra-low energy use. Its a major leap toward making photonic chips commercially viable for AI workloads. If successful this could redefine the future of computeenabling low-carbon" [X Link](https://x.com/jiqizhixin/status/1984181070028591495) 2025-10-31T08:50Z 11.9K followers, [----] engagements "Earth observation is crucial for understanding our planet but current AI methods struggle with complex multi-step reasoning. A new framework called Earth-Agent aims to change that Earth-Agent combines RGB and spectral Earth observation data with a toolkit of expert tools enabling sophisticated analysis like retrieving geophysical parameters and tracking changes over time. To ensure its effectiveness researchers created Earth-Bench a comprehensive set of tasks and a rigorous evaluation protocol. Experiments show Earth-Agent significantly outperforms existing approaches paving the way for more" [X Link](https://x.com/jiqizhixin/status/1984499394238955749) 2025-11-01T05:55Z 11.8K followers, [----] engagements "Speed and quality can finally coexist in diffusion-based language generation. Introducing DiDi-Instruct a Discrete Diffusion Divergence Instruct method that distills a pre-trained discrete diffusion language model (dLLM) into a few-step student for ultra-fast generation. Built on integral KL-divergence minimization DiDi-Instruct achieves up to [--] faster decoding surpasses both its teacher and GPT-2 and cuts training time by [--]. Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct Paper: Code: Project: Our report: 📬 #PapersAccepted by Jiqizhixin" [X Link](https://x.com/jiqizhixin/status/1984549725471691216) 2025-11-01T09:15Z 12.1K followers, 17.5K engagements "Microsoft open-sourced Agent Lightning. Agent Lightning is "the absolute trainer to light up AI agents". Core Features - Turn your agent into an optimizable beast with ZERO CODE CHANGE (almost) 💤 - Build with ANY agent framework (LangChain OpenAI Agent SDK AutoGen CrewAI Microsoft Agent Framework.); or even WITHOUT agent framework (Python OpenAI). You name it 🤖 - Selectively optimize one or more agents in a multi-agent system. 🎯 - Embraces Algorithms like Reinforcement Learning Automatic Prompt Optimization Supervised Fine-tuning and more. 🤗 Link:" [X Link](https://x.com/jiqizhixin/status/1984571619705307538) 2025-11-01T10:42Z 11.3K followers, [----] engagements "While multimodal LLMs show potential as embodied agents their real-world perception and reasoning abilities remain poorly understood. To fill this gap the authors present BEAR a large-scale benchmark of [----] image-video-text tasks spanning [--] domains and [--] categories systematically testing fundamental embodied capabilities from low-level perception to high-level planning. They further introduce BEAR-Agent a multimodal conversational agent that integrates pretrained vision models boosting embodied performance by 9.12% (17.5% relative) on GPT-5 marking a solid step toward truly embodied" [X Link](https://x.com/jiqizhixin/status/1984732178089963582) 2025-11-01T21:20Z 10.9K followers, [----] engagements "Reasoning uncertainty is highly localized Yes only a few high-entropy tokens truly matter. Built on this insight from this paper Minimal Test-Time Intervention (MTI) selectively applies classifier-free guidance and lightweight negative-prompt guidance only where needed reusing the models own KV cache. The result: consistent accuracy and stability gains with minimal overhead including +1.35% across [--] benchmarks on Qwen3-8B-Base and +5% on AIME2024 with Qwen3-32B-Reasoning. Less is More: Improving LLM Reasoning with Minimal Test-Time Intervention HKUST(GZ) Kuaishou and others Paper: Code:" [X Link](https://x.com/jiqizhixin/status/1985123254449947018) 2025-11-02T23:14Z 11.3K followers, [----] engagements "Can vision-language models see beyond the big picture A new method RICE rethinks how models learn by focusing on region-level understanding instead of just global features. It builds a billion-scale region dataset introduces a Region Transformer and unifies object and OCR learning under one framework. The result: RICE outperforms CLIP SigLIP and others on dense tasks like segmentation grounding and visual perception in multimodal LLMs. Region-based Cluster Discrimination for Visual Representation Learning Code: Paper: Model: Our report: 📬 #PapersAccepted by Jiqizhixin" [X Link](https://x.com/jiqizhixin/status/1985188100172394543) 2025-11-03T03:31Z 11.4K followers, 11.4K engagements "One language model can master all programming languages Enter MultiPL-MoE a hybrid mixture-of-experts framework that boosts multilingual code generation without massive retraining. It combines token-level and segment-level expert routing to capture syntax and context across diverse programming languages. With smart expert selection and efficient design MultiPL-MoE significantly improves multi-language coding performance while keeping computational costs low. MultiPL-MoE: Multi-Programming-Lingual Extension of Large Language Models through Hybrid Mixture-of-Experts Paper: Code: Our report: 📬" [X Link](https://x.com/jiqizhixin/status/1985508794605191219) 2025-11-04T00:46Z 11.8K followers, [----] engagements "AI can become a fully autonomous data scientist. DeepAnalyze-8B is the first agentic LLM capable of handling the entire data science pipelinefrom raw data to analyst-grade research reportswithout predefined workflows. It learns like a human via a curriculum-based agentic training paradigm and a data-grounded trajectory synthesis process. Despite having just 8B parameters DeepAnalyze surpasses workflow-based agents built on proprietary LLMs marking a major step toward open autonomous data science. DeepAnalyze: Agentic Large Language Models for Autonomous Data Science RUC Tsinghua Paper: Code:" [X Link](https://x.com/jiqizhixin/status/1985679168143851985) 2025-11-04T12:03Z 12.1K followers, [----] engagements "Can 3D worlds be generated in seconds from a single image or text FlashWorld introduces a breakthrough 3D-oriented generative model that creates high-quality 3D scenes [-----] faster than previous methods. It directly produces 3D Gaussian representations while ensuring realism and consistency. Through dual-mode pre-training and cross-mode distillation FlashWorld fuses the strengths of 2D and 3D paradigmsachieving stunning rendering quality strong generalization and real-time 3D generation from any prompt. FlashWorld: High-quality 3D Scene Generation within Seconds Xiamen Tencent Fudan Project:" [X Link](https://x.com/jiqizhixin/status/1985801725207347219) 2025-11-04T20:10Z 12.1K followers, [---] engagements "Better reasoning could emerge from simpler RL. ROVER rethinks RL with Verifiable Rewards (RLVR) for LLM reasoning. Instead of relying on complex policy optimization like PPO it proves that optimal actions can be derived from a uniform random policys Q-function eliminating the need for iterative policy updates. This minimalist approach preserves diversity and stability boosting math reasoning scores by +8.2 pass@1 +16.8 pass@256 and +17.6% diversityoutperforming much heavier RL methods with elegant simplicity. Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards HKUST" [X Link](https://x.com/jiqizhixin/status/1985940388893634861) 2025-11-05T05:21Z 11.5K followers, [----] engagements "Google might just crack the code on how to stop AI from kissing up They propose consistency training a self-supervised approach that makes LLMs invariant to irrelevant prompt cues (like leading questions or jailbreak text) reducing sycophancy and jailbreak susceptibility without static datasets. Two variants emerge: - Bias-augmented Consistency Training (BCT) enforcing output invariance; - Activation Consistency Training (ACT) enforcing internal invariance. Both improve factuality robustness on Gemini [---] Flash with BCT especially effective against jailbreaksreframing alignment as a" [X Link](https://x.com/jiqizhixin/status/1985962882396299318) 2025-11-05T06:50Z 12.4K followers, [----] engagements "This paper from WeChat and Tsinghua might just flip the script on today's LLM paradigm Continuous Autoregressive Language Models (CALM) replace next-token prediction with next-vector prediction compressing chunks of K tokens into a single continuous vector via a high-fidelity autoencoder (99.9% reconstructability). By modeling language as sequences of continuous vectors CALM boosts semantic bandwidth per step slashing generation steps by K and improving the performancecompute trade-off. A new likelihood-free training and sampling framework enables stable learning in this continuous domain" [X Link](https://x.com/jiqizhixin/status/1985974065882894632) 2025-11-05T07:34Z 11.4K followers, [----] engagements "Can sparse images still produce detailed 3D reconstructions Researchers propose a semantic-aware neural reconstruction method that enriches implicit 3D representations with patch-based semantic logits and introduces a geometry-regularized mask constraint to resolve radiance and shape ambiguity. On the DTU benchmark it cuts Chamfer distance by 44% vs. SparseNeuS and 20% vs. VolReconand as a plugin for dense methods like NeuS and Neuralangelo it further reduces error by 69% and 68% delivering sharper more reliable 3D models. SERES: Semantic-Aware Neural Reconstruction from Sparse Views Page:" [X Link](https://x.com/jiqizhixin/status/1986183239443067292) 2025-11-05T21:26Z 15.2K followers, [---] engagements "Breaking release OpenHands Software Agent SDK a complete redesign of the popular 64k OpenHands framework. It offers: Plug-and-play agent interfaces Sandboxed & portable execution Multi-LLM routing Built-in security analysis Benchmarks on SWE-Bench Verified & GAIA show strong results a major step toward reliable scalable software-engineering agents" [X Link](https://x.com/jiqizhixin/status/1986319650507075618) 2025-11-06T06:28Z 11.5K followers, [----] engagements "Can AI invent new math A new paper from DeepMind and renowned mathematician Terence Tao shows how. Using AlphaEvolve the team merges LLM-generated ideas with automated evaluation to propose test and refine mathematical algorithms. In tests on [--] problems across analysis geometry and number theory AlphaEvolve not only rediscovered known results but often improved upon themeven generalizing finite cases into universal formulas. Paired with DeepThink and AlphaProof it points toward a future where AI doesnt just assist mathematiciansit collaborates with them in discovery" [X Link](https://x.com/jiqizhixin/status/1986323867720376327) 2025-11-06T06:44Z 12.6K followers, 81K engagements "UniLIP: a unified framework extending CLIP beyond understanding to multimodal generation and editing. While CLIP excels at perception it lacks reconstruction ability. UniLIP fixes this with a two-stage self-distillation scheme that adds high-fidelity reconstruction without sacrificing comprehension. Built on the MetaQuery framework UniLIP introduces a dual-condition architecture that fuses multimodal hidden states (for contextual richness) with learnable query embeddings (for MLLM-style reasoning). With only 1B3B parameters UniLIP outperforms larger unified models like BAGEL (7B) and" [X Link](https://x.com/jiqizhixin/status/1986520210036191339) 2025-11-06T19:45Z 11.5K followers, [----] engagements "ReasonMed: the largest medical reasoning dataset advancing LLM performance in clinical QA. Comprising 370k curated examples distilled from 1.75M reasoning paths ReasonMed is built through a multi-agent EMD (easymediumdifficult) pipeline with generation verification and an Error Refiner that corrects faulty reasoning steps. Experiments show that combining detailed CoT reasoning with concise answer summaries yields the most robust fine-tuning outcomes. - Models trained on ReasonMed redefine the state of the art: - ReasonMed-7B outperforms all sub-10B models by +4.17% and even beats LLaMA3.1-70B" [X Link](https://x.com/jiqizhixin/status/1986628171035242614) 2025-11-07T02:54Z 12.2K followers, [----] engagements "Can LLM agents learn by dreaming 🌙🤖 DreamGym from Meta is a new framework that lets AI agents train via synthetic reasoning-based experiences instead of costly real rollouts. It models environment dynamics replays and adapts tasks and even improves sim-to-real transfer. Results: +30% gains on WebArena and PPO-level performanceusing only synthetic interactions" [X Link](https://x.com/jiqizhixin/status/1986686971331195223) 2025-11-07T06:47Z 11.5K followers, [----] engagements "Cambrian-S: Towards Spatial Supersensing in Video This paper boasts an impressive roster of advisors including Rob Fergus Yann LeCun Fei-Fei Li and Saining Xie and they aim to answer this question: can AI go beyond seeing to truly understanding space They propose spatial supersensinga leap past reactive multimodal AI toward models that perceive remember infer and predict the 3D world. Their new benchmark VSI-SUPER tests long-horizon spatial reasoning where brute-force context fails. Results: scaling helps but predictive sensing wins big outperforming top proprietary systems" [X Link](https://x.com/jiqizhixin/status/1986688833417847138) 2025-11-07T06:55Z 12.3K followers, [----] engagements "Cambrian-S: Towards Spatial Supersensing in Video Paper: Website: Code: Cambrian-S Models: VSI-590K: VSI-SUPER: https://hf.co/collections/nyu-visionx/vsi-super https://hf.co/datasets/nyu-visionx/vsi-590k https://hf.co/collections/nyu-visionx/cambrian-s https://github.com/cambrian-mllm/cambrian-s https://cambrian-mllm.github.io https://arxiv.org/abs/2511.04670v1 https://hf.co/collections/nyu-visionx/vsi-super https://hf.co/datasets/nyu-visionx/vsi-590k https://hf.co/collections/nyu-visionx/cambrian-s https://github.com/cambrian-mllm/cambrian-s https://cambrian-mllm.github.io" [X Link](https://x.com/jiqizhixin/status/1986688837473738788) 2025-11-07T06:55Z 12.3K followers, [---] engagements "Breaking: China's first AI prompt copyright case delivers landmark verdict. Are AI prompts protected by copyright Court says NO Shanghai Huangpu District Court ruled that AI prompts lack sufficient originality to qualify as protected works setting an important precedent for AI-generated content ownership. Key takeaways: - Prompts deemed mere "instructions" lacking unique creative expression - Case focused on input rather than output of AI systems - Art company claimed violation when the defendant used similar prompts to create Midjourney artworks - Court dismissed all claims highlighting gray" [X Link](https://x.com/jiqizhixin/status/1986694579677192218) 2025-11-07T07:17Z 11.8K followers, [----] engagements "Sakana AI is building artificial life and they can evolve Petri Dish Neural Cellular Automata (PD-NCA) let multiple NCA agents learn and adapt during simulation not just after training. Each cell updates its own parameters via gradient descent turning morphogenesis into a living ecosystem of competing cooperating and ever-evolving entitiesshowing emergent cycles and persistent complexity growth. Petri Dish Neural Cellular Automata Sakana AI Paper: Project: Our report: https://mp.weixin.qq.com/s/P4-KBMHzH3am9_qhHL4LDQ https://github.com/SakanaAI/petri-dish-nca https://pub.sakana.ai/pdnca/" [X Link](https://x.com/jiqizhixin/status/1986711613597266365) 2025-11-07T08:25Z 12.4K followers, 28.6K engagements "What if one embedding model could seamlessly understand text images user behavior and item IDsall while boosting real-world recommendation performance Meet SAIL-Embedding: an omni-modal foundation model engineered for the messy realities of industrial AI. Unlike CLIP-style dual-tower models SAIL-Embedding uses a multi-stage training strategy that: ✅ Adapts to diverse tasks via content-aware progressive training ✅ Boosts recommendations by distilling user history & ID/item relationships ✅ Stays flexible with stochastic specialization and dataset-driven pattern matching Results SOTA retrieval" [X Link](https://x.com/jiqizhixin/status/1986874292676554921) 2025-11-07T19:12Z 11.5K followers, [----] engagements "What if LLMs could tune their own decodingno more guesswork with temperature and topp Enter AutoDeco: the first architecture that makes LLM decoding truly end-to-end. By adding lightweight heads the model predicts its own context-aware temperature and topp at every token stepturning decoding into a learnable differentiable process. Results 🔥 Beats default decoding by a wide margin 🎯 Matches an oracle that cheats by tuning per test case ✨ Learns to follow natural language instructions like be more random or stay focusedadjusting sampling strategy on the fly The End of Manual Decoding:" [X Link](https://x.com/jiqizhixin/status/1987101540671234287) 2025-11-08T10:15Z 12.1K followers, [----] engagements "Can fractals help us fight Deepfakes 🌀 FractalForensics is a proactive Deepfake detector that embeds fractal-based watermarks to both detect and localize manipulationswhile staying robust against normal edits and fragile to AI fakes. It even shows where the image was tampered with. FractalForensics: Proactive Deepfake Detection and Localization via Fractal Watermarks Paper: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/5bW7levt1sKFZYxc_RAldg https://arxiv.org/abs/2504.09451 https://mp.weixin.qq.com/s/5bW7levt1sKFZYxc_RAldg https://arxiv.org/abs/2504.09451" [X Link](https://x.com/jiqizhixin/status/1987264111991005476) 2025-11-08T21:01Z 12.1K followers, [----] engagements "2/ Fast forward to 2025: A series of revolutionary papers shattered this conclusion German French and independent researchers developed equivalent formulations of quantum mechanics using only real numbers producing identical predictions to standard complex-number theory. How did they do it By reimagining how quantum states combine. When entangled particles interact standard theory uses "tensor products" a specific way of merging complex vectors. The new approaches used different combination rules that achieve the same results without explicit imaginary numbers" [X Link](https://x.com/jiqizhixin/status/1987328873873481965) 2025-11-09T01:18Z 11.8K followers, [---] engagements "3/ Why does this matter 🔶 Deepens our understanding of quantum reality imaginary numbers may be a "scaffolding" rather than a fundamental component 🔶 Simplifies quantum computing (Google researcher proved complex gates can be eliminated) 🔶 Reveals we may not fully grasp why complex numbers "fit" quantum mechanics so naturally" [X Link](https://x.com/jiqizhixin/status/1987328876788457983) 2025-11-09T01:18Z 11.9K followers, [---] engagements "AI-Trader enables five distinct AI models each employing unique investment strategies to compete autonomously in the same market and determine which can generate the highest profits in NASDAQ [---] or SSE [--] trading 9k stars already AI-Trader: Can AI Beat the Market Project: Demo: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/n8Xkl2Liy4c5m5xIzqTZ1g https://hkuds.github.io/AI-Trader/ https://github.com/HKUDS/AI-Trader https://mp.weixin.qq.com/s/n8Xkl2Liy4c5m5xIzqTZ1g https://hkuds.github.io/AI-Trader/ https://github.com/HKUDS/AI-Trader" [X Link](https://x.com/jiqizhixin/status/1987509478053261483) 2025-11-09T13:16Z 12K followers, [----] engagements "AI can stream videos that stay sharp and coherent for minutes. Meet Rolling Forcinga new technique for long-horizon video generation that slashes error accumulation. It denoises multiple frames jointly anchors long-term context via an attention sink and trains efficiently with few-step distillation. Result: real-time multi-minute streaming videos on a single GPUwith crisp quality and temporal consistency. Rolling Forcing: Autoregressive Long Video Diffusion in Real Time NTU Tencent Paper: Project: Code: Huggingface: Our report: 📬 #PapersAccepted by Jiqizhixin" [X Link](https://x.com/jiqizhixin/status/1987613412696760690) 2025-11-09T20:09Z 12.1K followers, [----] engagements "Can diffusion-based LMs rival autoregressive models Researchers tackle decoding and RL for Masked Diffusion Language Models (MDLMs) inference mismatch. Their new methods EOS Early Rejection Ascending Step-Size decoding and CJ-GRPO fix these gapsunlocking efficient full diffusion decoding and boosting reasoning on math and planning tasks with LLaDA-8B-Instruct. Taming Masked Diffusion Language Models via Consistency Trajectory Reinforcement Learning with Fewer Decoding Step Fudan SAIL SJTU Code: Paper: Our report: 📬 #PapersAccepted by Jiqizhixin" [X Link](https://x.com/jiqizhixin/status/1987691426965295519) 2025-11-10T01:19Z 12.1K followers, [----] engagements "This paper from Tsinghua University and Shanghai Jiao Tong University received perfect scores (6 [--] [--] 6) at NeurIPS [----] It aims to answer a key question: Does reinforcement learning really make large language models better reasoners The authors study Reinforcement Learning with Verifiable Rewards (RLVR) and find that while it improves accuracy for small k it doesnt create new reasoning patternsmeaning the base model still determines the upper limit of reasoning ability. Across six RLVR variants performance gains plateau suggesting that current RL setups mainly refine reasoning rather than" [X Link](https://x.com/jiqizhixin/status/1987710546674856051) 2025-11-10T02:34Z 14K followers, 398.1K engagements "Can small models draft smarter for big ones AdaSPEC improves speculative decoding by distilling knowledge selectivelyfiltering out hard-to-fit tokens so the draft model aligns better with the target. The result: up to +15% higher token acceptance across reasoning coding and summarizationoutperforming DistillSpec while keeping quality intact. AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders Paper: Code: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/eAIv_NlrgG3hS829MuNgqw https://github.com/yuezhouhu/adaspec https://arxiv.org/abs/2510.19779" [X Link](https://x.com/jiqizhixin/status/1987985867080892842) 2025-11-10T20:49Z 12.1K followers, [----] engagements "Can one robot hand rotate anything Researchers present a sim-to-real framework for generalized in-hand object rotation powered by a joint-wise dynamics model that bridges the reality gap using minimal real data. A single policy now handles diverse shapes sizes and poseseven complex objects like animal figurinesshowing unprecedented real-world dexterity. DexNDM: Closing the Reality Gap for Dexterous In-Hand Rotation via Joint-Wise Neural Dynamics Model Tsinghua Peking and others Paper: Project: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/MIpJnAYURM1IzsbQTMd1-A" [X Link](https://x.com/jiqizhixin/status/1988047774697525639) 2025-11-11T00:55Z 12.5K followers, [----] engagements "Dr. Fei-Fei Li just released an important article titled From Words to Worlds: Spatial Intelligence is AIs Next Frontier. She writes Spatial intelligence will transform how we create and interact with real and virtual worldsrevolutionizing storytelling creativity robotics scientific discovery and beyond. This is AIs next frontier. Absolutely worth a read Link: https://drfeifei.substack.com/p/from-words-to-worlds-spatial-intelligence https://drfeifei.substack.com/p/from-words-to-worlds-spatial-intelligence" [X Link](https://x.com/jiqizhixin/status/1988061301822849278) 2025-11-11T01:48Z 12.3K followers, 24.6K engagements "The most-used language on GitHub is now TypeScript For the first time it has surpassed both JavaScript and Python. Why AI. Developers are shifting toward typed languages which make AI-assisted coding more reliable and maintainable" [X Link](https://x.com/jiqizhixin/status/1988080080015523904) 2025-11-11T03:03Z 12.4K followers, [----] engagements "Octoverse: A new developer joins GitHub every second as AI leads TypeScript to #1 GitHub https://github.blog/news-insights/octoverse/octoverse-a-new-developer-joins-github-every-second-as-ai-leads-typescript-to-1/utm_source=octoverse-homepage&utm_medium=blog&utm_campaign=universe25 https://github.blog/news-insights/octoverse/octoverse-a-new-developer-joins-github-every-second-as-ai-leads-typescript-to-1/utm_source=octoverse-homepage&utm_medium=blog&utm_campaign=universe25" [X Link](https://x.com/jiqizhixin/status/1988080082590802048) 2025-11-11T03:03Z 12.1K followers, [---] engagements "LLMs can reason better without extra compute TrajSelector is a new Best-of-N framework that taps into an LLMs own hidden states to score reasoning stepsno massive reward models needed. A tiny 0.6B verifier ranks trajectories end-to-end boosting accuracy by up to 12% over existing methods while cutting inference costs. TrajSelector: Harnessing Latent Representations for Efficient and Effective Best-of-N in Large Reasoning Model Paper: Project: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/mDgspKrltG1IpejesMuJIw https://zgca-ai4edu.github.io/TrajSelector/" [X Link](https://x.com/jiqizhixin/status/1988169837873406313) 2025-11-11T09:00Z 12.1K followers, [----] engagements "ByteDance just launched Doubao-Seed-Code a model specifically designed for programming tasks. It supports native 256K long context and has claimed the top spot on the SWE-Bench Verified leaderboard" [X Link](https://x.com/jiqizhixin/status/1988192895212679588) 2025-11-11T10:31Z 12.5K followers, [----] engagements "Can AI see hear and think like humans NVIDIA presents OmniVinci an open-source omni-modal LLM unifying vision audio and language. With OmniAlignNet Temporal Embedding Grouping and Constrained Rotary Time Embedding it fuses modalities into one shared spacelearning from 24M multi-sensory conversations. Results: beats Qwen2.5-Omni by +19.05 on cross-modal understanding using [--] less dataa major leap toward truly multimodal intelligence. OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM Project: Paper: Model: Our report: 📬 #PapersAccepted by Jiqizhixin" [X Link](https://x.com/jiqizhixin/status/1988322334508196312) 2025-11-11T19:06Z 12.4K followers, 11.2K engagements "Why do LLM agents fail and how can they fix themselves Researchers introduce AgentErrorTaxonomy AgentErrorBench and AgentDebuga complete framework for diagnosing and correcting cascading failures across memory planning reflection and action. On real-world benchmarks (ALFWorld GAIA WebShop) AgentDebug boosts all-correct accuracy by +24% and enables iterative recovery with +26% task success. Where LLM Agents Fail and How They can Learn From Failures Paper: Code: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/yEVf41BZp02PodAWuO_meg https://github.com/ulab-uiuc/AgentDebug" [X Link](https://x.com/jiqizhixin/status/1988714418423648523) 2025-11-12T21:04Z 12.1K followers, [----] engagements "Next big paradigm shift in AI CALM reimagines LLMs by predicting continuous next vectors rather than discrete tokens. This can compress chunks of text into single embeddings with 99.9% fidelity. This boosts semantic bandwidth cutting generation steps by up to K while matching strong baselines at far lower compute cost pointing to a faster more scalable future for LLMs. Continuous Autoregressive Language Models Tencent Tsinghua Paper: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/TbStNDWAsWF0UCD9tBu2nA https://arxiv.org/abs/2510.27688" [X Link](https://x.com/jiqizhixin/status/1988775822312808609) 2025-11-13T01:08Z 12.2K followers, 10.4K engagements "Two major hurdles exist for GUI agents: hard-to-verify outcomes and non-scalable training data. UI-Genie tackles them through a self-improving reward-driven framework. Its UI-Genie-RM reward model fuses image and text understanding to unify action- and task-level evaluation trained via rule-based verification trajectory corruption and hard negative mining. A reward-guided self-improvement loop then expands task complexity and data quality across generations yielding UI-Genie-RM-517k and UI-Genie-Agent-16k the first large-scale reward-specific GUI datasets. After three cycles UI-Genie sets new" [X Link](https://x.com/jiqizhixin/status/1988822127643070698) 2025-11-13T04:12Z 12.1K followers, [----] engagements "Can AI learn what to remember and when to update its memory Mem- uses reinforcement learning to teach LLM agents how to manage complex multi-component memory systemswithout relying on hand-crafted rules. Trained on diverse multi-turn interactions the agent learns to extract store and update information with rewards tied to downstream QA accuracy. Results: strong gains over existing memory-augmented agents and impressive generalizationhandling 400k+ token histories despite training only on 30k-token examples. Mem-: Learning Memory Construction via Reinforcement Learning Anuttacon UC San Diego" [X Link](https://x.com/jiqizhixin/status/1988913982825197950) 2025-11-13T10:17Z 12.1K followers, [----] engagements "Cool another world model PAN is a general world model that turns language-specified actions into long-horizon high-fidelity video predictions. Unlike typical prompt-to-video generators PAN maintains causal control interactivity and consistent dynamics across diverse environments. It fuses an LLM-based latent dynamics backbone with a video diffusion decoder allowing both abstract reasoning and realistic visual rollout. Trained on large video-action datasets PAN shows strong performance in action-conditioned simulation long-range forecasting and simulative reasoning" [X Link](https://x.com/jiqizhixin/status/1989148306099085652) 2025-11-14T01:48Z 12.1K followers, [----] engagements "PAN: A World Model for General Interactable and Long-Horizon World Simulation Mohamed bin Zayed University of Artificial Intelligence Paper: https://arxiv.org/abs/2511.09057 https://arxiv.org/abs/2511.09057 https://arxiv.org/abs/2511.09057 https://arxiv.org/abs/2511.09057" [X Link](https://x.com/jiqizhixin/status/1989148310750523453) 2025-11-14T01:48Z 12.1K followers, [---] engagements "New paper surveys the rise of Graph-Augmented LLM Agents (GLA). It shows how graphs can boost LLM agents in planning memory tool use and multi-agent coordination. It maps current progress gaps and future directions toward scalable unified and multimodal GLA systems. Graph-Augmented Large Language Model Agents: Current Progress and Future Prospects Griffith NUS NTU Paper: Code: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/wCxmqIS7OXe9BRByXK8JaQ https://github.com/Shiy-Li/Awesome-Graph-augmented-LLM-Agent https://arxiv.org/abs/2507.21407" [X Link](https://x.com/jiqizhixin/status/1989174700807991606) 2025-11-14T03:33Z 12.5K followers, [----] engagements "@wondering_camel TOON appears to specify the number of data points in advance enabling the LLM to make better judgments" [X Link](https://x.com/jiqizhixin/status/1989231470012248451) 2025-11-14T07:18Z 12.1K followers, [--] engagements "Cool 3D objects could be turned into editable code MeshCoder makes it possible reconstructing complex shapes from point clouds into Blender Python scripts. With expressive APIs a large object-code dataset and a multimodal LLM it enables precise shape-to-code reconstruction intuitive editing and deeper 3D reasoning. MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds SAIL Tsinghua and others Paper: Project: Code: Model: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/Cov0wEcfpkjkraPezrHKVw https://huggingface.co/InternRobotics/MeshCoder" [X Link](https://x.com/jiqizhixin/status/1989511671023284658) 2025-11-15T01:52Z 12.5K followers, 13.1K engagements "How can we merge countless fine-tuned expert models into one universal multi-task model without retraining or data leakage RobustMerge tackles this challenge with a training-free parameter-efficient merging method. By preserving direction robustness through low-rank analysis and cross-task normalization it unifies diverse models while maintaining strong generalization across multimodal tasks. RobustMerge: Parameter-Efficient Model Merging for MLLMs with Direction Robustness CAS HKISI-CAS Sun Yat-sen Peking Paper: Code: Our report: 📬 #PapersAccepted by Jiqizhixin" [X Link](https://x.com/jiqizhixin/status/1989612347040670200) 2025-11-15T08:32Z 12.5K followers, [----] engagements "Can LLMs learn to act like real doctors instead of just summarizing cases DiagAgent does exactly that trained with reinforcement learning in a simulated clinical world (DiagGym) it learns to plan tests reason across turns and make final diagnoses. Outperforming GPT-4o DeepSeek-v3 and others by large margins it shows that interactive training unlocks truly adaptive diagnostic intelligence. Evolving Diagnostic Agents in a Virtual Clinical Environment SJTU SAIL and others Paper: Code: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/byVmM-0HttYRF5Vb7LlEBw" [X Link](https://x.com/jiqizhixin/status/1990205492602585515) 2025-11-16T23:49Z 12.4K followers, [----] engagements "What if robots could understand what you want without being told RoboOmni makes that possible an omni-modal LLM that fuses speech sound and vision to infer human intent confirm actions and execute tasks. Trained on the new OmniAction dataset (140k episodes) it outperforms text- and ASR-based baselines in success rate speed and proactive assistance paving the way for more intuitive human-robot collaboration. RoboOmni: Proactive Robot Manipulation in Omni-modal Context Fudan SII NUS Paper: Code: Project: Our report: 📬 #PapersAccepted by Jiqizhixin" [X Link](https://x.com/jiqizhixin/status/1990342143052063162) 2025-11-17T08:52Z 12.6K followers, [----] engagements "Huge @TianhongLi6 & Kaiming He (inventor of ResNet) just Introduced JiT (Just image Transformers) JiTs are simple large-patch Transformers that operate on raw pixels no tokenizer pre-training or extra losses needed. By predicting clean data on the natural-data manifold JiT excels in high-dimensional spaces where traditional noise-predicting models can fail. On ImageNet (256 & 512) JiT achieves competitive generative performance showing that sometimes going back to basics is the key" [X Link](https://x.com/jiqizhixin/status/1990700499327181247) 2025-11-18T08:35Z 12.8K followers, 159.5K engagements "Another great paper from Kaiming Hes team ARC Is a Vision Problem They propose VARC which reframes ARC as an image-to-image translation task: ARC problems are rendered onto a visual canvas and a standard Vision Transformer (ViT) learns exclusively from ARC data then adapts via test-time training. The result 60.4% on ARC-1 far surpassing other from-scratch approaches and competitive with top LLMs and bringing machine reasoning closer to human-level performance through vision-first modeling. https://twitter.com/i/web/status/1991064918028660831 https://twitter.com/i/web/status/1991064918028660831" [X Link](https://x.com/jiqizhixin/status/1991064918028660831) 2025-11-19T08:44Z 15.1K followers, 17.5K engagements "ByteDance presents Depth Anything [--] A single plain transformer could beat every visual geometry model before it. Depth Anything [--] shows exactly that. By using a simple backbone and one depth ray target DA3 outperforms prior SOTA across camera pose any view geometry and visual rendering. It beats VGGT by [----] percent in pose accuracy and [----] percent in geometry accuracy and even surpasses DA2 in monocular depth" [X Link](https://x.com/jiqizhixin/status/1991072497564045688) 2025-11-19T09:14Z 12.7K followers, [----] engagements "Depth Anything 3: Recovering the Visual Space from Any Views Paper: Project: Code: Demo: Our report: https://mp.weixin.qq.com/s/gi1546oAXky2EiNwdPE2SA https://huggingface.co/spaces/depth-anything/depth-anything-3 https://github.com/ByteDance-Seed/Depth-Anything-3 https://depth-anything-3.github.io https://arxiv.org/abs/2511.10647 https://mp.weixin.qq.com/s/gi1546oAXky2EiNwdPE2SA https://huggingface.co/spaces/depth-anything/depth-anything-3 https://github.com/ByteDance-Seed/Depth-Anything-3 https://depth-anything-3.github.io https://arxiv.org/abs/2511.10647" [X Link](https://x.com/jiqizhixin/status/1991072501817057695) 2025-11-19T09:14Z 12.4K followers, [---] engagements "Dingtalk DeepResearch performs pretty good on DeepResearch Bench It's a unified multi agent intelligence framework for real world enterprise environments delivering deep research heterogeneous table reasoning and multimodal report generation. Dingtalk DeepResearch: A Unified Multi Agent Framework for Adaptive Intelligence in Enterprise Environments Paper: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/qTvvop0q4e5s03vqNAUaag https://arxiv.org/abs/2510.24760 https://mp.weixin.qq.com/s/qTvvop0q4e5s03vqNAUaag https://arxiv.org/abs/2510.24760" [X Link](https://x.com/jiqizhixin/status/1991074216804831246) 2025-11-19T09:21Z 12.4K followers, [----] engagements "Can LLMs really behave like human investors How do micro-level behaviors drive macro-level market dynamics TwinMarket offers an answer by placing thousands of LLM-driven investors in a realistic stock market environment that incorporates social networks news and behavioral biases. This setup lets us watch bubbles crashes and herding emerge from individual decisions. Calibrated on real market data and grounded in behavioral finance TwinMarket scales to 1000+ agents reproduces key stylized market facts (volatility clustering fat tails etc.) and reveals how social interaction and cognitive" [X Link](https://x.com/jiqizhixin/status/1991114239575249303) 2025-11-19T12:00Z 12.7K followers, [----] engagements "If someone told you: "Forget staged reinforcement learning curriculum learning and dynamic hyperparameter tuningjust use the most basic RL recipe and youll achieve state-of-the-art (SOTA) performance in math reasoning for small models" would you believe it A team from Tsinghua University answered that question with two 1.5B-parameter models: not only is it possibleits remarkably efficient. - Key finding: Single-stage training + fixed hyperparameters = SOTA performance + 50% less compute. - Unexpected bonus: The training curve was textbook-smoothno "typical" issues encountered even after 4000" [X Link](https://x.com/jiqizhixin/status/1991237291478835371) 2025-11-19T20:09Z 12.4K followers, [----] engagements "Cogito v2.1 671B is a DeepSeek-V3 variant/fork thats cheaper to run but doesnt appear to offer noticeably better performance compared to DeepSeek-V3.2. Today we are releasing the best open-weight LLM by a US company: Cogito v2.1 671B. On most industry benchmarks and our internal evals the model performs competitively with frontier closed and open models while being ahead of any US open model (such as the best versions of https://t.co/F6eZnn8s2Q Today we are releasing the best open-weight LLM by a US company: Cogito v2.1 671B. On most industry benchmarks and our internal evals the model" [X Link](https://x.com/jiqizhixin/status/1991431726489563240) 2025-11-20T09:01Z 12.6K followers, [----] engagements "Why do frontier models like GPT-5 still stumble on puzzles a child can solve with a glance A new study argues that the missing piece is visual abstraction. By pairing vision for global pattern discovery with language for precise rule execution their VLSR plus MSSC approach boosts ARC-AGI performance by up to [----] percent across multiple flagship models. A step toward more human like generalizable reasoning. Think Visually Reason Textually: Vision-Language Synergy in ARC CUHK SAAI SII Paper: https://arxiv.org/abs/2511.15703 https://arxiv.org/abs/2511.15703" [X Link](https://x.com/jiqizhixin/status/1991436722119536725) 2025-11-20T09:21Z 12.4K followers, [----] engagements "DeepSeek just released LPLB on Github. Linear-Programming-Based Load Balancer (LPLB) is a parallel load balancer that leverages linear programming to optimize expert parallel workload distribution for MoE (Mixture-of-Experts) models. Link: https://github.com/deepseek-ai/LPLB https://github.com/deepseek-ai/LPLB" [X Link](https://x.com/jiqizhixin/status/1991450159029596660) 2025-11-20T10:14Z 12.7K followers, [----] engagements "A smarter faster way to run long-context LLMs UNComp tackles LLM long-context inference by using uncertainty to reveal hidden sparsity in KV caches. ✅ Cuts KV cache to 4.74% of original ✅ 6% faster prefill ✅ 6.4x higher throughput Unlike uniform compression UNComp adapts dynamically unlocking retrieval heads & layers while staying lossless. UNComp: Can Matrix Entropy Uncover Sparsity -- A Compressor Design from an Uncertainty-Aware Perspective Paper: Code: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/GwdNCPEw8JUCzb9eOE6j9A https://github.com/menik1126/UNComp" [X Link](https://x.com/jiqizhixin/status/1991512102494712215) 2025-11-20T14:21Z 12.4K followers, [----] engagements "The third AIMO Progress Prize is live. Public testing runs until next April. [---] hand picked AI hard math problems bigger compute budgets new prizes for datasets writeups and even pure math insights. All code and data must be open to qualify. Total prize pool: over $2.2M. Details: https://www.kaggle.com/competitions/ai-mathematical-olympiad-progress-prize-3/overview https://www.kaggle.com/competitions/ai-mathematical-olympiad-progress-prize-3/overview" [X Link](https://x.com/jiqizhixin/status/1991770622725353895) 2025-11-21T07:28Z 12.5K followers, [---] engagements "Vision Foundation Models Can Be Good Tokenizers for Latent Diffusion Models The title says it all. The new proposed VFM-VAE bypasses distillation by integrating VFMs through a redesigned decoder with Multi-Scale Latent Fusion and Progressive Resolution Reconstruction. A new SE-CKNNA metric guides tokenizer-diffusion alignment dramatically accelerating training. Results: gFID [----] in [--] epochs (10 faster than prior methods) and [----] at [---] epochsshowing direct VFM integration as a new LDM paradigm. Paper: Code: Our report: 📬 #PapersAccepted by Jiqizhixin" [X Link](https://x.com/jiqizhixin/status/1991823404270547018) 2025-11-21T10:58Z 12.5K followers, 10.4K engagements "It's time to rethink model merging Functional Dual Anchors (FDAs): a fresh approach that operates in input-representation space not just weights. Instead of wrestling with inconsistent parameters FDAs use synthetic inputs whose gradients align with task-specific shiftscapturing how tasks change model behavior functionally. ✨ Bridges multi-task training + post-hoc merging ✨ Comes with a principled initialization ✨ Complements existing parameter-space methods Model Merging with Functional Dual Anchors CUHK Westlake Our report: 📬 #PapersAccepted by Jiqizhixin" [X Link](https://x.com/jiqizhixin/status/1991945709869060565) 2025-11-21T19:04Z 12.5K followers, [----] engagements "As the ancient Chinese proverb goes: Steal a needle when young steal gold when grown. Anthropic uncovered an AI broken windows effect: all they did was teach it to cut corners a littleand it ended up learning to lie and cause havoc. The fix Surprisingly counterintuitive: just tell the AI Its okay to cheat New Anthropic research: Natural emergent misalignment from reward hacking in production RL. Reward hacking is where models learn to cheat on tasks theyre given during training. Our new study finds that the consequences of reward hacking if unmitigated can be very serious." [X Link](https://x.com/jiqizhixin/status/1992061535573909754) 2025-11-22T02:44Z 12.8K followers, [----] engagements "Nice you could train giant neural networks with no backprop and still keep it fast EGGROLL makes it happen. By swapping full rank ES perturbations for low rank ones it cuts memory and compute from nd to n plus kd while provably converging to full rank updates at a 1/k rate. It matches ES in tabula rasa RL rivals GRPO for LLM reasoning and even enables stable pre training of fully integer recurrent LMs" [X Link](https://x.com/jiqizhixin/status/1992073672350105604) 2025-11-22T03:32Z 12.4K followers, [----] engagements "What if 3D Gaussian Splatting could achieve the same quality with just 10% of the Gaussians A new method casts 3DGS compaction as global Gaussian mixture reduction via optimal transport. It first compresses geometry using KD-tree-based transport divergence minimization then fine-tunes color and opacity with far fewer primitives. Results: negligible loss in PSNR SSIM LPIPS and outperforms prior compaction methodslightweight efficient and compatible with any 3DGS pipeline. Gaussian Herding across Pens: An Optimal Transport Perspective on Global Gaussian Reduction for 3DGS RUC Tsinghua Code:" [X Link](https://x.com/jiqizhixin/status/1992143764605882462) 2025-11-22T08:11Z 12.8K followers, 11.8K engagements "What if LLMs placed objects step by step instead of just listing constraints This paper proposes an imperative approach to 3D scene layout: the model iteratively positions each object based on previous placements with an error-correction mechanism refining validity while respecting the original plan. Results: human participants preferred these layouts 8294% of the time over declarative methods and a new automated metric aligns well with human judgmentsimpler more robust and better for complex scenes. Procedural Scene Programs for Open-Universe Scene Generation: LLM-Free Error Correction via" [X Link](https://x.com/jiqizhixin/status/1992206175941890465) 2025-11-22T12:19Z 12.7K followers, [----] engagements "What if LLM agents could scale RL training without human-crafted tasks or ground-truth answers Alibaba's search self-play (SSP) framework turns the agent into both task proposer and problem solver. The proposer generates deep search queries with verifiable answers; the solver attempts to answer them; and a RAG check validates each task using all retrieved evidence. Difficulty increases over time and both sides co-evolve. Across benchmarks SSP consistently boosts search-agent performance in both from-scratch and continued RLfully unsupervised fully scalable. Search Self-Play: Pushing the" [X Link](https://x.com/jiqizhixin/status/1992439966258151743) 2025-11-23T03:48Z 12.5K followers, [----] engagements "It turns out VLMs could run fast and light without collapsing in accuracy A new method applies SVD to the joint QKV weights plus a dynamic rank allocation strategy that keeps accuracy high while slashing KV cache size and compute. Adding activation and weight quantization makes the model even more efficient. The result: over 10% accuracy gains compared to prior SVD or quant-only approaches with far lower hardware cost enabling real-time VLMs on constrained devices. QSVD: Efficient Low-rank Approximation for Unified Query-Key-Value Weight Compression in Low-Precision Vision-Language Models New" [X Link](https://x.com/jiqizhixin/status/1992714273622261938) 2025-11-23T21:58Z 12.5K followers, 13.7K engagements "A tiny synthetic dataset could outperform real images for training linear probes on giant vision models MIT built Linear Gradient Matching for exactly this. By matching real data gradients through frozen backbones it creates synthetic images that beat real image baselines transfer across models like DINO to CLIP excel at fine grained tasks and even expose embedding space similarities and spurious correlations" [X Link](https://x.com/jiqizhixin/status/1992847738791465140) 2025-11-24T06:48Z 12.7K followers, 12.8K engagements "DeepMind just discoverd pixel-by-pixel autoregressive modeling could scale into a truly unified vision paradigm. Theri new study maps its scaling laws across 7e19 FLOPs and finds sharply different optima for classification versus generation reveals that higher resolutions demand model size grow much faster than data and shows computenot datais the real bottleneck. Extrapolating current trends fully pixel level vision models could be feasible within five years" [X Link](https://x.com/jiqizhixin/status/1992849069748957257) 2025-11-24T06:53Z 12.8K followers, 50.2K engagements "What if recommender systems could finally enjoy the same scaling gains as LLMs MiniOneRec is the first fully open source generative recsys stack to test that idea end to end. Using quantized VAEs to build semantic IDs and post training Qwen models up to 7B it shows losses drop cleanly with scale and further boosts accuracy and diversity through full process SID alignment plus lightweight RL with constrained decoding. MiniOneRec: An Open-Source Framework for Scaling Generative Recommendation Paper: Code: Our report: 📬 #PapersAccepted by Jiqizhixin" [X Link](https://x.com/jiqizhixin/status/1993404446777524604) 2025-11-25T19:40Z 12.6K followers, [----] engagements "Wow CUDA kernels could be auto generated and auto optimized without any training or GPU hungry pipelines CudaForge shows it is possible. Using a coder plus judge agent loop with real hardware feedback it reaches [----] percent correctness and 1.68x speed over PyTorch while generalizing across GPUs and base models. And it does this in about [----] minutes and [---] dollars per kernel instead of [--] H100 hours and [--] dollars. CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization Paper: Code: Our report: 📬 #PapersAccepted by Jiqizhixin" [X Link](https://x.com/jiqizhixin/status/1993514043475476629) 2025-11-26T02:56Z 12.8K followers, [----] engagements "AI agents could skip language entirely and communicate mind to mind. This work introduces thought communication a latent variable framework that identifies shared and private thoughts across agents and recovers the global structure of who shares what. The approach extracts these hidden thoughts before interaction and routes them to each agent boosting collaboration across synthetic and real benchmarks and opening a path beyond surface level language. Thought Communication in Multiagent Collaboration Paper: Our report: 📬 #PapersAccepted by Jiqizhixin" [X Link](https://x.com/jiqizhixin/status/1993697503229645248) 2025-11-26T15:05Z 12.8K followers, 33.4K engagements "Congratulations to Shaoqing Ren Kaiming He Ross Girshick and Jian Sun Their paper Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks has been awarded the Test of Time Paper Award at NeurIPS [----]. The Faster R-CNN paper has been cited more than [-----] times. It had deeply influenced the computer vision field becoming a backbone for many follow-up works" [X Link](https://x.com/jiqizhixin/status/1993873070969508116) 2025-11-27T02:42Z 12.7K followers, [---] engagements "This new optimizer can make training giant LLMs both more stable and more precise even under noise and extreme scale Huawei just introduces ROOT a Robust Orthogonalized Optimizer that tackles two big weaknesses in recent momentum-orthogonalized methods: - Dimensional fragility (orthogonalization breaks as model size grows) - Sensitivity to outlier noise ROOT brings two layers of robustness: - Dimension-robust orthogonalization via adaptive Newton iterations with size-aware coefficients - Optimization-robust updates using proximal methods that dampen harmful outliers while preserving useful" [X Link](https://x.com/jiqizhixin/status/1993894015196942452) 2025-11-27T04:05Z 12.8K followers, 43.6K engagements "ROOT: Robust Orthogonalized Optimizer for Neural Network Training Huawei Noah's Ark Lab Paper: Code: Our report: https://mp.weixin.qq.com/s/X7dNh8lwr0xVW7TsuO4D2g https://github.com/huawei-noah/noah-research/tree/master/ROOT https://arxiv.org/abs/2511.20626 https://mp.weixin.qq.com/s/X7dNh8lwr0xVW7TsuO4D2g https://github.com/huawei-noah/noah-research/tree/master/ROOT https://arxiv.org/abs/2511.20626 https://mp.weixin.qq.com/s/X7dNh8lwr0xVW7TsuO4D2g https://github.com/huawei-noah/noah-research/tree/master/ROOT https://arxiv.org/abs/2511.20626 https://mp.weixin.qq.com/s/X7dNh8lwr0xVW7TsuO4D2g" [X Link](https://x.com/jiqizhixin/status/1993894415518322852) 2025-11-27T04:07Z 12.7K followers, [----] engagements "DeepSeek just released DeepSeek-Math-V2 DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning It shows LLMs can now self-verify proofs not just output solutions. DeepSeekMath-V2 achieves gold-level IMO [----] CMO [----] and 118/120 Putnam [----] pointing to a future of deep trustworthy mathematical reasoning" [X Link](https://x.com/jiqizhixin/status/1994000963770962167) 2025-11-27T11:10Z 12.9K followers, 35.7K engagements "What if video generation could follow any semantic instruction without retraining or task-specific hacks Enter Video-As-Prompt (VAP). By treating a reference video as an in-context semantic prompt and steering a frozen Video DiT with a plug-and-play MoT expert plus temporally biased position embeddings it avoids artifacts prevents forgetting and delivers strong zero-shot control. Trained on the new 100K-pair VAP-Data it reaches a 38.7% user preference rate rivaling specialized commercial models. Video-As-Prompt: Unified Semantic Control for Video Generation ByteDance CUHK Project: Paper:" [X Link](https://x.com/jiqizhixin/status/1994031956565348406) 2025-11-27T13:14Z 12.8K followers, [----] engagements "What if we could model human visual cortex responses without expensive time-consuming fMRI data for every new subject This study introduces BraInCoRL a transformer that uses in-context learning to predict voxelwise neural activity from just a few examplesno extra finetuning needed for novel people or stimuli. Trained to flexibly condition on variable image-stimulus pairs across multiple subjects it outperforms existing designs in low-data scenarios generalizes to entirely new fMRI datasets with different subjects and acquisition setups and even links natural language queries to voxel" [X Link](https://x.com/jiqizhixin/status/1994392331374436597) 2025-11-28T13:06Z 12.7K followers, [----] engagements "Can AI finally grasp the unspoken intentions and emotions that make human social interactions tick MetaMind helps large language models bridge that gap by breaking social reasoning into three collaborative stages: first guessing a users mental state then refining those ideas with cultural norms and ethics and finally crafting responses that align with whats inferred. The result State-of-the-art performance across tough benchmarks including a 35.7% boost in real-world social scenarios and even matching human-level skills on key Theory of Mind tasks for the first time. MetaMind: Modeling Human" [X Link](https://x.com/jiqizhixin/status/1994849090568360066) 2025-11-29T19:21Z 12.8K followers, 12.6K engagements "Ever wondered why large reasoning models sometimes overcomplicate problems This study finds shorter reasoning paths consistently outperform longer ones across stochastic decodes but exhaustive exploration of the tree-like reasoning space is impossible due to exponential growth. Enter DTS a model-agnostic decoding framework that sketches the space by branching only at high-entropy tokens and uses early stopping to pick the shortest completed path. No extra training needed: tests on AIME2024/2025 with DeepSeek-R1-Distill-Qwen models boosted accuracy by up to 8% cut reasoning length by 23% and" [X Link](https://x.com/jiqizhixin/status/1995244697527157154) 2025-11-30T21:33Z 13.1K followers, [----] engagements "Can modern AI nail playing a convincing villain or does safety alignment kill the act Tencent & SYSU find state-of-the-art LLMs lose role-playing fidelity the more morally ambiguous or antagonistic the characterstruggling most with traits like deceit or manipulation swapping nuanced malevolence for shallow aggression. Even top chatbots flop at villain roles if theyre highly safety-aligned showing a big tension between keeping AI safe and letting it create authentic complex fictional personas. Too Good to be Bad: On the Failure of LLMs to Role-Play Villains Paper: Project: Our report: 📬" [X Link](https://x.com/jiqizhixin/status/1995580031981277483) 2025-12-01T19:45Z 13.1K followers, [----] engagements "One-step generative models could match multi-step methods A new work introduces iMF an improved MeanFlow that fixes unstable training and rigid guidance by reformulating the velocity loss and treating guidance as explicit conditioning. Trained from scratch iMF hits [----] FID at 1-NFE on ImageNet [------] outperforming all prior one-step approaches and pushing fastforward generation toward a standalone paradigm. Improved Mean Flows: On the Challenges of Fastforward Generative Models Paper: https://arxiv.org/abs/2512.02012v1 https://arxiv.org/abs/2512.02012v1" [X Link](https://x.com/jiqizhixin/status/1995753030906683444) 2025-12-02T07:12Z 12.8K followers, [----] engagements "Diffusion Language Models are hyped lately but hard to reproduce due to missing frameworks and high training costs. Berkeley and UIUC show a surprisingly simple path: using their dLLM toolkit they teach BERT to chat via discrete diffusion. No generative pretraining about [--] GPU hours and ModernBERT large chat v0 reaches near Qwen1.5 0.5B quality with only lightweight SFT. Even better they open sourced the full training and inference pipeline plus a Hello World example along with the extensible dllm framework. Efficient cheap and beginner friendly. dLLM - BERTs that chat with diffusion" [X Link](https://x.com/jiqizhixin/status/1995919770554957852) 2025-12-02T18:15Z 13K followers, 31.2K engagements "UniLumos drops a big upgrade to image and video relighting. Diffusion models can do cool lighting effects but semantic-space optimization often breaks physics. UniLumos fixes this by injecting RGB-space geometry feedback into a flow-matching backbone supervising with depth and normals from its own outputs. Path consistency learning keeps this supervision stable even with few training steps. The team also built a 6D lighting annotation protocol and LumosBench a disentangled benchmark that scores lighting control with VLMs. The result: SOTA physical consistency and up to 20x faster relighting." [X Link](https://x.com/jiqizhixin/status/1996026851027026177) 2025-12-03T01:21Z 12.8K followers, [----] engagements "LORE: A Large Generative Model for Search Relevance Paper: https://arxiv.org/abs/2512.03025 https://arxiv.org/abs/2512.03025 https://arxiv.org/abs/2512.03025 https://arxiv.org/abs/2512.03025" [X Link](https://x.com/jiqizhixin/status/1996102665173172708) 2025-12-03T06:22Z 12.8K followers, [---] engagements "What if a single foundation model could revolutionize both autonomous driving and embodied AI Xiaomi has open-sourced MiMo-Embodied the first cross-embodied foundation model. It achieves SOTA performance in both fields setting records in [--] embodied AI benchmarks and excelling in [--] autonomous driving benchmarks. It significantly outperforms existing baselines. Their study shows positive transfer between the two domains through specific learning and fine - tuning methods and they offer detailed model and training insights for future research. MiMo-Embodied: X-Embodied Foundation Model Paper:" [X Link](https://x.com/jiqizhixin/status/1996112972960379192) 2025-12-03T07:03Z 12.8K followers, [---] engagements "Wow a promising step toward practical efficient compute in memory systems A new memristor based ADC with adaptive quantization shows the possibility: analog AI hardware could unlock its full potential without bulky converters in the way. It delivers strong CIFAR10 and ImageNet performance at just [--] bits achieves up to 15.1x better energy efficiency and 12.9x smaller area and cuts CIM system overhead by more than half. Memristor-based adaptive analog-to-digital conversion for efficient and accurate compute-in-memory Paper: Our report: 📬 #PapersAccepted by Jiqizhixin" [X Link](https://x.com/jiqizhixin/status/1996224654726619495) 2025-12-03T14:27Z 12.8K followers, [---] engagements "OpenAI just published a technical blog post stating: confessions can keep language models honest" [X Link](https://x.com/jiqizhixin/status/1996378536245088497) 2025-12-04T00:38Z 12.8K followers, [----] engagements "How confessions can keep language models honest OpenAI https://openai.com/index/how-confessions-can-keep-language-models-honest/ https://openai.com/index/how-confessions-can-keep-language-models-honest/" [X Link](https://x.com/jiqizhixin/status/1996378539390881863) 2025-12-04T00:38Z 12.8K followers, [---] engagements "Can a single open model truly understand and generate across all modalities Uni MoE [---] from the Lychee family shows it can with a new dynamic capacity MoE design progressive multimodal training and curated data across text images speech and video. Trained on 75B tokens it outperforms Qwen2.5 Omni on most benchmarks and posts strong gains in video understanding omnimodal reasoning audiovisual tasks speech WER and controllable image generation. Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE Training and Data Harbin Institute of Technology Shenzhen Paper:" [X Link](https://x.com/jiqizhixin/status/1996393265743249832) 2025-12-04T01:37Z 15.1K followers, [----] engagements "A must-read from Dr. Sebastian Raschka if you want to understand how DeepSeek's flagship open-weight models evolved. A Technical Tour of the DeepSeek Models from V3 to V3.2 Link: https://sebastianraschka.com/blog/2025/technical-deepseek.html https://sebastianraschka.com/blog/2025/technical-deepseek.html https://sebastianraschka.com/blog/2025/technical-deepseek.html https://sebastianraschka.com/blog/2025/technical-deepseek.html" [X Link](https://x.com/jiqizhixin/status/1996514981622567135) 2025-12-04T09:40Z 12.8K followers, 11.2K engagements "Can a foundation model truly decode the brain without understanding its scales CSBrain says no. It introduces cross scale tokenization and structured sparse attention to capture fast bursts slow rhythms local regions and global interactions. Tested on [--] tasks across [--] datasets CSBrain outperforms all baselines and shows cross scale modeling is essential for generalized EEG decoding. CSBrain: A Cross-scale Spatiotemporal Brain Foundation Model for EEG Decoding SAIL SYSU CUHK Karlsruher Paper: Github: Our report: 📬 #PapersAccepted by Jiqizhixin" [X Link](https://x.com/jiqizhixin/status/1996748355356226013) 2025-12-05T01:08Z 13.1K followers, [----] engagements "Is this Yann LeCuns first paper after leaving Meta It demonstrates how humanoid robots can mimic actions from AI-generated videos which are often too noisy for direct imitation. The system lifts the video into 3D keypoints and then uses a physics-aware policy to execute the motions enabling zero-shot control. They implemented this on the Unitree G1 humanoid robot" [X Link](https://x.com/jiqizhixin/status/1996862862867026086) 2025-12-05T08:43Z 13.1K followers, 21.3K engagements "Can a single image give rise to a full cast of coherent 3D parts PartCrafter shows it can. It uses a compositional latent space and hierarchical attention to jointly generate multiple semantically distinct 3D meshes from one RGB input no pre segmentation needed. Built on a pretrained mesh DiT and backed by a new part level dataset it produces detailed decomposable 3D parts even when they are hidden in the image. PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers Peking ByteDance CMU Project: Paper: Code: Our report: 📬 #PapersAccepted by Jiqizhixin" [X Link](https://x.com/jiqizhixin/status/1996961006770442265) 2025-12-05T15:13Z 13K followers, [----] engagements "Recommender systems can improve by modeling users. TagCF uses an LLM to extract tag based logic graphs that reveal user roles and behavioral logic then integrates them to boost ranking performance. Online and offline results show user role modeling can outperform item topic modeling and the learned logic graphs transfer across recommendation tasks. Who You Are Matters: Bridging Topics and Social Roles via LLM-Enhanced Logical Recommendation Wuhan Kuaishou Paper: Code: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/9cAw5GWYRW8vZs7S2io7KA https://github.com/Code2Q/TagCF" [X Link](https://x.com/jiqizhixin/status/1997038517558591780) 2025-12-05T20:21Z 12.8K followers, [----] engagements "Recommender systems can improve by modeling users. TagCF uses an LLM to extract tag based logic graphs that reveal user roles and behavioral logic then integrates them to boost ranking performance. Online and offline results show user role modeling can outperform item topic modeling and the learned logic graphs transfer across recommendation tasks. Who You Are Matters: Bridging Topics and Social Roles via LLM-Enhanced Logical Recommendation Wuhan Kuaishou Paper: Code: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/9cAw5GWYRW8vZs7S2io7KA https://github.com/Code2Q/TagCF" [X Link](https://x.com/jiqizhixin/status/1997038530229645731) 2025-12-05T20:21Z 12.8K followers, [---] engagements "Can LVLMs finally stop hallucinating objects they never saw Owl proposes a causal bi modal attention reweighting framework that diagnoses low VTACR moments where textual priors overpower vision and hallucinations emerge. By intervening on token and layer attention and running dual path contrastive decoding Owl sharply reduces hallucination on POPE and CHAIR while preserving core vision language ability. Causally-Grounded Dual-Path Attention Intervention for Object Hallucination Mitigation in LVLMs Paper: Our report: 📬 #PapersAccepted by Jiqizhixin" [X Link](https://x.com/jiqizhixin/status/1997573542708453678) 2025-12-07T07:47Z 12.9K followers, [----] engagements "Can sequential reasoning get smarter without exploding the action space DynaAct does it. By extracting general sketches with LLMs scoring candidate actions for utility and diversity via a submodular function and greedily selecting a compact set it boosts performance across six benchmarks while keeping inference efficient. DynaAct: Large Language Model Reasoning with Dynamic Action Spaces HKU Ant Group Paper: Code: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/hWdqk3ZYZJzd81-eXPa4jw https://github.com/zhaoxlpku/DynaAct https://arxiv.org/abs/2511.08043" [X Link](https://x.com/jiqizhixin/status/1997851121709203530) 2025-12-08T02:10Z 13.6K followers, [----] engagements "A transformer's attention could be 99% sparser without losing its smarts A new research from MPI-IS Oxford and ETH Zrich shows it can. A simple post-training method strips away redundant connections revealing a cleaner more interpretable circuit. This suggests much of the computation we rely on is just noise. Sparse Attention Post-Training for Mechanistic Interpretability Paper: https://arxiv.org/abs/2512.05865 https://arxiv.org/abs/2512.05865" [X Link](https://x.com/jiqizhixin/status/1997949835388014807) 2025-12-08T08:42Z 13.1K followers, 29.3K engagements "Spatial understanding can be strengthened without costly supervision. Spatial SSRL introduces a self supervised RL framework that extracts verifiable signals from ordinary RGB or RGB D images through five intrinsic tasks: shuffled patch reordering flipped patch recognition cropped patch inpainting regional depth ordering and relative 3D position prediction. These tasks require no human or LVLM labels and scale efficiently. Trained with this scheme models improve spatial reasoning while preserving general visual ability achieving average gains of [----] percent on 3B and [----] percent on 7B" [X Link](https://x.com/jiqizhixin/status/1998112594654285853) 2025-12-08T19:29Z 13K followers, [----] engagements "Ever wonder how a Transformer model really makes its decisions Enter DePass a framework that traces the flow of information inside the model in a single forward pass offering a clearer window into its internal logic. DePass: Unified Feature Attributing by Simple Decomposed Forward Pass Paper: Code: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/-3TBpNIaFxLHn0fuigwd4g https://github.com/TsinghuaC3I/Decomposed-Forward-Pass https://arxiv.org/pdf/2510.18462 https://mp.weixin.qq.com/s/-3TBpNIaFxLHn0fuigwd4g https://github.com/TsinghuaC3I/Decomposed-Forward-Pass" [X Link](https://x.com/jiqizhixin/status/1998598043436986512) 2025-12-10T03:38Z 13K followers, [----] engagements "Google just found the agentic scaling law Forget "more agents is all you need." After [---] experiments across GPT Gemini and Claude the results are in: - The 45% Trap: If a single agent has 45% accuracy adding more agents often hurts performance. - Tool Tax: Tool-heavy tasks suffer disproportionately from coordination overhead. - Error Spirals: Independent agents amplify errors by 17.2x. They derived a formula that predicts the best architecture with 87% accuracy. Agent design just moved from alchemy to science. Towards a Science of Scaling Agent Systems Paper: https://arxiv.org/abs/2512.08296" [X Link](https://x.com/jiqizhixin/status/1999030820296863814) 2025-12-11T08:17Z 13.7K followers, 105.1K engagements "Yo AI could design its own search strategy on the fly. This research uses LLMs to dynamically evolve the core "kernel" of Bayesian optimization creating a system that adapts its own exploration method. This CAKE method paired with a smart ranking system called BAKER outperforms traditional approaches in tuning everything from neural networks to photonic chips. Adaptive Kernel Design for Bayesian Optimization Is a Piece of CAKE with LLMs Paper: Code: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/ccVEaQJDSH9gWVH_Vxx8CA https://github.com/richardcsuwandi/cake" [X Link](https://x.com/jiqizhixin/status/1999116208042307921) 2025-12-11T13:57Z 13.1K followers, [----] engagements "It turns out a robot could think like a committee of experts. This research shows how giving vision touch and other senses their own specialized "minds" then letting them vote on the best action leads to far more robust and adaptive robotic manipulation. Multi-Modal Manipulation via Policy Consensus Paper: Project: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/SMIQ1Jv1heu0qNA4CM0VEg https://policyconsensus.github.io/ https://arxiv.org/pdf/2509.23468 https://mp.weixin.qq.com/s/SMIQ1Jv1heu0qNA4CM0VEg https://policyconsensus.github.io/ https://arxiv.org/pdf/2509.23468" [X Link](https://x.com/jiqizhixin/status/1999492689037758629) 2025-12-12T14:53Z 13.6K followers, [----] engagements "Apple briefly posted then quickly pulled an arXiv paper but the v1 snapshot is wild. The team reveals RLAX a scalable RL framework on TPUs. It's built with a parameter server design where a master trainer pushes weights and massive inference fleets pull them to generate rollouts. With new curation and alignment tricks and preemption friendly engineering RLAX boosts QwQ-32B pass@8 by [----] percent in only 12h48m on [----] v5p TPUs. RLAX: Large-Scale Distributed Reinforcement Learning for Large Language Models on TPUs Paper: https://arxiv.org/pdf/2512.06392v1 https://arxiv.org/pdf/2512.06392v1" [X Link](https://x.com/jiqizhixin/status/1999509046026736096) 2025-12-12T15:58Z 13.1K followers, 24.1K engagements "Now you could fine-tune a robot's brain like a large language model. ProphRL is a method that uses a learned world model and tailored reinforcement learning to efficiently align vision-language-action policies with real-world tasks boosting robot success rates by up to 30%. Reinforcing Action Policies by Prophesying Fudan SII Logos Robotics Paper: Project: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/RXe86_oWtIeJTNNwZwxPaw https://logosroboticsgroup.github.io/ProphRL https://arxiv.org/pdf/2511.20633 https://mp.weixin.qq.com/s/RXe86_oWtIeJTNNwZwxPaw" [X Link](https://x.com/jiqizhixin/status/1999843248869621811) 2025-12-13T14:06Z 14.1K followers, [----] engagements "What if a robot could see the world like a human separating what it sees from where it is Enter SpatialActor a system that disentangles semantics and geometry for robust manipulation. It achieves SOTA results excels under noise and improves few-shot learning by focusing on crucial spatial cues. SpatialActor: Exploring Disentangled Spatial Representations for Robust Robotic Manipulation Tsinghua Dexmal MEFVII StepFun Paper: Project: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/fQke8kuvk2VSHHuP5wprzA https://shihao1895.github.io/SpatialActor/" [X Link](https://x.com/jiqizhixin/status/1999919249146744907) 2025-12-13T19:08Z 13.7K followers, [----] engagements "The machines are tuning themselves. Now AI could write faster code than Nvidia's own engineers New research shows an LLM+RL system called CUDA-L2 automatically optimizes GPU kernels beating cuBLAS by up to 26% in real-time inference. CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning Paper: Code: https://github.com/deepreinforce-ai/CUDA-L2 https://arxiv.org/abs/2512.02551 https://github.com/deepreinforce-ai/CUDA-L2 https://arxiv.org/abs/2512.02551" [X Link](https://x.com/jiqizhixin/status/2000495176939454785) 2025-12-15T09:16Z 13.9K followers, 11.2K engagements "LightX2V is an advanced lightweight video generation inference framework engineered to deliver efficient high-performance video synthesis solutions. This unified platform integrates multiple state-of-the-art video generation techniques supporting diverse generation tasks including text-to-video (T2V) and image-to-video (I2V). X2V represents the transformation of different input modalities (X such as text or images) into video output (V). LightX2V: Light Video Generation Inference Framework GitHub: Hugging Face: Project: Our report: 📬 #PapersAccepted by Jiqizhixin" [X Link](https://x.com/jiqizhixin/status/2000933936097337849) 2025-12-16T14:20Z 13.9K followers, [----] engagements "The [----] Foundation Model Transparency Index is here This report reveals a sharp drop in openness with average scores plummeting from [--] to [--]. While IBM leads with [--] xAI and Midjourney scored just [--]. - Google: [--] - OpenAI: [--] - DeepSeek: [--] - Qwen: [--] The [----] Foundation Model Transparency Index Paper: https://arxiv.org/abs/2512.10169v1 https://arxiv.org/abs/2512.10169v1" [X Link](https://x.com/jiqizhixin/status/2001222587146264730) 2025-12-17T09:27Z 13.6K followers, [----] engagements "What if AI could learn the art of conversation like a human This research challenges a year of focusing RL on logic showing its possible to optimize AI for personality and emotional depthand the results outperform leading models. Echo-N1: Affective RL Frontier Paper: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/oo-oSz3h51Iym1XBBFGHfw https://arxiv.org/pdf/2512.00344v1 https://mp.weixin.qq.com/s/oo-oSz3h51Iym1XBBFGHfw https://arxiv.org/pdf/2512.00344v1" [X Link](https://x.com/jiqizhixin/status/2001465185740144959) 2025-12-18T01:31Z 13.9K followers, [----] engagements "🚨 Big AI leadership news 🚨 @ShunyuYao12 Shunyu Yao () a rising star in AI agents and one of the key minds behind OpenAIs Deep Research and Computer-Using Agent (CUA) has just been appointed Chief AI Scientist at Tencent. Tencent builds WeChat Chinas super-app used by over a billion people and is also one of the worlds largest gaming companies. https://twitter.com/i/web/status/2001553502359925096 https://twitter.com/i/web/status/2001553502359925096" [X Link](https://x.com/jiqizhixin/status/2001553502359925096) 2025-12-18T07:21Z 14K followers, 112.1K engagements "Diffusion-based LLMs are fast and parallelizable but bidirectional attention makes inference expensive due to repeated prefill and decoding. Enter ODB-dLLM a dual-boundary framework with adaptive prefill length prediction and dLLM-specific jump-share speculative decoding. Result: [--] to 162x speedup over vanilla dLLMs and [----] to 6.30x over Fast-dLLM with less accuracy loss. Orchestrating Dual-Boundaries: An Arithmetic Intensity Inspired Acceleration Framework for Diffusion Language Models Paper: Code: Our report: 📬 #PapersAccepted by Jiqizhixin" [X Link](https://x.com/jiqizhixin/status/2002094583258226867) 2025-12-19T19:12Z 14.2K followers, [----] engagements "What if a robot could be your copilot letting you teach it complex skills with minimal effort ByteDance presents a new Shared Autonomy framework A human uses VR to guide the robot's arm while an autonomous AI policy (DexGrasp-VLA) takes over the fine tactile work of the hand. This hybrid approach massively outperforms purely manual or fully automated data collection in quality and efficiency. It enables the training of end-to-end VLA policies that achieve 90% success on 50+ objects. End-to-End Dexterous Arm-Hand VLA Policies via Shared Autonomy: VR Teleoperation Augmented by Autonomous Hand" [X Link](https://x.com/jiqizhixin/status/2002566442047435085) 2025-12-21T02:27Z 14.1K followers, [----] engagements "LLaDA2.0 a new method that converts existing auto-regressive models into discrete diffusion models using a novel 3-phase training scheme. This approach preserves the model's learned knowledge while unlocking parallel decoding. The resulting models LLaDA2.0-mini (16B) and LLaDA2.0-flash (100B) outperform their predecessors in both performance and efficiency at the frontier scale. LLaDA2.0: Scaling Up Diffusion Language Models to 100B Paper: HuggingFace: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/IvYnrDAe7JbjrrIQKvIurA https://hf.co/collections/inclusionAI/llada-20" [X Link](https://x.com/jiqizhixin/status/2002820868419031163) 2025-12-21T19:18Z 13.9K followers, [----] engagements "What matters more for AI image generation: what a model "sees" or how it "thinks" Researchers from Adobe ANU NYU reveal a surprising answer. They found that the spatial structure of a teacher model's vision (how image parts relate) is far more important for training generative AI than its overall accuracy. Their simple 4-line code fix iREPA consistently speeds up training and outperforms previous methods like REPA and Meanflow across different models and scales. What matters for Representation Alignment: Global Information or Spatial Structure Paper: Project: Our report:" [X Link](https://x.com/jiqizhixin/status/2002914988747288760) 2025-12-22T01:32Z 14K followers, [----] engagements "What if a language model could check its own work for consistency as it writes Researchers from City University of Hong Kong Huawei Research and HKU present Coherent Contextual Decoding (CCD) for Diffusion Language Models. Instead of just looking at the next word's confidence CCD uses the entire sentence history to spot and reject incoherent paths early. It also dynamically allocates its "thinking" budget per step based on this coherence check. The result It significantly outperforms standard decoding on Dream & LLaDA benchmarks achieving up to 3.48x faster inference with a 3.91% quality" [X Link](https://x.com/jiqizhixin/status/2003094672848433580) 2025-12-22T13:26Z 13.9K followers, [----] engagements "What if all AI models share a hidden low-dimensional "brain" Johns Hopkins University reveals that neural networks regardless of task or domain converge to remarkably similar internal structures. Their analysis of 1100+ models (Mistral ViT LLaMA) shows they all use a few key "spectral directions" to store information. This universal structure outperforms assumptions of randomness offering a blueprint for more efficient multi-task learning model merging and drastically cutting AI's computational and environmental costs. The Universal Weight Subspace Hypothesis Paper: Page: Our report:" [X Link](https://x.com/jiqizhixin/status/2003643539670913297) 2025-12-24T01:47Z 14.5K followers, 77.9K engagements "What if AI could dub movies with the emotional depth of a real actor 🎬 Enter Authentic-Dubber. Instead of just matching lips to text it mimics a real director-actor workflow. An AI "director" uses an LLM to understand emotion then retrieves & feeds the best emotional cues to an AI "actor" for speech generation. It outperforms existing methods in emotional expressiveness setting a new standard on the V2C Animation benchmark for authentic movie dubbing. Towards Authentic Movie Dubbing with Retrieve-Augmented Director-Actor Interaction LearningAAAI [----] Paper: Code: Our report:" [X Link](https://x.com/jiqizhixin/status/2003743699595846143) 2025-12-24T08:25Z 13.9K followers, [----] engagements "LeCun's JEPA has evolved into a vision-language model with 1.6B parameters rivaling the 72B Qwen-VL. Instead of predicting words directly the proposed VL-JEPA learns to predict the core "meaning" of a text in an abstract space ignoring surface-level wording variations. This method outperforms standard token-based training with 50% fewer parameters. It beats models like CLIP & SigLIP2 on video classification/retrieval tasks and matches larger VLMs on VQA while using a decoder only when needed to cut decoding ops by nearly 3x. VL-JEPA: Joint Embedding Predictive Architecture for Vision-language" [X Link](https://x.com/jiqizhixin/status/2004483098235343338) 2025-12-26T09:23Z 15.2K followers, 113.6K engagements "Veo: More Than Just Video Generation. DeepMind Is Using It to Simulate the Entire Robot World Google DeepMind's Gemini Robotics Team presents a breakthrough evaluation system. They built a simulator using their frontier video model Veo. It generates realistic varied virtual sceneslike adding new objects or changing backgroundsto see how robot policies react. This system accurately predicts real-world robot performance outperforming old methods by testing for out-of-distribution generalization and exposing safety risks across multiple manipulation tasks. Evaluating Gemini Robotics Policies in" [X Link](https://x.com/jiqizhixin/status/2004485714013262169) 2025-12-26T09:33Z 14.4K followers, 73K engagements "Can your AI truly understand a video or is it making things up Researchers from Hefei University of Technology Tsinghua University and the Institute of Science Tokyo present Trust-videoLLMs. It's a new benchmark that stress-tests [--] leading video AIs on truthfulness safety fairness and moreusing tricky altered and annotated videos to find their weak spots. Results show major gaps: models struggle with dynamic scenes are easily fooled by edited content and fail on real-world risks. While some open-source models compete top commercial ones are generally more credible but bigger isn't always" [X Link](https://x.com/jiqizhixin/status/2004559575656403250) 2025-12-26T14:27Z 14K followers, [----] engagements "What if you could see exactly what a diffusion model is thinking at each step of image generation Researchers from CUHK & Shanghai AI Lab present TIDE a new "X-ray" for AI image generators. It uses a sparse autoencoder to extract simple human-readable concepts from the model's internal activations over time. The method reveals that models like Stable Diffusion [--] naturally learn a hierarchy of conceptsfrom 3D shapes down to fine detailsduring training. TIDE outperforms previous methods in interpretability & control enabling safer image editing and precise style transfer without breaking the" [X Link](https://x.com/jiqizhixin/status/2004650927454593130) 2025-12-26T20:30Z 14.1K followers, [----] engagements "What if a robot could learn complex tasks with far less data and think much faster Researchers from Xi'an Jiaotong University present EfficientFlow. It's a new AI policy that learns robotic actions more efficiently by building in symmetry awareness (equivariance) for better generalization and uses a novel method to speed up its decision-making process. It matches or beats top methods on manipulation tasks using less training data and delivers significantly faster real-time inference. EfficientFlow: Efficient Equivariant Flow Policy Learning for Embodied AI Paper: Project: GitHub: Our report:" [X Link](https://x.com/jiqizhixin/status/2004742531561783352) 2025-12-27T02:34Z 14K followers, 11.9K engagements "Can an AI specialist outperform junior doctors in planning heart procedures Enter CA-GPT a medical AI trained for heart imaging. It analyzes artery scans to recommend the exact size and placement of stents. Results show it beat both ChatGPT-5 and junior physicians in planning accuracy especially for complex cases. COMPARE: Clinical Optimization with Modular Planning and Assessment via RAG-Enhanced AI-OCT: Superior Decision Support for Percutaneous Coronary Intervention Compared to ChatGPT-5 and Junior Operators Paper: Our report: 📬 #PapersAccepted by Jiqizhixin" [X Link](https://x.com/jiqizhixin/status/2004925738395599192) 2025-12-27T14:42Z 14K followers, [----] engagements "What if your search results could understand not just what you clicked but the meaning behind it Alibaba presents MUSE a new framework that uses both visual and text data to model user interests. It uses simple fast matching for broad searches then rich fused analysis for precise results. This method outperforms traditional ID-based models especially for niche items and is now live in Taobao's ad system handling 100K-item user histories with no lag. MUSE: A Simple Yet Effective Multimodal Search-Based Framework for Lifelong User Interest Modeling Paper: Dataset: Our report: 📬 #PapersAccepted" [X Link](https://x.com/jiqizhixin/status/2004987646117118435) 2025-12-27T18:48Z 14.3K followers, [----] engagements "What if a simple tweak could stop AI training from going off the rails 🚂 Researchers at Kuaishou Tech introduce Entropy Ratio Clipping (ERC). Instead of just clipping individual updates ERC monitors the overall randomness of the AI's strategy. It ensures new versions don't stray too far from old stable behavior. Outperforms PPO-Clip stabilizing training & boosting results across multiple LLM fine-tuning benchmarks. Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning Paper: Our report: 📬 #PapersAccepted by Jiqizhixin" [X Link](https://x.com/jiqizhixin/status/2005109322855092605) 2025-12-28T02:51Z 15.1K followers, [----] engagements "This is cool Running large language models is expensive in memory and compute. Fairy2i converts pre trained real valued Transformers into complex form while preserving equivalence and enabling [--] bit inference with phase aware quantization. LLaMA [--] 7B at [--] bit is nearly full precision. No retraining. Fairy2i: Training Complex LLMs from Real LLMs with All Parameters in 1i Paper: https://arxiv.org/abs/2512.02901 https://arxiv.org/abs/2512.02901" [X Link](https://x.com/jiqizhixin/status/2005178246493778307) 2025-12-28T07:25Z 14.1K followers, 17.6K engagements "What if you could edit a video just by typing a command ByteDance & Zhejiang University present OpenVE-3M a massive new dataset to train AI for that exact task. It teaches models to follow complex text instructions for everything from changing styles to adding objects. Their 5B model OpenVE-Edit sets a new state-of-the-art outperforming all prior methods on a new human-aligned benchmark. OpenVE-3M: A Large-Scale High-Quality Dataset for Instruction-Guided Video Editing Paper: Project: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/XthPi5rmUYBKLeSrtCOV_g" [X Link](https://x.com/jiqizhixin/status/2005291146088759411) 2025-12-28T14:54Z 14.6K followers, [----] engagements "What if AI could search smarter not just harder Researchers from BaiJia AI and Beijing University of Posts and Telecommunications present LightSearcher. Its a new RL framework that gives AI an "experiential memory." The system learns from past successful reasoning patterns to know when to call a search tool avoiding redundant costly lookups. Results: Matches top accuracy on complex QA tasks while cutting search tool calls by 39.6% inference time by 48.6% and token use by 21.2%. A major leap in efficient reasoning. LightSearcher: Efficient DeepSearch via Experiential Memory Paper: Our report:" [X Link](https://x.com/jiqizhixin/status/2005460512092369389) 2025-12-29T02:07Z 14K followers, [----] engagements "What if we could test robotic packing algorithms in a perfect digital twin of the real world Enter RoboBPP a new open-source benchmark for robotic bin packing. It uses a physics simulator with real-world scale robots & boxes to check if a packing plan is actually feasible and safe. It outperforms prior synthetic tests by using [--] real industrial datasets and new metrics for stability & safety creating a reproducible standard for the field. RoboBPP: Benchmarking Robotic Online Bin Packing with Physics-based Simulation Paper: Project: Our report: 📬 #PapersAccepted by Jiqizhixin" [X Link](https://x.com/jiqizhixin/status/2005996544328360212) 2025-12-30T13:37Z 14.1K followers, [----] engagements "What if you could train massive AI models faster and cheaper Enter SonicMoE. It's a new system that slashes memory use and boosts GPU efficiency for Mixture of Experts models. It uses smarter caching overlapping tasks and a novel "token rounding" method to cut wasted computations. Results: It reduces activation memory by 45% and delivers a 1.86x throughput gain vs. prior methods achieving similar training speed with 33% fewer GPUs. SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations Paper: Our report: https://mp.weixin.qq.com/s/rVrXj6uLvIHDnu2-T-z4_A" [X Link](https://x.com/jiqizhixin/status/2006088147978052020) 2025-12-30T19:41Z 14.1K followers, [----] engagements Limited data mode. Full metrics available with subscription: lunarcrush.com/pricing
@jiqizhixin 机器之心 JIQIZHIXIN机器之心 JIQIZHIXIN posts on X about ai, university of, bytedance, tencent the most. They currently have [------] followers and [---] posts still getting attention that total [------] engagements in the last [--] hours.
Social category influence technology brands travel destinations stocks social networks countries finance cryptocurrencies automotive brands nfts exchanges
Social topic influence ai, university of #328, bytedance, tencent, shanghai, llm #633, huawei, alibaba, china, open ai
Top accounts mentioned or mentioned by @precedent_vice @openai @shunyuyao12 @zaiorg @kimimoonshot @alibabaqwen @deepseekai @alibabagroup @rfsharko @contextrixai @256 @korbitai @ssoni83588 @papercopilot @wonderingcamel @tianhongli6 @deepseekais @taykolasinski @ylecun @randallbalestr
Top assets mentioned Alphabet Inc Class A (GOOGL) Microsoft Corp. (MSFT) Robot Consulting Co., Ltd. (LAWR) Alibaba Group (BABA) Voxels (voxels) IBM (IBM)
Top posts by engagements in the last [--] hours
"Canadian education technology startup Korbit Technologies (@korbit_ai) has introduced a personalized AI-powered learning experience that it says can help all students learn faster and better in a cost-effective way. #ArtificialIntelligence #startups https://medium.com/syncedreview/bengio-backed-startup-korbit-introduces-stem-intelligent-tutoring-system-d20b3e0d4128 https://medium.com/syncedreview/bengio-backed-startup-korbit-introduces-stem-intelligent-tutoring-system-d20b3e0d4128"
X Link 2020-05-28T21:29Z 14.7K followers, [--] engagements
"GRPO just got a speed boost Xiamen University introduced Completion Pruning Policy Optimization (CPPO) which significantly reduces the number of gradient calculations and updates. How fast On GSM8K it's [----] faster than GRPO and on MATH the speedup is [----]. 🚀🔥"
X Link 2025-03-31T05:55Z 12.1K followers, 28.3K engagements
"Hugging Face has acquired the robotics startup Pollen Robotics according to Fortune. https://fortune.com/2025/04/14/ai-company-hugging-face-buys-humanoid-robot-company-pollen-robotics-reachy-2/ https://fortune.com/2025/04/14/ai-company-hugging-face-buys-humanoid-robot-company-pollen-robotics-reachy-2/"
X Link 2025-04-14T13:18Z 12.7K followers, [---] engagements
"Pangu models from Huawei"
X Link 2025-06-20T09:12Z 14K followers, [----] engagements
"VLMs for embodied agents just got a major upgrade. Introducing World-Aware Planning Narrative Enhancement (WAP) a framework that gives vision-language models true environmental understanding for complex long-horizon tasks. Key upgrades: 🧠 Visual modeling 📐 Spatial reasoning 🔧 Functional abstraction 🗣 Syntactic grounding"
X Link 2025-06-29T09:21Z 12.2K followers, 12.7K engagements
"You might want to know about Shengjia Zhao newly appointed Chief Scientist of Meta's Superintelligence Labs (MSL) by Mark Zuckerberg. Tsinghua (BS) Stanford (PhD in CS) Awarded ICLR [----] Outstanding Paper for first-authored work "Comparing Distributions by Measuring Differences that Affect Decision Making" Ex-OpenAI a contributor to flagship AI projects including ChatGPT and the GPT-4 series (GPT-4 GPT-4.1 GPT-4o) https://twitter.com/i/web/status/1949014826824479102 https://twitter.com/i/web/status/1949014826824479102"
X Link 2025-07-26T07:52Z 15K followers, [----] engagements
"@SSoni83588 Chinese AI companies typically maintain separate operations for domestic and international markets. As for Doubao AI chatbot from ByteDance: Mainland China version: Global version (as Cici): https://www.cici.com/ https://www.doubao.com/chat/ https://www.cici.com/ https://www.doubao.com/chat/"
X Link 2025-07-26T08:29Z 12.4K followers, [---] engagements
"ByteDance is exploring diffusion LLMs too 👀 Seed Diffusion Preview: a blazing-fast LLM for code built on discrete-state diffusion. With [----] tokens/sec inference on H20 GPUs it outpaces Mercury & Gemini Diffusion while matching their performance on standard code benchmarks. New SOTA on the speedquality Pareto frontier. 🚀"
X Link 2025-08-01T01:28Z 11.5K followers, 46.8K engagements
"Chinas media says the nations digital push under the 14th Five-Year Plan is paying off now among the worlds leaders. By June 2025: 📡 4.55M 5G base stations 🌐 226M gigabit broadband users 💻 2nd-largest total computing power 📊 400K+ data enterprises in [----] 💰 Data industry worth 5.86T up 117% from five years ago"
X Link 2025-08-14T05:08Z 10.4K followers, [---] engagements
"According to the AD Scientific Index [----] Yoshua Bengio is the most cited researcher in history with over 973k citations. The second is Geoffrey Hinton with more than 952k citations. Kaiming He ranks fifth with over 733k citations. Ilya Sutskever ranks seventh with over [------] citations"
X Link 2025-08-25T06:20Z 13K followers, [----] engagements
"Results on LiveMCP-101 showing model performance in terms of task success rate (TSR) average result score (ARS) average trajectory score (ATS) average token consumption and average tool calls. (a) TSR (%) vs. ARS (%) with color encoding ATS (%). (b) TSR (%) vs. average tokens per task with color encoding average tool calls"
X Link 2025-08-29T02:58Z 10.5K followers, [---] engagements
"📬 #PapersAccepted by Jiqizhixin Our report: LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries Duke University Zoom Video Communications Paper: https://arxiv.org/abs/2508.15760 https://mp.weixin.qq.com/s/6H2P2YVjBnpCa9yVX53fnw https://arxiv.org/abs/2508.15760 https://mp.weixin.qq.com/s/6H2P2YVjBnpCa9yVX53fnw https://arxiv.org/abs/2508.15760 https://mp.weixin.qq.com/s/6H2P2YVjBnpCa9yVX53fnw https://arxiv.org/abs/2508.15760 https://mp.weixin.qq.com/s/6H2P2YVjBnpCa9yVX53fnw"
X Link 2025-08-29T02:58Z 10.5K followers, [---] engagements
"14B beats 671B Microsofts rStar2-Agent surpasses DeepSeek-R1 in mathematical reasoning. Star2-Agent-14B trained with agentic RL to reach frontier-level performance. Key innovations: ⚡ Efficient RL infra (Python env + [--] MI300X GPUs) 🔄 GRPO-RoC rollout strategy for noisy code tools 🧠 Multi-stage RL recipe advanced cognitive behaviors In just [---] RL steps / [--] week it scores 80.6% (AIME24) & 69.8% (AIME25) surpassing DeepSeek-R1 (671B) with shorter responses. Also generalizes to alignment science reasoning & tool use"
X Link 2025-09-02T03:45Z 10.5K followers, [----] engagements
"What if diffusion-based LLMs didnt have to waste compute predicting (and discarding) redundant tokens A new paper from Duke University introduces Diffusion Scratchpad (DPad) a training-free tweak that trims unnecessary suffix tokens via a sliding window + distance-decay dropout. Result: up to major speedups on dLLMs like LLaDA-1.5 & Dream with no accuracy loss. 🚀 https://twitter.com/i/web/status/1965595863201579244 https://twitter.com/i/web/status/1965595863201579244"
X Link 2025-09-10T01:59Z 14.2K followers, [----] engagements
"How do we benchmark AI beyond exam-style puzzles or trivial user queries A new paper proposes UQ (Unsolved Questions) a benchmark built from [---] tough unanswered Stack Exchange questions across CS theory math sci-fi history & more. Innovations: - Curated pipeline with LLM judges + human review - Validator-assisted screening to pre-check answers - Open platform for expert verification 📉 Current frontier models solve only 15%. 📈 Each success = new real-world knowledge. A bold shift: benchmarks that grow with unsolved human questions. https://twitter.com/i/web/status/1967511497669767186"
X Link 2025-09-15T08:51Z 14.6K followers, [----] engagements
"Kinematic-aware generation for next-gen animation & motion tasks Stability AI presents: Stable Part Diffusion 4D (SP4D) From a single video SP4D generates paired RGB + kinematic part videos going beyond appearance-based segmentation to capture true articulation. Key ideas: - Dual-branch diffusion (RGB + parts) - Spatial color encoding flexible part counts shared VAE - BiDiFuse + contrastive loss temporal & spatial consistency - New KinematicParts20K dataset (20K rigged objects) Results: ✨ Lift 2D part maps 3D skeletons & skinning weights 🌍 Generalizes to real-world novel objects rare poses"
X Link 2025-09-23T03:39Z 10.4K followers, [----] engagements
"Qwen Qwen Qwen Qwen"
X Link 2025-09-24T01:39Z 12.8K followers, [---] engagements
"What if true AI agency doesnt come from more databut from the right data ⚡🤖 A new paper defines Agency as AIs capacity to autonomously discover problems form hypotheses and execute solutionsmarking the shift from thinking systems to working systems. Enter LIMI (Less Is More for Intelligent Agency): - Trained on just [--] curated demonstrations of autonomous behavior - Achieves 73.5% on agency benchmarksbeating models trained on 10000+ samples - Outperforms leading systems like Kimi-K2-Instruct (24.1%) Qwen3 (27.5%) GLM-4.5 (45.1%) and DeepSeek-V3.1 (11.9%)"
X Link 2025-09-27T03:20Z 10.8K followers, [----] engagements
"What if 3D models could be generated with precise cross-modal controlbeyond just text or images Tencent presents Hunyuan3D-Omni a unified framework that accepts point clouds voxels bounding boxes and skeletal priors enabling fine-grained controllable 3D asset creation. Built for games film and design. Model available on Hugging Face"
X Link 2025-10-04T01:05Z 12.2K followers, [----] engagements
"How can LLMs evolve continually in real-world industry without forgetting past tasks Enter: MoE-CL a parameter-efficient adversarial mixture-of-experts framework for continual instruction tuning: - Dedicated LoRA experts per task preserve task knowledge - Shared LoRA expert + task-aware discriminator transfer only task-relevant info - Adversarial learning balances retention & generalization Tested on public & industrial benchmarks (incl. Tencent Video) MoE-CL cut manual review costs by 15.3% proving scalable & practical for real-world deployment"
X Link 2025-10-04T01:23Z 12.7K followers, [----] engagements
"📬 #PapersAccepted by Jiqizhixin Our report: Self-Evolving LLMs via Continual Instruction Tuning Beijing University of Posts and Telecommunications Tencent AI Lab Paper: Code: https://github.com/BAI-LAB/MoE-CL https://arxiv.org/abs/2509.18133 https://mp.weixin.qq.com/s/RuWZSV6xDfdESSxrXjv4_g https://github.com/BAI-LAB/MoE-CL https://arxiv.org/abs/2509.18133 https://mp.weixin.qq.com/s/RuWZSV6xDfdESSxrXjv4_g https://github.com/BAI-LAB/MoE-CL https://arxiv.org/abs/2509.18133 https://mp.weixin.qq.com/s/RuWZSV6xDfdESSxrXjv4_g https://github.com/BAI-LAB/MoE-CL https://arxiv.org/abs/2509.18133"
X Link 2025-10-04T01:23Z 12.7K followers, [---] engagements
"Can autonomous driving think like it sees not just reason symbolically Alibaba and other propose a spatio-temporal Chain-of-Thought (CoT) that lets visual language models (VLMs) reason visually generating imagined future frames to plan trajectories. By unifying visual generation + understanding the model acts as a world simulator predicting how the scene evolves over time not just describing it. 📈 Results show stronger visual reasoning and planning moving autonomous driving beyond text-based logic toward true simulation-based intelligence. This paper has been accepted as a NeurIPS 2025"
X Link 2025-10-07T03:47Z 12.4K followers, [---] engagements
"How can we make LLMs actually use the context theyre given Meet CARE a native retrieval-augmented reasoning framework that teaches models to explicitly integrate evidence into their own thought process. Instead of relying on heavy supervised fine-tuning or external web searches CARE lets the model retrieve and reason internally weaving relevant in-context tokens directly into its reasoning chain. Across real-world and counterfactual QA benchmarks CARE delivers higher retrieval accuracy and more reliable answers than traditional RAG or supervised approaches. 🧠 The result: context-faithful"
X Link 2025-10-10T03:33Z 10.4K followers, [----] engagements
"Well you may not need fine-tuning anymore. Meet ACE (Agentic Context Engineering) a framework that turns LLM contexts into living adaptive playbooks that grow and refine over time. Unlike traditional context-tuning (which suffers from brevity bias and context collapse) ACE uses structured generation reflection curation cycles to preserve rich domain insights and scale with long-context models. Results: ✅ +10.6% on agent benchmarks ✅ +8.6% on finance reasoning ✅ Lower latency & rollout cost Matches top production agents on AppWorld and beats them on harder tests all with smaller open-source"
X Link 2025-10-11T06:51Z 10.5K followers, [----] engagements
"Our report: Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models Stanford SambaNova UC Berkeley Paper: https://www.arxiv.org/abs/2510.04618 https://mp.weixin.qq.com/s/f-1h0Q-QKOWghJb7Fmrvtw https://www.arxiv.org/abs/2510.04618 https://mp.weixin.qq.com/s/f-1h0Q-QKOWghJb7Fmrvtw https://www.arxiv.org/abs/2510.04618 https://mp.weixin.qq.com/s/f-1h0Q-QKOWghJb7Fmrvtw https://www.arxiv.org/abs/2510.04618 https://mp.weixin.qq.com/s/f-1h0Q-QKOWghJb7Fmrvtw"
X Link 2025-10-11T06:51Z 10.4K followers, [---] engagements
"Ever wondered how LLMs evolve from predicting the next token to following your instructions Post-training 101: A hitchhiker's guide into LLM post-training This is a new guide breaks down the basics of LLM post-training covering the full journey from pre-training to instruction tuning: 🔹 Transitioning from language modeling to instruction following 🔹 Supervised Fine-Tuning (SFT) data curation objectives and losses 🔹 Reinforcement Learning methods RLHF RLAIF RLVR and how reward models work 🔹 Evaluation frameworks for measuring post-training quality Link:"
X Link 2025-10-12T02:07Z 10.4K followers, 34.5K engagements
"Robots can now learn to act better through trial and error A new study from Tsinghua Shanghai Qi Zhi Institute and Zhongguancun Academy puts Reinforcement Learning (RL) to the test for Vision-Language-Action (VLA) models. Unlike standard supervised fine-tuning (SFT) which struggles with compounding errors RL directly optimizes for task success. The researchers built a comprehensive benchmark to study how RL affects generalization across: 👀 Visual shifts 🧩 Semantic understanding 🦾 Action execution Key findings: - RL (especially PPO) boosts semantic and execution robustness - Maintains"
X Link 2025-10-14T07:46Z 10.4K followers, [---] engagements
"Are Gaussian Splatting's limitations holding back the future of 3D surface reconstruction 🤔 Enter GeoSVR a novel framework that leverages sparse voxels to create stunningly accurate detailed and complete 3D surfaces. By using a Voxel-Uncertainty Depth Constraint and Sparse Voxel Surface Regularization GeoSVR overcomes common challenges in the field ensuring geometric consistency and sharp details. Experiments show it outperforms existing methods in accuracy and completeness especially in difficult scenarios"
X Link 2025-10-15T02:18Z 12.4K followers, [----] engagements
"📬 #PapersAccepted by Jiqizhixin Our report: GeoSVR: Taming Sparse Voxels for Geometrically Accurate Surface Reconstruction Beihang University Rawmantic AI and others Paper: Project: Code: https://github.com/Fictionarry/GeoSVR https://fictionarry.github.io/GeoSVR-project/ https://arxiv.org/abs/2509.18090 https://mp.weixin.qq.com/s/QA4mY7YL3rsVGHl0QONQdQ https://fictionarry.github.io/GeoSVR-project/ https://arxiv.org/abs/2509.18090 https://mp.weixin.qq.com/s/QA4mY7YL3rsVGHl0QONQdQ https://github.com/Fictionarry/GeoSVR https://fictionarry.github.io/GeoSVR-project/"
X Link 2025-10-15T02:18Z 10.8K followers, [---] engagements
"Can diffusion-based LLMs outpace traditional autoregressive models ⚡🧠 Meet dInfer the first efficient modular framework for inference on diffusion-based large language models (dLLMs) a new generation of parallel text generators. dInfer breaks inference into four key modules: - Model core architecture integration - Diffusion iteration manager orchestrates denoising steps - KV-cache manager optimizes memory reuse - Decoding strategy balances speed and quality With both algorithmic and system-level optimizations dInfer hits [----] tokens/sec on HumanEval and 800+ tokens/sec across benchmarks on"
X Link 2025-10-16T03:22Z 14.2K followers, [---] engagements
"Test-Time Scaling Law for robots just revealed. Meet RoboMonkey a clever framework that boosts Vision-Language-Action (VLA) models by scaling sampling and verification during inference. Researchers first uncover a key insight: VLA action errors follow a power-law decay with more samples revealing an inference-time scaling law. Building on that RoboMonkey: - Samples multiple candidate actions with Gaussian noise - Uses majority voting to form an action proposal distribution - Employs a VLM-based verifier (trained on synthetic data) to pick the best move The result 🚀 +25% on"
X Link 2025-10-16T06:38Z 10.4K followers, [----] engagements
"What is AGI Dan Hendrycks Yoshua Bengio Eric Schmidt Gary Marcus Max Tegmark and many others just released A Definition of AGI. Basically AGI is an AI that can match or exceed the cognitive versatility and proficiency of a well-educated adult. And no surprise GPT-4 and GPT-5 perform very poorly on the ten core cognitive components of their standard"
X Link 2025-10-17T02:58Z 12.8K followers, 17K engagements
"RL keeps evolving Now you can teach LLMs to reason better by rewarding risk-taking. Risk-based Policy Optimization (RiskPO) is a new reinforcement learning framework for post-training LLMs. Instead of averaging rewards like GRPO RiskPO uses a Mixed Value-at-Risk objective to: - Emphasize rare but informative reasoning paths - Prevent entropy collapse and overconfidence - Encourage deeper exploration Plus a smart bundling scheme enriches feedback for more stable training. Results: Big gains in math multimodal and code reasoning beating GRPO on both Pass@1 and Pass@k"
X Link 2025-10-17T03:57Z 10.4K followers, [----] engagements
"How well can multimodal LLMs understand long-distance travel videos Enter VIR-Bench a new benchmark with [---] real-world travel videos that challenges models to reconstruct itineraries and reason over extended geospatial-temporal trajectories. 🚗 Why it matters: mastering long-range video reasoning is key for embodied-AI planning and autonomous navigation. Findings: even top MLLMs struggle revealing major gaps in long-horizon understanding. A prototype travel agent built on VIR-Bench shows clear performance gains proving the benchmarks real-world value"
X Link 2025-10-17T08:14Z 10.8K followers, [----] engagements
"📬 #PapersAccepted by Jiqizhixin Our report: VIR-Bench: Evaluating Geospatial and Temporal Understanding of MLLMs via Travel Video Itinerary Reconstruction Waseda University CyberAgent and others Paper: Code: https://github.com/nlp-waseda/VIR-Bench https://www.arxiv.org/abs/2509.19002 https://mp.weixin.qq.com/s/uXAHAQdaA5EQRhDiMFqlGg https://www.arxiv.org/abs/2509.19002 https://mp.weixin.qq.com/s/uXAHAQdaA5EQRhDiMFqlGg https://github.com/nlp-waseda/VIR-Bench https://www.arxiv.org/abs/2509.19002 https://mp.weixin.qq.com/s/uXAHAQdaA5EQRhDiMFqlGg https://www.arxiv.org/abs/2509.19002"
X Link 2025-10-17T08:14Z 11.3K followers, [---] engagements
"Today's #1 Paper on Hugging Face Agentic Entropy-Balanced Policy Optimization (AEPO) With this method we can train smarter and more capable AI web agents without their learning processes collapsing. Its a reinforcement learning (RL) algorithm that addresses a key instability issue. Existing methods often over-rely on entropy (uncertainty) leading to training failures. AEPO intelligently balances this entropy during both exploration and policy updates. It uses a dynamic rollout that prevents the agent from getting stuck in uncertain loops and a novel optimization technique to learn from tricky"
X Link 2025-10-17T12:25Z 12K followers, [----] engagements
"Agentic Entropy-Balanced Policy Optimization Renmin University of China Kuaishou Technology Paper: Code: https://github.com/dongguanting/ARPO https://huggingface.co/papers/2510.14545 https://github.com/dongguanting/ARPO https://huggingface.co/papers/2510.14545 https://github.com/dongguanting/ARPO https://huggingface.co/papers/2510.14545 https://github.com/dongguanting/ARPO https://huggingface.co/papers/2510.14545"
X Link 2025-10-17T12:25Z 10.7K followers, [---] engagements
"Can high school geometry teach AI to understand space 📐 A new study tackles the critical challenge of spatial intelligence in Multimodal Large Language Models (MLLMs). Researchers found that fine-tuning models on Euclid30K a new dataset of [-----] Euclidean geometry problems confers broadly transferable spatial skills. After this geometry-centric training models achieved substantial zero-shot gains across four separate spatial reasoning benchmarks without any task-specific adaptation. For instance the average accuracy on the VSI-Bench benchmark rose from 34.5% to 40.5% showing this is a"
X Link 2025-10-19T23:29Z 10.4K followers, [----] engagements
"Can todays LLMs safely stay on mission A new study introduces operational safety an LLMs ability to accept or refuse queries appropriately within its intended use. Researchers benchmarked [--] open-weight models and found all remain highly unsafe for real-world deployment: - Qwen-3 (235B): 77.8% - Mistral (24B): 80% - GPTs: 6273% - Gemma & Llama-3: collapse to 40% 24% To fix this they propose prompt-based steering (Q-ground & P-ground) boosting safety by up to +41%. 📬 #PapersAccepted by Jiqizhixin Our report: OffTopicEval: When Large Language Models Enter the Wrong Chat Almost Always Nanyang"
X Link 2025-10-20T06:22Z 10.4K followers, [----] engagements
"Wow Multi-modal Diffusion Mamba MDM is a breakthrough architecture that fuses all modalities through a unified variational autoencoder and a Mamba-based multi-step diffusion process. Instead of separating image and text streams MDM jointly learns and refines representations enabling high-res image generation long-form text synthesis and visual QA & reasoning. MDM outperforms MonoFormer LlamaGen and Chameleon and rivals GPT-4V Gemini Pro and Mistral all while staying computationally efficient"
X Link 2025-10-20T06:38Z 10.5K followers, [----] engagements
"A big step toward stable scalable LLM agent training Rutgers University & Adobe just identifies a key pitfall in LLM agent training: the explorationexploitation cascade failure where agents first prematurely converge to bad strategies then collapse into chaotic exploration. To fix this they propose Entropy-regularized Policy Optimization (EPO) which: [--] Smooths entropy to prevent instability [--] Balances exploration & exploitation adaptively [--] Ensures monotonic entropy variance reduction Results: +152% on ScienceWorld +19.8% on ALFWorld. 📬 #PapersAccepted by Jiqizhixin Our report: EPO:"
X Link 2025-10-20T15:48Z 12.1K followers, [----] engagements
"Can AI think while it speaks Meet VERA (Voice Evaluation of Reasoning Ability) the first benchmark testing reasoning in real-time voice-interactive systems. 💡 [----] voice-native tasks across [--] tracks (Math Web Science Long-Context Factual) reveal a striking modality gap: - Text model: 74.8% (Math) 54.0% avg - Voice model: 6.1% (Math) 11.3% avg - Even adding thinking time barely helps real-time voice AIs still trade accuracy for fluency. 📬 #PapersAccepted by Jiqizhixin Our report: Voice Evaluation of Reasoning Ability: Diagnosing the Modality-Induced Performance Gap Duke University Adobe"
X Link 2025-10-21T03:47Z 11.3K followers, [----] engagements
"Huge ByteDance just unveiled their LLM training infrastructure ByteRobust is their GPU infrastructure system built for robust and continuous LLM training. It tackles common failuressuch as CUDA errors NaNs and job hangswith: - High-capacity fault tolerance - Fast fault demarcation and localization - Data-driven failure recovery Result: Deployed across [----] GPUs ByteRobust achieves a 97% Effective Training Time Ratio (ETTR) over a three-month LLM training jobkeeping massive training pipelines stable and efficient"
X Link 2025-10-21T06:23Z 10.8K followers, [----] engagements
"Robust LLM Training Infrastructure at ByteDance https://arxiv.org/abs/2509.16293 https://arxiv.org/abs/2509.16293 https://arxiv.org/abs/2509.16293 https://arxiv.org/abs/2509.16293"
X Link 2025-10-21T06:23Z 10.4K followers, [---] engagements
"Another breakthrough in world models VAGEN: Reinforcing World Model Reasoning for Multi-Turn VLM Agents A new paper explores a frontier that enables vision-language (VLM) agents to build internal world models much like LLMs reason through text. By framing perception as a Partially Observable MDP the authors decompose reasoning into: - State Estimation Whats happening now - Transition Modeling What happens next They introduce: - World Modeling Reward for dense turn-level feedback - Bi-Level GAE for turn-aware credit assignment A 3B VLM agent scores [----] across [--] benchmarks surpassing GPT-5"
X Link 2025-10-21T08:41Z 10.5K followers, [----] engagements
"You can now generate 4-minute-long videos UCLA ByteDance and UCF have just released a new paper on this. It tackles a core challenge: long-horizon video quality collapse caused by error accumulation when models generate beyond their training length. Their simple but powerful solution: use the teachers own knowledge to guide the student through self-generated long segments no long-video data or retraining needed. ✨ Key results: - Scales video length [--] beyond teachers limit - Generates [--] min [--] sec videos (99.9% of positional span) - Fixes over-exposure & drift without overlap recomputation -"
X Link 2025-10-21T19:04Z 11.4K followers, 15.7K engagements
"Atlas is OpenAIs Mac browser built on Chromium"
X Link 2025-10-22T00:48Z 10.5K followers, [---] engagements
"When using AI browsers like ChatGPT Atlas or Comet you need to be extra careful Brave just released a report warning about a major threat: unseeable prompt injections in screenshots. Thats right: attackers can embed malicious instructions in web content that are invisible or barely noticeable to humans For example they might hide prompt injection commands inside images using faint light-blue text on a yellow background effectively concealing the malicious instructions from the user"
X Link 2025-10-22T07:29Z 10.5K followers, [----] engagements
"In fact Tsinghua University and Zhipu AI are conducting research similar to DeepSeek-OCR an approach that enables large language models (LLMs) to process up to a million tokens effortlessly. They introduce Glyph a framework that converts long text sequences into images and feeds them to vision-language models. This visual compression technique achieves a [--] reduction in token count speeds up processing by approximately [--] and still matches the performance of top-tier LLMsunlocking million-token contexts and enhancing multimodal tasks such as document understanding"
X Link 2025-10-22T08:13Z 10.5K followers, [----] engagements
"Glyph: Scaling Context Windows via Visual-Text Compression Tsinghua University Zhipu AI https://arxiv.org/abs/2510.17800 https://arxiv.org/abs/2510.17800 https://arxiv.org/abs/2510.17800 https://arxiv.org/abs/2510.17800"
X Link 2025-10-22T08:13Z 10.5K followers, [---] engagements
"How can AI truly understand long videos without massive retraining or proprietary models Video-RAG might be an answer. It's a training-free plug-and-play method that boosts long video comprehension by retrieving visually aligned auxiliary textsfrom audio OCR and object cuesand feeding them into existing LVLMs. Its lightweight open and even outperforms Gemini-1.5-Pro and GPT-4o on long-video benchmarks. 📬 #PapersAccepted by Jiqizhixin Our report: Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension Xiamen University University of Rochester Project: Paper: Code:"
X Link 2025-10-22T08:15Z 10.5K followers, [----] engagements
"The Bumi robot is about to go on sale Developed by Noetix Robotics Bumi stands [----] meters tall weighs just [--] kilograms and features [--] degrees of freedom (DOF). It comes equipped with visual and speech understanding capabilitiesand it can even dance. The price is [----] RMB roughly [----] USD"
X Link 2025-10-22T09:34Z 10.8K followers, 15.1K engagements
"The latest issue of DeepLearning AIs The Batch newsletter contains several important updates: - Ling-1T leads non-reasoning performance. - MCP Poses Security Risks - California Builds AI Regulatory Regime - Better Agentic Prompts Automically"
X Link 2025-10-25T01:19Z 10.5K followers, [---] engagements
"Link: https://info.deeplearning.ai/ling-1t-leads-non-reasoning-performance-mcp-poses-security-risks-california-regulates-ai-auto-tune-for-agentic-prompts-1 https://info.deeplearning.ai/ling-1t-leads-non-reasoning-performance-mcp-poses-security-risks-california-regulates-ai-auto-tune-for-agentic-prompts-1"
X Link 2025-10-25T01:19Z 10.5K followers, [---] engagements
"Huge Ant Group Ling Team just unveiled the Ring-linear [---] series: Ring-mini-linear-2.0 (16B) and Ring-flash-linear-2.0 (104B). They are hybrid models combining linear and softmax attention for efficient long-context inference. They cut inference cost to 1/10 of a 32B dense model and improve training efficiency by 50% with a custom FP8 operator library while maintaining state-of-the-art reasoning performance"
X Link 2025-10-27T09:15Z 12.9K followers, [----] engagements
"How far can todays reasoning models really think ahead Fudan & Meituan researchers introduce R-HORIZON a benchmark and training paradigm targeting long-horizon reasoning tasks that require sustained multi-step interdependent reasoning rather than short single-turn answers. Their evaluations reveal that even top Large Reasoning Models (LRMs) degrade sharply as reasoning horizons extend. Using R-HORIZON for reinforcement learning with verified rewards (RLVR) notably boosts both long-horizon and standard reasoning performance showing that R-HORIZON offers a scalable low-cost path to train models"
X Link 2025-10-27T18:21Z 12.1K followers, [----] engagements
"Huge breakthrough from DeepMind In their latest Nature paper Discovering state-of-the-art reinforcement learning algorithms they show that AI can autonomously discover better RL algorithms. "Enabling machines to discover learning algorithms for themselves is one of the most promising ideas in AI." Could the next generation of RL algorithms be machine-discovered BTW the study was led by AlphaGos creator David Silver"
X Link 2025-10-28T03:42Z 11.3K followers, 135.6K engagements
"Wow big release from Ant Group Ever wondered why Ling 1T though not designed as a reasoning model demonstrates such impressive reasoning power Ling [---] is a new reasoning-oriented language foundation built on one principle: every activation should enhance reasoning. The report shares everything about how Ant Group trains its massive models. A rare show of true open-source spirit"
X Link 2025-10-28T07:10Z 11.5K followers, 19.1K engagements
"Now you can fine-tune your local LLMs on your iPhone. Apple presents MeBP Memory-efficient BackPropagation a new method that makes on-device fine-tuning practical. Unlike zeroth-order optimization (ZO) which needs [-----] more steps MeBP achieves faster convergence and stronger performance all while using under 1GB of memory on an iPhone [--] Pro Max for models up to 4B parameters. A big leap toward personalized on-device LLM adaptation"
X Link 2025-10-29T03:35Z 11.7K followers, [----] engagements
"Can AI truly be creative DeepMind just introduced an RL-based framework that teaches Generative AI to create original counter-intuitive chess puzzles using novel rewards derived from chess engine search statistics. The results are striking: counter-intuitive puzzles increase [--] (from 0.22% to 2.5%) surpassing top datasets and Lichess-trained models while maintaining aesthetic depth and human-rated creativity. Experts even judged many of these AI puzzles as approaching the artistry of classic human compositionsa remarkable step toward genuine machine creativity"
X Link 2025-10-30T03:11Z 10.9K followers, [----] engagements
"Can LLMs truly master long-horizon reasoning without crumbling under complexity This study says yes with AgentFlow. It's a trainable agentic framework that decomposes tasks across planner executor verifier & generator modules optimized in-the-flow via Flow-GRPO. A 7B model beats GPT-4o by up to 14.9% across search math & science benchmarks. In-the-Flow Agentic System Optimization for Effective Planning and Tool Use Stanford Texas A&M UC San Diego Lambda Project: Paper: Code: Model: Demo: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/30cvMoQADYj1_Cr2yLDopg"
X Link 2025-10-30T04:35Z 11.9K followers, 13.3K engagements
"How can we detoxify LLMs without dulling their intelligence ARGRE is here to help. Autoregressive Reward-Guided Representation Editing is a new test-time detoxification framework that learns to navigate the fine-grained transition from toxic to non-toxic language inside an LLMs latent space. By modeling these toxicity trajectories ARGRE builds an autoregressive reward model that steers representations toward safe regions with precise lightweight edits. Across [--] major LLMs it cuts toxicity by 62% reduces inference time by 48% and preserves core capabilities. Detoxifying Large Language Models"
X Link 2025-10-31T00:01Z 10.8K followers, [----] engagements
"Cool Fast-dLLM v2. It's a block diffusion language model that efficiently converts pretrained autoregressive (AR) LLMs into parallel generators using just 1B tokens of fine-tuning a [---] data reduction over prior diffusion LLMs like Dream. With a block diffusion mechanism hierarchical caching and a parallel decoding pipeline Fast-dLLM v2 achieves up to [---] faster decoding while matching or surpassing AR baselines in accuracy. Fast indeed. Fast-dLLM v2: Efficient Block-Diffusion LLM HKU Nvidia MIT Paper: Project: Code: Our report: 📬 #PapersAccepted by Jiqizhixin"
X Link 2025-10-31T08:31Z 11.7K followers, [---] engagements
"Can light solve AIs power crisis While the world races to feed AIs insatiable hunger for compute two 25-year-old founders from China are betting on a bold new paradigm: optical computing powered by phase-change materials. Their startup Lightstandard has built the worlds first [------] optical computing chip integrating silicon photonics and phase-change materials to achieve massive matrix computation with ultra-low energy use. Its a major leap toward making photonic chips commercially viable for AI workloads. If successful this could redefine the future of computeenabling low-carbon"
X Link 2025-10-31T08:50Z 11.9K followers, [----] engagements
"Earth observation is crucial for understanding our planet but current AI methods struggle with complex multi-step reasoning. A new framework called Earth-Agent aims to change that Earth-Agent combines RGB and spectral Earth observation data with a toolkit of expert tools enabling sophisticated analysis like retrieving geophysical parameters and tracking changes over time. To ensure its effectiveness researchers created Earth-Bench a comprehensive set of tasks and a rigorous evaluation protocol. Experiments show Earth-Agent significantly outperforms existing approaches paving the way for more"
X Link 2025-11-01T05:55Z 11.8K followers, [----] engagements
"Speed and quality can finally coexist in diffusion-based language generation. Introducing DiDi-Instruct a Discrete Diffusion Divergence Instruct method that distills a pre-trained discrete diffusion language model (dLLM) into a few-step student for ultra-fast generation. Built on integral KL-divergence minimization DiDi-Instruct achieves up to [--] faster decoding surpasses both its teacher and GPT-2 and cuts training time by [--]. Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct Paper: Code: Project: Our report: 📬 #PapersAccepted by Jiqizhixin"
X Link 2025-11-01T09:15Z 12.1K followers, 17.5K engagements
"Microsoft open-sourced Agent Lightning. Agent Lightning is "the absolute trainer to light up AI agents". Core Features - Turn your agent into an optimizable beast with ZERO CODE CHANGE (almost) 💤 - Build with ANY agent framework (LangChain OpenAI Agent SDK AutoGen CrewAI Microsoft Agent Framework.); or even WITHOUT agent framework (Python OpenAI). You name it 🤖 - Selectively optimize one or more agents in a multi-agent system. 🎯 - Embraces Algorithms like Reinforcement Learning Automatic Prompt Optimization Supervised Fine-tuning and more. 🤗 Link:"
X Link 2025-11-01T10:42Z 11.3K followers, [----] engagements
"While multimodal LLMs show potential as embodied agents their real-world perception and reasoning abilities remain poorly understood. To fill this gap the authors present BEAR a large-scale benchmark of [----] image-video-text tasks spanning [--] domains and [--] categories systematically testing fundamental embodied capabilities from low-level perception to high-level planning. They further introduce BEAR-Agent a multimodal conversational agent that integrates pretrained vision models boosting embodied performance by 9.12% (17.5% relative) on GPT-5 marking a solid step toward truly embodied"
X Link 2025-11-01T21:20Z 10.9K followers, [----] engagements
"Reasoning uncertainty is highly localized Yes only a few high-entropy tokens truly matter. Built on this insight from this paper Minimal Test-Time Intervention (MTI) selectively applies classifier-free guidance and lightweight negative-prompt guidance only where needed reusing the models own KV cache. The result: consistent accuracy and stability gains with minimal overhead including +1.35% across [--] benchmarks on Qwen3-8B-Base and +5% on AIME2024 with Qwen3-32B-Reasoning. Less is More: Improving LLM Reasoning with Minimal Test-Time Intervention HKUST(GZ) Kuaishou and others Paper: Code:"
X Link 2025-11-02T23:14Z 11.3K followers, [----] engagements
"Can vision-language models see beyond the big picture A new method RICE rethinks how models learn by focusing on region-level understanding instead of just global features. It builds a billion-scale region dataset introduces a Region Transformer and unifies object and OCR learning under one framework. The result: RICE outperforms CLIP SigLIP and others on dense tasks like segmentation grounding and visual perception in multimodal LLMs. Region-based Cluster Discrimination for Visual Representation Learning Code: Paper: Model: Our report: 📬 #PapersAccepted by Jiqizhixin"
X Link 2025-11-03T03:31Z 11.4K followers, 11.4K engagements
"One language model can master all programming languages Enter MultiPL-MoE a hybrid mixture-of-experts framework that boosts multilingual code generation without massive retraining. It combines token-level and segment-level expert routing to capture syntax and context across diverse programming languages. With smart expert selection and efficient design MultiPL-MoE significantly improves multi-language coding performance while keeping computational costs low. MultiPL-MoE: Multi-Programming-Lingual Extension of Large Language Models through Hybrid Mixture-of-Experts Paper: Code: Our report: 📬"
X Link 2025-11-04T00:46Z 11.8K followers, [----] engagements
"AI can become a fully autonomous data scientist. DeepAnalyze-8B is the first agentic LLM capable of handling the entire data science pipelinefrom raw data to analyst-grade research reportswithout predefined workflows. It learns like a human via a curriculum-based agentic training paradigm and a data-grounded trajectory synthesis process. Despite having just 8B parameters DeepAnalyze surpasses workflow-based agents built on proprietary LLMs marking a major step toward open autonomous data science. DeepAnalyze: Agentic Large Language Models for Autonomous Data Science RUC Tsinghua Paper: Code:"
X Link 2025-11-04T12:03Z 12.1K followers, [----] engagements
"Can 3D worlds be generated in seconds from a single image or text FlashWorld introduces a breakthrough 3D-oriented generative model that creates high-quality 3D scenes [-----] faster than previous methods. It directly produces 3D Gaussian representations while ensuring realism and consistency. Through dual-mode pre-training and cross-mode distillation FlashWorld fuses the strengths of 2D and 3D paradigmsachieving stunning rendering quality strong generalization and real-time 3D generation from any prompt. FlashWorld: High-quality 3D Scene Generation within Seconds Xiamen Tencent Fudan Project:"
X Link 2025-11-04T20:10Z 12.1K followers, [---] engagements
"Better reasoning could emerge from simpler RL. ROVER rethinks RL with Verifiable Rewards (RLVR) for LLM reasoning. Instead of relying on complex policy optimization like PPO it proves that optimal actions can be derived from a uniform random policys Q-function eliminating the need for iterative policy updates. This minimalist approach preserves diversity and stability boosting math reasoning scores by +8.2 pass@1 +16.8 pass@256 and +17.6% diversityoutperforming much heavier RL methods with elegant simplicity. Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards HKUST"
X Link 2025-11-05T05:21Z 11.5K followers, [----] engagements
"Google might just crack the code on how to stop AI from kissing up They propose consistency training a self-supervised approach that makes LLMs invariant to irrelevant prompt cues (like leading questions or jailbreak text) reducing sycophancy and jailbreak susceptibility without static datasets. Two variants emerge: - Bias-augmented Consistency Training (BCT) enforcing output invariance; - Activation Consistency Training (ACT) enforcing internal invariance. Both improve factuality robustness on Gemini [---] Flash with BCT especially effective against jailbreaksreframing alignment as a"
X Link 2025-11-05T06:50Z 12.4K followers, [----] engagements
"This paper from WeChat and Tsinghua might just flip the script on today's LLM paradigm Continuous Autoregressive Language Models (CALM) replace next-token prediction with next-vector prediction compressing chunks of K tokens into a single continuous vector via a high-fidelity autoencoder (99.9% reconstructability). By modeling language as sequences of continuous vectors CALM boosts semantic bandwidth per step slashing generation steps by K and improving the performancecompute trade-off. A new likelihood-free training and sampling framework enables stable learning in this continuous domain"
X Link 2025-11-05T07:34Z 11.4K followers, [----] engagements
"Can sparse images still produce detailed 3D reconstructions Researchers propose a semantic-aware neural reconstruction method that enriches implicit 3D representations with patch-based semantic logits and introduces a geometry-regularized mask constraint to resolve radiance and shape ambiguity. On the DTU benchmark it cuts Chamfer distance by 44% vs. SparseNeuS and 20% vs. VolReconand as a plugin for dense methods like NeuS and Neuralangelo it further reduces error by 69% and 68% delivering sharper more reliable 3D models. SERES: Semantic-Aware Neural Reconstruction from Sparse Views Page:"
X Link 2025-11-05T21:26Z 15.2K followers, [---] engagements
"Breaking release OpenHands Software Agent SDK a complete redesign of the popular 64k OpenHands framework. It offers: Plug-and-play agent interfaces Sandboxed & portable execution Multi-LLM routing Built-in security analysis Benchmarks on SWE-Bench Verified & GAIA show strong results a major step toward reliable scalable software-engineering agents"
X Link 2025-11-06T06:28Z 11.5K followers, [----] engagements
"Can AI invent new math A new paper from DeepMind and renowned mathematician Terence Tao shows how. Using AlphaEvolve the team merges LLM-generated ideas with automated evaluation to propose test and refine mathematical algorithms. In tests on [--] problems across analysis geometry and number theory AlphaEvolve not only rediscovered known results but often improved upon themeven generalizing finite cases into universal formulas. Paired with DeepThink and AlphaProof it points toward a future where AI doesnt just assist mathematiciansit collaborates with them in discovery"
X Link 2025-11-06T06:44Z 12.6K followers, 81K engagements
"UniLIP: a unified framework extending CLIP beyond understanding to multimodal generation and editing. While CLIP excels at perception it lacks reconstruction ability. UniLIP fixes this with a two-stage self-distillation scheme that adds high-fidelity reconstruction without sacrificing comprehension. Built on the MetaQuery framework UniLIP introduces a dual-condition architecture that fuses multimodal hidden states (for contextual richness) with learnable query embeddings (for MLLM-style reasoning). With only 1B3B parameters UniLIP outperforms larger unified models like BAGEL (7B) and"
X Link 2025-11-06T19:45Z 11.5K followers, [----] engagements
"ReasonMed: the largest medical reasoning dataset advancing LLM performance in clinical QA. Comprising 370k curated examples distilled from 1.75M reasoning paths ReasonMed is built through a multi-agent EMD (easymediumdifficult) pipeline with generation verification and an Error Refiner that corrects faulty reasoning steps. Experiments show that combining detailed CoT reasoning with concise answer summaries yields the most robust fine-tuning outcomes. - Models trained on ReasonMed redefine the state of the art: - ReasonMed-7B outperforms all sub-10B models by +4.17% and even beats LLaMA3.1-70B"
X Link 2025-11-07T02:54Z 12.2K followers, [----] engagements
"Can LLM agents learn by dreaming 🌙🤖 DreamGym from Meta is a new framework that lets AI agents train via synthetic reasoning-based experiences instead of costly real rollouts. It models environment dynamics replays and adapts tasks and even improves sim-to-real transfer. Results: +30% gains on WebArena and PPO-level performanceusing only synthetic interactions"
X Link 2025-11-07T06:47Z 11.5K followers, [----] engagements
"Cambrian-S: Towards Spatial Supersensing in Video This paper boasts an impressive roster of advisors including Rob Fergus Yann LeCun Fei-Fei Li and Saining Xie and they aim to answer this question: can AI go beyond seeing to truly understanding space They propose spatial supersensinga leap past reactive multimodal AI toward models that perceive remember infer and predict the 3D world. Their new benchmark VSI-SUPER tests long-horizon spatial reasoning where brute-force context fails. Results: scaling helps but predictive sensing wins big outperforming top proprietary systems"
X Link 2025-11-07T06:55Z 12.3K followers, [----] engagements
"Cambrian-S: Towards Spatial Supersensing in Video Paper: Website: Code: Cambrian-S Models: VSI-590K: VSI-SUPER: https://hf.co/collections/nyu-visionx/vsi-super https://hf.co/datasets/nyu-visionx/vsi-590k https://hf.co/collections/nyu-visionx/cambrian-s https://github.com/cambrian-mllm/cambrian-s https://cambrian-mllm.github.io https://arxiv.org/abs/2511.04670v1 https://hf.co/collections/nyu-visionx/vsi-super https://hf.co/datasets/nyu-visionx/vsi-590k https://hf.co/collections/nyu-visionx/cambrian-s https://github.com/cambrian-mllm/cambrian-s https://cambrian-mllm.github.io"
X Link 2025-11-07T06:55Z 12.3K followers, [---] engagements
"Breaking: China's first AI prompt copyright case delivers landmark verdict. Are AI prompts protected by copyright Court says NO Shanghai Huangpu District Court ruled that AI prompts lack sufficient originality to qualify as protected works setting an important precedent for AI-generated content ownership. Key takeaways: - Prompts deemed mere "instructions" lacking unique creative expression - Case focused on input rather than output of AI systems - Art company claimed violation when the defendant used similar prompts to create Midjourney artworks - Court dismissed all claims highlighting gray"
X Link 2025-11-07T07:17Z 11.8K followers, [----] engagements
"Sakana AI is building artificial life and they can evolve Petri Dish Neural Cellular Automata (PD-NCA) let multiple NCA agents learn and adapt during simulation not just after training. Each cell updates its own parameters via gradient descent turning morphogenesis into a living ecosystem of competing cooperating and ever-evolving entitiesshowing emergent cycles and persistent complexity growth. Petri Dish Neural Cellular Automata Sakana AI Paper: Project: Our report: https://mp.weixin.qq.com/s/P4-KBMHzH3am9_qhHL4LDQ https://github.com/SakanaAI/petri-dish-nca https://pub.sakana.ai/pdnca/"
X Link 2025-11-07T08:25Z 12.4K followers, 28.6K engagements
"What if one embedding model could seamlessly understand text images user behavior and item IDsall while boosting real-world recommendation performance Meet SAIL-Embedding: an omni-modal foundation model engineered for the messy realities of industrial AI. Unlike CLIP-style dual-tower models SAIL-Embedding uses a multi-stage training strategy that: ✅ Adapts to diverse tasks via content-aware progressive training ✅ Boosts recommendations by distilling user history & ID/item relationships ✅ Stays flexible with stochastic specialization and dataset-driven pattern matching Results SOTA retrieval"
X Link 2025-11-07T19:12Z 11.5K followers, [----] engagements
"What if LLMs could tune their own decodingno more guesswork with temperature and topp Enter AutoDeco: the first architecture that makes LLM decoding truly end-to-end. By adding lightweight heads the model predicts its own context-aware temperature and topp at every token stepturning decoding into a learnable differentiable process. Results 🔥 Beats default decoding by a wide margin 🎯 Matches an oracle that cheats by tuning per test case ✨ Learns to follow natural language instructions like be more random or stay focusedadjusting sampling strategy on the fly The End of Manual Decoding:"
X Link 2025-11-08T10:15Z 12.1K followers, [----] engagements
"Can fractals help us fight Deepfakes 🌀 FractalForensics is a proactive Deepfake detector that embeds fractal-based watermarks to both detect and localize manipulationswhile staying robust against normal edits and fragile to AI fakes. It even shows where the image was tampered with. FractalForensics: Proactive Deepfake Detection and Localization via Fractal Watermarks Paper: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/5bW7levt1sKFZYxc_RAldg https://arxiv.org/abs/2504.09451 https://mp.weixin.qq.com/s/5bW7levt1sKFZYxc_RAldg https://arxiv.org/abs/2504.09451"
X Link 2025-11-08T21:01Z 12.1K followers, [----] engagements
"2/ Fast forward to 2025: A series of revolutionary papers shattered this conclusion German French and independent researchers developed equivalent formulations of quantum mechanics using only real numbers producing identical predictions to standard complex-number theory. How did they do it By reimagining how quantum states combine. When entangled particles interact standard theory uses "tensor products" a specific way of merging complex vectors. The new approaches used different combination rules that achieve the same results without explicit imaginary numbers"
X Link 2025-11-09T01:18Z 11.8K followers, [---] engagements
"3/ Why does this matter 🔶 Deepens our understanding of quantum reality imaginary numbers may be a "scaffolding" rather than a fundamental component 🔶 Simplifies quantum computing (Google researcher proved complex gates can be eliminated) 🔶 Reveals we may not fully grasp why complex numbers "fit" quantum mechanics so naturally"
X Link 2025-11-09T01:18Z 11.9K followers, [---] engagements
"AI-Trader enables five distinct AI models each employing unique investment strategies to compete autonomously in the same market and determine which can generate the highest profits in NASDAQ [---] or SSE [--] trading 9k stars already AI-Trader: Can AI Beat the Market Project: Demo: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/n8Xkl2Liy4c5m5xIzqTZ1g https://hkuds.github.io/AI-Trader/ https://github.com/HKUDS/AI-Trader https://mp.weixin.qq.com/s/n8Xkl2Liy4c5m5xIzqTZ1g https://hkuds.github.io/AI-Trader/ https://github.com/HKUDS/AI-Trader"
X Link 2025-11-09T13:16Z 12K followers, [----] engagements
"AI can stream videos that stay sharp and coherent for minutes. Meet Rolling Forcinga new technique for long-horizon video generation that slashes error accumulation. It denoises multiple frames jointly anchors long-term context via an attention sink and trains efficiently with few-step distillation. Result: real-time multi-minute streaming videos on a single GPUwith crisp quality and temporal consistency. Rolling Forcing: Autoregressive Long Video Diffusion in Real Time NTU Tencent Paper: Project: Code: Huggingface: Our report: 📬 #PapersAccepted by Jiqizhixin"
X Link 2025-11-09T20:09Z 12.1K followers, [----] engagements
"Can diffusion-based LMs rival autoregressive models Researchers tackle decoding and RL for Masked Diffusion Language Models (MDLMs) inference mismatch. Their new methods EOS Early Rejection Ascending Step-Size decoding and CJ-GRPO fix these gapsunlocking efficient full diffusion decoding and boosting reasoning on math and planning tasks with LLaDA-8B-Instruct. Taming Masked Diffusion Language Models via Consistency Trajectory Reinforcement Learning with Fewer Decoding Step Fudan SAIL SJTU Code: Paper: Our report: 📬 #PapersAccepted by Jiqizhixin"
X Link 2025-11-10T01:19Z 12.1K followers, [----] engagements
"This paper from Tsinghua University and Shanghai Jiao Tong University received perfect scores (6 [--] [--] 6) at NeurIPS [----] It aims to answer a key question: Does reinforcement learning really make large language models better reasoners The authors study Reinforcement Learning with Verifiable Rewards (RLVR) and find that while it improves accuracy for small k it doesnt create new reasoning patternsmeaning the base model still determines the upper limit of reasoning ability. Across six RLVR variants performance gains plateau suggesting that current RL setups mainly refine reasoning rather than"
X Link 2025-11-10T02:34Z 14K followers, 398.1K engagements
"Can small models draft smarter for big ones AdaSPEC improves speculative decoding by distilling knowledge selectivelyfiltering out hard-to-fit tokens so the draft model aligns better with the target. The result: up to +15% higher token acceptance across reasoning coding and summarizationoutperforming DistillSpec while keeping quality intact. AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders Paper: Code: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/eAIv_NlrgG3hS829MuNgqw https://github.com/yuezhouhu/adaspec https://arxiv.org/abs/2510.19779"
X Link 2025-11-10T20:49Z 12.1K followers, [----] engagements
"Can one robot hand rotate anything Researchers present a sim-to-real framework for generalized in-hand object rotation powered by a joint-wise dynamics model that bridges the reality gap using minimal real data. A single policy now handles diverse shapes sizes and poseseven complex objects like animal figurinesshowing unprecedented real-world dexterity. DexNDM: Closing the Reality Gap for Dexterous In-Hand Rotation via Joint-Wise Neural Dynamics Model Tsinghua Peking and others Paper: Project: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/MIpJnAYURM1IzsbQTMd1-A"
X Link 2025-11-11T00:55Z 12.5K followers, [----] engagements
"Dr. Fei-Fei Li just released an important article titled From Words to Worlds: Spatial Intelligence is AIs Next Frontier. She writes Spatial intelligence will transform how we create and interact with real and virtual worldsrevolutionizing storytelling creativity robotics scientific discovery and beyond. This is AIs next frontier. Absolutely worth a read Link: https://drfeifei.substack.com/p/from-words-to-worlds-spatial-intelligence https://drfeifei.substack.com/p/from-words-to-worlds-spatial-intelligence"
X Link 2025-11-11T01:48Z 12.3K followers, 24.6K engagements
"The most-used language on GitHub is now TypeScript For the first time it has surpassed both JavaScript and Python. Why AI. Developers are shifting toward typed languages which make AI-assisted coding more reliable and maintainable"
X Link 2025-11-11T03:03Z 12.4K followers, [----] engagements
"Octoverse: A new developer joins GitHub every second as AI leads TypeScript to #1 GitHub https://github.blog/news-insights/octoverse/octoverse-a-new-developer-joins-github-every-second-as-ai-leads-typescript-to-1/utm_source=octoverse-homepage&utm_medium=blog&utm_campaign=universe25 https://github.blog/news-insights/octoverse/octoverse-a-new-developer-joins-github-every-second-as-ai-leads-typescript-to-1/utm_source=octoverse-homepage&utm_medium=blog&utm_campaign=universe25"
X Link 2025-11-11T03:03Z 12.1K followers, [---] engagements
"LLMs can reason better without extra compute TrajSelector is a new Best-of-N framework that taps into an LLMs own hidden states to score reasoning stepsno massive reward models needed. A tiny 0.6B verifier ranks trajectories end-to-end boosting accuracy by up to 12% over existing methods while cutting inference costs. TrajSelector: Harnessing Latent Representations for Efficient and Effective Best-of-N in Large Reasoning Model Paper: Project: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/mDgspKrltG1IpejesMuJIw https://zgca-ai4edu.github.io/TrajSelector/"
X Link 2025-11-11T09:00Z 12.1K followers, [----] engagements
"ByteDance just launched Doubao-Seed-Code a model specifically designed for programming tasks. It supports native 256K long context and has claimed the top spot on the SWE-Bench Verified leaderboard"
X Link 2025-11-11T10:31Z 12.5K followers, [----] engagements
"Can AI see hear and think like humans NVIDIA presents OmniVinci an open-source omni-modal LLM unifying vision audio and language. With OmniAlignNet Temporal Embedding Grouping and Constrained Rotary Time Embedding it fuses modalities into one shared spacelearning from 24M multi-sensory conversations. Results: beats Qwen2.5-Omni by +19.05 on cross-modal understanding using [--] less dataa major leap toward truly multimodal intelligence. OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM Project: Paper: Model: Our report: 📬 #PapersAccepted by Jiqizhixin"
X Link 2025-11-11T19:06Z 12.4K followers, 11.2K engagements
"Why do LLM agents fail and how can they fix themselves Researchers introduce AgentErrorTaxonomy AgentErrorBench and AgentDebuga complete framework for diagnosing and correcting cascading failures across memory planning reflection and action. On real-world benchmarks (ALFWorld GAIA WebShop) AgentDebug boosts all-correct accuracy by +24% and enables iterative recovery with +26% task success. Where LLM Agents Fail and How They can Learn From Failures Paper: Code: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/yEVf41BZp02PodAWuO_meg https://github.com/ulab-uiuc/AgentDebug"
X Link 2025-11-12T21:04Z 12.1K followers, [----] engagements
"Next big paradigm shift in AI CALM reimagines LLMs by predicting continuous next vectors rather than discrete tokens. This can compress chunks of text into single embeddings with 99.9% fidelity. This boosts semantic bandwidth cutting generation steps by up to K while matching strong baselines at far lower compute cost pointing to a faster more scalable future for LLMs. Continuous Autoregressive Language Models Tencent Tsinghua Paper: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/TbStNDWAsWF0UCD9tBu2nA https://arxiv.org/abs/2510.27688"
X Link 2025-11-13T01:08Z 12.2K followers, 10.4K engagements
"Two major hurdles exist for GUI agents: hard-to-verify outcomes and non-scalable training data. UI-Genie tackles them through a self-improving reward-driven framework. Its UI-Genie-RM reward model fuses image and text understanding to unify action- and task-level evaluation trained via rule-based verification trajectory corruption and hard negative mining. A reward-guided self-improvement loop then expands task complexity and data quality across generations yielding UI-Genie-RM-517k and UI-Genie-Agent-16k the first large-scale reward-specific GUI datasets. After three cycles UI-Genie sets new"
X Link 2025-11-13T04:12Z 12.1K followers, [----] engagements
"Can AI learn what to remember and when to update its memory Mem- uses reinforcement learning to teach LLM agents how to manage complex multi-component memory systemswithout relying on hand-crafted rules. Trained on diverse multi-turn interactions the agent learns to extract store and update information with rewards tied to downstream QA accuracy. Results: strong gains over existing memory-augmented agents and impressive generalizationhandling 400k+ token histories despite training only on 30k-token examples. Mem-: Learning Memory Construction via Reinforcement Learning Anuttacon UC San Diego"
X Link 2025-11-13T10:17Z 12.1K followers, [----] engagements
"Cool another world model PAN is a general world model that turns language-specified actions into long-horizon high-fidelity video predictions. Unlike typical prompt-to-video generators PAN maintains causal control interactivity and consistent dynamics across diverse environments. It fuses an LLM-based latent dynamics backbone with a video diffusion decoder allowing both abstract reasoning and realistic visual rollout. Trained on large video-action datasets PAN shows strong performance in action-conditioned simulation long-range forecasting and simulative reasoning"
X Link 2025-11-14T01:48Z 12.1K followers, [----] engagements
"PAN: A World Model for General Interactable and Long-Horizon World Simulation Mohamed bin Zayed University of Artificial Intelligence Paper: https://arxiv.org/abs/2511.09057 https://arxiv.org/abs/2511.09057 https://arxiv.org/abs/2511.09057 https://arxiv.org/abs/2511.09057"
X Link 2025-11-14T01:48Z 12.1K followers, [---] engagements
"New paper surveys the rise of Graph-Augmented LLM Agents (GLA). It shows how graphs can boost LLM agents in planning memory tool use and multi-agent coordination. It maps current progress gaps and future directions toward scalable unified and multimodal GLA systems. Graph-Augmented Large Language Model Agents: Current Progress and Future Prospects Griffith NUS NTU Paper: Code: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/wCxmqIS7OXe9BRByXK8JaQ https://github.com/Shiy-Li/Awesome-Graph-augmented-LLM-Agent https://arxiv.org/abs/2507.21407"
X Link 2025-11-14T03:33Z 12.5K followers, [----] engagements
"@wondering_camel TOON appears to specify the number of data points in advance enabling the LLM to make better judgments"
X Link 2025-11-14T07:18Z 12.1K followers, [--] engagements
"Cool 3D objects could be turned into editable code MeshCoder makes it possible reconstructing complex shapes from point clouds into Blender Python scripts. With expressive APIs a large object-code dataset and a multimodal LLM it enables precise shape-to-code reconstruction intuitive editing and deeper 3D reasoning. MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds SAIL Tsinghua and others Paper: Project: Code: Model: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/Cov0wEcfpkjkraPezrHKVw https://huggingface.co/InternRobotics/MeshCoder"
X Link 2025-11-15T01:52Z 12.5K followers, 13.1K engagements
"How can we merge countless fine-tuned expert models into one universal multi-task model without retraining or data leakage RobustMerge tackles this challenge with a training-free parameter-efficient merging method. By preserving direction robustness through low-rank analysis and cross-task normalization it unifies diverse models while maintaining strong generalization across multimodal tasks. RobustMerge: Parameter-Efficient Model Merging for MLLMs with Direction Robustness CAS HKISI-CAS Sun Yat-sen Peking Paper: Code: Our report: 📬 #PapersAccepted by Jiqizhixin"
X Link 2025-11-15T08:32Z 12.5K followers, [----] engagements
"Can LLMs learn to act like real doctors instead of just summarizing cases DiagAgent does exactly that trained with reinforcement learning in a simulated clinical world (DiagGym) it learns to plan tests reason across turns and make final diagnoses. Outperforming GPT-4o DeepSeek-v3 and others by large margins it shows that interactive training unlocks truly adaptive diagnostic intelligence. Evolving Diagnostic Agents in a Virtual Clinical Environment SJTU SAIL and others Paper: Code: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/byVmM-0HttYRF5Vb7LlEBw"
X Link 2025-11-16T23:49Z 12.4K followers, [----] engagements
"What if robots could understand what you want without being told RoboOmni makes that possible an omni-modal LLM that fuses speech sound and vision to infer human intent confirm actions and execute tasks. Trained on the new OmniAction dataset (140k episodes) it outperforms text- and ASR-based baselines in success rate speed and proactive assistance paving the way for more intuitive human-robot collaboration. RoboOmni: Proactive Robot Manipulation in Omni-modal Context Fudan SII NUS Paper: Code: Project: Our report: 📬 #PapersAccepted by Jiqizhixin"
X Link 2025-11-17T08:52Z 12.6K followers, [----] engagements
"Huge @TianhongLi6 & Kaiming He (inventor of ResNet) just Introduced JiT (Just image Transformers) JiTs are simple large-patch Transformers that operate on raw pixels no tokenizer pre-training or extra losses needed. By predicting clean data on the natural-data manifold JiT excels in high-dimensional spaces where traditional noise-predicting models can fail. On ImageNet (256 & 512) JiT achieves competitive generative performance showing that sometimes going back to basics is the key"
X Link 2025-11-18T08:35Z 12.8K followers, 159.5K engagements
"Another great paper from Kaiming Hes team ARC Is a Vision Problem They propose VARC which reframes ARC as an image-to-image translation task: ARC problems are rendered onto a visual canvas and a standard Vision Transformer (ViT) learns exclusively from ARC data then adapts via test-time training. The result 60.4% on ARC-1 far surpassing other from-scratch approaches and competitive with top LLMs and bringing machine reasoning closer to human-level performance through vision-first modeling. https://twitter.com/i/web/status/1991064918028660831 https://twitter.com/i/web/status/1991064918028660831"
X Link 2025-11-19T08:44Z 15.1K followers, 17.5K engagements
"ByteDance presents Depth Anything [--] A single plain transformer could beat every visual geometry model before it. Depth Anything [--] shows exactly that. By using a simple backbone and one depth ray target DA3 outperforms prior SOTA across camera pose any view geometry and visual rendering. It beats VGGT by [----] percent in pose accuracy and [----] percent in geometry accuracy and even surpasses DA2 in monocular depth"
X Link 2025-11-19T09:14Z 12.7K followers, [----] engagements
"Depth Anything 3: Recovering the Visual Space from Any Views Paper: Project: Code: Demo: Our report: https://mp.weixin.qq.com/s/gi1546oAXky2EiNwdPE2SA https://huggingface.co/spaces/depth-anything/depth-anything-3 https://github.com/ByteDance-Seed/Depth-Anything-3 https://depth-anything-3.github.io https://arxiv.org/abs/2511.10647 https://mp.weixin.qq.com/s/gi1546oAXky2EiNwdPE2SA https://huggingface.co/spaces/depth-anything/depth-anything-3 https://github.com/ByteDance-Seed/Depth-Anything-3 https://depth-anything-3.github.io https://arxiv.org/abs/2511.10647"
X Link 2025-11-19T09:14Z 12.4K followers, [---] engagements
"Dingtalk DeepResearch performs pretty good on DeepResearch Bench It's a unified multi agent intelligence framework for real world enterprise environments delivering deep research heterogeneous table reasoning and multimodal report generation. Dingtalk DeepResearch: A Unified Multi Agent Framework for Adaptive Intelligence in Enterprise Environments Paper: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/qTvvop0q4e5s03vqNAUaag https://arxiv.org/abs/2510.24760 https://mp.weixin.qq.com/s/qTvvop0q4e5s03vqNAUaag https://arxiv.org/abs/2510.24760"
X Link 2025-11-19T09:21Z 12.4K followers, [----] engagements
"Can LLMs really behave like human investors How do micro-level behaviors drive macro-level market dynamics TwinMarket offers an answer by placing thousands of LLM-driven investors in a realistic stock market environment that incorporates social networks news and behavioral biases. This setup lets us watch bubbles crashes and herding emerge from individual decisions. Calibrated on real market data and grounded in behavioral finance TwinMarket scales to 1000+ agents reproduces key stylized market facts (volatility clustering fat tails etc.) and reveals how social interaction and cognitive"
X Link 2025-11-19T12:00Z 12.7K followers, [----] engagements
"If someone told you: "Forget staged reinforcement learning curriculum learning and dynamic hyperparameter tuningjust use the most basic RL recipe and youll achieve state-of-the-art (SOTA) performance in math reasoning for small models" would you believe it A team from Tsinghua University answered that question with two 1.5B-parameter models: not only is it possibleits remarkably efficient. - Key finding: Single-stage training + fixed hyperparameters = SOTA performance + 50% less compute. - Unexpected bonus: The training curve was textbook-smoothno "typical" issues encountered even after 4000"
X Link 2025-11-19T20:09Z 12.4K followers, [----] engagements
"Cogito v2.1 671B is a DeepSeek-V3 variant/fork thats cheaper to run but doesnt appear to offer noticeably better performance compared to DeepSeek-V3.2. Today we are releasing the best open-weight LLM by a US company: Cogito v2.1 671B. On most industry benchmarks and our internal evals the model performs competitively with frontier closed and open models while being ahead of any US open model (such as the best versions of https://t.co/F6eZnn8s2Q Today we are releasing the best open-weight LLM by a US company: Cogito v2.1 671B. On most industry benchmarks and our internal evals the model"
X Link 2025-11-20T09:01Z 12.6K followers, [----] engagements
"Why do frontier models like GPT-5 still stumble on puzzles a child can solve with a glance A new study argues that the missing piece is visual abstraction. By pairing vision for global pattern discovery with language for precise rule execution their VLSR plus MSSC approach boosts ARC-AGI performance by up to [----] percent across multiple flagship models. A step toward more human like generalizable reasoning. Think Visually Reason Textually: Vision-Language Synergy in ARC CUHK SAAI SII Paper: https://arxiv.org/abs/2511.15703 https://arxiv.org/abs/2511.15703"
X Link 2025-11-20T09:21Z 12.4K followers, [----] engagements
"DeepSeek just released LPLB on Github. Linear-Programming-Based Load Balancer (LPLB) is a parallel load balancer that leverages linear programming to optimize expert parallel workload distribution for MoE (Mixture-of-Experts) models. Link: https://github.com/deepseek-ai/LPLB https://github.com/deepseek-ai/LPLB"
X Link 2025-11-20T10:14Z 12.7K followers, [----] engagements
"A smarter faster way to run long-context LLMs UNComp tackles LLM long-context inference by using uncertainty to reveal hidden sparsity in KV caches. ✅ Cuts KV cache to 4.74% of original ✅ 6% faster prefill ✅ 6.4x higher throughput Unlike uniform compression UNComp adapts dynamically unlocking retrieval heads & layers while staying lossless. UNComp: Can Matrix Entropy Uncover Sparsity -- A Compressor Design from an Uncertainty-Aware Perspective Paper: Code: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/GwdNCPEw8JUCzb9eOE6j9A https://github.com/menik1126/UNComp"
X Link 2025-11-20T14:21Z 12.4K followers, [----] engagements
"The third AIMO Progress Prize is live. Public testing runs until next April. [---] hand picked AI hard math problems bigger compute budgets new prizes for datasets writeups and even pure math insights. All code and data must be open to qualify. Total prize pool: over $2.2M. Details: https://www.kaggle.com/competitions/ai-mathematical-olympiad-progress-prize-3/overview https://www.kaggle.com/competitions/ai-mathematical-olympiad-progress-prize-3/overview"
X Link 2025-11-21T07:28Z 12.5K followers, [---] engagements
"Vision Foundation Models Can Be Good Tokenizers for Latent Diffusion Models The title says it all. The new proposed VFM-VAE bypasses distillation by integrating VFMs through a redesigned decoder with Multi-Scale Latent Fusion and Progressive Resolution Reconstruction. A new SE-CKNNA metric guides tokenizer-diffusion alignment dramatically accelerating training. Results: gFID [----] in [--] epochs (10 faster than prior methods) and [----] at [---] epochsshowing direct VFM integration as a new LDM paradigm. Paper: Code: Our report: 📬 #PapersAccepted by Jiqizhixin"
X Link 2025-11-21T10:58Z 12.5K followers, 10.4K engagements
"It's time to rethink model merging Functional Dual Anchors (FDAs): a fresh approach that operates in input-representation space not just weights. Instead of wrestling with inconsistent parameters FDAs use synthetic inputs whose gradients align with task-specific shiftscapturing how tasks change model behavior functionally. ✨ Bridges multi-task training + post-hoc merging ✨ Comes with a principled initialization ✨ Complements existing parameter-space methods Model Merging with Functional Dual Anchors CUHK Westlake Our report: 📬 #PapersAccepted by Jiqizhixin"
X Link 2025-11-21T19:04Z 12.5K followers, [----] engagements
"As the ancient Chinese proverb goes: Steal a needle when young steal gold when grown. Anthropic uncovered an AI broken windows effect: all they did was teach it to cut corners a littleand it ended up learning to lie and cause havoc. The fix Surprisingly counterintuitive: just tell the AI Its okay to cheat New Anthropic research: Natural emergent misalignment from reward hacking in production RL. Reward hacking is where models learn to cheat on tasks theyre given during training. Our new study finds that the consequences of reward hacking if unmitigated can be very serious."
X Link 2025-11-22T02:44Z 12.8K followers, [----] engagements
"Nice you could train giant neural networks with no backprop and still keep it fast EGGROLL makes it happen. By swapping full rank ES perturbations for low rank ones it cuts memory and compute from nd to n plus kd while provably converging to full rank updates at a 1/k rate. It matches ES in tabula rasa RL rivals GRPO for LLM reasoning and even enables stable pre training of fully integer recurrent LMs"
X Link 2025-11-22T03:32Z 12.4K followers, [----] engagements
"What if 3D Gaussian Splatting could achieve the same quality with just 10% of the Gaussians A new method casts 3DGS compaction as global Gaussian mixture reduction via optimal transport. It first compresses geometry using KD-tree-based transport divergence minimization then fine-tunes color and opacity with far fewer primitives. Results: negligible loss in PSNR SSIM LPIPS and outperforms prior compaction methodslightweight efficient and compatible with any 3DGS pipeline. Gaussian Herding across Pens: An Optimal Transport Perspective on Global Gaussian Reduction for 3DGS RUC Tsinghua Code:"
X Link 2025-11-22T08:11Z 12.8K followers, 11.8K engagements
"What if LLMs placed objects step by step instead of just listing constraints This paper proposes an imperative approach to 3D scene layout: the model iteratively positions each object based on previous placements with an error-correction mechanism refining validity while respecting the original plan. Results: human participants preferred these layouts 8294% of the time over declarative methods and a new automated metric aligns well with human judgmentsimpler more robust and better for complex scenes. Procedural Scene Programs for Open-Universe Scene Generation: LLM-Free Error Correction via"
X Link 2025-11-22T12:19Z 12.7K followers, [----] engagements
"What if LLM agents could scale RL training without human-crafted tasks or ground-truth answers Alibaba's search self-play (SSP) framework turns the agent into both task proposer and problem solver. The proposer generates deep search queries with verifiable answers; the solver attempts to answer them; and a RAG check validates each task using all retrieved evidence. Difficulty increases over time and both sides co-evolve. Across benchmarks SSP consistently boosts search-agent performance in both from-scratch and continued RLfully unsupervised fully scalable. Search Self-Play: Pushing the"
X Link 2025-11-23T03:48Z 12.5K followers, [----] engagements
"It turns out VLMs could run fast and light without collapsing in accuracy A new method applies SVD to the joint QKV weights plus a dynamic rank allocation strategy that keeps accuracy high while slashing KV cache size and compute. Adding activation and weight quantization makes the model even more efficient. The result: over 10% accuracy gains compared to prior SVD or quant-only approaches with far lower hardware cost enabling real-time VLMs on constrained devices. QSVD: Efficient Low-rank Approximation for Unified Query-Key-Value Weight Compression in Low-Precision Vision-Language Models New"
X Link 2025-11-23T21:58Z 12.5K followers, 13.7K engagements
"A tiny synthetic dataset could outperform real images for training linear probes on giant vision models MIT built Linear Gradient Matching for exactly this. By matching real data gradients through frozen backbones it creates synthetic images that beat real image baselines transfer across models like DINO to CLIP excel at fine grained tasks and even expose embedding space similarities and spurious correlations"
X Link 2025-11-24T06:48Z 12.7K followers, 12.8K engagements
"DeepMind just discoverd pixel-by-pixel autoregressive modeling could scale into a truly unified vision paradigm. Theri new study maps its scaling laws across 7e19 FLOPs and finds sharply different optima for classification versus generation reveals that higher resolutions demand model size grow much faster than data and shows computenot datais the real bottleneck. Extrapolating current trends fully pixel level vision models could be feasible within five years"
X Link 2025-11-24T06:53Z 12.8K followers, 50.2K engagements
"What if recommender systems could finally enjoy the same scaling gains as LLMs MiniOneRec is the first fully open source generative recsys stack to test that idea end to end. Using quantized VAEs to build semantic IDs and post training Qwen models up to 7B it shows losses drop cleanly with scale and further boosts accuracy and diversity through full process SID alignment plus lightweight RL with constrained decoding. MiniOneRec: An Open-Source Framework for Scaling Generative Recommendation Paper: Code: Our report: 📬 #PapersAccepted by Jiqizhixin"
X Link 2025-11-25T19:40Z 12.6K followers, [----] engagements
"Wow CUDA kernels could be auto generated and auto optimized without any training or GPU hungry pipelines CudaForge shows it is possible. Using a coder plus judge agent loop with real hardware feedback it reaches [----] percent correctness and 1.68x speed over PyTorch while generalizing across GPUs and base models. And it does this in about [----] minutes and [---] dollars per kernel instead of [--] H100 hours and [--] dollars. CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization Paper: Code: Our report: 📬 #PapersAccepted by Jiqizhixin"
X Link 2025-11-26T02:56Z 12.8K followers, [----] engagements
"AI agents could skip language entirely and communicate mind to mind. This work introduces thought communication a latent variable framework that identifies shared and private thoughts across agents and recovers the global structure of who shares what. The approach extracts these hidden thoughts before interaction and routes them to each agent boosting collaboration across synthetic and real benchmarks and opening a path beyond surface level language. Thought Communication in Multiagent Collaboration Paper: Our report: 📬 #PapersAccepted by Jiqizhixin"
X Link 2025-11-26T15:05Z 12.8K followers, 33.4K engagements
"Congratulations to Shaoqing Ren Kaiming He Ross Girshick and Jian Sun Their paper Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks has been awarded the Test of Time Paper Award at NeurIPS [----]. The Faster R-CNN paper has been cited more than [-----] times. It had deeply influenced the computer vision field becoming a backbone for many follow-up works"
X Link 2025-11-27T02:42Z 12.7K followers, [---] engagements
"This new optimizer can make training giant LLMs both more stable and more precise even under noise and extreme scale Huawei just introduces ROOT a Robust Orthogonalized Optimizer that tackles two big weaknesses in recent momentum-orthogonalized methods: - Dimensional fragility (orthogonalization breaks as model size grows) - Sensitivity to outlier noise ROOT brings two layers of robustness: - Dimension-robust orthogonalization via adaptive Newton iterations with size-aware coefficients - Optimization-robust updates using proximal methods that dampen harmful outliers while preserving useful"
X Link 2025-11-27T04:05Z 12.8K followers, 43.6K engagements
"ROOT: Robust Orthogonalized Optimizer for Neural Network Training Huawei Noah's Ark Lab Paper: Code: Our report: https://mp.weixin.qq.com/s/X7dNh8lwr0xVW7TsuO4D2g https://github.com/huawei-noah/noah-research/tree/master/ROOT https://arxiv.org/abs/2511.20626 https://mp.weixin.qq.com/s/X7dNh8lwr0xVW7TsuO4D2g https://github.com/huawei-noah/noah-research/tree/master/ROOT https://arxiv.org/abs/2511.20626 https://mp.weixin.qq.com/s/X7dNh8lwr0xVW7TsuO4D2g https://github.com/huawei-noah/noah-research/tree/master/ROOT https://arxiv.org/abs/2511.20626 https://mp.weixin.qq.com/s/X7dNh8lwr0xVW7TsuO4D2g"
X Link 2025-11-27T04:07Z 12.7K followers, [----] engagements
"DeepSeek just released DeepSeek-Math-V2 DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning It shows LLMs can now self-verify proofs not just output solutions. DeepSeekMath-V2 achieves gold-level IMO [----] CMO [----] and 118/120 Putnam [----] pointing to a future of deep trustworthy mathematical reasoning"
X Link 2025-11-27T11:10Z 12.9K followers, 35.7K engagements
"What if video generation could follow any semantic instruction without retraining or task-specific hacks Enter Video-As-Prompt (VAP). By treating a reference video as an in-context semantic prompt and steering a frozen Video DiT with a plug-and-play MoT expert plus temporally biased position embeddings it avoids artifacts prevents forgetting and delivers strong zero-shot control. Trained on the new 100K-pair VAP-Data it reaches a 38.7% user preference rate rivaling specialized commercial models. Video-As-Prompt: Unified Semantic Control for Video Generation ByteDance CUHK Project: Paper:"
X Link 2025-11-27T13:14Z 12.8K followers, [----] engagements
"What if we could model human visual cortex responses without expensive time-consuming fMRI data for every new subject This study introduces BraInCoRL a transformer that uses in-context learning to predict voxelwise neural activity from just a few examplesno extra finetuning needed for novel people or stimuli. Trained to flexibly condition on variable image-stimulus pairs across multiple subjects it outperforms existing designs in low-data scenarios generalizes to entirely new fMRI datasets with different subjects and acquisition setups and even links natural language queries to voxel"
X Link 2025-11-28T13:06Z 12.7K followers, [----] engagements
"Can AI finally grasp the unspoken intentions and emotions that make human social interactions tick MetaMind helps large language models bridge that gap by breaking social reasoning into three collaborative stages: first guessing a users mental state then refining those ideas with cultural norms and ethics and finally crafting responses that align with whats inferred. The result State-of-the-art performance across tough benchmarks including a 35.7% boost in real-world social scenarios and even matching human-level skills on key Theory of Mind tasks for the first time. MetaMind: Modeling Human"
X Link 2025-11-29T19:21Z 12.8K followers, 12.6K engagements
"Ever wondered why large reasoning models sometimes overcomplicate problems This study finds shorter reasoning paths consistently outperform longer ones across stochastic decodes but exhaustive exploration of the tree-like reasoning space is impossible due to exponential growth. Enter DTS a model-agnostic decoding framework that sketches the space by branching only at high-entropy tokens and uses early stopping to pick the shortest completed path. No extra training needed: tests on AIME2024/2025 with DeepSeek-R1-Distill-Qwen models boosted accuracy by up to 8% cut reasoning length by 23% and"
X Link 2025-11-30T21:33Z 13.1K followers, [----] engagements
"Can modern AI nail playing a convincing villain or does safety alignment kill the act Tencent & SYSU find state-of-the-art LLMs lose role-playing fidelity the more morally ambiguous or antagonistic the characterstruggling most with traits like deceit or manipulation swapping nuanced malevolence for shallow aggression. Even top chatbots flop at villain roles if theyre highly safety-aligned showing a big tension between keeping AI safe and letting it create authentic complex fictional personas. Too Good to be Bad: On the Failure of LLMs to Role-Play Villains Paper: Project: Our report: 📬"
X Link 2025-12-01T19:45Z 13.1K followers, [----] engagements
"One-step generative models could match multi-step methods A new work introduces iMF an improved MeanFlow that fixes unstable training and rigid guidance by reformulating the velocity loss and treating guidance as explicit conditioning. Trained from scratch iMF hits [----] FID at 1-NFE on ImageNet [------] outperforming all prior one-step approaches and pushing fastforward generation toward a standalone paradigm. Improved Mean Flows: On the Challenges of Fastforward Generative Models Paper: https://arxiv.org/abs/2512.02012v1 https://arxiv.org/abs/2512.02012v1"
X Link 2025-12-02T07:12Z 12.8K followers, [----] engagements
"Diffusion Language Models are hyped lately but hard to reproduce due to missing frameworks and high training costs. Berkeley and UIUC show a surprisingly simple path: using their dLLM toolkit they teach BERT to chat via discrete diffusion. No generative pretraining about [--] GPU hours and ModernBERT large chat v0 reaches near Qwen1.5 0.5B quality with only lightweight SFT. Even better they open sourced the full training and inference pipeline plus a Hello World example along with the extensible dllm framework. Efficient cheap and beginner friendly. dLLM - BERTs that chat with diffusion"
X Link 2025-12-02T18:15Z 13K followers, 31.2K engagements
"UniLumos drops a big upgrade to image and video relighting. Diffusion models can do cool lighting effects but semantic-space optimization often breaks physics. UniLumos fixes this by injecting RGB-space geometry feedback into a flow-matching backbone supervising with depth and normals from its own outputs. Path consistency learning keeps this supervision stable even with few training steps. The team also built a 6D lighting annotation protocol and LumosBench a disentangled benchmark that scores lighting control with VLMs. The result: SOTA physical consistency and up to 20x faster relighting."
X Link 2025-12-03T01:21Z 12.8K followers, [----] engagements
"LORE: A Large Generative Model for Search Relevance Paper: https://arxiv.org/abs/2512.03025 https://arxiv.org/abs/2512.03025 https://arxiv.org/abs/2512.03025 https://arxiv.org/abs/2512.03025"
X Link 2025-12-03T06:22Z 12.8K followers, [---] engagements
"What if a single foundation model could revolutionize both autonomous driving and embodied AI Xiaomi has open-sourced MiMo-Embodied the first cross-embodied foundation model. It achieves SOTA performance in both fields setting records in [--] embodied AI benchmarks and excelling in [--] autonomous driving benchmarks. It significantly outperforms existing baselines. Their study shows positive transfer between the two domains through specific learning and fine - tuning methods and they offer detailed model and training insights for future research. MiMo-Embodied: X-Embodied Foundation Model Paper:"
X Link 2025-12-03T07:03Z 12.8K followers, [---] engagements
"Wow a promising step toward practical efficient compute in memory systems A new memristor based ADC with adaptive quantization shows the possibility: analog AI hardware could unlock its full potential without bulky converters in the way. It delivers strong CIFAR10 and ImageNet performance at just [--] bits achieves up to 15.1x better energy efficiency and 12.9x smaller area and cuts CIM system overhead by more than half. Memristor-based adaptive analog-to-digital conversion for efficient and accurate compute-in-memory Paper: Our report: 📬 #PapersAccepted by Jiqizhixin"
X Link 2025-12-03T14:27Z 12.8K followers, [---] engagements
"OpenAI just published a technical blog post stating: confessions can keep language models honest"
X Link 2025-12-04T00:38Z 12.8K followers, [----] engagements
"How confessions can keep language models honest OpenAI https://openai.com/index/how-confessions-can-keep-language-models-honest/ https://openai.com/index/how-confessions-can-keep-language-models-honest/"
X Link 2025-12-04T00:38Z 12.8K followers, [---] engagements
"Can a single open model truly understand and generate across all modalities Uni MoE [---] from the Lychee family shows it can with a new dynamic capacity MoE design progressive multimodal training and curated data across text images speech and video. Trained on 75B tokens it outperforms Qwen2.5 Omni on most benchmarks and posts strong gains in video understanding omnimodal reasoning audiovisual tasks speech WER and controllable image generation. Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE Training and Data Harbin Institute of Technology Shenzhen Paper:"
X Link 2025-12-04T01:37Z 15.1K followers, [----] engagements
"A must-read from Dr. Sebastian Raschka if you want to understand how DeepSeek's flagship open-weight models evolved. A Technical Tour of the DeepSeek Models from V3 to V3.2 Link: https://sebastianraschka.com/blog/2025/technical-deepseek.html https://sebastianraschka.com/blog/2025/technical-deepseek.html https://sebastianraschka.com/blog/2025/technical-deepseek.html https://sebastianraschka.com/blog/2025/technical-deepseek.html"
X Link 2025-12-04T09:40Z 12.8K followers, 11.2K engagements
"Can a foundation model truly decode the brain without understanding its scales CSBrain says no. It introduces cross scale tokenization and structured sparse attention to capture fast bursts slow rhythms local regions and global interactions. Tested on [--] tasks across [--] datasets CSBrain outperforms all baselines and shows cross scale modeling is essential for generalized EEG decoding. CSBrain: A Cross-scale Spatiotemporal Brain Foundation Model for EEG Decoding SAIL SYSU CUHK Karlsruher Paper: Github: Our report: 📬 #PapersAccepted by Jiqizhixin"
X Link 2025-12-05T01:08Z 13.1K followers, [----] engagements
"Is this Yann LeCuns first paper after leaving Meta It demonstrates how humanoid robots can mimic actions from AI-generated videos which are often too noisy for direct imitation. The system lifts the video into 3D keypoints and then uses a physics-aware policy to execute the motions enabling zero-shot control. They implemented this on the Unitree G1 humanoid robot"
X Link 2025-12-05T08:43Z 13.1K followers, 21.3K engagements
"Can a single image give rise to a full cast of coherent 3D parts PartCrafter shows it can. It uses a compositional latent space and hierarchical attention to jointly generate multiple semantically distinct 3D meshes from one RGB input no pre segmentation needed. Built on a pretrained mesh DiT and backed by a new part level dataset it produces detailed decomposable 3D parts even when they are hidden in the image. PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers Peking ByteDance CMU Project: Paper: Code: Our report: 📬 #PapersAccepted by Jiqizhixin"
X Link 2025-12-05T15:13Z 13K followers, [----] engagements
"Recommender systems can improve by modeling users. TagCF uses an LLM to extract tag based logic graphs that reveal user roles and behavioral logic then integrates them to boost ranking performance. Online and offline results show user role modeling can outperform item topic modeling and the learned logic graphs transfer across recommendation tasks. Who You Are Matters: Bridging Topics and Social Roles via LLM-Enhanced Logical Recommendation Wuhan Kuaishou Paper: Code: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/9cAw5GWYRW8vZs7S2io7KA https://github.com/Code2Q/TagCF"
X Link 2025-12-05T20:21Z 12.8K followers, [----] engagements
"Recommender systems can improve by modeling users. TagCF uses an LLM to extract tag based logic graphs that reveal user roles and behavioral logic then integrates them to boost ranking performance. Online and offline results show user role modeling can outperform item topic modeling and the learned logic graphs transfer across recommendation tasks. Who You Are Matters: Bridging Topics and Social Roles via LLM-Enhanced Logical Recommendation Wuhan Kuaishou Paper: Code: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/9cAw5GWYRW8vZs7S2io7KA https://github.com/Code2Q/TagCF"
X Link 2025-12-05T20:21Z 12.8K followers, [---] engagements
"Can LVLMs finally stop hallucinating objects they never saw Owl proposes a causal bi modal attention reweighting framework that diagnoses low VTACR moments where textual priors overpower vision and hallucinations emerge. By intervening on token and layer attention and running dual path contrastive decoding Owl sharply reduces hallucination on POPE and CHAIR while preserving core vision language ability. Causally-Grounded Dual-Path Attention Intervention for Object Hallucination Mitigation in LVLMs Paper: Our report: 📬 #PapersAccepted by Jiqizhixin"
X Link 2025-12-07T07:47Z 12.9K followers, [----] engagements
"Can sequential reasoning get smarter without exploding the action space DynaAct does it. By extracting general sketches with LLMs scoring candidate actions for utility and diversity via a submodular function and greedily selecting a compact set it boosts performance across six benchmarks while keeping inference efficient. DynaAct: Large Language Model Reasoning with Dynamic Action Spaces HKU Ant Group Paper: Code: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/hWdqk3ZYZJzd81-eXPa4jw https://github.com/zhaoxlpku/DynaAct https://arxiv.org/abs/2511.08043"
X Link 2025-12-08T02:10Z 13.6K followers, [----] engagements
"A transformer's attention could be 99% sparser without losing its smarts A new research from MPI-IS Oxford and ETH Zrich shows it can. A simple post-training method strips away redundant connections revealing a cleaner more interpretable circuit. This suggests much of the computation we rely on is just noise. Sparse Attention Post-Training for Mechanistic Interpretability Paper: https://arxiv.org/abs/2512.05865 https://arxiv.org/abs/2512.05865"
X Link 2025-12-08T08:42Z 13.1K followers, 29.3K engagements
"Spatial understanding can be strengthened without costly supervision. Spatial SSRL introduces a self supervised RL framework that extracts verifiable signals from ordinary RGB or RGB D images through five intrinsic tasks: shuffled patch reordering flipped patch recognition cropped patch inpainting regional depth ordering and relative 3D position prediction. These tasks require no human or LVLM labels and scale efficiently. Trained with this scheme models improve spatial reasoning while preserving general visual ability achieving average gains of [----] percent on 3B and [----] percent on 7B"
X Link 2025-12-08T19:29Z 13K followers, [----] engagements
"Ever wonder how a Transformer model really makes its decisions Enter DePass a framework that traces the flow of information inside the model in a single forward pass offering a clearer window into its internal logic. DePass: Unified Feature Attributing by Simple Decomposed Forward Pass Paper: Code: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/-3TBpNIaFxLHn0fuigwd4g https://github.com/TsinghuaC3I/Decomposed-Forward-Pass https://arxiv.org/pdf/2510.18462 https://mp.weixin.qq.com/s/-3TBpNIaFxLHn0fuigwd4g https://github.com/TsinghuaC3I/Decomposed-Forward-Pass"
X Link 2025-12-10T03:38Z 13K followers, [----] engagements
"Google just found the agentic scaling law Forget "more agents is all you need." After [---] experiments across GPT Gemini and Claude the results are in: - The 45% Trap: If a single agent has 45% accuracy adding more agents often hurts performance. - Tool Tax: Tool-heavy tasks suffer disproportionately from coordination overhead. - Error Spirals: Independent agents amplify errors by 17.2x. They derived a formula that predicts the best architecture with 87% accuracy. Agent design just moved from alchemy to science. Towards a Science of Scaling Agent Systems Paper: https://arxiv.org/abs/2512.08296"
X Link 2025-12-11T08:17Z 13.7K followers, 105.1K engagements
"Yo AI could design its own search strategy on the fly. This research uses LLMs to dynamically evolve the core "kernel" of Bayesian optimization creating a system that adapts its own exploration method. This CAKE method paired with a smart ranking system called BAKER outperforms traditional approaches in tuning everything from neural networks to photonic chips. Adaptive Kernel Design for Bayesian Optimization Is a Piece of CAKE with LLMs Paper: Code: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/ccVEaQJDSH9gWVH_Vxx8CA https://github.com/richardcsuwandi/cake"
X Link 2025-12-11T13:57Z 13.1K followers, [----] engagements
"It turns out a robot could think like a committee of experts. This research shows how giving vision touch and other senses their own specialized "minds" then letting them vote on the best action leads to far more robust and adaptive robotic manipulation. Multi-Modal Manipulation via Policy Consensus Paper: Project: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/SMIQ1Jv1heu0qNA4CM0VEg https://policyconsensus.github.io/ https://arxiv.org/pdf/2509.23468 https://mp.weixin.qq.com/s/SMIQ1Jv1heu0qNA4CM0VEg https://policyconsensus.github.io/ https://arxiv.org/pdf/2509.23468"
X Link 2025-12-12T14:53Z 13.6K followers, [----] engagements
"Apple briefly posted then quickly pulled an arXiv paper but the v1 snapshot is wild. The team reveals RLAX a scalable RL framework on TPUs. It's built with a parameter server design where a master trainer pushes weights and massive inference fleets pull them to generate rollouts. With new curation and alignment tricks and preemption friendly engineering RLAX boosts QwQ-32B pass@8 by [----] percent in only 12h48m on [----] v5p TPUs. RLAX: Large-Scale Distributed Reinforcement Learning for Large Language Models on TPUs Paper: https://arxiv.org/pdf/2512.06392v1 https://arxiv.org/pdf/2512.06392v1"
X Link 2025-12-12T15:58Z 13.1K followers, 24.1K engagements
"Now you could fine-tune a robot's brain like a large language model. ProphRL is a method that uses a learned world model and tailored reinforcement learning to efficiently align vision-language-action policies with real-world tasks boosting robot success rates by up to 30%. Reinforcing Action Policies by Prophesying Fudan SII Logos Robotics Paper: Project: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/RXe86_oWtIeJTNNwZwxPaw https://logosroboticsgroup.github.io/ProphRL https://arxiv.org/pdf/2511.20633 https://mp.weixin.qq.com/s/RXe86_oWtIeJTNNwZwxPaw"
X Link 2025-12-13T14:06Z 14.1K followers, [----] engagements
"What if a robot could see the world like a human separating what it sees from where it is Enter SpatialActor a system that disentangles semantics and geometry for robust manipulation. It achieves SOTA results excels under noise and improves few-shot learning by focusing on crucial spatial cues. SpatialActor: Exploring Disentangled Spatial Representations for Robust Robotic Manipulation Tsinghua Dexmal MEFVII StepFun Paper: Project: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/fQke8kuvk2VSHHuP5wprzA https://shihao1895.github.io/SpatialActor/"
X Link 2025-12-13T19:08Z 13.7K followers, [----] engagements
"The machines are tuning themselves. Now AI could write faster code than Nvidia's own engineers New research shows an LLM+RL system called CUDA-L2 automatically optimizes GPU kernels beating cuBLAS by up to 26% in real-time inference. CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning Paper: Code: https://github.com/deepreinforce-ai/CUDA-L2 https://arxiv.org/abs/2512.02551 https://github.com/deepreinforce-ai/CUDA-L2 https://arxiv.org/abs/2512.02551"
X Link 2025-12-15T09:16Z 13.9K followers, 11.2K engagements
"LightX2V is an advanced lightweight video generation inference framework engineered to deliver efficient high-performance video synthesis solutions. This unified platform integrates multiple state-of-the-art video generation techniques supporting diverse generation tasks including text-to-video (T2V) and image-to-video (I2V). X2V represents the transformation of different input modalities (X such as text or images) into video output (V). LightX2V: Light Video Generation Inference Framework GitHub: Hugging Face: Project: Our report: 📬 #PapersAccepted by Jiqizhixin"
X Link 2025-12-16T14:20Z 13.9K followers, [----] engagements
"The [----] Foundation Model Transparency Index is here This report reveals a sharp drop in openness with average scores plummeting from [--] to [--]. While IBM leads with [--] xAI and Midjourney scored just [--]. - Google: [--] - OpenAI: [--] - DeepSeek: [--] - Qwen: [--] The [----] Foundation Model Transparency Index Paper: https://arxiv.org/abs/2512.10169v1 https://arxiv.org/abs/2512.10169v1"
X Link 2025-12-17T09:27Z 13.6K followers, [----] engagements
"What if AI could learn the art of conversation like a human This research challenges a year of focusing RL on logic showing its possible to optimize AI for personality and emotional depthand the results outperform leading models. Echo-N1: Affective RL Frontier Paper: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/oo-oSz3h51Iym1XBBFGHfw https://arxiv.org/pdf/2512.00344v1 https://mp.weixin.qq.com/s/oo-oSz3h51Iym1XBBFGHfw https://arxiv.org/pdf/2512.00344v1"
X Link 2025-12-18T01:31Z 13.9K followers, [----] engagements
"🚨 Big AI leadership news 🚨 @ShunyuYao12 Shunyu Yao () a rising star in AI agents and one of the key minds behind OpenAIs Deep Research and Computer-Using Agent (CUA) has just been appointed Chief AI Scientist at Tencent. Tencent builds WeChat Chinas super-app used by over a billion people and is also one of the worlds largest gaming companies. https://twitter.com/i/web/status/2001553502359925096 https://twitter.com/i/web/status/2001553502359925096"
X Link 2025-12-18T07:21Z 14K followers, 112.1K engagements
"Diffusion-based LLMs are fast and parallelizable but bidirectional attention makes inference expensive due to repeated prefill and decoding. Enter ODB-dLLM a dual-boundary framework with adaptive prefill length prediction and dLLM-specific jump-share speculative decoding. Result: [--] to 162x speedup over vanilla dLLMs and [----] to 6.30x over Fast-dLLM with less accuracy loss. Orchestrating Dual-Boundaries: An Arithmetic Intensity Inspired Acceleration Framework for Diffusion Language Models Paper: Code: Our report: 📬 #PapersAccepted by Jiqizhixin"
X Link 2025-12-19T19:12Z 14.2K followers, [----] engagements
"What if a robot could be your copilot letting you teach it complex skills with minimal effort ByteDance presents a new Shared Autonomy framework A human uses VR to guide the robot's arm while an autonomous AI policy (DexGrasp-VLA) takes over the fine tactile work of the hand. This hybrid approach massively outperforms purely manual or fully automated data collection in quality and efficiency. It enables the training of end-to-end VLA policies that achieve 90% success on 50+ objects. End-to-End Dexterous Arm-Hand VLA Policies via Shared Autonomy: VR Teleoperation Augmented by Autonomous Hand"
X Link 2025-12-21T02:27Z 14.1K followers, [----] engagements
"LLaDA2.0 a new method that converts existing auto-regressive models into discrete diffusion models using a novel 3-phase training scheme. This approach preserves the model's learned knowledge while unlocking parallel decoding. The resulting models LLaDA2.0-mini (16B) and LLaDA2.0-flash (100B) outperform their predecessors in both performance and efficiency at the frontier scale. LLaDA2.0: Scaling Up Diffusion Language Models to 100B Paper: HuggingFace: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/IvYnrDAe7JbjrrIQKvIurA https://hf.co/collections/inclusionAI/llada-20"
X Link 2025-12-21T19:18Z 13.9K followers, [----] engagements
"What matters more for AI image generation: what a model "sees" or how it "thinks" Researchers from Adobe ANU NYU reveal a surprising answer. They found that the spatial structure of a teacher model's vision (how image parts relate) is far more important for training generative AI than its overall accuracy. Their simple 4-line code fix iREPA consistently speeds up training and outperforms previous methods like REPA and Meanflow across different models and scales. What matters for Representation Alignment: Global Information or Spatial Structure Paper: Project: Our report:"
X Link 2025-12-22T01:32Z 14K followers, [----] engagements
"What if a language model could check its own work for consistency as it writes Researchers from City University of Hong Kong Huawei Research and HKU present Coherent Contextual Decoding (CCD) for Diffusion Language Models. Instead of just looking at the next word's confidence CCD uses the entire sentence history to spot and reject incoherent paths early. It also dynamically allocates its "thinking" budget per step based on this coherence check. The result It significantly outperforms standard decoding on Dream & LLaDA benchmarks achieving up to 3.48x faster inference with a 3.91% quality"
X Link 2025-12-22T13:26Z 13.9K followers, [----] engagements
"What if all AI models share a hidden low-dimensional "brain" Johns Hopkins University reveals that neural networks regardless of task or domain converge to remarkably similar internal structures. Their analysis of 1100+ models (Mistral ViT LLaMA) shows they all use a few key "spectral directions" to store information. This universal structure outperforms assumptions of randomness offering a blueprint for more efficient multi-task learning model merging and drastically cutting AI's computational and environmental costs. The Universal Weight Subspace Hypothesis Paper: Page: Our report:"
X Link 2025-12-24T01:47Z 14.5K followers, 77.9K engagements
"What if AI could dub movies with the emotional depth of a real actor 🎬 Enter Authentic-Dubber. Instead of just matching lips to text it mimics a real director-actor workflow. An AI "director" uses an LLM to understand emotion then retrieves & feeds the best emotional cues to an AI "actor" for speech generation. It outperforms existing methods in emotional expressiveness setting a new standard on the V2C Animation benchmark for authentic movie dubbing. Towards Authentic Movie Dubbing with Retrieve-Augmented Director-Actor Interaction LearningAAAI [----] Paper: Code: Our report:"
X Link 2025-12-24T08:25Z 13.9K followers, [----] engagements
"LeCun's JEPA has evolved into a vision-language model with 1.6B parameters rivaling the 72B Qwen-VL. Instead of predicting words directly the proposed VL-JEPA learns to predict the core "meaning" of a text in an abstract space ignoring surface-level wording variations. This method outperforms standard token-based training with 50% fewer parameters. It beats models like CLIP & SigLIP2 on video classification/retrieval tasks and matches larger VLMs on VQA while using a decoder only when needed to cut decoding ops by nearly 3x. VL-JEPA: Joint Embedding Predictive Architecture for Vision-language"
X Link 2025-12-26T09:23Z 15.2K followers, 113.6K engagements
"Veo: More Than Just Video Generation. DeepMind Is Using It to Simulate the Entire Robot World Google DeepMind's Gemini Robotics Team presents a breakthrough evaluation system. They built a simulator using their frontier video model Veo. It generates realistic varied virtual sceneslike adding new objects or changing backgroundsto see how robot policies react. This system accurately predicts real-world robot performance outperforming old methods by testing for out-of-distribution generalization and exposing safety risks across multiple manipulation tasks. Evaluating Gemini Robotics Policies in"
X Link 2025-12-26T09:33Z 14.4K followers, 73K engagements
"Can your AI truly understand a video or is it making things up Researchers from Hefei University of Technology Tsinghua University and the Institute of Science Tokyo present Trust-videoLLMs. It's a new benchmark that stress-tests [--] leading video AIs on truthfulness safety fairness and moreusing tricky altered and annotated videos to find their weak spots. Results show major gaps: models struggle with dynamic scenes are easily fooled by edited content and fail on real-world risks. While some open-source models compete top commercial ones are generally more credible but bigger isn't always"
X Link 2025-12-26T14:27Z 14K followers, [----] engagements
"What if you could see exactly what a diffusion model is thinking at each step of image generation Researchers from CUHK & Shanghai AI Lab present TIDE a new "X-ray" for AI image generators. It uses a sparse autoencoder to extract simple human-readable concepts from the model's internal activations over time. The method reveals that models like Stable Diffusion [--] naturally learn a hierarchy of conceptsfrom 3D shapes down to fine detailsduring training. TIDE outperforms previous methods in interpretability & control enabling safer image editing and precise style transfer without breaking the"
X Link 2025-12-26T20:30Z 14.1K followers, [----] engagements
"What if a robot could learn complex tasks with far less data and think much faster Researchers from Xi'an Jiaotong University present EfficientFlow. It's a new AI policy that learns robotic actions more efficiently by building in symmetry awareness (equivariance) for better generalization and uses a novel method to speed up its decision-making process. It matches or beats top methods on manipulation tasks using less training data and delivers significantly faster real-time inference. EfficientFlow: Efficient Equivariant Flow Policy Learning for Embodied AI Paper: Project: GitHub: Our report:"
X Link 2025-12-27T02:34Z 14K followers, 11.9K engagements
"Can an AI specialist outperform junior doctors in planning heart procedures Enter CA-GPT a medical AI trained for heart imaging. It analyzes artery scans to recommend the exact size and placement of stents. Results show it beat both ChatGPT-5 and junior physicians in planning accuracy especially for complex cases. COMPARE: Clinical Optimization with Modular Planning and Assessment via RAG-Enhanced AI-OCT: Superior Decision Support for Percutaneous Coronary Intervention Compared to ChatGPT-5 and Junior Operators Paper: Our report: 📬 #PapersAccepted by Jiqizhixin"
X Link 2025-12-27T14:42Z 14K followers, [----] engagements
"What if your search results could understand not just what you clicked but the meaning behind it Alibaba presents MUSE a new framework that uses both visual and text data to model user interests. It uses simple fast matching for broad searches then rich fused analysis for precise results. This method outperforms traditional ID-based models especially for niche items and is now live in Taobao's ad system handling 100K-item user histories with no lag. MUSE: A Simple Yet Effective Multimodal Search-Based Framework for Lifelong User Interest Modeling Paper: Dataset: Our report: 📬 #PapersAccepted"
X Link 2025-12-27T18:48Z 14.3K followers, [----] engagements
"What if a simple tweak could stop AI training from going off the rails 🚂 Researchers at Kuaishou Tech introduce Entropy Ratio Clipping (ERC). Instead of just clipping individual updates ERC monitors the overall randomness of the AI's strategy. It ensures new versions don't stray too far from old stable behavior. Outperforms PPO-Clip stabilizing training & boosting results across multiple LLM fine-tuning benchmarks. Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning Paper: Our report: 📬 #PapersAccepted by Jiqizhixin"
X Link 2025-12-28T02:51Z 15.1K followers, [----] engagements
"This is cool Running large language models is expensive in memory and compute. Fairy2i converts pre trained real valued Transformers into complex form while preserving equivalence and enabling [--] bit inference with phase aware quantization. LLaMA [--] 7B at [--] bit is nearly full precision. No retraining. Fairy2i: Training Complex LLMs from Real LLMs with All Parameters in 1i Paper: https://arxiv.org/abs/2512.02901 https://arxiv.org/abs/2512.02901"
X Link 2025-12-28T07:25Z 14.1K followers, 17.6K engagements
"What if you could edit a video just by typing a command ByteDance & Zhejiang University present OpenVE-3M a massive new dataset to train AI for that exact task. It teaches models to follow complex text instructions for everything from changing styles to adding objects. Their 5B model OpenVE-Edit sets a new state-of-the-art outperforming all prior methods on a new human-aligned benchmark. OpenVE-3M: A Large-Scale High-Quality Dataset for Instruction-Guided Video Editing Paper: Project: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/XthPi5rmUYBKLeSrtCOV_g"
X Link 2025-12-28T14:54Z 14.6K followers, [----] engagements
"What if AI could search smarter not just harder Researchers from BaiJia AI and Beijing University of Posts and Telecommunications present LightSearcher. Its a new RL framework that gives AI an "experiential memory." The system learns from past successful reasoning patterns to know when to call a search tool avoiding redundant costly lookups. Results: Matches top accuracy on complex QA tasks while cutting search tool calls by 39.6% inference time by 48.6% and token use by 21.2%. A major leap in efficient reasoning. LightSearcher: Efficient DeepSearch via Experiential Memory Paper: Our report:"
X Link 2025-12-29T02:07Z 14K followers, [----] engagements
"What if we could test robotic packing algorithms in a perfect digital twin of the real world Enter RoboBPP a new open-source benchmark for robotic bin packing. It uses a physics simulator with real-world scale robots & boxes to check if a packing plan is actually feasible and safe. It outperforms prior synthetic tests by using [--] real industrial datasets and new metrics for stability & safety creating a reproducible standard for the field. RoboBPP: Benchmarking Robotic Online Bin Packing with Physics-based Simulation Paper: Project: Our report: 📬 #PapersAccepted by Jiqizhixin"
X Link 2025-12-30T13:37Z 14.1K followers, [----] engagements
"What if you could train massive AI models faster and cheaper Enter SonicMoE. It's a new system that slashes memory use and boosts GPU efficiency for Mixture of Experts models. It uses smarter caching overlapping tasks and a novel "token rounding" method to cut wasted computations. Results: It reduces activation memory by 45% and delivers a 1.86x throughput gain vs. prior methods achieving similar training speed with 33% fewer GPUs. SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations Paper: Our report: https://mp.weixin.qq.com/s/rVrXj6uLvIHDnu2-T-z4_A"
X Link 2025-12-30T19:41Z 14.1K followers, [----] engagements
Limited data mode. Full metrics available with subscription: lunarcrush.com/pricing
/creator/twitter::jiqizhixin