@jiqizhixin 机器之心 JIQIZHIXIN机器之心 JIQIZHIXIN posts on X about ai, university of, token, solve the most. They currently have [------] followers and [---] posts still getting attention that total [-----] engagements in the last [--] hours.
Social category influence technology brands 29.78% travel destinations 12.92% stocks 9.55% countries 4.49% social networks 3.93% finance 1.69% gaming 0.56%
Social topic influence ai 24.72%, university of #357, token 6.74%, solve #1608, shanghai 5.62%, alibaba 4.49%, math 3.93%, hong kong #1311, microsoft #713, realtime #544
Top accounts mentioned or mentioned by @shunyuyao12 @xmaxaiofficial @tsinghuauni @32showing
Top assets mentioned Microsoft Corp. (MSFT) Alphabet Inc Class A (GOOGL) Robot Consulting Co., Ltd. (LAWR) Alibaba Group (BABA)
Top posts by engagements in the last [--] hours
"Ever wonder why Vision-Language Models get distracted by empty space and the bottom of your images Researchers from Shanghai University and Nankai University introduce a new way to fix biased AI attention. They developed two simple debiasing tools that clean up attention noise. By removing the model's tendency to favor tokens just because of their position and ignoring useless padding they make token pruning much more accurate and efficient. The method outperformed [--] state-of-the-art pruning techniques across [--] benchmarks delivering major performance gains for both image and video tasks."
X Link 2026-02-11T03:46Z 15.9K followers, [----] engagements
"New paradigm from Kaiming He's team: Drifting Models With this approach you can generate a perfect image in a single step. The team trains a "drifting field" that smoothly moves samples toward equilibrium with the real data distribution. The result A one-step generator that sets a new SOTA on ImageNet 256x256 beating complex multi-step models. https://twitter.com/i/web/status/2019308224223354936 https://twitter.com/i/web/status/2019308224223354936"
X Link 2026-02-05T07:12Z 15.9K followers, 312.6K engagements
"What if standard Reinforcement Learning isn't actually training your models to find the most likely correct answers Researchers from CMU Tsinghua Zhejiang and UC Berkeley introduce MaxRL to fix this fundamental limitation. MaxRL is a new framework that bridges the gap between standard RL and exact maximum likelihood. By using a sampling-based approach that scales with available compute it more directly optimizes for the correct outcome rather than settling for a rough approximation. The results are massive: MaxRL Pareto-dominates existing methods delivering up to 20x better test-time scaling"
X Link 2026-02-12T17:21Z 15.9K followers, 11.3K engagements
"Ever wondered why fish swim in short bursts instead of one steady motion Enter ZBot It's a bio-inspired robot that mimics larval zebrafish using brain-like neural networks to test burst-and-glide swimming versus moving at a constant speed. The results prove that swimming in bursts is significantly more energy-efficient than continuous movement. It turns out that moving in cycles does more than just reduce water drag; it allows the robot's motors to operate at their peak power efficiency across almost all speeds. Energy efficiency and neural control of continuous versus intermittent swimming"
X Link 2026-02-11T17:50Z 15.9K followers, [---] engagements
"To Think or Not To Think That is The Question for Large Reasoning Models in Theory of Mind Tasks Paper: https://arxiv.org/abs/2602.10625 https://arxiv.org/abs/2602.10625"
X Link 2026-02-12T07:47Z 15.9K followers, [---] engagements
"What if your AI could generate high-quality code at nearly [---] tokens per second Ant Group in collaboration with researchers from Zhejiang University and Westlake University presents LLaDA2.1. Instead of just filling in blanks this model uses a smart token editing system that refines text as it goes. This allows you to switch between a Speedy Mode for raw velocity and a Quality Mode for high-precision tasks all powered by a new reinforcement learning framework designed specifically for diffusion models. The results are massive: the 100B model hits a record-breaking [---] tokens per second on"
X Link 2026-02-12T08:46Z 15.9K followers, [----] engagements
"LLaDA2.1: Speeding Up Text Diffusion via Token Editing Paper: Hugging Face: ModelScope: GitHub: Tech Report: Our report: https://mp.weixin.qq.com/s/XEG5MQMHaOXO-IRY6O09Vg https://huggingface.co/papers/2602.08676 https://modelscope.cn/collections/inclusionAI/LLaDA21 https://huggingface.co/collections/inclusionAI/llada21 https://github.com/inclusionAI/LLaDA2.X/blob/main/llada2_1_tech_report.pdf https://mp.weixin.qq.com/s/XEG5MQMHaOXO-IRY6O09Vg https://huggingface.co/papers/2602.08676 https://modelscope.cn/collections/inclusionAI/LLaDA21 https://huggingface.co/collections/inclusionAI/llada21"
X Link 2026-02-12T08:46Z 15.9K followers, [---] engagements
"How can we train humanoid robots to handle new environments safely and instantly Researchers from BIGAI and Xidian University present a breakthrough in humanoid control By combining large-scale pretraining with a physics-smart world model the robot can safely practice new tasks in its mind before trying them for real. This approach enables successful zero-shot deployment on real hardware and far exceeds traditional methods in adapting to unpredictable out-of-distribution tasks. Towards Bridging the Gap between Large-Scale Pretraining and Efficient Finetuning for Humanoid Control Paper: Paper:"
X Link 2026-02-14T04:03Z 15.9K followers, [----] engagements
"Can video generation AI actually understand the physical world or is it just a digital illusion Researchers from HKUST (GZ) Tongji University and Kuaishou Technology present a new mechanistic framework to bridge the gap between video generation and true world models They break down video AI into two pillars: state construction which builds an internal memory of the scene and dynamics modeling which handles how objects move and interact over time. This approach moves evaluation beyond just looking good to mastering physical persistence and causal reasoning outperforming traditional visual-only"
X Link 2026-02-14T18:53Z 15.9K followers, [----] engagements
"AlphaGo Moment for Model Architecture Discovery Paper: https://arxiv.org/abs/2507.18074 https://arxiv.org/abs/2507.18074"
X Link 2025-07-25T06:41Z 15.9K followers, 178.2K engagements
"Wow language models can talk without words. A new framework Cache-to-Cache (C2C) lets multiple LLMs communicate directly through their KV-caches instead of text transferring deep semantics without token-by-token generation. It fuses cache representations via a neural projector and gating mechanism for efficient inter-model exchange. The payoff: up to 10% higher accuracy 35% gains over text-based communication and [--] faster responses. Cache-to-Cache: Direct Semantic Communication Between Large Language Models Code: Project: Paper: Our report: 📬 #PapersAccepted by Jiqizhixin"
X Link 2025-11-03T05:35Z 15.9K followers, 867.8K engagements
"What if you could scan a photorealistic moving avatar of yourself in [--] minutes with just your phone Researchers from Alibaba Group & Shanghai Jiao Tong University present HRM2Avatar. It uses a phone scan to build a 3D mesh of your clothes then attaches smart "light-aware" points to it. This captures how fabric moves and light changes with your pose. The result Avatars with superior realism that run at [---] FPS on an iPhone and [--] FPS on Apple Vision Pro outperforming other mobile methods in speed and visual quality. HRMAvatar: High-Fidelity Real-Time Mobile Avatars from Monocular Phone Scans"
X Link 2025-12-30T04:35Z 15.9K followers, [----] engagements
"Yao Shunyu (Tencent): The "Spy" Paradigm Making his debut after leaving OpenAI Yao argued that the next paradigm shift won't look like a nuclear explosion. It will look like a spy infiltrating a system. It happens when AI starts writing the code that calls its own neural networks. It is a quiet structural takeover. He was blunt about the market: ToC improvements are flattening but the ToB revolution is violent. Companies will pay huge premiums for "strong models" that offer certainty over probability. https://twitter.com/i/web/status/2010606180960239699"
X Link 2026-01-12T06:54Z 15.9K followers, [----] engagements
"Rich Innovation vs. Poor Innovation In a candid roundtable the leaders debated the US-China gap. Lin Junyang described Silicon Valley as "Rich Innovation" (abundance of GPUs allowing wasteful exploration) versus China's "Poor Innovation" (resource constraints forcing extreme optimization). Yao Shunyu critiqued China's "Leaderboard Culture" urging researchers to stop chasing benchmarks and start taking risks on defining the "correct things" to build. https://twitter.com/i/web/status/2010606184139551223 https://twitter.com/i/web/status/2010606184139551223"
X Link 2026-01-12T06:54Z 15.9K followers, [----] engagements
"What if one AI model could handle all the steps of search query suggestions Kuaishou researchers present OneSug a new end-to-end framework. It enriches your initial search generates suggestions in one unified step and ranks them based on fine-grained user behavior. It outperforms traditional multi-stage systems boosting click-through rates orders and revenue on their live e-commerce platform. OneSug: The Unified End-to-End Generative Framework for E-commerce Query Suggestion Paper: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/XsIjyOejB5iesj5WERuRiA"
X Link 2026-01-23T00:23Z 15.9K followers, [----] engagements
"What if we could model vision like a wave moving through space Researchers from Peking & Tsinghua Universities present WaveFormer. They treat image features as signals governed by a wave equation explicitly controlling how low-to-high frequency details evolve across network layers. This new Wave Propagation Operator outperforms standard Vision Transformers in image classification detection and segmentation achieving up to 1.6x higher throughput with 30% fewer computations. WaveFormer: Frequency-Time Decoupled Vision Modeling with Wave Equation Paper: Code: Our report: 📬 #PapersAccepted by"
X Link 2026-01-25T14:49Z 15.9K followers, 15.6K engagements
"What's the single biggest roadblock to creating truly intelligent robots A new survey tackles this. They argue that moving beyond simple point clouds to dense AI-powered 3D "world models" is key. These neural scene representations like NeRF and 3D Gaussian Splatting outperform traditional methods by enabling robots to understand semantics and language revolutionizing navigation and manipulation. What Is The Best 3D Scene Representation for Robotics From Geometric to Foundation Models Paper: Code: Our report: 📬 #PapersAccepted by Jiqizhixin https://mp.weixin.qq.com/s/vBNr9hGz9mEH9cprGLm_GQ"
X Link 2026-01-27T19:54Z 15.9K followers, 10.7K engagements
"What if you could fix AI image generators without breaking them Researchers from HKUST Kuaishou CUHK & Edinburgh present GARDO. It's a new training method that selectively corrects an AI's reward system preventing it from "cheating" its own quality scores and losing creativity. Results: It outperforms standard RL methods by avoiding reward hacking boosting image diversity and maintaining quality across multiple unseen metrics. GARDO: Reinforcing Diffusion Models without Reward Hacking Project: Paper: Our report: 📬 #PapersAccepted by Jiqizhixin"
X Link 2026-01-29T19:12Z 15.9K followers, [----] engagements
"The best AI reasoning might work like a team debate not a solo monologue. Google & University of Chicago researchers show that top reasoning models like DeepSeek-R1 don't just think longerthey simulate a "society of thought." The model internally hosts diverse expert personas that argue and reconcile views. This internal multi-agent debate outperforms standard models on complex reasoning tasks by exploring more solution paths. Reasoning Models Generate Societies of Thought Paper: Our report: https://mp.weixin.qq.com/s/rPhrpvubjoz6IA5hS2DFvA https://arxiv.org/pdf/2601.10825"
X Link 2026-02-01T02:48Z 15.9K followers, [----] engagements
"What if a single AI model could control any robot arm Researchers from Ant Group present LingBot-VLA a new Vision-Language-Action foundation model. It learns from seeing and reading instructions to directly control robotic movements. Trained on 20k hours of real-world data from [--] robot types it clearly outperforms competitors. It shows superior performance and generalization across [--] platforms and [---] different tasks while also being [---] to 2.8x faster to train. A Pragmatic VLA Foundation Model Project: Paper: Model: Our report: https://mp.weixin.qq.com/s/o0WKZi-JFYd8ZDHV6_5Xfg"
X Link 2026-02-03T05:46Z 15.9K followers, [----] engagements
"What if a single image could instantly generate a complete labeled 3D world Researchers from Nanyang Technological University & Shanghai Jiao Tong present SplatSSC. They use predicted depth to smartly place 3D building blocks instead of scattering them randomly then cleanly separate shape and label prediction to reduce errors. It sets a new state-of-the-art for 3D scene reconstruction from photos beating prior methods by over 6% in accuracy while being faster and using less memory. SplatSSC: Decoupled Depth-Guided Gaussian Splatting for Semantic Scene Completion Paper: Code: Our report: 📬"
X Link 2026-02-03T19:49Z 15.9K followers, [----] engagements
"What if giving an AI the answer key still isn't enough for it to solve the problem New research from Tencent's Hunyuan team & Fudan University introduces CL-bench a new benchmark for "Context Learning." They found that simply providing all necessary info in the context doesn't guarantee a model can complete a task. The core challenge is a model's ability to learn from that context on the fly. Results show a major gap in current models' context learning ability. They often fail to use provided examples/logic highlighting a key bottleneck for real-world AI applications beyond just having long"
X Link 2026-02-03T22:34Z 15.9K followers, [----] engagements
"What if an AI could train itself to solve your specific problem on the spot Researchers from Stanford NVIDIA UC San Diego & Together AI present TTT-Discover. They let a large language model do reinforcement learning during a test focusing all its learning power on finding one brilliant solution for that exact challenge. This method set new state-of-the-art records in math proofs GPU kernel speed (2x faster) algorithm competitions and a biology denoising taskall using an open-source model for just a few hundred dollars per problem. Learning to Discover at Test Time Paper: Project: Our report:"
X Link 2026-02-04T18:58Z 15.9K followers, [----] engagements
"Can LLMs self-improve without ground-truth rewards Researchers from Microsoft Research & UC San Diego propose Test-time Recursive Thinking (TRT). It's a self-improvement loop where the model generates multiple solution strategies uses its own accumulated knowledge to verify them and then iteratively refines its answers. The results Open-source models hit 100% on AIME math problems and top closed-source models improved by 10-15 percentage points on the hardest LiveCodeBench tasks. https://twitter.com/i/web/status/2019331879317696777 https://twitter.com/i/web/status/2019331879317696777"
X Link 2026-02-05T08:46Z 15.9K followers, [----] engagements
"What if a single framework could unify every AI agent from software chatbots to physical robots Dr. Hang Li from ByteDance proposes exactly that in a new JCST paper. It's a universal blueprint where agents use LLMs as their "brain" for reasoning are built via reinforcement learning and operate using tools & long-term memory to complete tasks. This general framework outperforms fragmented approaches providing a unified theory for agent development across both software and hardware domains. General framework of AI agents Paper: Our report: 📬 #PapersAccepted by Jiqizhixin"
X Link 2026-02-05T19:16Z 15.9K followers, [----] engagements
"vLLM-Omni is here Now you can serve a single AI model that handles text images video and audio without massive performance drops. vLLM-Omni introduces a fully disaggregated serving system that breaks down complex AI architectures into independent interconnected stages. Instead of forcing one engine to handle everything it treats each componentlike text generation or image creationas its own optimized stage with its own GPU resources and data routing. The results are a game changer reducing job completion times by up to [----] percent compared to traditional serving methods."
X Link 2026-02-06T01:38Z 15.9K followers, [----] engagements
"What if giving an AI more freedom to think actually makes it worse at reasoning Researchers from Tsinghua University and Alibaba reveal a "Flexibility Trap" in new Diffusion LLMs. These models can generate words in any order not just left-to-right. But they use that freedom to skip over hard crucial steps in a problem. By intentionally removing this arbitrary order and using a simpler training method (GRPO) their "JustGRPO" approach outperforms complex alternatives. It achieves state-of-the-art results like 89.1% on the GSM8K math benchmark. The Flexibility Trap: Why Arbitrary Order Limits"
X Link 2026-02-06T02:19Z 15.9K followers, [----] engagements
"Can the worlds best AI models solve research-level math problems that have never been seen on the internet A collaborative team from Stanford Harvard Yale UC Berkeley and Columbia just introduced First Proof. They curated ten original math questions drawn from actual research workflows to create a zero-contamination test that separates true reasoning from simple pattern matching. Below is an introduction from Terence Tao"
X Link 2026-02-07T02:18Z 15.9K followers, 12.3K engagements
"Can we truly trust a robot is autonomous if its movements are shaky or secretly controlled by a human Here is a new framework to fix the credibility gap in robotic manipulation. They developed Eval-Actions and AutoEval a system that replaces simple pass/fail scores with a deep analysis of movement quality and authenticity by looking at how smooth and safe a robot's actions really are. The model achieves an incredible [----] percent accuracy in spotting human teleoperation and significantly outperforms standard vision models in grading precision across complex tasks. Trustworthy Evaluation of"
X Link 2026-02-07T16:38Z 15.9K followers, [----] engagements
"What if AI could solve complex problems in its head without typing out every single step Researchers from Fudan University and Shanghai AI Laboratory present SIM-CoT. They fixed the instability of hidden reasoning by using a temporary guide during training that aligns internal thoughts with real-world logic. This prevents the models internal states from becoming cluttered allowing for fast invisible and accurate thinking without the usual overhead of long text outputs. The method boosts LLaMA-3.1 8B performance by 3% and beats traditional step-by-step reasoning on GPT-2 with 2.3x better token"
X Link 2026-02-08T14:48Z 15.9K followers, [----] engagements
"@XmaxAIOfficial A major milestone in Generative AI The leap from "watching" to "interacting" is finally here with Xmax X1. This real-time infinite-duration capability is exactly what the industry has been waiting for. Excited to see how it reshapes the future of digital content"
X Link 2026-02-08T16:30Z 15.9K followers, [----] engagements
"What if your AI robot could actually ask you for help when your instructions are too vague Researchers from the University of Science and Technology of China Shanghai AI Laboratory Zhejiang University and HKU present VL-LN Bench to bridge this gap. They introduced a new framework called Interactive Instance Goal Navigation which moves beyond simple command-following. Instead of guessing the agent uses active dialog to ask questions in natural language clarifying user intent and resolving uncertainty while navigating through complex environments. Tested on a massive dataset of 41000"
X Link 2026-02-09T02:57Z 15.9K followers, [----] engagements
"Want to cram a 1TB model into a single H200 server without sacrificing training accuracy The SGLang RL team InfiXAI Ant Group Slime and RadixArk Miles teams have successfully implemented an INT4 Quantization-Aware Training (QAT) workflow. Inspired by Kimi K2 they used a pseudo-quantization method during training combined with W4A16 inference. This keeps model weights at a tiny [--] bits while using standard high-precision cores for calculation ensuring the model remains as stable as full-precision versions. The result is a game-changer: 1TB-class models now fit on a single machine eliminating"
X Link 2026-02-09T09:02Z 15.9K followers, [----] engagements
"Can AI master the art of persuasion to win over even the toughest academic reviewers Researchers at the Hong Kong University of Science and Technology present RebuttalAgent to solve this exact problem. Instead of just mimicking academic language this system uses Theory of Mind to get inside a reviewers head and anticipate their concerns. It follows a strategic pipeline that models the reviewer's mental state and plans a specific persuasion strategy before ever writing a word of the response. The results are impressive: RebuttalAgent outperforms its base models by [----] percent and even beats"
X Link 2026-02-09T19:27Z 15.9K followers, [----] engagements
"Can AI solve the world's most famous "unsolved" math problems Google DeepMind UC Berkeley and an international team of researchers present Aletheia a math research agent built on Gemini. The system uses AI to systematically scan hundreds of complex conjectures filtering through potential proofs with natural language verification before sending the best candidates to human experts for final review. The team resolved [--] "open" problems from the Erds database generating [--] brand-new solutions and identifying [--] others that were actually solved in obscure corners of existing literature."
X Link 2026-02-10T02:31Z 15.9K followers, [----] engagements
"Have you ever wondered why LLMs funnel so much attention toward the very first token in a prompt Researchers from the University of Oxford AITHYRA and NYU including Yann LeCun just revealed that this phenomenon is actually a feature not a bug. The team discovered that attention sinks and data compression are two sides of the same coin both triggered by massive spikes in the models internal signals. They propose a new Mix-Compress-Refine theory: LLMs mix data early on squash it into a dense core in the middle layers and then refine it for the final output. This framework finally explains why"
X Link 2026-02-11T06:13Z 15.9K followers, [----] engagements
"🚀 Meet RLinf-USER: A Unified and Extensible System for Real-World Online Robot Training Tired of the Sim-to-Real" gap holding back embodied AI RLinf-USER is here to change the game. This unified system treats robots as first-class compute resources (just like GPUs) enabling seamless cloud-edge-terminal collaboration. By implementing a Fully Asynchronous Pipeline RLinf-USER eliminates idle time boosting training throughput by 5.7x. It allows robots to learn continuously in the physical world without waiting for computation bottlenecks. 📈Results RLinf-USER has successfully fine-tuned a"
X Link 2026-02-11T11:29Z 15.9K followers, [----] engagements
"Our https://mp.weixin.qq.com/s/4iPmPYghEzbWZeyO9jlD5w https://mp.weixin.qq.com/s/4iPmPYghEzbWZeyO9jlD5w"
X Link 2026-02-11T11:30Z 15.9K followers, [---] engagements
"Can an AI learn to choose the best way to think based on the specific image it sees Researchers from Fudan University and Alibaba introduced MoVT to do exactly that. Mixture-of-Visual-Thoughts (MoVT) unifies different reasoning styles into one model and uses a new reinforcement learning framework called AdaVaR to teach the AI how to pick the right logic for any given context. The method achieves consistent performance gains across diverse scenarios outperforming traditional models that rely on a single reasoning mode. Mixture-of-Visual-Thoughts:Exploring Context-Adaptive Reasoning Mode"
X Link 2026-02-12T03:17Z 15.9K followers, [----] engagements
"Does being a math genius make an AI better at understanding human intentions Researchers from Arizona State University and Microsoft Research Asia investigated whether the step-by-step logic used for coding helps AI master Theory of Mindthe ability to sense what others are thinking and feeling. The results show that more thinking time can actually cause social reasoning to collapse with advanced reasoning models often being outperformed by simpler ones. Unlike in math or code these models frequently rely on answer-matching shortcuts rather than true deduction proving that social intelligence"
X Link 2026-02-12T07:47Z 15.9K followers, [---] engagements
"Can diffusion models finally outperform traditional AI at writing code Researchers from Huazhong University of Science and Technology and ByteDance Seed just introduced Stable-DiffCoder. Instead of writing code one token at a time like standard models this method uses a block diffusion approach to generate and refine code chunks simultaneously resulting in more stable and structured programming. The results show it outperforms its autoregressive counterparts and various 8B-parameter models on major benchmarks specifically excelling in code editing logical reasoning and low-resource"
X Link 2026-02-13T03:23Z 15.9K followers, [----] engagements
"Can we trust AI agents to interact with the world safely without a clear way to diagnose their mistakes Shanghai Artificial Intelligence Laboratory presents AgentDoG It is a new diagnostic guardrail framework that monitors AI agents in real-time. Instead of just blocking risky moves with a simple yes or no it uses a specialized three-part system to explain the root cause of a danger and catch subtle "hidden" errors that other models miss. AgentDoG achieves state-of-the-art performance in safety moderation across complex interactive scenarios outperforming current guardrail models in both"
X Link 2026-02-13T16:27Z 15.9K followers, [---] engagements
"Test-time Recursive Thinking: Self-Improvement without External Feedback Paper: Code: https://github.com/EvanZhuang/test_time_recursive_thinking https://arxiv.org/pdf/2602.03094 https://github.com/EvanZhuang/test_time_recursive_thinking https://arxiv.org/pdf/2602.03094"
X Link 2026-02-05T08:46Z 15.9K followers, [----] engagements
"Huge Google DeepMind is mining activation functions using AlphaEvolve Instead of manually designing these components the team used Large Language Models to search through a massive space of Python functions to find the best logic for neural networks. The system is specifically designed to hunt for inductive biases meaning it looks for math that helps AI stay accurate even when it encounters data it has never seen before. The results show that this evolutionary approach can discover meaningful and robust activation functions using only small-scale synthetic datasets. These new functions"
X Link 2026-02-06T08:22Z 15.9K followers, [----] engagements
"Is your LLM training sabotaging itself on the toughest reasoning tasks Researchers from Beihang University UC Berkeley Peking University and Meituan have identified a fundamental bias in popular training methods like GRPO. They discovered that these models often miscalculate rewards systematically underestimating success on difficult problems while overvaluing easy ones. To fix this they developed HA-DW an adaptive system that recalibrates rewards by tracking how difficult a prompt is throughout the training process. The result HA-DW consistently outperforms standard GRPO across five"
X Link 2026-02-07T02:32Z 15.9K followers, [----] engagements
"Why do even the smartest AI models fail at simple logical puzzles Researchers from Caltech Stanford and Carleton College just released the first comprehensive map of LLM reasoning failures. They developed a new framework that breaks down AI blind spots into three clear categories: fundamental flaws in model design failures in specific tasks and performance that breaks when you change the wording. By unifying years of scattered research into one roadmap this study provides the essential blueprint for building more reliable consistent and logically sound AI systems"
X Link 2026-02-09T09:14Z 15.9K followers, [----] engagements
"Can one AI model truly master both understanding and generating images without sacrificing performance Researchers from Meituan Inc. just introduced STAR to solve this exact problem. The method works by stacking new learning layers on top of a frozen base model like building blocks. By separating tasks into stages like understanding and editing the model gains new skills without forgetting old ones or getting confused between different functions. The results are impressive setting new state-of-the-art records on the GenEval and DPG-Bench benchmarks. STAR proves that this modular approach"
X Link 2026-02-12T03:06Z 15.9K followers, [----] engagements
"📬 #PapersAccepted by Jiqizhixin is a tweet series where we share cutting-edge research contributed to our official Chinese platform. Reach millions of Chinese readers with your work for free. Submissions are welcome"
X Link 2025-08-06T00:08Z 15.9K followers, 21.4K engagements
"OpenAI just claimed to have found the culprit behind AI hallucinations"
X Link 2025-09-06T04:51Z 15.9K followers, 716.5K engagements
"Xmax X1 the first real-time interactive video model is here. Powered by autoregressive streaming generation X1 achieves millisecond ultra-low latency and infinite-length generation enabling truly natural spatial interaction. Camera is redefined. Its no longer just a lens but a magic wand that breaks the barrier between dimensions. Summon virtual beings into your reality and interact with them in real-time. Live implementation video is below truely amazing work Definitely one to watch. The boundary between reality and the virtual world is about to disappear. Imagine characters from Pokemon or"
X Link 2026-02-08T16:33Z 15.9K followers, 91.4K engagements
"What if we could generate high-fidelity images in a single step without needing a complex latent space Researchers from MIT and CMU have introduced pixel MeanFlow (pMF) to do exactly that. The method separates the AI training process from its final output. While the model learns by calculating how pixels move across a path it is designed to predict the final clean image directly on a low-dimensional manifold. This allows the network to jump straight from noise to a finished image in one go rather than following a noisy path step-by-step. The results are a breakthrough for one-step latent-free"
X Link 2026-02-10T18:34Z 15.9K followers, [----] engagements
"Attention Sinks and Compression Valleys in LLMs are Two Sides of the Same Coin Paper: https://arxiv.org/abs/2510.06477 https://arxiv.org/abs/2510.06477"
X Link 2026-02-11T06:13Z 15.9K followers, [---] engagements
"OpenClaw is cool but too large Researchers at the University of Hong Kong (HKUDS) just released nanobot to solve this exact problem. They transformed the massive OpenClaw system into a clean 4000-line Python framework that focuses on a simple loop: receive input let the AI think and execute tools like file management or web searches. It strips away complex abstractions to focus on clear modular function calls that any developer can understand. By slashing code complexity by [--] percent they achieved full functional parity with a 2-minute deployment time making it significantly easier to"
X Link 2026-02-12T17:13Z 15.9K followers, [----] engagements
"Can we make AI reasoning both smarter and [--] percent cheaper at the same time Researchers from UIUC and Amazon Web Services just introduced SAR to solve the problem of long-winded and expensive AI models. Instead of just checking if an answer is right or wrong Self-Aligned Reward (SAR) uses a new scoring system that rewards concise specific logic. It measures how much value an answer provides relative to the question teaching the model to eliminate filler text and focus on the actual solution. The results are a win-win: SAR boosts accuracy by [--] percent across [--] major benchmarks while slashing"
X Link 2026-02-15T04:13Z 15.9K followers, [----] engagements
"Could LLMs finally end the nightmare of manual data cleaning Researchers from SJTU Tsinghua Microsoft Research MIT and Alibaba present a comprehensive survey on the future of application-ready data preparation. They detail a massive paradigm shift from rigid rule-based code to smart agentic workflows that use LLMs to understand and organize messy datasets through natural language and context. This new approach outperforms traditional pipelines in flexibility and semantic reasoning across critical domains like data cleaning entity matching and automated dataset enrichment. Can LLMs Clean Up"
X Link 2026-02-15T19:16Z 15.9K followers, [----] engagements
"Ever wonder how AI agents actually behave when they go on a deep-dive search for you Researchers from Carnegie Mellon University University of Lisbon and NOVA University Lisbon analyzed [--] million real-world search requests to find out. They mapped the secret life of AI search agents by studying how they refine their queries and use evidence across millions of sessions. By tracking how new search terms are born from old results they identified the specific patterns that separate simple fact-finding from complex reasoning. The data shows agents are fast usually wrapping up in under ten steps"
X Link 2026-02-16T04:19Z 15.9K followers, [----] engagements
"Link: https://www.politico.eu/article/albania-apppoints-worlds-first-virtual-minister-edi-rama-diella/ https://www.politico.eu/article/albania-apppoints-worlds-first-virtual-minister-edi-rama-diella/"
X Link 2025-09-12T10:53Z [----] followers, [---] engagements
"Huawei proposed Tree-OPO They explore how MCTS trajectories can fuel Group Relative Policy Optimization (GRPO) enabling preference-based RL without value networks. By staging training with partially revealed rollouts they create tree-structured reward signals that better capture reasoning qualitywhile addressing pitfalls such as advantage saturation and reward collapse. 🚀 Early results: more stable updates deeper compositional reasoning but open challenges remain"
X Link 2025-09-13T01:01Z [----] followers, 11.6K engagements
"Tree-OPO: Off-policy Monte Carlo Tree-Guided Advantage Optimization for Multistep Reasoning Technical University Munich Huawei https://arxiv.org/abs/2509.09284 https://arxiv.org/abs/2509.09284 https://arxiv.org/abs/2509.09284 https://arxiv.org/abs/2509.09284"
X Link 2025-09-13T01:01Z [----] followers, [---] engagements
"How do you fix RAGs biggest flawsnoisy retrievals & broken multi-hop reasoning 📚🤯 A new framework EviNote-RAG adds a structured retrieve note answer pipeline. Instead of raw passages the model writes concise Supportive-Evidence Notes (SENs) and gets feedback from an Evidence Quality Reward to ensure evidence truly supports the answer. ⚡ On HotpotQA Bamboogle and 2Wiki it boosts F1 by up to +91% while staying robust & efficient"
X Link 2025-09-13T01:16Z [----] followers, [----] engagements
"📬 #PapersAccepted by Jiqizhixin Our report: EviNote-RAG: Enhancing RAG Models via Answer-Supportive Evidence Notes Tsinghua University Zhejiang University Ant Group and others Paper: Code: https://github.com/Dalyuqin/EviNoteRAG https://arxiv.org/abs/2509.00877v1 https://mp.weixin.qq.com/s/FOo38trc3OSEx2EXqVwyFA https://github.com/Dalyuqin/EviNoteRAG https://arxiv.org/abs/2509.00877v1 https://mp.weixin.qq.com/s/FOo38trc3OSEx2EXqVwyFA https://github.com/Dalyuqin/EviNoteRAG https://arxiv.org/abs/2509.00877v1 https://mp.weixin.qq.com/s/FOo38trc3OSEx2EXqVwyFA"
X Link 2025-09-13T01:16Z [----] followers, [---] engagements
"Another blog gem from Dr. Wolfe Online versus Offline RL for LLMs Link: https://cameronrwolfe.substack.com/p/online-rl https://cameronrwolfe.substack.com/p/online-rl https://cameronrwolfe.substack.com/p/online-rl https://cameronrwolfe.substack.com/p/online-rl"
X Link 2025-09-13T23:05Z [----] followers, 14.4K engagements
"What if fragmented knowledge could be turned into structured reasoning power for LLMs A new paper introduces Youtu-GraphRAG a vertically unified agentic paradigm that: - Builds expandable seed-graph schemas - Detects communities via structure + semantics - Uses agentic retrievers for reflective query decomposition - Tackles knowledge leakage with anonymity reversion 📊 Results: up to 90.7% token cost savings & +16.6% accuracy vs SOTA"
X Link 2025-09-15T06:29Z [----] followers, [----] engagements
"📬 #PapersAccepted by Jiqizhixin Our report: Youtu-GraphRAG: Vertically Unified Agents for Graph Retrieval-Augmented Complex Reasoning Tencent Youtu Lab Monash University The Hong Kong Polytechnic University Paper: Code: Data: https://huggingface.co/datasets/Youtu-Graph/AnonyRAG https://github.com/TencentCloudADP/Youtu-GraphRAG https://arxiv.org/pdf/2508.19855 https://mp.weixin.qq.com/s/vm6yuZi4PIs2wizycnnfRg https://arxiv.org/pdf/2508.19855 https://mp.weixin.qq.com/s/vm6yuZi4PIs2wizycnnfRg https://huggingface.co/datasets/Youtu-Graph/AnonyRAG https://github.com/TencentCloudADP/Youtu-GraphRAG"
X Link 2025-09-15T06:29Z [----] followers, [---] engagements
"How close are we to real-time interactive digital humans A Kling paper introduces an autoregressive + diffusion framework for video generation that: - Accepts multimodal controls (audio pose text) - Runs in streaming mode with low latency - Uses a [--] compression autoencoder to ease long-horizon inference - Trains on [-----] hours of dialogue data Results: duplex conversations multilingual human synthesis & interactive world models all with efficient fine-grained control. A leap toward practical digital human systems"
X Link 2025-09-15T06:44Z [----] followers, [----] engagements
"📬 #PapersAccepted by Jiqizhixin Our report: MIDAS: Multimodal Interactive Digital-humAn Synthesis via Real-time Autoregressive Video Generation Kuaishou Technology Zhejiang University Tsinghua University Paper: Page: https://chenmingthu.github.io/milm/ https://arxiv.org/pdf/2508.19320 https://mp.weixin.qq.com/s/2pfS1zGF8OBeVtjtmosnYw https://arxiv.org/pdf/2508.19320 https://mp.weixin.qq.com/s/2pfS1zGF8OBeVtjtmosnYw https://chenmingthu.github.io/milm/ https://arxiv.org/pdf/2508.19320 https://mp.weixin.qq.com/s/2pfS1zGF8OBeVtjtmosnYw https://arxiv.org/pdf/2508.19320"
X Link 2025-09-15T06:44Z [----] followers, [---] engagements
"Just in: Chinas State Administration for Market Regulation (SAMR) has launched a further investigation into NVIDIA. Reason: alleged violations of Chinas Anti-Monopoly Law and conditions attached to its Mellanox acquisition"
X Link 2025-09-15T08:43Z [----] followers, [----] engagements
"📬 #PapersAccepted by Jiqizhixin Our report: UQ: Assessing Language Models on Unsolved Questions Stanford University University of Washington UNC Rutgers University Contextual AI Paper: Project: https://uq.stanford.edu/ https://arxiv.org/pdf/2508.17580v1 https://mp.weixin.qq.com/s/7qqhCcSInFIb0SERkY9GqQ https://uq.stanford.edu/ https://arxiv.org/pdf/2508.17580v1 https://mp.weixin.qq.com/s/7qqhCcSInFIb0SERkY9GqQ https://uq.stanford.edu/ https://arxiv.org/pdf/2508.17580v1 https://mp.weixin.qq.com/s/7qqhCcSInFIb0SERkY9GqQ https://uq.stanford.edu/ https://arxiv.org/pdf/2508.17580v1"
X Link 2025-09-15T08:51Z [----] followers, [---] engagements
"Why does robotics still lag behind AI despite huge hardware leaps 🦾🤖 The real bottleneck is software: fragmented stacks C++ hurdles and steep learning curves. Enter ARKan open-source Python-first robotics framework with Gym-style APIs imitation-learning support (ACT Diffusion Policy) seamless sim-to-real and native ROS interoperability. With reusable modules & full docs ARK makes robotics as convenient as modern MLlowering barriers and speeding up real-world autonomy"
X Link 2025-09-16T06:53Z [----] followers, [----] engagements
"📬 #PapersAccepted by Jiqizhixin Our report: Ark: An Open-source Python-based Framework for Robot Learning Technical University of Darmstadt Huawei Noahs Ark Imperial College London and others Paper: Code: Tutorial: https://arkrobotics.notion.site/Ark-Home-22be053d9c6f8096bcdbefd6276aba61 https://github.com/Robotics-Ark https://arxiv.org/pdf/2506.21628 https://mp.weixin.qq.com/s/8oKsS9NZCT7O2scODSU4eA https://arxiv.org/pdf/2506.21628 https://mp.weixin.qq.com/s/8oKsS9NZCT7O2scODSU4eA https://arkrobotics.notion.site/Ark-Home-22be053d9c6f8096bcdbefd6276aba61 https://github.com/Robotics-Ark"
X Link 2025-09-16T06:53Z [----] followers, [---] engagements
"OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning With this you can make multimodal models simpler and still stay competitive 👀📉 OpenVision [--] is a streamlined visual encoder that drops the text encoder and contrastive losskeeping only a generative captioning loss. Using ViT-L/14 it cuts training time by [---] (83h 57h) and memory by [---] (24.5GB 13.8GB) while still matching benchmarks. These efficiency gains unlock scaling to 1B+ parameters making generative-only training a promising path for future vision encoders"
X Link 2025-09-16T08:19Z [----] followers, [----] engagements
"📬 #PapersAccepted by Jiqizhixin Our report: OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning University of California Santa Cruz Apple University of California Berkeley Paper: Project: https://ucsc-vlaa.github.io/OpenVision2/ https://arxiv.org/abs/2509.01644 https://mp.weixin.qq.com/s/hJEwRF4kW7Y4r8kpk5iXKg https://arxiv.org/abs/2509.01644 https://mp.weixin.qq.com/s/hJEwRF4kW7Y4r8kpk5iXKg https://ucsc-vlaa.github.io/OpenVision2/ https://arxiv.org/abs/2509.01644 https://mp.weixin.qq.com/s/hJEwRF4kW7Y4r8kpk5iXKg https://arxiv.org/abs/2509.01644"
X Link 2025-09-16T08:19Z [----] followers, [---] engagements
"🚀 Google just launched Agent Payments Protocol (AP2) an open standard for the agent economy. Built on A2A AP2 enables secure reliable and interoperable agent commerce for developers merchants & the payments industry"
X Link 2025-09-17T01:51Z [----] followers, [----] engagements
"DeepSeeks paper DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning was featured on the cover of Nature with the companys CEO Wenfeng Liang as the corresponding author"
X Link 2025-09-18T00:40Z [----] followers, [----] engagements
"Are LLMs hitting a reasoning ceiling because of how we scale them 🤔💡 A @Tsinghua_Uni study identifies Tunnel Vision: sequential thought forces early mistakes to lock models into suboptimal reasoning. Enter ParaThinker: it trains LLMs to generate multiple reasoning paths in parallel and synthesize the best answer. Results: +12.3% accuracy for 1.5B and +7.5% for 7B models (8 parallel paths) Only +7.1% latency Smaller models outperform larger ones by thinking in parallel"
X Link 2025-09-19T07:14Z [----] followers, [----] engagements
"📬 #PapersAccepted by Jiqizhixin Our ParaThinker: Native Parallel Thinking as a New Paradigm to Scale LLM Test-time Compute Tsinghua University Paper: Code: https://github.com/MobileLLM/ParaThinker https://arxiv.org/pdf/2509.04475 https://mp.weixin.qq.com/s/jkDVYxiuplNrFfZYKLjkxg https://github.com/MobileLLM/ParaThinker https://arxiv.org/pdf/2509.04475 https://mp.weixin.qq.com/s/jkDVYxiuplNrFfZYKLjkxg https://github.com/MobileLLM/ParaThinker https://arxiv.org/pdf/2509.04475 https://mp.weixin.qq.com/s/jkDVYxiuplNrFfZYKLjkxg https://github.com/MobileLLM/ParaThinker"
X Link 2025-09-19T07:14Z [----] followers, [---] engagements
"Parallel thinking can make LLMs better reasonersnot just imitators Introducing Parallel-R1 the first RL framework that trains LLMs to think in parallel for complex reasoning tasks. 🔑 How it works: - Start with SFT on easier tasks instill parallel thinking - Switch to RL explore & generalize on harder problems - Curriculum avoids the cold-start trap 🚀 Results: - +8.4% accuracy over sequential RL models on math (MATH AMC23 AIME) - On AIME25 yields +42.9% over baseline - Models shift from using parallel paths for exploration to multi-perspective verification"
X Link 2025-09-19T07:24Z [----] followers, [----] engagements
"📬 #PapersAccepted by Jiqizhixin Our Parallel-R1: Towards Parallel Thinking via Reinforcement Learning Tencent AI Lab Seattle University of Maryland College Park and others Paper: Code: (Coming Soon) Project: https://zhengkid.github.io/Parallel_R1.github.io/ https://github.com/zhengkid/Parallel-R1 https://arxiv.org/abs/2509.07980 https://mp.weixin.qq.com/s/s1IS-8lHQi6VvXqExi0V9A https://github.com/zhengkid/Parallel-R1 https://arxiv.org/abs/2509.07980 https://mp.weixin.qq.com/s/s1IS-8lHQi6VvXqExi0V9A https://zhengkid.github.io/Parallel_R1.github.io/ https://github.com/zhengkid/Parallel-R1"
X Link 2025-09-19T07:24Z [----] followers, [---] engagements
"How do you make TTS both natural and precisely timed for dubbing or sync 🎙 Meet IndexTTS2an autoregressive model with novel duration control: - Mode 1: specify token count exact speech length - Mode 2: free AR generation natural prosody preserved ✨ Extra features: - Disentangles emotion vs. timbre independent control - Zero-shot: nails target timbre & emotion from prompts - Uses GPT latents + 3-stage training for clarity under strong emotion - Soft instruction (via Qwen3) makes emotional control as easy as text 🚀 Outperforms SOTA zero-shot TTS in WER speaker similarity and emotional fidelity"
X Link 2025-09-19T08:17Z [----] followers, [----] engagements
"📬 #PapersAccepted by Jiqizhixin Our IndexTTS2: A Breakthrough in Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech Bilibili Paper: GitHub: Demos: https://index-tts.github.io/index-tts2.github.io/ https://github.com/index-tts/index-tts https://arxiv.org/abs/2506.21619 https://mp.weixin.qq.com/s/T9ol2agv2oB_pac15iQHzQ https://github.com/index-tts/index-tts https://arxiv.org/abs/2506.21619 https://mp.weixin.qq.com/s/T9ol2agv2oB_pac15iQHzQ https://index-tts.github.io/index-tts2.github.io/ https://github.com/index-tts/index-tts"
X Link 2025-09-19T08:17Z [----] followers, [---] engagements
"MIT just built PDDL-Instruct designed to enhance LLMs symbolic planning capabilities. They teach LLMs logical chain-of-thought reasoning for PDDL covering: ✅ Action applicability 🔄 State transitions 🔍 Plan validity Results: Up to 94% accuracy on benchmarks a +66% absolute gain over baselines. Bridging LLM reasoning precise AI planning. Teaching LLMs to Plan: Logical Chain-of-Thought Instruction Tuning for Symbolic Planning MIT CSAIL Microsoft AI https://arxiv.org/abs/2509.13351 https://arxiv.org/abs/2509.13351"
X Link 2025-09-21T09:03Z [----] followers, 11.6K engagements
"LLM safety needs continual unlearning but repeated unlearning often wrecks utility. Enter: ALKN (Adaptive Localization of Knowledge Negation) ALKN tackles this with: - Dynamic masking of gradients - Adaptive unlearning intensity - Task-aware parameter updates 📊 On [--] benchmarks ALKN beats baselines in both forgetting harmful knowledge and retaining utility"
X Link 2025-09-22T01:42Z [----] followers, [----] engagements
"📬 #PapersAccepted by Jiqizhixin Our report: Adaptive Localization of Knowledge Negation for Continual LLM Unlearning Tsinghua University BNRist and others Paper: Code: https://github.com/zaocan666/ALKN https://openreview.net/pdfid=tcK4PV3VN4 https://mp.weixin.qq.com/s/izPa_zqG4eVWpMoG8i3KIA https://github.com/zaocan666/ALKN https://openreview.net/pdfid=tcK4PV3VN4 https://mp.weixin.qq.com/s/izPa_zqG4eVWpMoG8i3KIA https://github.com/zaocan666/ALKN https://openreview.net/pdfid=tcK4PV3VN4 https://mp.weixin.qq.com/s/izPa_zqG4eVWpMoG8i3KIA https://github.com/zaocan666/ALKN"
X Link 2025-09-22T01:42Z [----] followers, [---] engagements
"Reasoning in speech models is still early existing Think-then-Speak approaches add latency. 🗣 New work: Mini-Omni-Reasoner Mini-Omni-Reasoner introduces Thinking-in-Speaking: - Interleaves silent reasoning + spoken tokens - Hierarchical ThinkerTalker architecture - New Spoken-Math-Problems-3M dataset Results: +19.1% arithmetic reasoning +6.4% contextual understanding with zero latency"
X Link 2025-09-22T03:36Z [----] followers, [----] engagements
"📬 #PapersAccepted by Jiqizhixin Our report: Mini-Omni-Reasoner: Token-Level Thinking-in-Speaking in Large Speech Models Nanyang Technological University National University of Singapore Tencent Paper: Project: https://github.com/xzf-thu/Mini-Omni-Reasoner https://arxiv.org/pdf/2508.15827 https://mp.weixin.qq.com/s/HnmKxecS4ZzrVj-rQC4Qrw https://arxiv.org/pdf/2508.15827 https://mp.weixin.qq.com/s/HnmKxecS4ZzrVj-rQC4Qrw https://github.com/xzf-thu/Mini-Omni-Reasoner https://arxiv.org/pdf/2508.15827 https://mp.weixin.qq.com/s/HnmKxecS4ZzrVj-rQC4Qrw https://arxiv.org/pdf/2508.15827"
X Link 2025-09-22T03:36Z [----] followers, [---] engagements
"NVIDIA built NCCL for high-performance topology-aware collective ops. Now a coalition in China has introduced VCCL (Venus Collective Communication Library) already deployed in production clusters. Best part Its fully open-source 🚀 - Smarter scheduling offloads comms to CPUs better GPU utilization (up to +26% efficiency on dense models with Megatron-LM). - Built-in fault tolerance cuts cluster failure rates by 50%+ keeping large-scale training resilient. - Fine-grained Flow Telemetry gives micro-level observability to spot slow nodes/links in real time"
X Link 2025-09-22T03:54Z [----] followers, [----] engagements
"Huge potential Apple and Stanford have just released Synthetic Bootstrapped Pretraining (SBP). Standard LM pretraining = token correlations in one doc. SBP = learns inter-document relations synthesizes a huge new corpus for joint training. ✨ Pretrained 3B model on 1T tokens 📈 Outperforms repetition baseline ⚡ Approaches oracle gains from [--] more unique data 🧠 Synthesized docs = concept abstraction + fresh narration SBP can leverage document relationships for richer more data-efficient LMs"
X Link 2025-09-22T06:18Z [----] followers, [----] engagements
"Synthetic bootstrapped pretraining Apple Stanford University Paper: https://arxiv.org/abs/2509.15248 https://arxiv.org/abs/2509.15248 https://arxiv.org/abs/2509.15248 https://arxiv.org/abs/2509.15248"
X Link 2025-09-22T06:19Z [----] followers, [---] engagements
"Faster smarter and more realistic virtual world creation is now possible Enter: LatticeWorld A lightweight yet powerful 3D world generation framework that fuses LLMs (LLaMA-2-7B) + Unreal Engine [--]. 📝 Text + visual instructions 🌐 large-scale interactive 3D worlds 🤖 Dynamic multi-agent interaction ⚡ High-fidelity physics & real-time rendering 📊 Boosts production efficiency while keeping creative quality"
X Link 2025-09-22T08:12Z [----] followers, [----] engagements
"Technical framework of LatticeWorld"
X Link 2025-09-22T08:12Z [----] followers, [---] engagements
"📬 #PapersAccepted by Jiqizhixin Our report: LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation NetEase Beihang University and others Paper: https://arxiv.org/pdf/2509.05263 https://mp.weixin.qq.com/s/KUZj2FN93PEBmagehv9lXg https://arxiv.org/pdf/2509.05263 https://mp.weixin.qq.com/s/KUZj2FN93PEBmagehv9lXg https://arxiv.org/pdf/2509.05263 https://mp.weixin.qq.com/s/KUZj2FN93PEBmagehv9lXg https://arxiv.org/pdf/2509.05263 https://mp.weixin.qq.com/s/KUZj2FN93PEBmagehv9lXg"
X Link 2025-09-22T08:12Z [----] followers, [---] engagements
"Current GRPO-based alignment for image/video gen is slow & inefficient. BranchGRPO fixes this with: [--] Branching rollouts shared prefixes + diverse exploration [--] Reward fusion dense step-level credit assignment [--] Smart pruning cut gradient cost keep exploration 📊 Results: - +16% alignment vs DanceGRPO - 55% faster training (4.7 with BranchGRPO-Mix) - Sharper more consistent video frames"
X Link 2025-09-23T03:15Z [----] followers, [----] engagements
"📬 #PapersAccepted by Jiqizhixin Our report: BranchGRPO: Stable and Efficient GRPO with Structured Branching in Diffusion Models Peking University Beijing Normal University ByteDance Paper: Project: Code: https://github.com/Fredreic1849/BranchGRPO https://fredreic1849.github.io/BranchGRPO-Webpage/ https://arxiv.org/pdf/2509.06040 https://mp.weixin.qq.com/s/AvV0WVQcopmvy-BeAzOD4A https://fredreic1849.github.io/BranchGRPO-Webpage/ https://arxiv.org/pdf/2509.06040 https://mp.weixin.qq.com/s/AvV0WVQcopmvy-BeAzOD4A https://github.com/Fredreic1849/BranchGRPO"
X Link 2025-09-23T03:15Z [----] followers, [---] engagements
"📬 #PapersAccepted by Jiqizhixin Our report: Stable Part Diffusion 4D: Multi-View RGB and Kinematic Parts Video Generation Stability AI University of Illinois Urbana-Champaign Paper: Project: https://stablepartdiffusion4d.github.io/ https://arxiv.org/pdf/2509.10687 https://mp.weixin.qq.com/s/mON11i3O2j4-RlnG-t3n_Q https://stablepartdiffusion4d.github.io/ https://arxiv.org/pdf/2509.10687 https://mp.weixin.qq.com/s/mON11i3O2j4-RlnG-t3n_Q https://stablepartdiffusion4d.github.io/ https://arxiv.org/pdf/2509.10687 https://mp.weixin.qq.com/s/mON11i3O2j4-RlnG-t3n_Q"
X Link 2025-09-23T03:39Z [----] followers, [---] engagements
"Apple has just unified the tokenizer for vision. AToken is the first unified visual tokenizer capable of simultaneously achieving high-fidelity reconstruction and semantic understanding across images videos and 3D assets. Unlike existing tokenizerswhich are limited to a single modality and focus exclusively on either reconstruction or understandingAToken encodes diverse visual inputs into a shared 4D latent space unifying tasks and modalities within a single framework. Truly impressive research"
X Link 2025-09-23T03:55Z [----] followers, [----] engagements
"Our report: AToken: A Unified Tokenizer for Vision Apple Paper: https://arxiv.org/pdf/2509.14476 https://mp.weixin.qq.com/s/EsfNO75XmOJjRqj_hIF2XA https://arxiv.org/pdf/2509.14476 https://mp.weixin.qq.com/s/EsfNO75XmOJjRqj_hIF2XA https://arxiv.org/pdf/2509.14476 https://mp.weixin.qq.com/s/EsfNO75XmOJjRqj_hIF2XA https://arxiv.org/pdf/2509.14476 https://mp.weixin.qq.com/s/EsfNO75XmOJjRqj_hIF2XA"
X Link 2025-09-23T03:55Z [----] followers, [---] engagements
"Video diffusion models have strong priors but struggle with controllability & geometric consistency. Retraining is costly. Enter: WorldForge training-free & inference-time only. Key modules: - Intra-Step Recursive Refinement precise trajectory injection - Flow-Gated Latent Fusion decouple motion vs appearance w/ optical flow - Dual-Path Self-Corrective Guidance fix trajectory drift adaptively 📊 Results: sharper more realistic videos with accurate motion control no retraining needed"
X Link 2025-09-25T02:09Z [----] followers, [---] engagements
"VLA models are powerful for robotics but slowed by heavy attention over tons of visual tokens. LightVLA = adaptive differentiable token pruning: 🔍 Query-based importance scoring 🎲 Gumbel softmax for end-to-end pruning ⚡ No heuristic thresholds no extra params 📊 On LIBERO: 🚀 59.1% FLOPs 38.2% latency 📈 +2.6% task success rate"
X Link 2025-09-25T06:10Z [----] followers, [---] engagements
"📬 #PapersAccepted by Jiqizhixin Our report: The Better You Learn The Smarter You Prune: Towards Efficient Vision-language-action Models via Differentiable Token Pruning LiAuto Tsinghua University Chinese Academy of Sciences Paper: Project: https://liauto-research.github.io/LightVLA/ https://arxiv.org/abs/2509.12594 https://mp.weixin.qq.com/s/qzozyaHoEqFzCKLkZX-qNg https://mp.weixin.qq.com/s/qzozyaHoEqFzCKLkZX-qNg https://liauto-research.github.io/LightVLA/ https://arxiv.org/abs/2509.12594 https://mp.weixin.qq.com/s/qzozyaHoEqFzCKLkZX-qNg https://mp.weixin.qq.com/s/qzozyaHoEqFzCKLkZX-qNg"
X Link 2025-09-25T06:10Z [----] followers, [---] engagements
"Federated learning suffers when clients have feature drift (different feature spaces) weak feature extraction & poor classification. FedPall fixes this with: - Prototype-based adversarial learning unify feature spaces - Collaborative learning strengthen class info - Mixed features (global prototypes + local features) boost global classifier 📊 On [--] feature-drifted datasets FedPall consistently outperforms baselines for classification under FL. A step toward robust FL under real-world heterogeneity"
X Link 2025-09-25T06:45Z [----] followers, [---] engagements
"📬 #PapersAccepted by Jiqizhixin Our report: FedPall: Prototype-based Adversarial and Collaborative Learning for Federated Learning with Feature Drift Shenzhen MSU-BIT University Chinese Academy of Sciences and Beijing Institute of Technology Paper: Project: Dataset: https://drive.google.com/drive/folders/1xLxaz3zJRqZbTVDzkoAoWZiX50gwZI_4 https://github.com/DistriAI/FedPall https://arxiv.org/abs/2507.04781 https://mp.weixin.qq.com/s/j6YzLsemSn_Zoh07R0zDvA https://mp.weixin.qq.com/s/j6YzLsemSn_Zoh07R0zDvA https://drive.google.com/drive/folders/1xLxaz3zJRqZbTVDzkoAoWZiX50gwZI_4"
X Link 2025-09-25T06:45Z [----] followers, [---] engagements
"What if your favorite text-to-image model could be secretly manipulated by hidden backdoors Enter: NaviT2I. It's an input-level defense that detects malicious triggers by observing how tokens cause unusual neuron activation in the very first steps of diffusion. Tested on multiple datasets backdoor types and model architectures (UNet DiT) NaviT2I consistently outperforms existing defenseseven under adaptive attacks"
X Link 2025-09-25T07:11Z [----] followers, [---] engagements
"📬 #PapersAccepted by Jiqizhixin Our report: Efficient Input-level Backdoor Defense on Text-to-Image Synthesis via Neuron Activation Variation Peking University National University of Singapore and others Paper: Code: https://github.com/zhaisf/NaviT2I https://arxiv.org/abs/2503.06453 https://mp.weixin.qq.com/s/yaiDxLU0DYd1-F5A-f8kCg https://arxiv.org/abs/2503.06453 https://mp.weixin.qq.com/s/yaiDxLU0DYd1-F5A-f8kCg https://github.com/zhaisf/NaviT2I https://arxiv.org/abs/2503.06453 https://mp.weixin.qq.com/s/yaiDxLU0DYd1-F5A-f8kCg https://arxiv.org/abs/2503.06453"
X Link 2025-09-25T07:11Z [----] followers, [---] engagements
"How do you make virtual worlds look realwith consistent lighting and textureseven in long dynamic videos 🎥✨ TC-Light is here It's a generative renderer for illumination & texture editing across domains. It first aligns global lighting via appearance embeddings then refines details using a Unique Video Tensor (UVT) for canonical video representation. On a new benchmark of long dynamic videos TC-Light delivers physically plausible re-rendering with strong temporal coherence and low computation costoutperforming existing video relighting and world-generation methods"
X Link 2025-09-26T01:55Z [----] followers, [---] engagements
"Why do multimodal LLMs still stumble on geometry problems despite excelling elsewhere 📐🤔 A paper tackles this by introducing RLVR (Reinforcement Learning with Verifiable Rewards) into the data pipeline. Instead of rigid templates RLVR refines captions for synthetic geometric images (from [--] base relations) using math-based reward signalscapturing the essence of geometric reasoning. The result: stronger task generalization non-trivial gains on geometry benchmarks and even accuracy boosts on out-of-distribution tasksfrom MathVista & MathVerse to broader MMMU domains like art design and"
X Link 2025-09-26T03:11Z [----] followers, [---] engagements
"📬 #PapersAccepted by Jiqizhixin Our report: Generalizable Geometric Image Caption Synthesis UIUC Shanghai Jiao Tong University Rutgers University NVIDIA Paper: Code: Hugging Face: https://huggingface.co/datasets/ScaleMath/GeoReasoning https://github.com/MachinePhoenix/GeoReasoning https://arxiv.org/abs/2509.15217 https://mp.weixin.qq.com/s/f2e-k_taiioFW0WcUfYPPw https://github.com/MachinePhoenix/GeoReasoning https://arxiv.org/abs/2509.15217 https://mp.weixin.qq.com/s/f2e-k_taiioFW0WcUfYPPw https://huggingface.co/datasets/ScaleMath/GeoReasoning https://github.com/MachinePhoenix/GeoReasoning"
X Link 2025-09-26T03:11Z [----] followers, [---] engagements
"How can LLMs adapt to dynamic scenario-specific rulesbalancing safety and usefulness at the same time Enter: Align3 a lightweight method using Test-Time Deliberation with hierarchical reflection and revision to reason over evolving behavioral and safety specifications. To measure progress the authors present SpecBench: [--] scenarios [---] specifications [----] prompts. Findings from [--] reasoning and [--] instruct models: [--] Test-time deliberation improves specification alignment [--] Align3 pushes the safetyhelpfulness frontier with minimal cost [--] SpecBench exposes hidden alignment gaps Together this"
X Link 2025-09-28T08:24Z [----] followers, [----] engagements
"📬 #PapersAccepted by Jiqizhixin Our report: Reasoning over Boundaries: Enhancing Specification Alignment via Test-time Delibration Shanghai Jiao Tong University The Chinese University of Hong Kong and others Paper: Code & Data: https://github.com/zzzhr97/SpecBench https://arxiv.org/abs/2509.14760 https://mp.weixin.qq.com/s/tw15KvvsQ5CEk9eSK9NcmA https://arxiv.org/abs/2509.14760 https://mp.weixin.qq.com/s/tw15KvvsQ5CEk9eSK9NcmA https://github.com/zzzhr97/SpecBench https://arxiv.org/abs/2509.14760 https://mp.weixin.qq.com/s/tw15KvvsQ5CEk9eSK9NcmA https://arxiv.org/abs/2509.14760"
X Link 2025-09-28T08:24Z [----] followers, [---] engagements
"What if you could use AI to build bug-finding tools instead of just looking for bugs KNighter does exactly that for the Linux kernel automatically creating high-precision analyzers that have already uncovered [--] critical long-hidden bugs. Key idea: use LLMs to generate specialized checkers validate them against real patches and refine iteratively to cut false positives. A paradigm shift in static analysis"
X Link 2025-09-28T08:36Z [----] followers, [---] engagements
"📬 #PapersAccepted by Jiqizhixin Our report: KNighter: Transforming Static Analysis with LLM-Synthesized Checkers University of Illinois Urbana-Champaign Zhejiang University and others Paper: Code: https://github.com/ise-uiuc/KNighter https://arxiv.org/pdf/2503.09002 https://mp.weixin.qq.com/s/6ayMsGjfhft7KdpPfG4kSA https://github.com/ise-uiuc/KNighter https://arxiv.org/pdf/2503.09002 https://mp.weixin.qq.com/s/6ayMsGjfhft7KdpPfG4kSA https://github.com/ise-uiuc/KNighter https://arxiv.org/pdf/2503.09002 https://mp.weixin.qq.com/s/6ayMsGjfhft7KdpPfG4kSA https://github.com/ise-uiuc/KNighter"
X Link 2025-09-28T08:36Z [----] followers, [---] engagements
"Huazhong University of Science and Technology & Xiaomi EV present Genesis a unified framework for jointly generating multi-view driving videos and LiDAR sequences with spatio-temporal and cross-modal consistency. Highlights: - Two-stage design: DiT-based video diffusion + BEV-aware LiDAR generator with NeRF rendering - Shared latent space: keeps video and LiDAR tightly aligned - DataCrafter: a vision-language captioning module for scene- & instance-level semantic guidance On the nuScenes benchmark Genesis sets new SOTA (FVD [-----] FID [----] Chamfer 0.611) and improves downstream tasks like"
X Link 2025-09-28T08:47Z [----] followers, [----] engagements
"📬 #PapersAccepted by Jiqizhixin Our report: Genesis: Multimodal Driving Scene Generation with Spatio-Temporal and Cross-Modal Consistency Huazhong University of Science and Technology & Xiaomi EV Paper: Code: Project: https://xiaomi-research.github.io/genesis/ https://github.com/xiaomi-research/genesis https://arxiv.org/abs/2506.07497 https://mp.weixin.qq.com/s/np5US9uEb72KUbi2-KIoTg https://arxiv.org/abs/2506.07497 https://mp.weixin.qq.com/s/np5US9uEb72KUbi2-KIoTg https://xiaomi-research.github.io/genesis/ https://github.com/xiaomi-research/genesis https://arxiv.org/abs/2506.07497"
X Link 2025-09-28T08:47Z [----] followers, [---] engagements
"Meta has taken a significant step forward in Agentic Recommender Systems: RecoWorld. It's a simulated playground for agentic recommender systems. 🤝 Dual-view setup: a user simulator + recommender agent in multi-turn dialogue. 🧠 Users give feedback & reflective instructions agents adapt via reasoning traces. 🎯 Enables safe RL training diverse content modes and even multi-agent user sims"
X Link 2025-09-30T04:08Z [----] followers, [---] engagements
"📬 #PapersAccepted by Jiqizhixin Our report: RecoWorld: Building Simulated Environments for Agentic Recommender Systems Meta Modern Recommendation System (MRS) Paper: https://arxiv.org/abs/2509.10397 https://mp.weixin.qq.com/s/J2qoNk3UCRlT1wHLsvOHTA https://arxiv.org/abs/2509.10397 https://mp.weixin.qq.com/s/J2qoNk3UCRlT1wHLsvOHTA https://arxiv.org/abs/2509.10397 https://mp.weixin.qq.com/s/J2qoNk3UCRlT1wHLsvOHTA https://arxiv.org/abs/2509.10397 https://mp.weixin.qq.com/s/J2qoNk3UCRlT1wHLsvOHTA"
X Link 2025-09-30T04:08Z [----] followers, [---] engagements
"Is your Bayesian Optimization slow to converge 🥱 You might be using the wrong recipe This NeurIPS [----] paper introduces CAKE 🍰 Context-Aware Kernel Evolution a freshly baked method that uses Large Language Models (LLMs) as genetic operators to adaptively evolve the perfect Gaussian Process kernel for your problem. CAKE consistently serves up state-of-the-art results across diverse tasks like hyperparameter tuning robotics and even photonic chip design converging faster with less data. Bon apptit"
X Link 2025-09-30T04:31Z [----] followers, [---] engagements
"Adaptive Kernel Design for Bayesian Optimization Is a Piece of CAKE with LLMs The Chinese University of Hong Kong Shenzhen and others Paper: Code: https://github.com/richardcsuwandi/cake https://www.alphaxiv.org/abs/2509.17998v2 https://github.com/richardcsuwandi/cake https://www.alphaxiv.org/abs/2509.17998v2 https://github.com/richardcsuwandi/cake https://www.alphaxiv.org/abs/2509.17998v2 https://github.com/richardcsuwandi/cake https://www.alphaxiv.org/abs/2509.17998v2"
X Link 2025-09-30T04:31Z [----] followers, [--] engagements
"Seoul National University & Adobe Present: Program Synthesis via Test-Time Transduction A formulation that explicitly leverages test inputs during synthesis unlike prior approaches that only generalize from training examples. Key idea: treat synthesis as active learning over a finite hypothesis space. - Use an LLM to predict outputs for selected test inputs - Eliminate inconsistent hypotheses - Pick inputs with a greedy maximin algorithm to minimize queries Tested on Playgol MBPP+ 1D-ARC and MiniGrid the method boosts both accuracy and efficiencyshowing that robustness in program synthesis"
X Link 2025-09-30T05:09Z [----] followers, [---] engagements
"This is huge A UCLA team managed to build an optical generative model that runs on light instead of GPUs. In their demo a shallow encoder maps noise into phase patterns which a free-space optical decoder then transforms into imagesdigits fashion butterflies faces even Van Goghstyle artwithout any computation during synthesis. ⚡ The results rival digital diffusion models pointing to ultra-fast energy-efficient AI powered by photonics. Optical generative models Nature Paper: https://www.nature.com/articles/s41586-025-09446-5#MOESM1 https://www.nature.com/articles/s41586-025-09446-5#MOESM1"
X Link 2025-10-02T05:09Z 10.4K followers, 174.6K engagements
"If AI were to possess consciousness. Yoshua Bengio and his PhD student Eric Elmoznino recently published an article in Science titled Illusions of AI Consciousness addressing two interrelated questions: As AI continues to advance how will scientific and public perceptions regarding AI consciousness evolve If we begin treating future AI systems as conscious beings what risks might arise For instance if an AI refuses to obey human commands would shutting it down face legal restrictionssince it would then be regarded as a living entity Absolutely worth reading. Illusions of AI consciousness"
X Link 2025-10-03T05:27Z [----] followers, [----] engagements
"📬 #PapersAccepted by Jiqizhixin Our report: Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets Tencent Hunyuan3D Project: Code: Model: Paper: https://arxiv.org/pdf/2509.21245 https://huggingface.co/tencent/Hunyuan3D-Omni https://github.com/Tencent-Hunyuan/Hunyuan3D-Omni https://3d.hunyuan.tencent.com https://mp.weixin.qq.com/s/WyCO_ZdrBhRYWHhKCqxSbQ https://huggingface.co/tencent/Hunyuan3D-Omni https://github.com/Tencent-Hunyuan/Hunyuan3D-Omni https://3d.hunyuan.tencent.com https://mp.weixin.qq.com/s/WyCO_ZdrBhRYWHhKCqxSbQ https://arxiv.org/pdf/2509.21245"
X Link 2025-10-04T01:05Z [----] followers, [---] engagements
"Can reasoning LLMs think better if their Chain-of-Thought is continuous instead of discrete 🧠✨ This Meta paper introduces the first scalable way to train continuous CoTs with reinforcement learningno need to distill from discrete references. By using "soft" tokens (mixtures of tokens + noise) for RL exploration the method learns continuous reasoning traces with hundreds of steps at minimal overhead. On math benchmarks with Llama & Qwen (up to 8B) continuous CoTs match discrete ones at pass@1 but surpass them at pass@32showing richer reasoning diversity. The best strategy: train with"
X Link 2025-10-04T05:37Z 10.3K followers, [----] engagements
"An intriguing paper from Apple. MoEs Are Stronger than You Think: Hyper-Parallel Inference Scaling with RoE Paper: https://arxiv.org/abs/2509.17238 https://arxiv.org/abs/2509.17238 https://arxiv.org/abs/2509.17238 https://arxiv.org/abs/2509.17238"
X Link 2025-10-05T06:58Z 10.3K followers, 31.8K engagements
"📬 #PapersAccepted by Jiqizhixin Our report: FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving Alibaba Xian Jiaotong University Project: Paper: Code: https://github.com/MIV-XJTU/FSDrive https://arxiv.org/abs/2505.17685 https://miv-xjtu.github.io/FSDrive.github.io/ https://mp.weixin.qq.com/s/uLazfZGChZUbHo_jW_VN6g https://arxiv.org/abs/2505.17685 https://miv-xjtu.github.io/FSDrive.github.io/ https://mp.weixin.qq.com/s/uLazfZGChZUbHo_jW_VN6g https://github.com/MIV-XJTU/FSDrive https://arxiv.org/abs/2505.17685 https://miv-xjtu.github.io/FSDrive.github.io/"
X Link 2025-10-07T03:47Z [----] followers, [---] engagements
"Nice survey on Reinforcement Learning. This comprehensive survey covers [---] papers and maps how RL empowers LLMs across their full lifecycle from pre-training and alignment fine-tuning to reinforced reasoning where models learn to think better through verifiable feedback. It highlights RL with Verifiable Rewards (RLVR) as a key step toward more reliable interpretable and self-improving AI systems while cataloging datasets benchmarks and open-source frameworks that drive the field. 📚 A must-read for those exploring the frontier of RL-enhanced reasoning and alignment in next-gen LLMs"
X Link 2025-10-07T07:32Z 10.3K followers, [---] engagements
"Technical architecture of the RLVR methods. It depicts the overall workflow of the RLVR and expands on the design methods for the reward model off-policy assistance reward filtering sampling and reasoning strategies Agent RL and reward update hierarchy"
X Link 2025-10-07T07:32Z 10.1K followers, [---] engagements
"📬 #PapersAccepted by Jiqizhixin Our report: Reinforcement Learning Meets Large Language Models: A Survey of Advancements and Applications Across the LLM Lifecycle Fudan University ByteDance and others Paper: https://arxiv.org/pdf/2509.16679 https://mp.weixin.qq.com/s/tTr7J6U9U3ypv4Q8DeEMVw https://arxiv.org/pdf/2509.16679 https://mp.weixin.qq.com/s/tTr7J6U9U3ypv4Q8DeEMVw https://arxiv.org/pdf/2509.16679 https://mp.weixin.qq.com/s/tTr7J6U9U3ypv4Q8DeEMVw https://arxiv.org/pdf/2509.16679 https://mp.weixin.qq.com/s/tTr7J6U9U3ypv4Q8DeEMVw"
X Link 2025-10-07T07:32Z 10.1K followers, [---] engagements
"Can synthetic data grow as rich and diverse as human-collected datasets Meet TreeSynth a tree-guided subspace-based data synthesis method that uses decision treelike partitioning to systematically explore the entire task space before generating data. By recursively dividing the data space into mutually exclusive atomic subspaces TreeSynth ensures both diversity and coverage avoiding the repetition and bias common in LLM-based synthesis. 🚀 Results show +10% average performance gains and stronger scalability vs. human or AI-generated baselines proving that smart structure not just scale drives"
X Link 2025-10-08T06:15Z 10.1K followers, [----] engagements
"📬 #PapersAccepted by Jiqizhixin Our report: TreeSynth: Synthesizing Diverse Data from Scratch via Tree-Guided Subspace Partitioning The University of Hong Kong The Chinese University of Hong Kong Paper: Code: https://github.com/cpa2001/TreeSynth https://arxiv.org/abs/2503.17195 https://mp.weixin.qq.com/s/fp29jkzxBL7Sku4tqlvGPw https://arxiv.org/abs/2503.17195 https://mp.weixin.qq.com/s/fp29jkzxBL7Sku4tqlvGPw https://github.com/cpa2001/TreeSynth https://arxiv.org/abs/2503.17195 https://mp.weixin.qq.com/s/fp29jkzxBL7Sku4tqlvGPw https://arxiv.org/abs/2503.17195"
X Link 2025-10-08T06:15Z 10.1K followers, [---] engagements
"How do we teach AI to see the world in [---] A new survey dives deep into panoramic vision exploring how models adapt from standard perspective images to omnidirectional images (ODIs) used in VR robotics and autonomous driving. The paper reviews 300+ works and identifies key challenges in perspective-to-panorama adaptation: ⚙ Geometric distortions near poles 📊 Non-uniform sampling in equirectangular projections 🔁 Boundary continuity across the [---] field It maps out progress across 20+ tasks from image enhancement to multimodal understanding and generation and highlights open frontiers in data"
X Link 2025-10-08T06:52Z 10.1K followers, [---] engagements
"📬 #PapersAccepted by Jiqizhixin Our report: One Flight Over the Gap: A Survey from Perspective to Panoramic Vision Insta360 Research University of California San Diego and others Project: Paper: Repo: https://github.com/Insta360-Research-Team/panoramic-vision-survey https://arxiv.org/pdf/2509.04444 https://insta360-research-team.github.io/Survey-of-Panorama/ https://mp.weixin.qq.com/s/O-4L9pACS-kGxTX6fCLtOA https://arxiv.org/pdf/2509.04444 https://insta360-research-team.github.io/Survey-of-Panorama/ https://mp.weixin.qq.com/s/O-4L9pACS-kGxTX6fCLtOA"
X Link 2025-10-08T06:52Z 10.1K followers, [---] engagements
"Salesforce just proposed UserRL. It's a unified framework for building user-centric agentic models through standardized gym environments and simulated users. Using Qwen3 models under the GRPO algorithm the study uncovers: [--] SFT cold start is essential for unlocking early interaction skills. [--] Trajectory-level rewards boost multi-turn efficiency and quality. [--] Simulated users (even open-source ones like Qwen3-32B) enable scalable cost-effective training"
X Link 2025-10-08T07:31Z 10.3K followers, [----] engagements
"📬 #PapersAccepted by Jiqizhixin Our report: UserRL: Training Interactive User-Centric Agent via Reinforcement Learning Salesforce AI Research University of Illinois Urbana-Champaign Paper: Code: https://github.com/SalesforceAIResearch/UserRL https://arxiv.org/pdf/2509.19736 https://mp.weixin.qq.com/s/HpFIFebKAabgvPLC8jeMLA https://github.com/SalesforceAIResearch/UserRL https://arxiv.org/pdf/2509.19736 https://mp.weixin.qq.com/s/HpFIFebKAabgvPLC8jeMLA https://github.com/SalesforceAIResearch/UserRL https://arxiv.org/pdf/2509.19736 https://mp.weixin.qq.com/s/HpFIFebKAabgvPLC8jeMLA"
X Link 2025-10-08T07:31Z 10.3K followers, [---] engagements
"Huge LLMs can now think longer without burning quadratic compute Mila Microsoft and others just introduced Markovian Thinking a paradigm that decouples reasoning length from context size turning LLM reasoning into a linear-compute process. Their system Delethink trains models in fixed-size reasoning chunks: at each boundary the model writes a compact textual state resets the context and seamlessly continues reasoning. Results are striking: an R1-Distill 1.5B model thinks up to 24K tokens with only 8K context outperforming LongCoT-RL trained on full 24K sequences at [--] lower compute cost (7 vs."
X Link 2025-10-10T01:56Z 10.3K followers, 45K engagements
"📬 #PapersAccepted by Jiqizhixin Our report: Improving Context Fidelity via Native Retrieval-Augmented Reasoning Project: Paper: Code: Models: https://huggingface.co/collections/sheryc/care-checkpoints-emnlp-2025-68be35dbd732816c9d98f258 https://github.com/FoundationAgents/CARE https://arxiv.org/abs/2509.13683 https://foundationagents.github.io/CARE https://mp.weixin.qq.com/s/-nQcoGJu6zbYygK44EMFyg https://huggingface.co/collections/sheryc/care-checkpoints-emnlp-2025-68be35dbd732816c9d98f258 https://github.com/FoundationAgents/CARE https://arxiv.org/abs/2509.13683"
X Link 2025-10-10T03:33Z 10.2K followers, [---] engagements
"Yes it turns out diffusion models can learn from feedback as effectively as language models do with RL Tsinghua NVIDIA and Stanford introduced Diffusion Negative-aware FineTuning (DiffusionNFT) a new online reinforcement learning paradigm that finally makes RL practical for diffusion models. Instead of struggling with intractable likelihoods or reverse-sampling hacks DiffusionNFT works directly on the forward process via flow matching contrasting positive vs. negative generations to guide improvement. ✨ Key perks: - Works with any black-box solver no likelihood estimation needed. - CFG-free"
X Link 2025-10-10T03:58Z 10.3K followers, 27K engagements
"How can we teach language models to read images as naturally as text 🧠📸 A new study from BeingBeyond introduces a unified framework that applies byte-pair encoding to visual tokens replacing separate vision encoders with a text-like tokenization scheme. With priority-guided encoding and curriculum-based training the model learns to reason seamlessly across modalitiesachieving stronger performance on diverse vision-language tasks. 👉 Bridging the gap between vision and language one token at a time"
X Link 2025-10-10T06:44Z 10.2K followers, [----] engagements
"📬 #PapersAccepted by Jiqizhixin Our report: Unified Multimodal Understanding via Byte-Pair Visual Encoding Peking University UC San Diego Renmin University of China BeingBeyond Paper: Project: GitHub: https://github.com/beingbeyond/Being-VL-0.5 https://beingbeyond.github.io/Being-VL-0.5 https://arxiv.org/abs/2506.23639 https://mp.weixin.qq.com/s/c53EDKSD8yGPcqDOAdU6Cg https://beingbeyond.github.io/Being-VL-0.5 https://arxiv.org/abs/2506.23639 https://mp.weixin.qq.com/s/c53EDKSD8yGPcqDOAdU6Cg https://github.com/beingbeyond/Being-VL-0.5 https://beingbeyond.github.io/Being-VL-0.5"
X Link 2025-10-10T06:44Z 10.2K followers, [---] engagements
"Wow robots can learn new manipulation tasks just by watching generated videos Meet NovaFlow a zero-shot manipulation framework that turns task descriptions into executable robot actions no demos needed. It synthesizes a video extracts 3D object flow and plans motions for both rigid and deformable objects. ✅ Works across embodiments from a Franka arm to Spot without fine-tuning. A big step toward truly general-purpose robotic intelligence"
X Link 2025-10-10T07:41Z 10.2K followers, [---] engagements
"📬 #PapersAccepted by Jiqizhixin Our report: NovaFlow: Zero-Shot Manipulation via Actionable Flow from Generated Videos Robotics and AI Institute Brown University Paper: Project: https://novaflow.lhy.xyz/ https://arxiv.org/abs/2510.08568 https://mp.weixin.qq.com/s/qolvGDUY22luJYzmq07tgw https://novaflow.lhy.xyz/ https://arxiv.org/abs/2510.08568 https://mp.weixin.qq.com/s/qolvGDUY22luJYzmq07tgw https://novaflow.lhy.xyz/ https://arxiv.org/abs/2510.08568 https://mp.weixin.qq.com/s/qolvGDUY22luJYzmq07tgw https://novaflow.lhy.xyz/ https://arxiv.org/abs/2510.08568"
X Link 2025-10-10T07:41Z 10.2K followers, [---] engagements
"Interesting Sakana AI is using LLMs to creating new constructed language like Esperanto Klingon Vulcan Navi Quenya. and Lojban. Theyve built an interactive agentic system for Constructed Languages (ConLangs) where an LLM designs phonology builds grammar generates a lexicon creates orthography and even writes a mini grammar book. Beyond the fun of language creation it probes a deeper question: how much do LLMs truly understand about the structure of language itself IASC: Interactive Agentic System for ConLangs Notre Dame University Sakana AI Paper: Code: https://github.com/SakanaAI/IASC"
X Link 2025-10-11T01:02Z 10.2K followers, [----] engagements
"How can robots teach themselves to master dexterous manipulation DexFlyWheel is here(NeurIPS [----] Spotlight) It's a self-improving data generation framework that builds ever-more diverse and capable manipulation datasets through iterative learning cycles. Each cycle blends: 🔹 Imitation Learning for human-like skills 🔹 Residual Reinforcement Learning for generalization 🔹 Rollout & Augmentation for scaling diversity across environments This closed-loop flywheel continuously enriches the data enabling scalable high-quality training without constant human input. 📊 Results: - 2000+ diverse"
X Link 2025-10-11T04:00Z 10.2K followers, [---] engagements
"📬 #PapersAccepted by Jiqizhixin Our report: DexFlyWheel: A Scalable and Self-improving Data Generation Framework for Dexterous Manipulation Harbin Institute of Technology PsiBot and others Paper: Project: https://DexFlyWheel.github.io https://arxiv.org/abs/2509.23829 https://mp.weixin.qq.com/s/LDokC_fXorPEbX37he-Esg https://arxiv.org/abs/2509.23829 https://mp.weixin.qq.com/s/LDokC_fXorPEbX37he-Esg https://DexFlyWheel.github.io https://arxiv.org/abs/2509.23829 https://mp.weixin.qq.com/s/LDokC_fXorPEbX37he-Esg https://arxiv.org/abs/2509.23829 https://mp.weixin.qq.com/s/LDokC_fXorPEbX37he-Esg"
X Link 2025-10-11T04:00Z 10.2K followers, [---] engagements
"Can a vision-language model teach itself to reasonwithout any human labels 👀 Meet Vision-Zero a new framework that lets VLMs improve through competitive visual games instead of costly datasets. Heres how it works: - Strategic Self-Play: models play Whos the Spy-style games generating their own training data. - Any Images Any Domain: from synthetic scenes to real-world photos Vision-Zero builds reasoning through play. - Iterative-SPO: a new loop that alternates self-play with RL sustaining long-term gains. The result Label-free state-of-the-art reasoning outperforming even annotation-heavy"
X Link 2025-10-13T06:34Z 10.3K followers, [----] engagements
"📬 #PapersAccepted by Jiqizhixin Our report: Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play Duke University National University of Singapore University of Maryl and Adobe Models: Project: Paper: https://arxiv.org/abs/2509.25541 https://huggingface.co/papers/2509.25541 https://github.com/wangqinsi1/Vision-Zero https://mp.weixin.qq.com/s/TRkETaG2y1gzcdbE-eHNxA https://github.com/wangqinsi1/Vision-Zero https://mp.weixin.qq.com/s/TRkETaG2y1gzcdbE-eHNxA https://arxiv.org/abs/2509.25541 https://huggingface.co/papers/2509.25541 https://github.com/wangqinsi1/Vision-Zero"
X Link 2025-10-13T06:34Z 10.3K followers, [---] engagements
"Can AI-generated 3D models understand physics Meet PhysX-3D a new paradigm bringing physical grounding to 3D generation. While most models focus on geometry and texture PhysX-3D teaches AI to model how objects behave in the real world. It introduces two key components: - PhysXNet the first physics-annotated 3D dataset covering [--] dimensions: scale material affordance kinematics and function. - PhysXGen a physics-aware image-to-3D generator that links structure and physical properties through a dual-branch architecture. The result: 3D assets that look real and act real paving the way for"
X Link 2025-10-14T06:41Z 10.3K followers, [----] engagements
"Struggling to deploy massive Mixture-of-Experts (MoE) models without system instability EaaS is a novel serving system that makes MoE deployment efficient scalable and robust. It works by disaggregating MoE modules into independent stateless microservices. This clever design enables fine-grained resource scaling and provides inherent fault tolerance. The system is powered by a high-performance CPU-free communication library to ensure minimal overhead. The outcome is a system that saves up to 37.5% of computing resources by adapting to traffic and suffers less than a 2% throughput reduction"
X Link 2025-10-15T02:28Z 10.3K followers, [----] engagements
"📬 #PapersAccepted by Jiqizhixin Our report: Expert-as-a-Service: Towards Efficient Scalable and Robust Large-scale MoE Serving National University of Singapore Shanghai Qiji Zhifeng Co. Ltd. and others Paper: https://arxiv.org/abs/2509.17863v1 https://mp.weixin.qq.com/s/7uQSGe6htpQnv881Ayb2YQ https://arxiv.org/abs/2509.17863v1 https://mp.weixin.qq.com/s/7uQSGe6htpQnv881Ayb2YQ https://arxiv.org/abs/2509.17863v1 https://mp.weixin.qq.com/s/7uQSGe6htpQnv881Ayb2YQ https://arxiv.org/abs/2509.17863v1 https://mp.weixin.qq.com/s/7uQSGe6htpQnv881Ayb2YQ"
X Link 2025-10-15T02:28Z 10.3K followers, [---] engagements
"Is building a state-of-the-art Large Multimodal Model (LMM) from scratch prohibitively expensive LLaVA-OneVision-1.5 says no. It's a family of open efficient and reproducible LMMs that deliver top-tier performance on a budget. The team developed a complete end-to-end framework including massive curated datasets (85M for pre-training 22M for instruction tuning) enabling training for under $16000. The results are stunning: - The 8B model outperforms Qwen2.5-VL-7B on [--] of [--] benchmarks. - The 4B model surpasses Qwen2.5-VL-3B on all [--] benchmarks. This work democratizes access to building"
X Link 2025-10-15T08:30Z 10.3K followers, [----] engagements
"📬 #PapersAccepted by Jiqizhixin Our report: LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training LLaVA-OneVision Community Contributors Code: Paper: Model & data: Demo: https://huggingface.co/spaces/lmms-lab/LLaVA-OneVision-1.5 https://huggingface.co/collections/lmms-lab/llava-onevision-15-68d385fe73b50bd22de23713 https://arxiv.org/abs/2509.23661 https://github.com/EvolvingLMMs-Lab/LLaVA-OneVision-1.5 https://mp.weixin.qq.com/s/t0oflHZOVU_73zzq2PCLSw https://arxiv.org/abs/2509.23661 https://github.com/EvolvingLMMs-Lab/LLaVA-OneVision-1.5"
X Link 2025-10-15T08:30Z 10.3K followers, [---] engagements
"Say goodbye to GRPOGVPO is here GVPO (Group Variance Policy Optimization) proposed by a NeurIPS [----] paper from HKUST(GZ) and Zuoyebang is a new algorithm that tackles the instability plaguing advanced post-training methods like GRPO. GVPO introduces an analytical solu tion to the KL-constrained reward maximization problem and bakes it directly into its gradient weights aligning every update with the true optimal policy. Why it matters: - Stable by design guarantees a unique optimal solution - Flexible sampling no on-policy or importance sampling constraints - Physically intuitive the"
X Link 2025-10-16T03:53Z 10.4K followers, 12.6K engagements
"How can we make generalist robot hands both dexterous and affordable RAPID Hand is a co-designed hardware & software platform with: - 20-DoF compact robotic hand - Wrist vision + fingertip tactile + proprioception (sub-7 ms latency) - High-DoF teleoperation with stable retargeting Trained diffusion policies show state-of-the-art performance proving RAPID Hand enables high-quality low-cost data collection for multi-fingered manipulation"
X Link 2025-10-17T03:12Z 10.4K followers, [---] engagements
"📬 #PapersAccepted by Jiqizhixin Our report: RAPID Hand: A Robust Affordable Perception-Integrated Dexterous Manipulation Platform for Generalist Robot Autonomy Sun Yat-sen University University of California Merced CASIA Paper: Project: Code: https://github.com/SYSU-RoboticsLab/RAPID-Hand https://rapid-hand.github.io/ https://www.arxiv.org/abs/2506.07490 https://mp.weixin.qq.com/s/x-pov_ppBKXwQv_3NmMquw https://mp.weixin.qq.com/s/x-pov_ppBKXwQv_3NmMquw https://github.com/SYSU-RoboticsLab/RAPID-Hand https://rapid-hand.github.io/ https://www.arxiv.org/abs/2506.07490"
X Link 2025-10-17T03:12Z 10.4K followers, [---] engagements
"📬 #PapersAccepted by Jiqizhixin Our report: RiskPO: Risk-based Policy Optimization via Verifiable Reward for LLM Post-Training Peking University Paper: Code: https://github.com/RTkenny/RiskPO https://arxiv.org/abs/2510.00911v1 https://mp.weixin.qq.com/s/9TbUIT6ed_wOviVU0GuLqg https://github.com/RTkenny/RiskPO https://arxiv.org/abs/2510.00911v1 https://mp.weixin.qq.com/s/9TbUIT6ed_wOviVU0GuLqg https://github.com/RTkenny/RiskPO https://arxiv.org/abs/2510.00911v1 https://mp.weixin.qq.com/s/9TbUIT6ed_wOviVU0GuLqg https://github.com/RTkenny/RiskPO https://arxiv.org/abs/2510.00911v1"
X Link 2025-10-17T03:57Z 10.4K followers, [---] engagements
"How can we make text-to-speech systems speak the worlds dialects Tsinghua and Giant Network build DiaMoE-TTS a unified IPA-based framework that brings scalable and expressive dialect TTS to life. 🎯 Key innovations: - Standardizes phonetic representations to resolve orthography & pronunciation ambiguity - Uses a dialect-aware Mixture-of-Experts to model phonological variation - Adapts fast to new dialects via LoRA and Conditioning Adapters Results: natural expressive speech even zero-shot synthesis on unseen dialects and niche domains like Peking Opera with just a few hours of data"
X Link 2025-10-17T08:28Z 10.4K followers, [----] engagements
"📬 #PapersAccepted by Jiqizhixin Our report: DiaMoE-TTS: A Unified IPA-Based Dialect TTS Framework with Mixture-of-Experts and Parameter-Efficient Zero-Shot Adaptation Tsinghua and Giant Network Paper: Code: Checkpoint: Dataset: https://huggingface.co/datasets/RICHARD12369/DiaMoE-TTS_IPA_Trainingset https://huggingface.co/RICHARD12369/DiaMoE_TTS https://github.com/GiantAILab/DiaMoE-TTS https://www.arxiv.org/abs/2509.22727 https://mp.weixin.qq.com/s/RIECHcBUYZxkYR08CtuCAw https://www.arxiv.org/abs/2509.22727 https://mp.weixin.qq.com/s/RIECHcBUYZxkYR08CtuCAw"
X Link 2025-10-17T08:28Z 10.4K followers, [---] engagements
"From Amazon to Alibaba: The Quest of Computer Vision Scientist XiaofengRen http://syncedreview.com/2017/09/04/from-amazon-to-alibaba-the-quest-of-computer-vision-scientist-xiaofeng-ren/ http://syncedreview.com/2017/09/04/from-amazon-to-alibaba-the-quest-of-computer-vision-scientist-xiaofeng-ren/"
X Link 2017-09-04T22:37Z [----] followers, [--] engagements
"📬 #PapersAccepted by Jiqizhixin is a tweet series where we share cutting-edge research contributed to our official Chinese platform. Reach millions of Chinese readers with your work for free. Submissions are welcome"
X Link 2025-08-06T00:08Z 15.9K followers, 21.4K engagements
"Happy Year of the Horse 🐎 In Chinese there is an idiom meaning success comes the moment the horse arrives. Wishing every researcher their own well-earned breakthroughs. We will also keep sharing more exciting research and AI stories from China"
X Link 2026-02-17T02:00Z 15.9K followers, [---] engagements
"Ever wonder how AI agents actually behave when they go on a deep-dive search for you Researchers from Carnegie Mellon University University of Lisbon and NOVA University Lisbon analyzed [--] million real-world search requests to find out. They mapped the secret life of AI search agents by studying how they refine their queries and use evidence across millions of sessions. By tracking how new search terms are born from old results they identified the specific patterns that separate simple fact-finding from complex reasoning. The data shows agents are fast usually wrapping up in under ten steps"
X Link 2026-02-16T04:19Z 15.9K followers, [----] engagements
"Could LLMs finally end the nightmare of manual data cleaning Researchers from SJTU Tsinghua Microsoft Research MIT and Alibaba present a comprehensive survey on the future of application-ready data preparation. They detail a massive paradigm shift from rigid rule-based code to smart agentic workflows that use LLMs to understand and organize messy datasets through natural language and context. This new approach outperforms traditional pipelines in flexibility and semantic reasoning across critical domains like data cleaning entity matching and automated dataset enrichment. Can LLMs Clean Up"
X Link 2026-02-15T19:16Z 15.9K followers, [----] engagements
"Can we make AI reasoning both smarter and [--] percent cheaper at the same time Researchers from UIUC and Amazon Web Services just introduced SAR to solve the problem of long-winded and expensive AI models. Instead of just checking if an answer is right or wrong Self-Aligned Reward (SAR) uses a new scoring system that rewards concise specific logic. It measures how much value an answer provides relative to the question teaching the model to eliminate filler text and focus on the actual solution. The results are a win-win: SAR boosts accuracy by [--] percent across [--] major benchmarks while slashing"
X Link 2026-02-15T04:13Z 15.9K followers, [----] engagements
"Can video generation AI actually understand the physical world or is it just a digital illusion Researchers from HKUST (GZ) Tongji University and Kuaishou Technology present a new mechanistic framework to bridge the gap between video generation and true world models They break down video AI into two pillars: state construction which builds an internal memory of the scene and dynamics modeling which handles how objects move and interact over time. This approach moves evaluation beyond just looking good to mastering physical persistence and causal reasoning outperforming traditional visual-only"
X Link 2026-02-14T18:53Z 15.9K followers, [----] engagements
"How can we train humanoid robots to handle new environments safely and instantly Researchers from BIGAI and Xidian University present a breakthrough in humanoid control By combining large-scale pretraining with a physics-smart world model the robot can safely practice new tasks in its mind before trying them for real. This approach enables successful zero-shot deployment on real hardware and far exceeds traditional methods in adapting to unpredictable out-of-distribution tasks. Towards Bridging the Gap between Large-Scale Pretraining and Efficient Finetuning for Humanoid Control Paper: Paper:"
X Link 2026-02-14T04:03Z 15.9K followers, [----] engagements
"Can we trust AI agents to interact with the world safely without a clear way to diagnose their mistakes Shanghai Artificial Intelligence Laboratory presents AgentDoG It is a new diagnostic guardrail framework that monitors AI agents in real-time. Instead of just blocking risky moves with a simple yes or no it uses a specialized three-part system to explain the root cause of a danger and catch subtle "hidden" errors that other models miss. AgentDoG achieves state-of-the-art performance in safety moderation across complex interactive scenarios outperforming current guardrail models in both"
X Link 2026-02-13T16:27Z 15.9K followers, [---] engagements
"Can diffusion models finally outperform traditional AI at writing code Researchers from Huazhong University of Science and Technology and ByteDance Seed just introduced Stable-DiffCoder. Instead of writing code one token at a time like standard models this method uses a block diffusion approach to generate and refine code chunks simultaneously resulting in more stable and structured programming. The results show it outperforms its autoregressive counterparts and various 8B-parameter models on major benchmarks specifically excelling in code editing logical reasoning and low-resource"
X Link 2026-02-13T03:23Z 15.9K followers, [----] engagements
"What if standard Reinforcement Learning isn't actually training your models to find the most likely correct answers Researchers from CMU Tsinghua Zhejiang and UC Berkeley introduce MaxRL to fix this fundamental limitation. MaxRL is a new framework that bridges the gap between standard RL and exact maximum likelihood. By using a sampling-based approach that scales with available compute it more directly optimizes for the correct outcome rather than settling for a rough approximation. The results are massive: MaxRL Pareto-dominates existing methods delivering up to 20x better test-time scaling"
X Link 2026-02-12T17:21Z 15.9K followers, 11.3K engagements
"OpenClaw is cool but too large Researchers at the University of Hong Kong (HKUDS) just released nanobot to solve this exact problem. They transformed the massive OpenClaw system into a clean 4000-line Python framework that focuses on a simple loop: receive input let the AI think and execute tools like file management or web searches. It strips away complex abstractions to focus on clear modular function calls that any developer can understand. By slashing code complexity by [--] percent they achieved full functional parity with a 2-minute deployment time making it significantly easier to"
X Link 2026-02-12T17:13Z 15.9K followers, [----] engagements
"What if your AI could generate high-quality code at nearly [---] tokens per second Ant Group in collaboration with researchers from Zhejiang University and Westlake University presents LLaDA2.1. Instead of just filling in blanks this model uses a smart token editing system that refines text as it goes. This allows you to switch between a Speedy Mode for raw velocity and a Quality Mode for high-precision tasks all powered by a new reinforcement learning framework designed specifically for diffusion models. The results are massive: the 100B model hits a record-breaking [---] tokens per second on"
X Link 2026-02-12T08:46Z 15.9K followers, [----] engagements
"LLaDA2.1: Speeding Up Text Diffusion via Token Editing Paper: Hugging Face: ModelScope: GitHub: Tech Report: Our report: https://mp.weixin.qq.com/s/XEG5MQMHaOXO-IRY6O09Vg https://huggingface.co/papers/2602.08676 https://modelscope.cn/collections/inclusionAI/LLaDA21 https://huggingface.co/collections/inclusionAI/llada21 https://github.com/inclusionAI/LLaDA2.X/blob/main/llada2_1_tech_report.pdf https://mp.weixin.qq.com/s/XEG5MQMHaOXO-IRY6O09Vg https://huggingface.co/papers/2602.08676 https://modelscope.cn/collections/inclusionAI/LLaDA21 https://huggingface.co/collections/inclusionAI/llada21"
X Link 2026-02-12T08:46Z 15.9K followers, [---] engagements
"Does being a math genius make an AI better at understanding human intentions Researchers from Arizona State University and Microsoft Research Asia investigated whether the step-by-step logic used for coding helps AI master Theory of Mindthe ability to sense what others are thinking and feeling. The results show that more thinking time can actually cause social reasoning to collapse with advanced reasoning models often being outperformed by simpler ones. Unlike in math or code these models frequently rely on answer-matching shortcuts rather than true deduction proving that social intelligence"
X Link 2026-02-12T07:47Z 15.9K followers, [---] engagements
"To Think or Not To Think That is The Question for Large Reasoning Models in Theory of Mind Tasks Paper: https://arxiv.org/abs/2602.10625 https://arxiv.org/abs/2602.10625"
X Link 2026-02-12T07:47Z 15.9K followers, [---] engagements
"Can an AI learn to choose the best way to think based on the specific image it sees Researchers from Fudan University and Alibaba introduced MoVT to do exactly that. Mixture-of-Visual-Thoughts (MoVT) unifies different reasoning styles into one model and uses a new reinforcement learning framework called AdaVaR to teach the AI how to pick the right logic for any given context. The method achieves consistent performance gains across diverse scenarios outperforming traditional models that rely on a single reasoning mode. Mixture-of-Visual-Thoughts:Exploring Context-Adaptive Reasoning Mode"
X Link 2026-02-12T03:17Z 15.9K followers, [----] engagements
"Can one AI model truly master both understanding and generating images without sacrificing performance Researchers from Meituan Inc. just introduced STAR to solve this exact problem. The method works by stacking new learning layers on top of a frozen base model like building blocks. By separating tasks into stages like understanding and editing the model gains new skills without forgetting old ones or getting confused between different functions. The results are impressive setting new state-of-the-art records on the GenEval and DPG-Bench benchmarks. STAR proves that this modular approach"
X Link 2026-02-12T03:06Z 15.9K followers, [----] engagements
"Ever wondered why fish swim in short bursts instead of one steady motion Enter ZBot It's a bio-inspired robot that mimics larval zebrafish using brain-like neural networks to test burst-and-glide swimming versus moving at a constant speed. The results prove that swimming in bursts is significantly more energy-efficient than continuous movement. It turns out that moving in cycles does more than just reduce water drag; it allows the robot's motors to operate at their peak power efficiency across almost all speeds. Energy efficiency and neural control of continuous versus intermittent swimming"
X Link 2026-02-11T17:50Z 15.9K followers, [---] engagements
"🚀 Meet RLinf-USER: A Unified and Extensible System for Real-World Online Robot Training Tired of the Sim-to-Real" gap holding back embodied AI RLinf-USER is here to change the game. This unified system treats robots as first-class compute resources (just like GPUs) enabling seamless cloud-edge-terminal collaboration. By implementing a Fully Asynchronous Pipeline RLinf-USER eliminates idle time boosting training throughput by 5.7x. It allows robots to learn continuously in the physical world without waiting for computation bottlenecks. 📈Results RLinf-USER has successfully fine-tuned a"
X Link 2026-02-11T11:29Z 15.9K followers, [----] engagements
"Our https://mp.weixin.qq.com/s/4iPmPYghEzbWZeyO9jlD5w https://mp.weixin.qq.com/s/4iPmPYghEzbWZeyO9jlD5w"
X Link 2026-02-11T11:30Z 15.9K followers, [---] engagements
"Have you ever wondered why LLMs funnel so much attention toward the very first token in a prompt Researchers from the University of Oxford AITHYRA and NYU including Yann LeCun just revealed that this phenomenon is actually a feature not a bug. The team discovered that attention sinks and data compression are two sides of the same coin both triggered by massive spikes in the models internal signals. They propose a new Mix-Compress-Refine theory: LLMs mix data early on squash it into a dense core in the middle layers and then refine it for the final output. This framework finally explains why"
X Link 2026-02-11T06:13Z 15.9K followers, [----] engagements
"Attention Sinks and Compression Valleys in LLMs are Two Sides of the Same Coin Paper: https://arxiv.org/abs/2510.06477 https://arxiv.org/abs/2510.06477"
X Link 2026-02-11T06:13Z 15.9K followers, [---] engagements
Limited data mode. Full metrics available with subscription: lunarcrush.com/pricing