@HuggingPapers DailyPapersDailyPapers posts on X about inference, agentic, ai, paper the most. They currently have [------] followers and [---] posts still getting attention that total [------] engagements in the last [--] hours.
Social category influence technology brands 25.93% stocks 10.19% finance 1.85% cryptocurrencies 0.93% travel destinations 0.93%
Social topic influence inference #154, agentic #100, ai 8.33%, paper #2101, llm #242, alibaba 4.63%, microsoft #1185, strong 4.63%, bytedance #73, math #2306
Top accounts mentioned or mentioned by @codewithimanshu @huggingface @kimimoonshot @googledeepmind @barrakali @calebfahlgren @vsouthvpawv @alexwingfield_ @elangovankamesh @jrggllf
Top assets mentioned Microsoft Corp. (MSFT) Alphabet Inc Class A (GOOGL)
Top posts by engagements in the last [--] hours
"IGGT: a unified transformer for semantic 3D reconstruction IGGT is an end-to-end unified transformer that marries geometry with instance-level semantics. It achieves SOTA 3D reconstruction & understanding from 2D images powering spatial tracking & open-vocabulary segmentation"
X Link 2025-11-02T01:11Z 13.2K followers, [----] engagements
"$A3$-Bench A new benchmark that evaluates memory-driven mechanisms in scientific reasoning. It measures how models activate "anchors" (core formulas) and "attractors" (schemas/examples) during inferencegoing beyond just checking final answers"
X Link 2026-01-16T00:21Z 13.2K followers, [----] engagements
"Alibaba tackles urban socio-semantic segmentation SocioReasoner uses vision-language reasoning + RL to identify socially-defined entities like schools and parks from satellite imagery. The SocioSeg dataset provides hierarchical pixel-level labels for this challenging task"
X Link 2026-01-16T12:10Z 13.2K followers, [----] engagements
"DanQing: 100M Chinese Image-Text Pairs A new vision-language dataset built from 2024-2025 web data. Curated through a rigorous pipeline that filters 90% of raw data for superior quality. Outperforms existing Chinese VLP datasets across zero-shot classification retrieval and LMM tasks. https://twitter.com/i/web/status/2012920406420685049 https://twitter.com/i/web/status/2012920406420685049"
X Link 2026-01-18T16:09Z 13.2K followers, [----] engagements
"BayesianVLA A novel framework solving the "vision shortcut" problem in vision-language-action models via Bayesian decomposition with latent action queries forcing robots to actually pay attention to language instructions instead of ignoring them and acting on vision alone"
X Link 2026-01-23T12:09Z 13.3K followers, 13.7K engagements
"EvoCUA becomes #1 open-source Computer Use Agent Meituan's evolutionary framework achieves 56.7% on OSWorld by generating synthetic tasks running them in sandboxes and learning from failures"
X Link 2026-01-23T16:10Z 13.4K followers, [----] engagements
"ACoT-VLA A novel framework introducing Action Chain-of-Thought for Vision-Language-Action models decoupling action reasoning from generation using explicit and implicit action reasoners to achieve state-of-the-art results on LIBERO benchmarks"
X Link 2026-01-25T20:08Z 13.2K followers, [----] engagements
"Scientific Image Synthesis: ImgCoder & SciGenBench A systematic study on generating scientifically rigorous images. Introduces ImgCoder a logic-driven 'understand plan code' framework and SciGenBench a 1.4K benchmark spanning [--] domains. Synthetic data improves LMM reasoning. https://twitter.com/i/web/status/2016123491066503568 https://twitter.com/i/web/status/2016123491066503568"
X Link 2026-01-27T12:17Z 13.3K followers, [----] engagements
"LingBot-VLA A pragmatic Vision-Language-Action foundation model trained on [-----] hours of real-world robot data across [--] dual-arm configurations. Delivers state-of-the-art performance with 1.5-2.8 training speedup over existing codebases"
X Link 2026-01-28T12:10Z 13.2K followers, [----] engagements
"ByteDance Seed unlocks visual reasoning From a world-model perspective we study when visual generation beats verbal reasoning. Our visual superiority hypothesis shows unified multimodal models excel at physical world tasks. New VisWorld-Eval benchmarks [--] tasks requiring interleaved visual-verbal CoT. https://twitter.com/i/web/status/2016544964730364305 https://twitter.com/i/web/status/2016544964730364305"
X Link 2026-01-28T16:12Z 13.2K followers, [----] engagements
"Tencent just released HunyuanImage [---] Instruct on Hugging Face The largest open-source image generation MoE model with 80B parameters. Features native multimodal architecture Chain-of-Thought reasoning and distilled 8-step generation"
X Link 2026-01-28T22:07Z 13.2K followers, [----] engagements
"Youtu-VL from Tencent A 4B parameter VLM that matches Qwen3-VL-8B on visual tasks despite being half the size. Uses novel VLUAS paradigm to treat vision as target not just inputenabling unified supervision without task-specific modules"
X Link 2026-01-29T00:22Z 13.2K followers, [---] engagements
"LingBot-World from Ant Group An open-source world simulator from video generation with real-time interactivity. Maintains high fidelity across diverse environments with minute-level consistency and 1s latency at [--] FPS"
X Link 2026-01-29T08:14Z 13.3K followers, [----] engagements
"DynamicVLA A compact 0.4B Vision-Language-Action model that finally lets robots manipulate moving objects in real-time closing the perception-execution gap with Continuous Inference and Latent-aware Action Streaming"
X Link 2026-01-30T04:36Z 13.3K followers, 16.2K engagements
""Everything in Its Place" - Alibaba's new benchmark for spatial intelligence in T2I models SpatialGenEval evaluates [--] models with [----] dense prompts across [--] scenes & [-----] questions. Higher-order spatial reasoning is the key bottleneck. SpatialT2I dataset (15400 pairs) also released. https://twitter.com/i/web/status/2017149207757439171 https://twitter.com/i/web/status/2017149207757439171"
X Link 2026-01-30T08:13Z 13.3K followers, [----] engagements
"Paper: Model: Embedding scaling + speculative decoding delivers real inference speedups establishing a new Pareto frontier beyond MoE architectures. https://huggingface.co/meituan-longcat/LongCat-Flash-Lite https://huggingface.co/papers/2601.21204 https://huggingface.co/meituan-longcat/LongCat-Flash-Lite https://huggingface.co/papers/2601.21204"
X Link 2026-01-30T16:13Z 13.3K followers, [---] engagements
"Self-Refining Video Sampling Repurposes pre-trained video generators as denoising autoencoders to iteratively refine latents at inference time. Improves motion coherence and physics alignment with just 50% more compute. No external verifiers training or datasets required"
X Link 2026-01-31T04:39Z 13.3K followers, [----] engagements
"DeepPlanning from Alibaba Qwen A new benchmark for long-horizon agentic planning that tests real-world constraints like budgets and schedules. Travel and shopping tasks show even frontier LLMs struggle with genuine planning"
X Link 2026-01-31T08:15Z 13.3K followers, [----] engagements
"Reinforcement Learning via Self-Distillation SDPO converts rich textual feedback into dense learning signals without external teachers achieving [--] faster training and higher accuracy on code math and scientific reasoning"
X Link 2026-01-31T12:13Z 13.4K followers, [----] engagements
"ConceptMoE ByteDance's new paradigm shifts LLMs from token-level to adaptive concept-level processing. Dynamically merges similar tokens to allocate compute intelligently delivering 175% prefill speedups and +5.5 performance gains"
X Link 2026-01-31T16:12Z 13.3K followers, [----] engagements
"Reduces attention by [--] and KV cache by [--] at compression ratio [--] with improvements across language (+0.9) vision-language (+0.6) and long context (+2.3). Paper: Code: https://github.com/ZihaoHuang-notabot/ConceptMoE https://huggingface.co/papers/2601.21420 https://github.com/ZihaoHuang-notabot/ConceptMoE https://huggingface.co/papers/2601.21420"
X Link 2026-01-31T16:12Z 13.3K followers, [----] engagements
"Self-Distillation Enables Continual Learning MIT & ETH Zurich researchers introduce SDFT for on-policy learning from demonstrations. It uses demonstration-conditioned models as their own teacher to reduce catastrophic forgetting outperforming SFT and enabling sequential skill learning. https://twitter.com/i/web/status/2017757611861446895 https://twitter.com/i/web/status/2017757611861446895"
X Link 2026-02-01T00:31Z 13.3K followers, [----] engagements
"Alibaba releases Qwen3-ASR on Hugging Face Two all-in-one ASR models supporting [--] languages with a novel forced-aligner. Achieves SOTA performance competitive with proprietary APIs while transcribing [----] seconds in [--] second at [---] concurrency"
X Link 2026-02-01T04:55Z 13.2K followers, [----] engagements
"LLMs that clean data agents that train themselves and Microsoft's new reasoning model This week's top AI papers on @huggingface (Feb 1-7): - Can LLMs Clean Up Your Mess Survey of data prep with LLMs - AgentFly: Fine-tuning LLM agents without fine-tuning LLMs - LongCat-Flash-Thinking-2601: 560B-parameter MoE reasoning model - rStar2-Agent by Microsoft: Agentic reasoning with code execution - Idea2Story: Automated scientific narrative generation - VibeVoice: 90-minute multi-speaker speech synthesis - daVinci-Dev: Agent-native mid-training for software engineering - Beyond Pass@1: Self-play for"
X Link 2026-02-01T14:09Z 13.3K followers, [----] engagements
"Linear representations shift during conversation New research shows that LLM representations evolve as you chat. What's 'factual' at the start can flip to 'non-factual' by the endchallenging static interpretability methods"
X Link 2026-02-01T16:11Z 13.4K followers, 12.3K engagements
"PLANING A novel triangle-Gaussian framework for streaming 3D reconstruction that decouples geometry and appearance achieving accurate geometry high-fidelity rendering and efficient planar abstraction for embodied AI"
X Link 2026-02-01T20:11Z 13.2K followers, 11.2K engagements
"ASTRA A fully automated framework for training tool-augmented LLM agents. Synthesizes trajectories and verifiable RL environments end-to-end achieving state-of-the-art performance on BFCL-V3-MT"
X Link 2026-02-02T04:47Z 13.3K followers, [----] engagements
"Golden Goose Nvidia's method to synthesize unlimited RLVR tasks from unverifiable internet text by converting reasoning-rich corpora into multiple-choice questions"
X Link 2026-02-03T00:25Z 13.4K followers, [----] engagements
"PISCES Annotation-free post-training for text-to-video models using Optimal Transport to align text and video embeddings. Dual OT-aligned rewards improve fidelity and prompt faithfulness across short and long video generators"
X Link 2026-02-03T04:38Z 13.4K followers, [----] engagements
"Kimi K2.5: Visual Agentic Intelligence Moonshot just dropped Kimi K2.5 an open multimodal agentic model that parallelizes complex tasks across specialized sub-agents using Agent Swarmcutting latency by 4.5x while hitting SOTA across coding vision and reasoning"
X Link 2026-02-03T08:14Z 13.4K followers, [----] engagements
"Green-VLA A staged Vision-Language-Action framework for humanoid robots that achieves 69.5% first-item success on ALOHA (vs 35.6% baseline) and 71.8% on SimplerEnv through a five-stage curriculum from foundation models to RL alignment"
X Link 2026-02-03T12:14Z 13.3K followers, [----] engagements
"Vision-DeepResearch: First long-horizon multimodal deep-research MLLM Multi-turn multi-entity multi-scale visual/textual search with dozens of reasoning steps and hundreds of engine interactions. 8B & 30B-A3B models achieve SOTA on [--] benchmarks outperforming GPT-5 Gemini-2.5-Pro & Claude-4-Sonnet agents. https://twitter.com/i/web/status/2018721455060537519 https://twitter.com/i/web/status/2018721455060537519"
X Link 2026-02-03T16:21Z 13.4K followers, [----] engagements
"CodeOCR Vision language models can read code from images with 8x compression110 text tokens become just [--] visual tokens. The code stays recognizable while slashing compute costs"
X Link 2026-02-04T04:36Z 13.3K followers, [----] engagements
"NVIDIA just released GR00T N1.6 DROID on Hugging Face A Vision-Language-Action model for generalist humanoid robots. Achieves SOTA on simulation benchmarks and runs on the Fourier GR-1 robot"
X Link 2026-02-04T07:46Z 13.3K followers, [----] engagements
"Microsoft just released X-Reasoner on Hugging Face A vision-language model trained only on text that outperforms multimodal SOTA on reasoning benchmarks"
X Link 2026-02-04T09:46Z 13.3K followers, [----] engagements
"Quant VideoGen Solves the KV-cache memory bottleneck in autoregressive video generationreducing usage by 7x to fit on consumer GPUs with under 4% latency overhead while improving long-horizon consistency"
X Link 2026-02-05T04:38Z 13.3K followers, [----] engagements
"Achieves this through Semantic Aware Smoothing and Progressive Residual Quantization establishing a new Pareto frontier on LongCat Video HY WorldPlay and Self-Forcing benchmarks. https://huggingface.co/papers/2602.02958 https://huggingface.co/papers/2602.02958"
X Link 2026-02-05T04:38Z 13.3K followers, [---] engagements
"Enables [----] inference speedup with just 18.9% KV cache on AIME24 while maintaining near-oracle accuracythe first query-aware token eviction framework for long-context LLMs. https://huggingface.co/papers/2602.03152 https://huggingface.co/papers/2602.03152"
X Link 2026-02-05T12:15Z 13.2K followers, [---] engagements
"WideSeek-R1 Explores width scaling with multi-agent RL for broad information seeking. A 4B model matches DeepSeek-R1-671B performance with 170x fewer parameters. Performance scales consistently with more parallel subagents"
X Link 2026-02-05T16:19Z 13.3K followers, [----] engagements
"Paper: Only 4.85M parameters for Qwen2.5-Omni-7B lower latency than training-free baselines and outperforms all compression baselines on [--] benchmarks. https://huggingface.co/papers/2602.04804 https://huggingface.co/papers/2602.04804"
X Link 2026-02-06T00:24Z 13.2K followers, [---] engagements
"CAR-bench Frontier LLMs ace task completion (80%) but crumble under uncertaintyhallucination resistance (48%) and disambiguation (46%) lag far behind. A new benchmark reveals the reliability crisis in real-world agent deployment"
X Link 2026-02-06T16:18Z 13.3K followers, [---] engagements
"Paper: Results show even GPT-5 and Claude-Opus barely crack 50% on consistent reliability. The gap between capability and deployment readiness is widening. https://huggingface.co/papers/2601.22027 https://huggingface.co/papers/2601.22027"
X Link 2026-02-06T16:18Z 13.3K followers, [---] engagements
"Context Forcing A novel framework for consistent long video generation that trains a long-context student via a long-context teacher eliminating the student-teacher mismatch that plagues existing streaming methods"
X Link 2026-02-06T20:10Z 13.3K followers, [----] engagements
"Achieves 20+ seconds of effective context2-10 longer than state-of-the-artusing a Slow-Fast Memory architecture that reduces visual redundancy. Paper: Project: https://chenshuo20.github.io/Context_Forcing/ https://huggingface.co/papers/2602.06028 https://chenshuo20.github.io/Context_Forcing/ https://huggingface.co/papers/2602.06028"
X Link 2026-02-06T20:10Z 13.3K followers, [---] engagements
"PaperBanana An agentic framework that automates creation of publication-ready academic illustrations. Orchestrates [--] specialized agents to transform scientific content into high-quality diagrams and statistical plots outperforming baselines across all metrics"
X Link 2026-02-07T00:23Z 13.3K followers, [----] engagements
"Paper: Project: Google Cloud AI Research and Peking University introduce PaperBananaBench with [---] NeurIPS [----] test cases. Code & dataset releasing soon. https://dwzhu-pku.github.io/PaperBanana/ https://huggingface.co/papers/2601.23265 https://dwzhu-pku.github.io/PaperBanana/ https://huggingface.co/papers/2601.23265"
X Link 2026-02-07T00:23Z 13.3K followers, [---] engagements
"UniReason [---] A unified reasoning framework from ByteDance that harmonizes image generation and editing through world knowledge-enhanced planning and self-reflective visual refinement mirroring human planning and refinement"
X Link 2026-02-07T04:35Z 13.3K followers, [----] engagements
"SWE-Universe Scales real-world software engineering environments to 807K+ multilingual instances from GitHub PRs using an agentic framework with iterative self-verification and hacking detection"
X Link 2026-02-07T08:09Z 13.3K followers, [----] engagements
"MemSkill Replaces rigid hand-crafted memory operations with learnable evolvable skills creating a self-improving closed-loop system for LLM agents"
X Link 2026-02-08T00:32Z 13.3K followers, [----] engagements
"HySparse A hybrid sparse attention architecture that interleaves full and sparse attention layers. Full layers serve as an oracle for token selection and KV cache sharing reducing memory by 10x while boosting performance over baselines"
X Link 2026-02-08T12:13Z 13.4K followers, [----] engagements
"On the Entropy Dynamics in Reinforcement Fine-Tuning A theoretical framework analyzing entropy evolution during RL fine-tuning of LLMs. Derives first-order expressions for entropy change extends to GRPO and proposes practical entropy-discriminator clipping methods"
X Link 2026-02-09T04:45Z 13.4K followers, [----] engagements
"Alibaba researchers provide the first principled understanding of entropy dynamics in RFT revealing why and how to control entropy for stable training. Paper: Framework: https://github.com/agentscope-ai/Trinity-RFT https://huggingface.co/papers/2602.03392 https://github.com/agentscope-ai/Trinity-RFT https://huggingface.co/papers/2602.03392"
X Link 2026-02-09T04:45Z 13.3K followers, [---] engagements
"OdysseyArena A new benchmark that reveals a critical bottleneck: even frontier LLMs struggle with long-horizon inductive reasoning. Agents must discover hidden rules from experience across four interactive environmentsnot just follow instructions"
X Link 2026-02-09T08:18Z 13.3K followers, [----] engagements
"Baichuan Inc released Baichuan-M3 A medical LLM that shifts from passive Q&A to active clinical decision support. Models physician workflows with proactive info gathering long-horizon reasoning and hallucination suppressionoutperforming GPT-5.2 on HealthBench"
X Link 2026-02-09T12:19Z 13.4K followers, [----] engagements
"Paper: Code: ReAlign maps text representations into visual distributions using massive unpaired data decoupling MLLM training from dependence on costly image-text pairs. https://github.com/Yu-xm/ReVision https://huggingface.co/papers/2602.07026 https://github.com/Yu-xm/ReVision https://huggingface.co/papers/2602.07026"
X Link 2026-02-10T04:49Z 13.4K followers, [---] engagements
"NVIDIA just released Earth2Studio assets on Hugging Face A comprehensive collection of AI weather & climate model resources including GraphCast Pangu AIFS & more"
X Link 2026-02-10T08:14Z 13.4K followers, [----] engagements
"Safety is Always Vanishing in Self-Evolving AI Societies New research proves multi-agent LLM systems cannot simultaneously achieve continuous self-improvement isolation and safety invariance. Statistical blind spots from isolated self-evolution irreversibly degrade safety alignment. https://twitter.com/i/web/status/2022169687912563193 https://twitter.com/i/web/status/2022169687912563193"
X Link 2026-02-13T04:43Z 13.5K followers, [----] engagements
"Paper: Model: Open source under Apache [---] runs on consumer hardware with 100-300 tok/s inference speed. https://huggingface.co/stepfun-ai/Step-3.5-Flash https://huggingface.co/papers/2602.10604 https://huggingface.co/stepfun-ai/Step-3.5-Flash https://huggingface.co/papers/2602.10604"
X Link 2026-02-12T04:44Z 13.5K followers, [----] engagements
"just released GLM-5 on Hugging Face 744B parameters with DeepSeek Sparse Attention and a novel async RL framework called slime. Best-in-class open-source performance on reasoning coding and agentic tasks. http://Z.ai http://Z.ai"
X Link 2026-02-12T21:45Z 13.5K followers, [----] engagements
"Paper: Demonstrates first AI-generated proofs in arithmetic geometry and interacting particle systems plus a new taxonomy for quantifying AI autonomy levels in mathematical research. https://huggingface.co/papers/2602.10177 https://huggingface.co/papers/2602.10177"
X Link 2026-02-13T00:26Z 13.5K followers, [---] engagements
"DeepGen [---] A lightweight 5B unified multimodal model that outperforms 80B+ giants like HunyuanImage by 28% on WISE and Qwen-Image-Edit by 37% on UniREditBenchproving scale isn't everything"
X Link 2026-02-13T20:12Z 13.5K followers, [----] engagements
"Qwen just released Qwen3-VL their most powerful vision-language model on Hugging Face. It features comprehensive upgrades for visual perception reasoning and generation across diverse tasks. https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Instruct-GGUF https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Instruct-GGUF"
X Link 2025-11-02T04:37Z 13.4K followers, [----] engagements
"Qwen just released Qwen3-ASR on Hugging Face The most capable open-source speech recognition model yet supporting [--] languages & dialects with performance rivaling GPT-4o and Gemini"
X Link 2026-01-29T16:12Z 13.4K followers, [----] engagements
"Find it here: Features unified streaming/offline inference novel forced alignment and handles everything from speech to singing. https://huggingface.co/Qwen/Qwen3-ASR-1.7B https://huggingface.co/Qwen/Qwen3-ASR-1.7B"
X Link 2026-01-29T16:12Z 13.4K followers, [---] engagements
"Scaling Embeddings Scaling Experts Meituan's LongCat-Flash-Lite rethinks language model sparsity: allocate 30B+ params to embeddings instead of MoE experts. The result A 68.5B param model with only 3B active that beats MoE baselines in agentic & coding tasks"
X Link 2026-01-30T16:13Z 13.4K followers, [----] engagements
"Paper: The first annotation-free reward supervision via Optimal Transport achieving state-of-the-art on VBench for both quality and semantic alignment. https://huggingface.co/papers/2602.01624 https://huggingface.co/papers/2602.01624"
X Link 2026-02-03T04:38Z 13.4K followers, [---] engagements
"NVIDIA just unleashed their MLPerf-tuned Qwen3-VL on Hugging Face A 235B parameter vision-language powerhouse with NVFP4 quantization built for record MLPerf v6.0 inference performance https://huggingface.co/nvidia/Qwen3-VL-235B-A22B-Instruct-NVFP4-MLPerf-Inference-Closed-V6.0 https://huggingface.co/nvidia/Qwen3-VL-235B-A22B-Instruct-NVFP4-MLPerf-Inference-Closed-V6.0"
X Link 2026-02-03T09:13Z 13.4K followers, 14.1K engagements
"ERNIE [---] Baidu's trillion-parameter natively autoregressive foundation model that unifies multimodal understanding and generation across text image video and audio with ultra-sparse MoE and elastic training"
X Link 2026-02-05T08:15Z 13.4K followers, [----] engagements
"FASA: Frequency-aware Sparse Attention from Alibaba Discovers functional sparsity in RoPE frequency-chunks to dynamically predict token importance achieving nearly 100% full-KV performance on LongBench-V1 using only [---] tokens"
X Link 2026-02-05T12:14Z 13.4K followers, [----] engagements
"OmniSIFT A modality-asymmetric token compression framework for omni-modal LLMs that reduces token context by 75% while maintaining or exceeding full-token model performance. Uses spatio-temporal video pruning and vision-guided audio selection"
X Link 2026-02-06T00:24Z 13.4K followers, [----] engagements
"10 datasets tested: 10% avg improvement up to 122x tokenization speedup Dataset: Paper: https://huggingface.co/papers/2602.02338 https://huggingface.co/datasets/PIIR/ReSID-dataset https://huggingface.co/papers/2602.02338 https://huggingface.co/datasets/PIIR/ReSID-dataset"
X Link 2026-02-07T16:15Z 13.4K followers, [---] engagements
"FS-Researcher A file-system-based dual-agent framework that enables test-time scaling for long-horizon research tasks beyond context window limits. The file system serves as durable external memory allowing iterative refinement across agent sessions"
X Link 2026-02-07T20:10Z 13.4K followers, [----] engagements
"This week's top AI research on @huggingface ERNIE [---] by Baidu: natively autoregressive foundation model for unified multimodal understanding and generation across text image video and audio with elastic training Green-VLA: staged vision-language-action framework for generalist robots with [----] hours of demonstrations achieving strong generalization across humanoids and manipulators Kimi K2.5 by Moonshot: visual agentic intelligence with Agent Swarm framework that dynamically parallelizes tasks reducing latency by 4.5x Vision-DeepResearch: multimodal deep-research capability with multi-turn"
X Link 2026-02-08T14:11Z 13.4K followers, [----] engagements
"This week's top AI research on @huggingface - ERNIE [---] by Baidu: natively autoregressive multimodal foundation model - Green-VLA: staged vision-language-action framework for generalist robots with 3k hours of demonstrations - Kimi K2.5 by @Kimi_Moonshot: visual agentic intelligence with Agent Swarm framework that dynamically parallelizes tasks reducing latency by 4.5x - PaperBanana by @GoogleDeepMind: automating academic illustration generation for AI scientists - Vision-DeepResearch: multimodal deep-research capability with multi-turn visual/textual search Read on"
X Link 2026-02-08T21:12Z 13.4K followers, [----] engagements
"MSign: A new optimizer from Microsoft researchers Prevents training instability in LLMs by restoring stable rank via matrix sign operations. Avoids gradient explosions with less than 7.0% computational overhead"
X Link 2026-02-10T00:30Z 13.4K followers, [----] engagements
"Modality Gap-Driven Subspace Alignment A novel training paradigm for MLLMs that tackles the persistent modality gap between vision and language. Introduces ReAlign (Anchor Trace Centroid Alignment) and ReVision for scalable training without expensive image-text pairs"
X Link 2026-02-10T04:49Z 13.4K followers, [----] engagements
"Weak-Driven Learning A novel post-training paradigm where strong models improve by learning from weak agents like historical checkpoints. Achieves performance gains on math and code with zero additional inference cost"
X Link 2026-02-10T16:27Z 13.4K followers, [----] engagements
"Recurrent-Depth VLA Replaces token-based reasoning with latent iterative refinement in VLA models achieving adaptive test-time compute with constant memory. Tasks failing at 0% with single-iteration reach 90% with four iterations while simpler tasks saturate quicklyup to [--] faster than previous methods. https://twitter.com/i/web/status/2021380976358638053 https://twitter.com/i/web/status/2021380976358638053"
X Link 2026-02-11T00:29Z 13.4K followers, [----] engagements
"OPUS A dynamic data selection framework for LLM pre-training that aligns with optimizer geometry (AdamW Muon) to overcome the "data wall". Achieves +2.2% accuracy gains with [--] compute reduction and only 4.7% overhead"
X Link 2026-02-11T04:49Z 13.4K followers, [----] engagements
"UI-Venus-1.5 by Ant Group A unified end-to-end GUI agent achieving state-of-the-art performance across benchmarks with robust real-world navigation for 40+ Chinese mobile apps"
X Link 2026-02-11T20:13Z 13.4K followers, [---] engagements
"Paper: Collection: Features 10B-token mid-training online RL and model merging. https://huggingface.co/collections/inclusionAI/ui-venus https://huggingface.co/papers/2602.09082 https://huggingface.co/collections/inclusionAI/ui-venus https://huggingface.co/papers/2602.09082"
X Link 2026-02-11T20:13Z 13.4K followers, [---] engagements
"Paper: Code: https://github.com/QuantaAlpha/chain-of-mindset https://huggingface.co/papers/2602.10063 https://github.com/QuantaAlpha/chain-of-mindset https://huggingface.co/papers/2602.10063"
X Link 2026-02-12T00:27Z 13.4K followers, [---] engagements
"Chain of Mindset A training-free framework that makes LLMs reason like humans by dynamically switching between four cognitive modes at each step. No more one-size-fits-all reasoning"
X Link 2026-02-12T00:27Z 13.5K followers, [----] engagements
"Learning beyond Teacher G-OPD is a generalized on-policy distillation framework that introduces reward extrapolation (ExOPD). By increasing the reward scaling factor beyond [--] students can surpass teacher performance in math reasoning and code generation"
X Link 2026-02-13T08:16Z 13.5K followers, [----] engagements
"Models & data: Paper: Key innovation: Stacked Channel Bridging framework + three-stage training delivers omni-capabilities with just 50M samples https://huggingface.co/papers/2602.12205 https://huggingface.co/deepgenteam/DeepGen-1.0 https://huggingface.co/papers/2602.12205 https://huggingface.co/deepgenteam/DeepGen-1.0"
X Link 2026-02-13T20:12Z 13.5K followers, [---] engagements
"VidVec Your MLLM already contains strong video representations. VidVec unlocks them for zero-shot video-text retrieval without any training beating trained models by up to 9.4% recall"
X Link 2026-02-14T00:25Z 13.5K followers, [----] engagements
"InternAgent-1.5 A unified agentic framework for long-horizon autonomous scientific discovery. It coordinates generation verification and evolution subsystems to compress weeks of research into minutes across biology earth science and materials"
X Link 2026-02-14T04:36Z 13.5K followers, [----] engagements
"Microsoft just released the VITRA Teleoperation Dataset on Hugging Face Real-world robot demos with 7-DoF arm dexterous hand & head-mounted camera. Each episode includes synchronized video + state/action data for training vision-language-action models"
X Link 2026-02-14T08:44Z 13.5K followers, [----] engagements
"ByteDance Seed is back with SeedVR2 now on Hugging Face This one-step video restoration model leverages diffusion adversarial post-training for impressive results even on high-resolution videos"
X Link 2025-06-08T17:20Z 13.4K followers, 54.9K engagements
"OCRVerse by Meituan The first holistic OCR method that unifies text-centric and vision-centric OCR across [--] diverse scenariosfrom documents to charts web pages and molecules. Uses a novel two-stage SFT-RL training approach to outperform models 10-20x larger"
X Link 2026-01-30T20:08Z 13.4K followers, [----] engagements
"ReSID A recommendation-native tokenizer that rethinks representation learning and semantic quantization for generative recommenders introducing two novel components: FAMAE and GAOQ"
X Link 2026-02-07T16:15Z 13.4K followers, [----] engagements
"Meta releases AIRS-Bench on Hugging Face A benchmark suite challenging AI agents with [--] tasks from SOTA ML papers across NLP math bioinformatics and more. Tests full research lifecycleidea generation experimentation iterative refinementwithout providing baseline code"
X Link 2026-02-10T20:17Z 13.4K followers, [----] engagements
"Paper: Project: https://showlab.github.io/Olaf-World/ https://huggingface.co/papers/2602.10104 https://showlab.github.io/Olaf-World/ https://huggingface.co/papers/2602.10104"
X Link 2026-02-11T13:02Z 13.4K followers, [----] engagements
"Alibaba's Code2World A GUI world model that predicts next UI states via renderable code generation. It rivals GPT-5 & Gemini-3-Pro-Image on next UI prediction and boosts agent navigation success by +9.5% on AndroidWorld"
X Link 2026-02-11T16:26Z 13.4K followers, [---] engagements
"NVIDIA proposes PhyCritic A multimodal critic that unifies physical judging and reasoning. It uses self-referential evaluation: first generating its own physics-aware prediction as internal reference then judging candidate responses for improved stability"
X Link 2026-02-12T08:16Z 13.4K followers, [---] engagements
"Achieves 12-point gains On physical judgment over open-source baselines with strong generalization to general multimodal tasks. Self-referential finetuning drives performance. Project: Paper: https://huggingface.co/papers/2602.11124 https://research.nvidia.com/labs/lpr/phycritic/ https://huggingface.co/papers/2602.11124 https://research.nvidia.com/labs/lpr/phycritic/"
X Link 2026-02-12T08:16Z 13.4K followers, [---] engagements
"GRU-Mem by ByteDance Seed A gated recurrent memory framework for long-context reasoning. Uses two text-controlled gates: an update gate to prevent memory explosion and an exit gate for early termination. Achieves up to 400% inference speed acceleration via RL training"
X Link 2026-02-12T20:11Z 13.5K followers, [----] engagements
"iFSQ: Improving FSQ with [--] line of code Tencent Hunyuan team discovers the sweet spot for image generation is [--] bits. AR converges faster but diffusion achieves higher quality. The fix Replace tanh with 2.0*(1.6x)-1 to map Gaussian latents to uniform distribution and prevent activation collapse. https://twitter.com/i/web/status/2016242900405989428 https://twitter.com/i/web/status/2016242900405989428"
X Link 2026-01-27T20:12Z 13.5K followers, [----] engagements
"Microsoft just released InfoAgent on Hugging Face RE-TRAC is a recursive trajectory compression framework that outperforms ReAct by 15-20% on BrowseComp"
X Link 2026-02-05T20:45Z 13.4K followers, [----] engagements
"QuantaAlpha An evolutionary framework for LLM-driven alpha mining that discovers quantitative factors achieving 27.75% annualized return on CSI [---] with strong transfer to S&P [---] and CSI [---] markets"
X Link 2026-02-10T12:21Z 13.5K followers, [---] engagements
"NVIDIA just dropped a massive kitchen robotics dataset on Hugging Face [---] hours of human-teleoperated demonstrations across [---] real-world tasks. https://huggingface.co/datasets/nvidia/PhysicalAI-Robotics-Kitchen-Sim-Demos https://huggingface.co/datasets/nvidia/PhysicalAI-Robotics-Kitchen-Sim-Demos"
X Link 2026-02-11T06:23Z 13.5K followers, 40.8K engagements
"Olaf-World learns transferable actions from unlabeled video We introduce Seq-REPA aligning latent actions to observable visual effects across contexts. This enables zero-shot action transfer and data-efficient adaptation for video world models"
X Link 2026-02-11T13:02Z 13.5K followers, 11K engagements
"StepFun's Step [---] Flash A sparse MoE model with 196B parameters 11B active per token. Achieves frontier-level reasoning comparable to GPT-5.2 xHigh and Gemini [---] Pro at 1/6th the decoding cost. Ranks #1 on MathArena with 97.3% on AIME 2025"
X Link 2026-02-12T04:44Z 13.5K followers, 18.2K engagements
"GENIUS: A benchmark for generative fluid intelligence Tests if multimodal models can handle gravity anomalies & visual metaphorschallenging them to induce patterns reason through constraints and adapt to novel scenarios beyond knowledge recall"
X Link 2026-02-12T12:18Z 13.5K followers, [----] engagements
"Towards Autonomous Mathematics Research Google's Aletheia generates verifies and revises mathematical proofs end-to-end. Solved [--] open problems wrote research papers without human calculation and evaluated 700+ problems using Gemini Deep Think"
X Link 2026-02-13T00:26Z 13.5K followers, [----] engagements
"State-of-the-art on GAIA GPQA HLE and FrontierScience. Performs both algorithm discovery and wet lab experiments across multiple scientific domains. Paper: Collection: https://huggingface.co/collections/InternScience/internagent https://huggingface.co/papers/2602.08990 https://huggingface.co/collections/InternScience/internagent https://huggingface.co/papers/2602.08990"
X Link 2026-02-14T04:36Z 13.5K followers, [---] engagements
"SkillRL: Evolving Agents via Recursive Skill-Augmented RL A framework proving that skills beat scale. It enables LLM agents to automatically discover and evolve reusable skills from past experiences letting a 7B model beat GPT-4o while reducing token usage"
X Link 2026-02-14T08:09Z 13.5K followers, [---] engagements
"TermiGen Trains robust terminal agents by synthesizing 3500+ verified Docker environments and injecting errors into trajectories. Achieves 31.3% on TerminalBench establishing new open-weights SOTA and outperforming GPT-4o-mini"
X Link 2026-02-14T12:09Z 13.5K followers, [---] engagements
Limited data mode. Full metrics available with subscription: lunarcrush.com/pricing