[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]
@_reachsumit
"Stratified GRPO: Handling Structural Heterogeneity in Reinforcement Learning of LLM Search Agents Introduces Stratified GRPO to address cross-stratum bias in training LLM search agents"
X Link @_reachsumit 2025-10-08T04:54Z 3363 followers, XXX engagements
"A2Search: Ambiguity-Aware Question Answering with Reinforcement Learning Introduces an annotation-free framework that automatically detects ambiguous questions and gathers alternative answers"
X Link @_reachsumit 2025-10-10T02:53Z 3362 followers, XXX engagements
"Beyond the limitation of a single query: Train your LLM for query expansion with Reinforcement Learning NVIDIA introduces trains LLM-based search agents to generate multiple query variants simultaneously. 📝 👨🏽💻"
X Link @_reachsumit 2025-10-14T06:00Z 3365 followers, 1199 engagements
"A Longitudinal Study on Different Annotator Feedback Loops in Complex RAG Tasks @seirasto et al. compare internal and external annotator groups over X yr finding that closer feedback loops create higher quality data with decreased quantity & diversity"
X Link @_reachsumit 2025-10-15T05:48Z 3366 followers, XXX engagements
"UniDex: Rethinking Search Inverted Indexing with Unified Semantic Modeling Kuaishou presents a model-based framework that replaces traditional term-matching with semantic modeling for inverted indexing improving retrieval effectiveness in search"
X Link @_reachsumit 2025-09-30T05:10Z 3365 followers, XXX engagements
"QAgent: A modular Search Agent with Interactive Query Understanding Alibaba introduces a modular search agent that optimizes retrieval through interactive query understanding and multi-round reasoning"
X Link @_reachsumit 2025-10-10T02:51Z 3364 followers, XXX engagements
"OneRec-Think: In-Text Reasoning for Generative Recommendation Kuaishou integrates dialogue reasoning and personalized recommendation"
X Link @_reachsumit 2025-10-14T05:38Z 3365 followers, XXX engagements
"HoMer: Addressing Heterogeneities by Modeling Sequential and Set-wise Contexts for CTR Prediction Meituan presents a unified transformer that jointly models panoramic sequences and set-wise item interactions for click-through rate prediction. 📝"
X Link @_reachsumit 2025-10-14T05:51Z 3366 followers, XXX engagements
"Differentiable Fast Top-K Selection for Large-Scale Recommendation Kuaishou introduces a differentiable Top-K operator with linear O(n) time complexity for cascade ranking systems. 📝 👨🏽💻"
X Link @_reachsumit 2025-10-14T06:01Z 3366 followers, XXX engagements
"SAIL-Embedding Technical Report: Omni-modal Embedding Foundation Model ByteDance presents a foundation model that supports multifaceted multimodal retrieval and classification by accommodating arbitrary modality inputs including text vision and audio"
X Link @_reachsumit 2025-10-15T05:33Z 3364 followers, XXX engagements
"SMILE: SeMantic Ids Enhanced CoLd Item Representation for Click-through Rate Prediction in E-commerce SEarch Kuaishou uses RQ-OPQ encoding to enhance cold-start item representations by aligning collaborative signals with semantic information. 📝"
X Link @_reachsumit 2025-10-15T05:36Z 3366 followers, XXX engagements
"Supervised Fine-Tuning or Contrastive Learning Towards Better Multimodal LLM Reranking Shows that supervised fine-tuning consistently outperforms contrastive learning for LLM-based reranking with the weight component being the dominant factor. 📝"
X Link @_reachsumit 2025-10-17T05:04Z 3366 followers, XXX engagements
"An Efficient Rubric-based Generative Verifier for Search-Augmented LLMs Proposes a 4B-parameter verifier using nugget-as-rubric paradigm to provide verifiable rewards for search-augmented LLMs. 📝 👨🏽💻"
X Link @_reachsumit 2025-10-17T05:07Z 3366 followers, XXX engagements
"GemiRec: Interest Quantization and Generation for Multi-Interest Recommendation Xiaohongshu introduces interest quantization via vector-quantized dictionaries and generative modeling to address interest collapse and evolution. 📝"
X Link @_reachsumit 2025-10-17T05:08Z 3366 followers, XXX engagements
"Agentic Entropy-Balanced Policy Optimization Presents AEPO algorithm with dynamic entropy-balanced rollout and entropy-aware policy optimization to address high-entropy challenges in multi-turn web agent RL training. 📝 👨🏽💻"
X Link @_reachsumit 2025-10-17T05:08Z 3366 followers, XXX engagements
"Retrofitting Small Multilingual Models for Retrieval: Matching 7B Performance with 300M Parameters Develops a compact 300M multilingual embedding model that achieves retrieval performance comparable to current 7B models. 📝"
X Link @_reachsumit 2025-10-17T05:11Z 3366 followers, XXX engagements
"Big Reasoning with Small Models: Instruction Retrieval at Inference Time Enhances small language models' reasoning by retrieving structured instructions at inference time achieving 5-10% gains on medical legal and math tasks. 📝"
X Link @_reachsumit 2025-10-17T05:16Z 3366 followers, XXX engagements
"Demystifying Deep Search: A Holistic Evaluation with Hint-Free Multi-Hop Questions and Factorised Metrics Presents a benchmark with hint-free multi-hop questions and controlled Wikipedia sandbox"
X Link @_reachsumit 2025-10-08T04:54Z 3366 followers, XXX engagements
"Stop-RAG: Value-Based Retrieval Control for Iterative RAG Introduces a value-based controller that adaptively decides when to stop retrieving in iterative RAG systems reducing unnecessary retrieval loops 📝 👨🏽💻"
X Link @_reachsumit 2025-10-17T05:10Z 3366 followers, XXX engagements
"Towards Agentic Self-Learning LLMs in Search Environment Proposes a fully closed-loop framework that unifies task generation policy execution and evaluation enabling LLM agents to self-improve. 📝 👨🏽💻"
X Link @_reachsumit 2025-10-17T05:17Z 3366 followers, XXX engagements