LunarCrush LLM | LunarCrush AI Interface

[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

@Marktechpost "Salesforce AI Research Releases CoDA-1.7B: a Discrete-Diffusion Code Model with Bidirectional Parallel Token Generation Salesforce AI Research released CoDA-1.7B a discrete-diffusion code LLM that denoises masked sequences with bidirectional context and updates multiple tokens per step (non-autoregressive). The team provides Base and Instruct checkpoints a reproducible pipeline (TPU pre-training post-training/SFT evaluation) and a FastAPI server exposing OpenAI-compatible endpoints with a CLI; decoding is controlled via parameters such as STEPS ALG="entropy" BLOCK_LENGTH etc. Reported pass@1"
X Link @Marktechpost 2025-10-06T00:21Z 9829 followers, 21K engagements

"ServiceNow AI Research Releases DRBench a Realistic Enterprise Deep-Research Benchmark DRBench is a reproducible enterprise-grade benchmark and environment for evaluating deep research agents on open-ended tasks that require synthesizing evidence from both public web sources and private organizational data (documents emails chats cloud files). The initial release includes XX tasks across XX domains distributes relevant and distractor insights across multiple applications and scores outputs on Insight Recall Distractor Avoidance Factuality and Report Quality. A baseline DRBench Agent (DRBA)"
X Link @Marktechpost 2025-10-14T07:43Z 9829 followers, XXX engagements

"RA3: Mid-Training with Temporal Action Abstractions for Faster Reinforcement Learning (RL) Post-Training in Code LLMs TL;DR: A new research from Apple formalizes what mid-training should do before reinforcement learning RL post-training and introduces RA3 (Reasoning as Action Abstractions)an EM-style procedure that learns temporally consistent latent actions from expert traces then fine-tunes on those bootstrapped traces. It shows mid-training should (1) prune to a compact near-optimal action subspace and (2) shorten the effective planning horizon improving RL convergence. Empirically RA3"
X Link @Marktechpost 2025-10-09T06:22Z 9825 followers, XXX engagements

"Agentic Context Engineering (ACE): Self-Improving LLMs via Evolving Contexts Not Fine-Tuning TL;DR: A team of researchers from Stanford University SambaNova Systems and UC Berkeley introduce ACE framework that improves LLM performance by editing and growing the input context instead of updating model weights. Context is treated as a living playbook maintained by three rolesGenerator Reflector Curatorwith small delta items merged incrementally to avoid brevity bias and context collapse. Reported gains: +10.6% on AppWorld agent tasks +8.6% on finance reasoning and XXXX% average latency"
X Link @Marktechpost 2025-10-10T11:43Z 9829 followers, 6342 engagements

"Microsoft Research Releases Skala: a Deep-Learning ExchangeCorrelation Functional Targeting Hybrid-Level Accuracy at Semi-Local Cost Skala is a deep-learning exchangecorrelation functional for KohnSham Density Functional Theory (DFT) that targets hybrid-level accuracy at semi-local cost reporting MAE XXXX kcal/mol on W4-17 (0.85 on the single-reference subset) and WTMAD-2 XXXX kcal/mol on GMTKN55; evaluations use a fixed D3(BJ) dispersion correction. It is positioned for main-group molecular chemistry today with transition metals and periodic systems slated as future extensions. Azure AI"
X Link @Marktechpost 2025-10-10T04:54Z 9829 followers, 3850 engagements

"QeRL: NVFP4-Quantized Reinforcement Learning (RL) Brings 32B LLM Training to a Single H100While Improving Exploration TL;DR: QeRL open-sources a quantization-enhanced RL pipeline that runs 4-bit NVFP4 weights with LoRA updates to accelerate the rollout bottleneck. QeRL reports XXX rollout speedups parity or gains over 16-bit LoRA/QLoRA on math reasoning and the first RL training of a 32B policy on a single H100-80GB. Adaptive Quantization Noise schedules channel-wise perturbations to raise policy entropy and improve exploration during training. NVFP4 provides a hardware-optimized 4-bit"
X Link @Marktechpost 2025-10-16T04:38Z 9828 followers, 4175 engagements

"Samsung introduced a tiny X Million parameter model that just beat DeepSeek-R1 Gemini XXX pro and o3-mini at reasoning on both ARG-AGI X and ARC-AGI X Samsungs Tiny Recursive Model (TRM) is a 7M-parameter two-layer solver that replaces token-by-token decoding with an iterative draft latent-think revise loop: X scratchpad updates per outer step unrolled up to XX steps with full backprop through the recursion. On public protocols it reports XX% on ARC-AGI-1 and X% (two-try) on ARC-AGI-2 and also XXXX% on Sudoku-Extreme and XXXX% on Maze-Hard. Code is available on GitHub. full analysis: paper:"
X Link @Marktechpost 2025-10-09T21:52Z 9824 followers, XXX engagements

"Liquid AI Releases LFM2-8B-A1B: An On-Device Mixture-of-Experts with 8.3B Params and a 1.5B Active Params per Token How much capability can a sparse8.3B-parameterMoE with a1.5B active pathdeliver on your phone without blowing latency or memoryLiquid AI has releasedLFM2-8B-A1Ba small-scale Mixture-of-Experts (MoE) model built for on-device execution under tight memory latency and energy budgets. Unlike most MoE work optimized for cloud batch serving LFM2-8B-A1B targets phones laptops and embedded systems. It showcases8.3B total parametersbut activates only1.5B parameters per token using sparse"
X Link @Marktechpost 2025-10-11T05:02Z 9820 followers, 2017 engagements

"Stanford Researchers Released AgentFlow: In-the-Flow Reinforcement Learning RL for Modular Tool-Using AI Agents AgentFlow is a trainable modular agent frameworkPlanner Executor Verifier Generator with explicit memorythat optimizes only the Planner in-loop using Flow-GRPO an on-policy objective that broadcasts a single trajectory-level outcome to every turn with token-level PPO-style updates and KL control. On ten benchmarks spanning search agentic tasks (GAIA textual) math and science a 7B backbone reports average gains of +14.9% +14.0% +14.5% and +4.1% over strong baselines. full analysis:"
X Link @Marktechpost 2025-10-09T02:26Z 9822 followers, 1143 engagements

"Alibabas Qwen AI Releases Compact Dense Qwen3-VL 4B/8B (Instruct & Thinking) With FP8 Checkpoints Qwen introduced compact dense Qwen3-VL models at 4B and 8B each in Instruct and Thinking variants plus first-party FP8 checkpoints that use fine-grained FP8 (block size 128) and report near-BF16 quality for materially lower VRAM. The release retains the full capability surfacelong-document and video understanding 32-language OCR spatial groundingand supports a 256K context window extensible to 1M positioning these SKUs for single-GPU and edge deployments without sacrificing multimodal breadth."
X Link @Marktechpost 2025-10-15T03:00Z 9828 followers, 1195 engagements

"@Alibaba_Qwen"
X Link @Marktechpost 2025-10-15T03:04Z 9826 followers, XXX engagements

"NVIDIA Researchers Propose Reinforcement Learning Pretraining (RLP): Reinforcement as a Pretraining Objective for Building Reasoning During Pretraining RLP makes think-before-predict a pretraining objective: it samples a short chain-of-thought as an action and rewards it by information gainthe log-likelihood improvement of the next token versus a no-think EMA teacheryielding a verifier-free dense position-wise signal that works on ordinary text streams at scale; empirically RLP lifts Qwen3-1.7B math+science averages by +19% vs Base and +17% vs compute-matched CPT with gains persisting after"
X Link @Marktechpost 2025-10-14T10:02Z 9828 followers, 1013 engagements

"Meet OpenTSLM: A Family of Time-Series Language Models (TSLMs) Revolutionizing Medical Time-Series Analysis A significant development is set to transform AI in healthcare. Researchers at Stanford University in collaboration with ETH Zurich and tech leaders including Google Research and Amazon have introduced OpenTSLM a novel family of Time-Series Language Models (TSLMs). This breakthrough addresses a critical limitation in current LLMs by enabling them to interpret and reason over complex continuous medical time-series data such as ECGs EEGs and wearable sensor streams a feat where even"
X Link @Marktechpost 2025-10-11T22:56Z 9826 followers, XXX engagements

"AWS Open-Sources an MCP Server for Bedrock AgentCore to Streamline AI Agent Development AWS has open-sourced an MCP server for Amazon Bedrock AgentCore enabling IDE-native agent workflows across MCP clients via a simple mcp.json plus uvx install; supported client docs and repo examples cover Kiro and Amazon Q Developer CLI setup and the server runs directly on AgentCore Runtime with Gateway/Memory integration for end-to-end deploytest inside the editor; the code and install guidance are live in the awslabs/mcp repository (including the amazon-bedrock-agentcore-mcp-server directory) and AWS"
X Link @Marktechpost 2025-10-03T23:50Z 9827 followers, XXX engagements

"ServiceNow AI Releases Apriel-1.5-15B-Thinker: An Open-Weights Multimodal Reasoning Model that Hits Frontier-Level Performance on a Single-GPU Budget ServiceNow AI Researchs Apriel-1.5-15B-Thinker is a 15-billion-parameter open-weights multimodal reasoning model trained via mid-training (continual pretraining) plus supervised fine-tuningwith no reinforcement learningthat achieves an Artificial Analysis Intelligence Index (AAI) score of XX and discloses task results of AIME 2025 XX GPQA Diamond XX LiveCodeBench XX Instruction-Following Benchmark XX and Tau-squared Bench (Telecom) 68; it is"
X Link @Marktechpost 2025-10-02T06:13Z 9829 followers, XXX engagements