Dark | Light
[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

[@Marktechpost](/creator/twitter/Marktechpost)
"Agentic Context Engineering (ACE): Self-Improving LLMs via Evolving Contexts Not Fine-Tuning TL;DR: A team of researchers from Stanford University SambaNova Systems and UC Berkeley introduce ACE framework that improves LLM performance by editing and growing the input context instead of updating model weights. Context is treated as a living playbook maintained by three rolesGenerator Reflector Curatorwith small delta items merged incrementally to avoid brevity bias and context collapse. Reported gains: +10.6% on AppWorld agent tasks +8.6% on finance reasoning and XXXX% average latency"  
[X Link](https://x.com/Marktechpost/status/1976614553002930678) [@Marktechpost](/creator/x/Marktechpost) 2025-10-10T11:43Z 9857 followers, 6416 engagements


"ServiceNow AI Research Releases DRBench a Realistic Enterprise Deep-Research Benchmark DRBench is a reproducible enterprise-grade benchmark and environment for evaluating deep research agents on open-ended tasks that require synthesizing evidence from both public web sources and private organizational data (documents emails chats cloud files). The initial release includes XX tasks across XX domains distributes relevant and distractor insights across multiple applications and scores outputs on Insight Recall Distractor Avoidance Factuality and Report Quality. A baseline DRBench Agent (DRBA)"  
[X Link](https://x.com/Marktechpost/status/1978003687059722627) [@Marktechpost](/creator/x/Marktechpost) 2025-10-14T07:43Z 9857 followers, 1480 engagements


"Weak-for-Strong (W4S): A Novel Reinforcement Learning Algorithm that Trains a weak Meta Agent to Design Agentic Workflows with Stronger LLMs TL;DR (1) W4S trains a 7B weak meta agent with RLAO to write Python workflows that harness stronger executors modeled as a multi turn MDP. (2) On HumanEval with GPT 4o mini as executor W4S reaches Pass@1 of XXXX with about XX minutes optimization and about XXX dollars total cost beating automated baselines under the same executor. (3) Across XX benchmarks W4S improves over the strongest baseline by XXX% to XXXX% while avoiding fine tuning of the strong"  
[X Link](https://x.com/Marktechpost/status/1979803547173794180) [@Marktechpost](/creator/x/Marktechpost) 2025-10-19T06:55Z 9858 followers, 1598 engagements


"Here is a very interesting upcoming AI webinar from deepset Topic: Scaling AI with Haystack Enterprise: A Developers Guide When: October XX 2025 10am ET 3pm BST 4pm CEST In this webinarJulian RischandBilge Ycelwill show howHaystack Enterprisehelps developers bridge that gap bringing the speed and flexibility of open source together with the support enterprises need. Youll learn how to: (1) Extend your expertisewith direct access to the Haystack engineering team through private support and consultation hours. (2) Deploy with confidenceusing Helm charts and best-practice guides for secure"  
[X Link](https://x.com/Marktechpost/status/1976389104842748262) [@Marktechpost](/creator/x/Marktechpost) 2025-10-09T20:47Z 9856 followers, XXX engagements


"Samsung introduced a tiny X Million parameter model that just beat DeepSeek-R1 Gemini XXX pro and o3-mini at reasoning on both ARG-AGI X and ARC-AGI X Samsungs Tiny Recursive Model (TRM) is a 7M-parameter two-layer solver that replaces token-by-token decoding with an iterative draft latent-think revise loop: X scratchpad updates per outer step unrolled up to XX steps with full backprop through the recursion. On public protocols it reports XX% on ARC-AGI-1 and X% (two-try) on ARC-AGI-2 and also XXXX% on Sudoku-Extreme and XXXX% on Maze-Hard. Code is available on GitHub. full analysis: paper:"  
[X Link](https://x.com/Marktechpost/status/1976405546157961356) [@Marktechpost](/creator/x/Marktechpost) 2025-10-09T21:52Z 9851 followers, 1182 engagements


"The Local AI Revolution: Expanding Generative AI with GPT-OSS-20B and the NVIDIA RTX AI PC The landscape of AI is expanding.Today many of the most powerfulLLMs (large language models)reside primarily in the cloud offering incredible capabilities but also concerns about privacy and limitations around how many files you can upload or how long they stay loaded.Now a powerful new paradigm is emerging. This is the dawn oflocal private AI. This switch to local PCs is catalyzed by the release of powerful open models like OpenAIs newgpt-oss and supercharged by accelerations provided byNVIDIA RTX AI"  
[X Link](https://x.com/Marktechpost/status/1980309356496429338) [@Marktechpost](/creator/x/Marktechpost) 2025-10-20T16:25Z 9858 followers, 9044 engagements


"Meet OpenTSLM: A Family of Time-Series Language Models (TSLMs) Revolutionizing Medical Time-Series Analysis A significant development is set to transform AI in healthcare. Researchers at Stanford University in collaboration with ETH Zurich and tech leaders including Google Research and Amazon have introduced OpenTSLM a novel family of Time-Series Language Models (TSLMs). This breakthrough addresses a critical limitation in current LLMs by enabling them to interpret and reason over complex continuous medical time-series data such as ECGs EEGs and wearable sensor streams a feat where even"  
[X Link](https://x.com/Marktechpost/status/1977146374572626063) [@Marktechpost](/creator/x/Marktechpost) 2025-10-11T22:56Z 9858 followers, XXX engagements


"Andrej Karpathy Releases nanochat: A Minimal End-to-End ChatGPT-Style Pipeline You Can Train in X Hours for $XXX Andrej Karpathys nanochat is a 8K-LOC dependency-light full-stack ChatGPT-style pipeline that you can run end-to-end on a single 8H100 node via producing a usable chat model and Web UI in X hours for roughly $XXX. The stack includes a Rust BPE tokenizer base pretraining on FineWeb-EDU mid-training (SmolTalk/MMLU aux/GSM8K with tool-use tags) SFT optional simplified GRPO on GSM8K a thin inference Engine (KV cache prefill/decode Python-interpreter tool) and an auto-generated with"  
[X Link](https://x.com/Marktechpost/status/1978155416162083035) [@Marktechpost](/creator/x/Marktechpost) 2025-10-14T17:46Z 9857 followers, 3785 engagements


"QeRL: NVFP4-Quantized Reinforcement Learning (RL) Brings 32B LLM Training to a Single H100While Improving Exploration TL;DR: QeRL open-sources a quantization-enhanced RL pipeline that runs 4-bit NVFP4 weights with LoRA updates to accelerate the rollout bottleneck. QeRL reports XXX rollout speedups parity or gains over 16-bit LoRA/QLoRA on math reasoning and the first RL training of a 32B policy on a single H100-80GB. Adaptive Quantization Noise schedules channel-wise perturbations to raise policy entropy and improve exploration during training. NVFP4 provides a hardware-optimized 4-bit"  
[X Link](https://x.com/Marktechpost/status/1978681811795636718) [@Marktechpost](/creator/x/Marktechpost) 2025-10-16T04:38Z 9857 followers, 10.3K engagements


"Google + Yale release C2S-Scale 27B (Gemma based model): converts scRNA-seq into cell sentences for LLM-native single-cell analysis. Dual-context virtual screen across 4000 compounds targets interferon-conditional antigen presentation. Model flags CK2 inhibition (silmitasertib) + low-dose IFN MHC-I boost; prediction validated in living cells. Open weights on Hugging Face enable replication and benchmarking. Full analysis: Paper: Model on HF: GitHub Repo: @googleaidevs @GoogleResearch @GoogleAI"  
[X Link](https://x.com/Marktechpost/status/1979091268484633031) [@Marktechpost](/creator/x/Marktechpost) 2025-10-17T07:45Z 9857 followers, 2406 engagements


"Are your LLM code benchmarks actually rejecting wrong-complexity solutions and interactive-protocol violations or are they passing under-specified unit tests A team of researchers from UCSD NYU University of Washington Princeton University Canyon Crest Academy OpenAI UC Berkeley MIT University of Waterloo and Sentient Labs introduce AutoCode a new AI framework that lets LLMs create and verify competitive programming problems mirroring the workflow of human problem setters. AutoCode reframes evaluation for code-reasoning models by treating problem setting (not only problem solving) as the"  
[X Link](https://x.com/Marktechpost/status/1979473678065905847) [@Marktechpost](/creator/x/Marktechpost) 2025-10-18T09:04Z 9858 followers, XXX engagements


"Meet LangChains DeepAgents Library and a Practical Example to See How DeepAgents Actually Work in Action While a basic Large Language Model (LLM) agentone that repeatedly calls external toolsis easy to create these agents often struggle with long and complex tasks because they lack the ability to plan ahead and manage their work over time. They can be considered shallow in their execution. The deepagents library is designed to overcome this limitation by implementing a general architecture inspired by advanced applications like Deep Research and Claude Code. Full Analysis and Implementation:"  
[X Link](https://x.com/Marktechpost/status/1980257744029687887) [@Marktechpost](/creator/x/Marktechpost) 2025-10-20T13:00Z 9857 followers, XXX engagements


"DeepSeek Just Released a 3B OCR Model: A 3B VLM Designed for High-Performance OCR and Structured Document Conversion Deepseek AI releases Deepseek OCR a 3B vision language model for document understanding. It encodes pages into compact vision tokens then decodes with a MoE decoder to recover text. This design cuts sequence length and memory growth on long documents. Reported results show about XX% decoding precision near 10x compression on Fox. The research team also report strong efficiency on OmniDocBench surpassing GOT OCR XXX using about XXX vision tokens and outperforming MinerU 2.0"  
[X Link](https://x.com/Marktechpost/status/1980434402875437178) [@Marktechpost](/creator/x/Marktechpost) 2025-10-21T00:42Z 9857 followers, XXX engagements

[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

@Marktechpost "Agentic Context Engineering (ACE): Self-Improving LLMs via Evolving Contexts Not Fine-Tuning TL;DR: A team of researchers from Stanford University SambaNova Systems and UC Berkeley introduce ACE framework that improves LLM performance by editing and growing the input context instead of updating model weights. Context is treated as a living playbook maintained by three rolesGenerator Reflector Curatorwith small delta items merged incrementally to avoid brevity bias and context collapse. Reported gains: +10.6% on AppWorld agent tasks +8.6% on finance reasoning and XXXX% average latency"
X Link @Marktechpost 2025-10-10T11:43Z 9857 followers, 6416 engagements

"ServiceNow AI Research Releases DRBench a Realistic Enterprise Deep-Research Benchmark DRBench is a reproducible enterprise-grade benchmark and environment for evaluating deep research agents on open-ended tasks that require synthesizing evidence from both public web sources and private organizational data (documents emails chats cloud files). The initial release includes XX tasks across XX domains distributes relevant and distractor insights across multiple applications and scores outputs on Insight Recall Distractor Avoidance Factuality and Report Quality. A baseline DRBench Agent (DRBA)"
X Link @Marktechpost 2025-10-14T07:43Z 9857 followers, 1480 engagements

"Weak-for-Strong (W4S): A Novel Reinforcement Learning Algorithm that Trains a weak Meta Agent to Design Agentic Workflows with Stronger LLMs TL;DR (1) W4S trains a 7B weak meta agent with RLAO to write Python workflows that harness stronger executors modeled as a multi turn MDP. (2) On HumanEval with GPT 4o mini as executor W4S reaches Pass@1 of XXXX with about XX minutes optimization and about XXX dollars total cost beating automated baselines under the same executor. (3) Across XX benchmarks W4S improves over the strongest baseline by XXX% to XXXX% while avoiding fine tuning of the strong"
X Link @Marktechpost 2025-10-19T06:55Z 9858 followers, 1598 engagements

"Here is a very interesting upcoming AI webinar from deepset Topic: Scaling AI with Haystack Enterprise: A Developers Guide When: October XX 2025 10am ET 3pm BST 4pm CEST In this webinarJulian RischandBilge Ycelwill show howHaystack Enterprisehelps developers bridge that gap bringing the speed and flexibility of open source together with the support enterprises need. Youll learn how to: (1) Extend your expertisewith direct access to the Haystack engineering team through private support and consultation hours. (2) Deploy with confidenceusing Helm charts and best-practice guides for secure"
X Link @Marktechpost 2025-10-09T20:47Z 9856 followers, XXX engagements

"Samsung introduced a tiny X Million parameter model that just beat DeepSeek-R1 Gemini XXX pro and o3-mini at reasoning on both ARG-AGI X and ARC-AGI X Samsungs Tiny Recursive Model (TRM) is a 7M-parameter two-layer solver that replaces token-by-token decoding with an iterative draft latent-think revise loop: X scratchpad updates per outer step unrolled up to XX steps with full backprop through the recursion. On public protocols it reports XX% on ARC-AGI-1 and X% (two-try) on ARC-AGI-2 and also XXXX% on Sudoku-Extreme and XXXX% on Maze-Hard. Code is available on GitHub. full analysis: paper:"
X Link @Marktechpost 2025-10-09T21:52Z 9851 followers, 1182 engagements

"The Local AI Revolution: Expanding Generative AI with GPT-OSS-20B and the NVIDIA RTX AI PC The landscape of AI is expanding.Today many of the most powerfulLLMs (large language models)reside primarily in the cloud offering incredible capabilities but also concerns about privacy and limitations around how many files you can upload or how long they stay loaded.Now a powerful new paradigm is emerging. This is the dawn oflocal private AI. This switch to local PCs is catalyzed by the release of powerful open models like OpenAIs newgpt-oss and supercharged by accelerations provided byNVIDIA RTX AI"
X Link @Marktechpost 2025-10-20T16:25Z 9858 followers, 9044 engagements

"Meet OpenTSLM: A Family of Time-Series Language Models (TSLMs) Revolutionizing Medical Time-Series Analysis A significant development is set to transform AI in healthcare. Researchers at Stanford University in collaboration with ETH Zurich and tech leaders including Google Research and Amazon have introduced OpenTSLM a novel family of Time-Series Language Models (TSLMs). This breakthrough addresses a critical limitation in current LLMs by enabling them to interpret and reason over complex continuous medical time-series data such as ECGs EEGs and wearable sensor streams a feat where even"
X Link @Marktechpost 2025-10-11T22:56Z 9858 followers, XXX engagements

"Andrej Karpathy Releases nanochat: A Minimal End-to-End ChatGPT-Style Pipeline You Can Train in X Hours for $XXX Andrej Karpathys nanochat is a 8K-LOC dependency-light full-stack ChatGPT-style pipeline that you can run end-to-end on a single 8H100 node via producing a usable chat model and Web UI in X hours for roughly $XXX. The stack includes a Rust BPE tokenizer base pretraining on FineWeb-EDU mid-training (SmolTalk/MMLU aux/GSM8K with tool-use tags) SFT optional simplified GRPO on GSM8K a thin inference Engine (KV cache prefill/decode Python-interpreter tool) and an auto-generated with"
X Link @Marktechpost 2025-10-14T17:46Z 9857 followers, 3785 engagements

"QeRL: NVFP4-Quantized Reinforcement Learning (RL) Brings 32B LLM Training to a Single H100While Improving Exploration TL;DR: QeRL open-sources a quantization-enhanced RL pipeline that runs 4-bit NVFP4 weights with LoRA updates to accelerate the rollout bottleneck. QeRL reports XXX rollout speedups parity or gains over 16-bit LoRA/QLoRA on math reasoning and the first RL training of a 32B policy on a single H100-80GB. Adaptive Quantization Noise schedules channel-wise perturbations to raise policy entropy and improve exploration during training. NVFP4 provides a hardware-optimized 4-bit"
X Link @Marktechpost 2025-10-16T04:38Z 9857 followers, 10.3K engagements

"Google + Yale release C2S-Scale 27B (Gemma based model): converts scRNA-seq into cell sentences for LLM-native single-cell analysis. Dual-context virtual screen across 4000 compounds targets interferon-conditional antigen presentation. Model flags CK2 inhibition (silmitasertib) + low-dose IFN MHC-I boost; prediction validated in living cells. Open weights on Hugging Face enable replication and benchmarking. Full analysis: Paper: Model on HF: GitHub Repo: @googleaidevs @GoogleResearch @GoogleAI"
X Link @Marktechpost 2025-10-17T07:45Z 9857 followers, 2406 engagements

"Are your LLM code benchmarks actually rejecting wrong-complexity solutions and interactive-protocol violations or are they passing under-specified unit tests A team of researchers from UCSD NYU University of Washington Princeton University Canyon Crest Academy OpenAI UC Berkeley MIT University of Waterloo and Sentient Labs introduce AutoCode a new AI framework that lets LLMs create and verify competitive programming problems mirroring the workflow of human problem setters. AutoCode reframes evaluation for code-reasoning models by treating problem setting (not only problem solving) as the"
X Link @Marktechpost 2025-10-18T09:04Z 9858 followers, XXX engagements

"Meet LangChains DeepAgents Library and a Practical Example to See How DeepAgents Actually Work in Action While a basic Large Language Model (LLM) agentone that repeatedly calls external toolsis easy to create these agents often struggle with long and complex tasks because they lack the ability to plan ahead and manage their work over time. They can be considered shallow in their execution. The deepagents library is designed to overcome this limitation by implementing a general architecture inspired by advanced applications like Deep Research and Claude Code. Full Analysis and Implementation:"
X Link @Marktechpost 2025-10-20T13:00Z 9857 followers, XXX engagements

"DeepSeek Just Released a 3B OCR Model: A 3B VLM Designed for High-Performance OCR and Structured Document Conversion Deepseek AI releases Deepseek OCR a 3B vision language model for document understanding. It encodes pages into compact vision tokens then decodes with a MoE decoder to recover text. This design cuts sequence length and memory growth on long documents. Reported results show about XX% decoding precision near 10x compression on Fox. The research team also report strong efficiency on OmniDocBench surpassing GOT OCR XXX using about XXX vision tokens and outperforming MinerU 2.0"
X Link @Marktechpost 2025-10-21T00:42Z 9857 followers, XXX engagements

creator/twitter::717930546391687170/posts
/creator/twitter::717930546391687170/posts