LunarCrush LLM | creator/twitter::1684362265653567488/posts

[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

[@nexa_ai](/creator/twitter/nexa_ai)
"The best vision-language models just went fully on-device - Day-0 on NPU GPU and CPU. Qwen3-VL-4B and 8B from @Alibaba_Qwen now run locally across @Apple @Qualcomm @NVIDIA @Intel @MediaTek and @AMD devices with NexaSDK Every line of model inference code in NexaML GGML and MLX was built from scratch by Nexa for SOTA performance on each hardware stack powered by Nexas unified inference engine. One line to run the latest VLM Day-0 on every backend Qualcomm NPU (NexaML): nexa infer NexaAI/Qwen3-VL-4B-Instruct-NPU nexa infer NexaAI/Qwen3-VL-4B-Thinking-NPU CPU/GPU for everyone (GGML): nexa infer"  
[X Link](https://x.com/nexa_ai/status/1978151146134053085) [@nexa_ai](/creator/x/nexa_ai) 2025-10-14T17:29Z 2285 followers, 387.1K engagements


"Sam Altman recently said:GPT-OSS has strong real-world performance comparable to o4-miniand you can run it locally on your phone. Many believed running a 20B-parameter model on mobile devices was still years away. AtNexa AI weve built our foundation on deep on-device AI technologyturning that vision into reality. TodayGPT-OSSis runningfully localon mobile devices through our appNexa Studio. Real performance on @Snapdragon Gen 5: - XX tokens/sec decoding speed - X seconds Time-to-First-Token Developers can now useNexaSDKto build their own local AI apps powered by GPT-OSS. What this unlocks: -"  
[X Link](https://x.com/nexa_ai/status/1975232300985291008) [@nexa_ai](/creator/x/nexa_ai) 2025-10-06T16:10Z 2286 followers, 196.2K engagements


"@Qualcomm Honored to partner with @Qualcomm to push whats possible for on-device AI"  
[X Link](https://x.com/nexa_ai/status/1978895660419080337) [@nexa_ai](/creator/x/nexa_ai) 2025-10-16T18:47Z 2287 followers, XX engagements


"NVIDIA sent us a 5090 so we can demo Qwen3-VL 4B & 8B GGUF. You can now run it in our desktop UI Hyperlink powered by NexaML Engine the first and only framework that supports Qwen3-VL GGUF right now. We tried the same demo examples from the Qwen2.5-32B blog the new Qwen3-VL 4B & 8B are insane. Benchmarks on RTX 5090 (Q4): Qwen3VL-8B XXX tok/s 8GB VRAM Qwen3VL-4B XXX tok/s 6GB VRAM Thanks @Alibaba_Qwen and @NVIDIA local multimodal just went beast mode 🧠 More optimizations are coming. Run it yourself in Hyperlink one-click install fully local beautiful UI. What interesting Qwen3VL use cases"  
[X Link](https://x.com/nexa_ai/status/1979236244765970731) [@nexa_ai](/creator/x/nexa_ai) 2025-10-17T17:21Z 2285 followers, 53.6K engagements


"Try gpt-oss on mobile (need =16GB RAM) on Nexa Studio:"  
[X Link](https://x.com/nexa_ai/status/1975233621691924962) [@nexa_ai](/creator/x/nexa_ai) 2025-10-06T16:16Z 2298 followers, 5844 engagements


"Nexa AI at IBM #TechXchange 2025 Join Alan Zhu @alanzhuly from Nexa AI at #IBMTechXchange 2025 to learn how on-device AI is becoming the next leap in intelligence and how Nexa AI is leading this movement toward XXX% private local and offline AI. 📅 Wednesday Oct X Granite Theater Orlando FL - Speaker Session: 3:30 PM X :30 PM ET - Demo Session: 11:30 AM 12:00 PM ET"  
[X Link](https://x.com/nexa_ai/status/1975344727756316789) [@nexa_ai](/creator/x/nexa_ai) 2025-10-06T23:37Z 2297 followers, XXX engagements


"Thrilled to speak and demo at @IBM #TechXchange in Orlando this week @alanzhuly shared how were advancing the frontier of on-device AI showcasing: ⚡ IBM Granite XXX running lightning-fast on @Qualcomm NPU the first Day-0 model support in NPU history. 💻 Hyperlink the worlds first local AI app that runs the latest models on NPU/GPU/CPU turning your computer into an agentic assistant that can search and reason across all your files privately and offline. ⚙ NexaML and NexaSDK our CUDA-like software layer and SDK for NPUs built from scratch to run any model on any backend with multimodal support"  
[X Link](https://x.com/nexa_ai/status/1976684584407544278) [@nexa_ai](/creator/x/nexa_ai) 2025-10-10T16:21Z 2297 followers, XXX engagements


"We ran @OpenAI GPTOSS 20B fully local on a phone. Heres how:"  
[X Link](https://x.com/nexa_ai/status/1977061429489680743) [@nexa_ai](/creator/x/nexa_ai) 2025-10-11T17:19Z 2299 followers, 1771 engagements


"Recently one of our teammates had a 6-hour flight and a report due that night. No Wi-Fi. Hundreds of notes and files to sort through. Hyperlink running @OpenAI GPT-OSS locally pulled insights from all of them instantly searching summarizing connecting dots and helping them write. It felt like having ChatGPT built into their computer but fully local & offline. Also a sneak peek of the new Hyperlink UI weve been testing 👀"  
[X Link](https://x.com/nexa_ai/status/1977454430951018711) [@nexa_ai](/creator/x/nexa_ai) 2025-10-12T19:20Z 2297 followers, 9506 engagements


"Try Hyperlink for free on your flights:"  
[X Link](https://x.com/nexa_ai/status/1977454432737804370) [@nexa_ai](/creator/x/nexa_ai) 2025-10-12T19:20Z 2297 followers, XXX engagements


"Check out NexaSDK GitHub to get started:"  
[X Link](https://x.com/nexa_ai/status/1978151149195739396) [@nexa_ai](/creator/x/nexa_ai) 2025-10-14T17:29Z 2301 followers, XXX engagements


"Learn more in Blog:"  
[X Link](https://x.com/nexa_ai/status/1978151151703924964) [@nexa_ai](/creator/x/nexa_ai) 2025-10-14T17:29Z 2302 followers, XXX engagements


"@Alibaba_Qwen You can already run Qwen3-VL-4B & 8B locally Day-0 on NPU/GPU/CPU using MLX GGUF and NexaML. Check out NexaSDK"  
[X Link](https://x.com/nexa_ai/status/1978152512390676613) [@nexa_ai](/creator/x/nexa_ai) 2025-10-14T17:34Z 2297 followers, 2172 engagements


"We just built the worlds first fully NPU-supported local RAG pipeline retrieval rerank and generation all run entirely on the @Qualcomm NPU with SOTA models. XX less power. Always-on. XXX% private. Cooler hands. Models: - Embedding: @GoogleDeepMind EmbeddingGemma-300M - Rerank: @JinaAI_ Reranker v2 - Generate: @IBMResearch Granite 4.0-Micro Together full-stack SOTA retrieval + generation on NPU. Demo & example project for @Snapdragon AI PC below 👇"  
[X Link](https://x.com/nexa_ai/status/1977767442547187819) [@nexa_ai](/creator/x/nexa_ai) 2025-10-13T16:04Z 2298 followers, 16.5K engagements


"Qwen3-VL-4B: Unlock Multimodal RAG on Edge 🚀 Enterprise multimodal RAG has been bottlenecked by cloud latency and privacy constraints. Qwen3-VL-4B delivers breakthrough performance that makes visual document processing practical on consumer hardware. Traditional multimodal RAG systems hit fundamental limits: 200500 ms cloud API latency and the inability to process sensitive visual documents locally. Qwen3-VL-4Bs XXXX MMMU score in a compact 4B architecture enables real-time visual reasoning that was previously impossible at the edge. RAG System: - Search massive text and image files locally:"  
[X Link](https://x.com/nexa_ai/status/1978868243487244766) [@nexa_ai](/creator/x/nexa_ai) 2025-10-16T16:58Z 2301 followers, XXX engagements


"Download Hyperlink:"  
[X Link](https://x.com/nexa_ai/status/1979236246640824691) [@nexa_ai](/creator/x/nexa_ai) 2025-10-17T17:21Z 2298 followers, XXX engagements

[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

@nexa_ai "The best vision-language models just went fully on-device - Day-0 on NPU GPU and CPU. Qwen3-VL-4B and 8B from @Alibaba_Qwen now run locally across @Apple @Qualcomm @NVIDIA @Intel @MediaTek and @AMD devices with NexaSDK Every line of model inference code in NexaML GGML and MLX was built from scratch by Nexa for SOTA performance on each hardware stack powered by Nexas unified inference engine. One line to run the latest VLM Day-0 on every backend Qualcomm NPU (NexaML): nexa infer NexaAI/Qwen3-VL-4B-Instruct-NPU nexa infer NexaAI/Qwen3-VL-4B-Thinking-NPU CPU/GPU for everyone (GGML): nexa infer"
X Link @nexa_ai 2025-10-14T17:29Z 2285 followers, 387.1K engagements

"Sam Altman recently said:GPT-OSS has strong real-world performance comparable to o4-miniand you can run it locally on your phone. Many believed running a 20B-parameter model on mobile devices was still years away. AtNexa AI weve built our foundation on deep on-device AI technologyturning that vision into reality. TodayGPT-OSSis runningfully localon mobile devices through our appNexa Studio. Real performance on @Snapdragon Gen 5: - XX tokens/sec decoding speed - X seconds Time-to-First-Token Developers can now useNexaSDKto build their own local AI apps powered by GPT-OSS. What this unlocks: -"
X Link @nexa_ai 2025-10-06T16:10Z 2286 followers, 196.2K engagements

"@Qualcomm Honored to partner with @Qualcomm to push whats possible for on-device AI"
X Link @nexa_ai 2025-10-16T18:47Z 2287 followers, XX engagements

"NVIDIA sent us a 5090 so we can demo Qwen3-VL 4B & 8B GGUF. You can now run it in our desktop UI Hyperlink powered by NexaML Engine the first and only framework that supports Qwen3-VL GGUF right now. We tried the same demo examples from the Qwen2.5-32B blog the new Qwen3-VL 4B & 8B are insane. Benchmarks on RTX 5090 (Q4): Qwen3VL-8B XXX tok/s 8GB VRAM Qwen3VL-4B XXX tok/s 6GB VRAM Thanks @Alibaba_Qwen and @NVIDIA local multimodal just went beast mode 🧠 More optimizations are coming. Run it yourself in Hyperlink one-click install fully local beautiful UI. What interesting Qwen3VL use cases"
X Link @nexa_ai 2025-10-17T17:21Z 2285 followers, 53.6K engagements

"Try gpt-oss on mobile (need =16GB RAM) on Nexa Studio:"
X Link @nexa_ai 2025-10-06T16:16Z 2298 followers, 5844 engagements

"Nexa AI at IBM #TechXchange 2025 Join Alan Zhu @alanzhuly from Nexa AI at #IBMTechXchange 2025 to learn how on-device AI is becoming the next leap in intelligence and how Nexa AI is leading this movement toward XXX% private local and offline AI. 📅 Wednesday Oct X Granite Theater Orlando FL - Speaker Session: 3:30 PM X :30 PM ET - Demo Session: 11:30 AM 12:00 PM ET"
X Link @nexa_ai 2025-10-06T23:37Z 2297 followers, XXX engagements

"Thrilled to speak and demo at @IBM #TechXchange in Orlando this week @alanzhuly shared how were advancing the frontier of on-device AI showcasing: ⚡ IBM Granite XXX running lightning-fast on @Qualcomm NPU the first Day-0 model support in NPU history. 💻 Hyperlink the worlds first local AI app that runs the latest models on NPU/GPU/CPU turning your computer into an agentic assistant that can search and reason across all your files privately and offline. ⚙ NexaML and NexaSDK our CUDA-like software layer and SDK for NPUs built from scratch to run any model on any backend with multimodal support"
X Link @nexa_ai 2025-10-10T16:21Z 2297 followers, XXX engagements

"We ran @OpenAI GPTOSS 20B fully local on a phone. Heres how:"
X Link @nexa_ai 2025-10-11T17:19Z 2299 followers, 1771 engagements

"Recently one of our teammates had a 6-hour flight and a report due that night. No Wi-Fi. Hundreds of notes and files to sort through. Hyperlink running @OpenAI GPT-OSS locally pulled insights from all of them instantly searching summarizing connecting dots and helping them write. It felt like having ChatGPT built into their computer but fully local & offline. Also a sneak peek of the new Hyperlink UI weve been testing 👀"
X Link @nexa_ai 2025-10-12T19:20Z 2297 followers, 9506 engagements

"Try Hyperlink for free on your flights:"
X Link @nexa_ai 2025-10-12T19:20Z 2297 followers, XXX engagements

"Check out NexaSDK GitHub to get started:"
X Link @nexa_ai 2025-10-14T17:29Z 2301 followers, XXX engagements

"Learn more in Blog:"
X Link @nexa_ai 2025-10-14T17:29Z 2302 followers, XXX engagements

"@Alibaba_Qwen You can already run Qwen3-VL-4B & 8B locally Day-0 on NPU/GPU/CPU using MLX GGUF and NexaML. Check out NexaSDK"
X Link @nexa_ai 2025-10-14T17:34Z 2297 followers, 2172 engagements

"We just built the worlds first fully NPU-supported local RAG pipeline retrieval rerank and generation all run entirely on the @Qualcomm NPU with SOTA models. XX less power. Always-on. XXX% private. Cooler hands. Models: - Embedding: @GoogleDeepMind EmbeddingGemma-300M - Rerank: @JinaAI_ Reranker v2 - Generate: @IBMResearch Granite 4.0-Micro Together full-stack SOTA retrieval + generation on NPU. Demo & example project for @Snapdragon AI PC below 👇"
X Link @nexa_ai 2025-10-13T16:04Z 2298 followers, 16.5K engagements

"Qwen3-VL-4B: Unlock Multimodal RAG on Edge 🚀 Enterprise multimodal RAG has been bottlenecked by cloud latency and privacy constraints. Qwen3-VL-4B delivers breakthrough performance that makes visual document processing practical on consumer hardware. Traditional multimodal RAG systems hit fundamental limits: 200500 ms cloud API latency and the inability to process sensitive visual documents locally. Qwen3-VL-4Bs XXXX MMMU score in a compact 4B architecture enables real-time visual reasoning that was previously impossible at the edge. RAG System: - Search massive text and image files locally:"
X Link @nexa_ai 2025-10-16T16:58Z 2301 followers, XXX engagements

"Download Hyperlink:"
X Link @nexa_ai 2025-10-17T17:21Z 2298 followers, XXX engagements