[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

# ![@ArtificialAnlys Avatar](https://lunarcrush.com/gi/w:26/cr:twitter::1743487864934162432.png) @ArtificialAnlys Artificial Analysis

Artificial Analysis posts on X about agentic, ai, the first, $googl the most. They currently have XXXXXX followers and XX posts still getting attention that total XXXXXX engagements in the last XX hours.

### Engagements: XXXXXX [#](/creator/twitter::1743487864934162432/interactions)
![Engagements Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::1743487864934162432/c:line/m:interactions.svg)

- X Week XXXXXXX -XX%
- X Month XXXXXXXXX +22%
- X Months XXXXXXXXXX +330%
- X Year XXXXXXXXXX +659%

### Mentions: XX [#](/creator/twitter::1743487864934162432/posts_active)
![Mentions Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::1743487864934162432/c:line/m:posts_active.svg)

- X Week XX -XX%
- X Month XX +14%
- X Months XXX +146%
- X Year XXX +212%

### Followers: XXXXXX [#](/creator/twitter::1743487864934162432/followers)
![Followers Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::1743487864934162432/c:line/m:followers.svg)

- X Week XXXXXX +0.63%
- X Month XXXXXX +7.80%
- X Months XXXXXX +73%
- X Year XXXXXX +285%

### CreatorRank: XXXXXXX [#](/creator/twitter::1743487864934162432/influencer_rank)
![CreatorRank Line Chart](https://lunarcrush.com/gi/w:600/cr:twitter::1743487864934162432/c:line/m:influencer_rank.svg)

### Social Influence

**Social category influence**
[technology brands](/list/technology-brands)  XXXX% [stocks](/list/stocks)  XXXX% [finance](/list/finance)  XXXX%

**Social topic influence**
[agentic](/topic/agentic) #43, [ai](/topic/ai) 11.76%, [the first](/topic/the-first) 5.88%, [$googl](/topic/$googl) 5.88%, [token](/topic/token) #554, [strong](/topic/strong) 2.94%, [reduce](/topic/reduce) 2.94%, [claude opus](/topic/claude-opus) #1, [pro](/topic/pro) 2.94%, [native](/topic/native) XXXX%

**Top accounts mentioned or mentioned by**
[@elaina43114880](/creator/undefined) [@fiesta_mop](/creator/undefined) [@huggingface](/creator/undefined) [@togethercompute](/creator/undefined) [@lightningai](/creator/undefined) [@elonmusk](/creator/undefined) [@minihuizhu](/creator/undefined) [@minyangtian1](/creator/undefined) [@haopenguiuc](/creator/undefined) [@anthropicais](/creator/undefined) [@viduaiofficial](/creator/undefined) [@mistralai](/creator/undefined) [@awscloud](/creator/undefined) [@azure](/creator/undefined) [@ibmwatsonx](/creator/undefined) [@fireworksaihq](/creator/undefined) [@modal](/creator/undefined) [@runwayml](/creator/undefined) [@kimimoonshots](/creator/undefined) [@allenai](/creator/undefined)

**Top assets mentioned**
[Alphabet Inc Class A (GOOGL)](/topic/$googl) [Flex Ltd. Ordinary Shares (FLEX)](/topic/$flex)
### Top Social Posts
Top posts by engagements in the last XX hours

"ServiceNows upgraded Apriel-v1.6B-15B-Thinker grows its lead amongst small open weights models (40B parameters) and uses XX% fewer tokens to complete our Intelligence Index 🧠 Increased intelligence; agentic performance remains strong: Like v1.5 Apriel-v1.6-15B-Thinker is a dense 15B parameter open weights reasoning model. It scores XX on the Artificial Analysis Intelligence Index gaining X points on v1.5. The previous release scored highly in agentic tasks like multi-turn conversations and tool use. v1.6 gains in two other capabilities useful for agents - long context reasoning (20% to 50%"  
[X Link](https://x.com/ArtificialAnlys/status/1998488372734832935)  2025-12-09T20:22Z 69.7K followers, 9407 engagements


"GPT-5.2 cost $XXX to run the XXX agentic tasks in GDPval-AA"  
[X Link](https://x.com/ArtificialAnlys/status/1999404581986664831)  2025-12-12T09:02Z 69.7K followers, 6919 engagements


"DeepSeek has launched V3.2 Exp with their new DeepSeek Sparse Attention (DSA) architecture that claims to reduce the impact of the quadratic scaling of compute with context length Weve independently benchmarked V3.2 Exp as achieving similar intelligence to DeepSeek V3.1 Terminus; DeepSeek have switched to using V3.2 for their main API endpoint and have reduced API pricing by 50%. With DeepSeeks updated first party API pricing cost to run Artificial Analysis Intelligence Index falls from $XXX to $XX. DeepSeek claims to have deliberately aligned the training configurations of V3.1 Terminus and"  
[X Link](https://x.com/ArtificialAnlys/status/1973230103854456993)  2025-10-01T03:34Z 69.5K followers, 32.7K engagements


"DeepSeek V3.2 Exp is cheaper than DeepSeek V3.1 Terminus via DeepSeek first party API due to a reduction in per token pricing"  
[X Link](https://x.com/ArtificialAnlys/status/1973230105809010891)  2025-10-01T03:34Z 69.5K followers, 6962 engagements


"Anthropics new Claude Opus XXX is the #2 most intelligent model in the Artificial Analysis Intelligence Index narrowly behind Googles Gemini X Pro and tying OpenAIs GPT-5.1 (high) Claude Opus XXX delivers a substantial intelligence uplift over Claude Sonnet XXX (+7 points on the Artificial Analysis Intelligence Index) and Claude Opus XXX (+11 points) establishing it as @AnthropicAI's new leading model. Anthropic has dramatically cut per-token pricing for Claude Opus XXX to $5/$25 per million input/output tokens. However compared to the prior Claude Opus XXX model it used XX% more tokens to"  
[X Link](https://x.com/ArtificialAnlys/status/1993287030252749231)  2025-11-25T11:53Z 69.7K followers, 66K engagements


"Amazon is back with Nova XXX a substantial upgrade over prior Amazon Nova models and demonstrating particular strength in agentic capabilities Amazon has released Nova XXX Pro (Preview) its new flagship model; Nova XXX Lite focused on speed and lower cost; and Nova XXX Omni a multimodal model handling text image video and speech inputs with text and image outputs. Key benchmarking takeaways: Amazon back amongst top AI players: This is Amazons latest release since Nova Premier and Amazons first release of reasoning models. Nova XXX Pro jumps XX points in the Artificial Analysis Intelligence"  
[X Link](https://x.com/ArtificialAnlys/status/1995921468010758267)  2025-12-02T18:22Z 69.6K followers, 74.3K engagements


"Mistral just launched their new large open weights model Mistral Large X (675B total 41B active) alongside a set of three Ministral models (3B 8B 14B) Mistral has released Instruct (non-reasoning) variants of all four models as well as reasoning variants of the three Ministral models. All models support multimodal inputs and are available with an Apache XXX license today on @huggingface. We evaluated Mistral Large X and the Instruct variants of the three Ministral models prior to launch. Mistrals highest scoring model in Artificial Analysis Intelligence Index remains the proprietary Magistral"  
[X Link](https://x.com/ArtificialAnlys/status/1995946145236001168)  2025-12-02T20:00Z 69.7K followers, 16.7K engagements


"Amazon has launched a new speech-to-speech model Nova Sonic XXX which ranks #2 on our Artificial Analysis Big Bench Audio Speech Reasoning benchmark The new model achieves a reasoning accuracy score of XXXX% on Big Bench Audio placing second overall behind Googles Gemini XXX Flash Native Audio Thinking and above other offerings including GPT Realtime Performance: ➤ Reasoning: Achieves XXXX% on Big Bench Audio ranking second on the Artificial Analysis Speech to Speech reasoning leaderboard between Googles Gemini XXX Flash Native Audio Thinking and OpenAIs GPT Realtime Aug XX ➤ Latency: At an"  
[X Link](https://x.com/ArtificialAnlys/status/1995950101068763393)  2025-12-02T20:16Z 69.7K followers, 11.2K engagements


"Apriel-v1.6-15B-Thinker is the most intelligent open weights Small Model (40B parameters) further pushing the Pareto Frontier of Intelligence vs Total Parameters"  
[X Link](https://x.com/ArtificialAnlys/status/1998488376882950393)  2025-12-09T20:22Z 69.7K followers, XXX engagements


"The model scores negative XX in our knowledge and hallucination benchmark AA-Omniscience. Apriel-v1.6-15B-Thinkers Omniscience Accuracy score of XX% is within expectations given its size. The driver of its lower score is its XX% Omniscience Hallucination rate - an increase from v1.5s score rate of 84%"  
[X Link](https://x.com/ArtificialAnlys/status/1998488379231752313)  2025-12-09T20:22Z 69.7K followers, XXX engagements


"Motif-2-12.7B-Reasoning demonstrates particular strength in Competition Math and Instruction Following scoring XX% on AIME 2025 and XX% on Instruction Following. This places it amongst the best performing models in these categories when accounting for its 12.7B size"  
[X Link](https://x.com/ArtificialAnlys/status/1998570295230410778)  2025-12-10T01:47Z 69.7K followers, 1083 engagements


"The Whisper-Thunder Reveal: Runway Gen-4.5 is now the leading Text to Video model in the Artificial Analysis Video Leaderboards surpassing Veo X Kling XXX Turbo and Sora X Pro Runway Gen-4.5 is the latest release from @runwayml succeeding Runway Gen-4 released in March. While Gen-4 only supported Image to Video Runway Gen-4.5 introduces Text to Video generation. We have not yet evaluated Runway Gen-4.5s Image to Video generation capabilities. Runway Gen-4.5 is gradually rolling out to users of the Runway application with wider availability expected in the coming days. See below for"  
[X Link](https://x.com/ArtificialAnlys/status/1996052123470209164)  2025-12-03T03:01Z 69.7K followers, 9691 engagements


"DeepSeeks R1 leaps over xAI Meta and Anthropic to be tied as the worlds #2 AI Lab and the undisputed open-weights leader DeepSeek R1 0528 has jumped from XX to XX in the Artificial Analysis Intelligence Index our index of X leading evaluations that we run independently across all leading models. Thats the same magnitude of increase as the difference between OpenAIs o1 and o3 (62 to 70). This positions DeepSeek R1 as higher intelligence than xAIs Grok X mini (high) NVIDIAs Llama Nemotron Ultra Metas Llama X Maverick Alibabas Qwen X XXX and equal to Googles Gemini XXX Pro. Breakdown of the"  
[X Link](https://x.com/ArtificialAnlys/status/1928071179115581671)  2025-05-29T12:49Z 69.7K followers, 598.3K engagements


"MoonshotAI has released Kimi K2 Thinking a new reasoning variant of Kimi K2 that achieves #1 in the Tau2 Bench Telecom agentic benchmark and is potentially the new leading open weights model Kimi K2 Thinking is one of the largest open weights models ever at 1T total parameters with 32B active. K2 Thinking is the first reasoning model release within @Kimi_Moonshot's Kimi K2 model family following non-reasoning Kimi K2 Instruct models released previously in July and September 2025. Key takeaways: ➤ Strong performance on agentic tasks: Kimi K2 Thinking achieves XX% in -Bench Telecom an agentic"  
[X Link](https://x.com/ArtificialAnlys/status/1986541785511043536)  2025-11-06T21:10Z 69.7K followers, 1.4M engagements


"Google TPU v6e vs AMD MI300X vs NVIDIA H100/B200: Artificial Analysis Hardware Benchmarking shows NVIDIA achieving a 5x tokens-per-dollar advantage over TPU v6e (Trillium) and a 2x advantage over MI300X in our key inference cost metric In our metric for inference cost called Cost Per Million Input and Output Tokens at Reference Speed we see NVIDIA H100 and B200 systems achieving lower overall cost than TPU v6e and MI300X. For Llama XXX 70B running with vLLM at a Per-Query Reference Speed of XX output tokens/s NVIDIA H100 achieves a Cost Per Million Input and Output Tokens of $XXXX compared to"  
[X Link](https://x.com/ArtificialAnlys/status/1993878037226557519)  2025-11-27T03:02Z 69.7K followers, 713.1K engagements


"Introducing the Artificial Analysis Openness Index: a standardized and independently assessed measure of AI model openness across availability and transparency Openness is not just the ability to download model weights. It is also licensing data and methodology - we developed a framework underpinning the Artificial Analysis Openness Index to incorporate these elements. It allows developers users and labs to compare across all these aspects of openness on a standardized basis and brings visibility to labs advancing the open AI ecosystem. A model with a score of XXX in Openness Index would be"  
[X Link](https://x.com/ArtificialAnlys/status/1995523178521846191)  2025-12-01T15:59Z 69.7K followers, 103.9K engagements


"FLUX.2 pro ranks #2 in the Artificial Analysis Text to Image Leaderboard trailing only Nano Banana Pro (Gemini XXX Pro Image) while costing less than a quarter of the price FLUX.2 is a family of image models from Black Forest Labs @bfl_ml coming in pro flex and dev variants. All variants support both text to image and image editing. FLUX.2 pro comes in at #2 in the Text to Image Leaderboard and is positioned by BFL as the best balance of generation speed and quality. We observe generation times of 10s from Black Forest Labs' API comparable to FLUX.1 Kontext max (10s) and Seedream XXX (12s)."  
[X Link](https://x.com/ArtificialAnlys/status/1995924695775150409)  2025-12-02T18:35Z 69.7K followers, 13.2K engagements


"DeepSeek V3.2 is the #2 most intelligent open weights model and also ranks ahead of Grok X and Claude Sonnet XXX (Thinking) - it takes DeepSeek Sparse Attention out of experimental status and couples it with a material boost to intelligence @deepseek_ai V3.2 scores XX on the Artificial Analysis Intelligence Index; a substantial intelligence uplift over DeepSeek V3.2-Exp (+9 points) released in September 2025. DeepSeek has switched its main API endpoint to V3.2 with no pricing change from the V3.2-Exp pricing - this puts pricing at just $0.28/$0.42 per 1M input/output tokens with XX% off for"  
[X Link](https://x.com/ArtificialAnlys/status/1996110256628539409)  2025-12-03T06:52Z 69.7K followers, 82.8K engagements


"Motif Technologies a Korean AI lab has just launched Motif-2-12.7B-Reasoning a 12.7B open weights reasoning model that scores XX on the Artificial Analysis Intelligence Index and is now the leading model from Korea Key benchmarking takeaways: ➤ Open weights: Motif-2-12.7B-Reasoning is open weights and is a relatively small model at 12.7B parameters. This marks a shift for the Korean model ecosystem which has historically been more closed relative to Chinese open weights releases ➤ Strengths in Instruction Following and Competition Math: Motif-2-12.7B-Reasoning scores XX% on IFBench and XX% on"  
[X Link](https://x.com/ArtificialAnlys/status/1998570291086373081)  2025-12-10T01:47Z 69.7K followers, 52.1K engagements


"Motif-2-12.7B-Reasoning generated 190M reasoning tokens while running the Artificial Analysis evaluation suite making it the most token-intensive model tested - this has latency and cost implications"  
[X Link](https://x.com/ArtificialAnlys/status/1998570297340174362)  2025-12-10T01:47Z 69.7K followers, 1080 engagements


"Stirrup agents can be easily set up in just a few lines of code"  
[X Link](https://x.com/ArtificialAnlys/status/1998785302358569095)  2025-12-10T16:02Z 69.7K followers, XXX engagements


"Stirrup includes built in logging to help you observe and debug agents"  
[X Link](https://x.com/ArtificialAnlys/status/1998785303881068644)  2025-12-10T16:02Z 69.7K followers, 3736 engagements


"Announcing GDPval-AA our leaderboard and evaluation harness for comparing models on OpenAIs GDPval dataset of real-world knowledge work tasks Earlier today we announced our agentic harness called Stirrup which we built to run GDPval tasks on any language model. Were combining this with an AI-based grading pipeline to run GDPval tasks at scale - and we think this makes it todays best way to compare general agentic performance of language models. Key findings: 🥇 Claude Opus XXX is the leader in GDPval-AA followed by GPT-5 (not GPT-5.1) Claude Sonnet XXX and a tie between DeepSeek V3.2 and"  
[X Link](https://x.com/ArtificialAnlys/status/1998841566627246173)  2025-12-10T19:45Z 69.7K followers, 49.5K engagements


"In addition to the models we compared the leading consumer chatbot apps where users frequently interact with these models (and ask for their support in work tasks) today: Claude was the clear winner - it consistently followed detailed instructions and showed extensive range including producing video and audio files that competitors did not attempt. This was driven by both Claude Opus 4.5s strength and the Claude application now allowing the most flexible access to tools and code execution of the major chatbot applications along with Anthropics Agent Skills that help the model with document"  
[X Link](https://x.com/ArtificialAnlys/status/1998841569039028634)  2025-12-10T19:45Z 69.7K followers, 2098 engagements


"GPT-5.2 just overtook Claude Opus XXX to achieve the highest score in GDPval-AA a benchmark that focuses on performance in real-world economically valuable tasks However GPT-5.2 is also the most expensive model to run GDPval-AA: GPT-5.2 cost $XXX compared to Claude Opus 4.5s $XXX and GPT-5.1s $XX. This was driven by @OpenAI's GPT-5.2 using 6x more tokens than GPT-5.1 (250M compared to 40M) and OpenAI raising prices by XX% ($14/$1.75 per million input/output tokens compared to $1.25/$10). GDPval-AA uses our agentic harness Stirrup to run models on OpenAI's GDPval dataset and measures their"  
[X Link](https://x.com/ArtificialAnlys/status/1999404579599823091)  2025-12-12T09:02Z 69.7K followers, 93.6K engagements


"Thanks for the support @AndrewYNg Completely agree faster token generation will become increasingly important as a greater proportion of output tokens are consumed by models such as in multi-step agentic workflows rather than being read by people"  
[X Link](https://x.com/ArtificialAnlys/status/1809670091778207901)  2024-07-06T19:25Z 69.7K followers, 271K engagements


"Gemini X Pro is the new leader in AI. Google has the leading language model for the first time with Gemini X Pro debuting +3 points above GPT-5.1 in our Artificial Analysis Intelligence Index @GoogleDeepMind gave us pre-release access to Gemini X Pro Preview. The model outperforms all other models in Artificial Analysis Intelligence Index. It demonstrates strength across the board coming in first in X of the XX evaluations that make up Intelligence Index. Despite these intelligence gains Gemini X Pro Preview shows improved token efficiency from Gemini XXX Pro using significantly fewer tokens"  
[X Link](https://x.com/ArtificialAnlys/status/1990813106478715098)  2025-11-18T16:03Z 69.7K followers, 250.2K engagements


"FLUX.2 dev is the new leading open weights text to image model surpassing HunyuanImage XXX Qwen-Image and HiDream-I1-Dev in the Artificial Analysis Image Arena @bfl_ml's latest release claims the top spot for open weights text to image generation while also ranking #2 in open weights Image Editing trailing only Alibaba's Qwen Image Edit 2509. FLUX.2 dev is released under the FLUX dev Non-Commercial License with weights available on @huggingface. Commercial applications require a separate license from Black Forest Labs. The model is available via API on @fal @replicate @runware Verda"  
[X Link](https://x.com/ArtificialAnlys/status/1996801917196841345)  2025-12-05T04:40Z 69.7K followers, 13.8K engagements


"Claude Opus XXX was the best-performing model but it was also 2x more expensive than any other model we tested at$608 driven by both high per-token costs and relatively high token usage (higher than GPT-5s $XXX cost). DeepSeek V3.2 stands out as the only model that makes it into the top left quadrant below - it has landed in a 5th place tie with Gemini X Pro Preview and cost just $XX to run all GDPval tasks - over 20x cheaper than Claude Opus XXX and 3x cheaper that Gemini X Pros $XX cost. GPT-5.1 used XX% fewer tokens than GPT-5 but also scored a 67-point lower Elo. This suggests the model"  
[X Link](https://x.com/ArtificialAnlys/status/1998841571412946957)  2025-12-10T19:45Z 69.7K followers, 3701 engagements


"GPT-5.2 used 250M tokens to run the XXX agentic tasks in GDPval-AA"  
[X Link](https://x.com/ArtificialAnlys/status/1999404583991607682)  2025-12-12T09:02Z 69.7K followers, 5829 engagements

[GUEST ACCESS MODE: Data is scrambled or limited to provide examples. Make requests using your API key to unlock full data. Check https://lunarcrush.ai/auth for authentication information.]

@ArtificialAnlys Artificial Analysis

Artificial Analysis posts on X about agentic, ai, the first, $googl the most. They currently have XXXXXX followers and XX posts still getting attention that total XXXXXX engagements in the last XX hours.

Engagements: XXXXXX #

X Week XXXXXXX -XX%
X Month XXXXXXXXX +22%
X Months XXXXXXXXXX +330%
X Year XXXXXXXXXX +659%

Mentions: XX #

X Week XX -XX%
X Month XX +14%
X Months XXX +146%
X Year XXX +212%

Followers: XXXXXX #

X Week XXXXXX +0.63%
X Month XXXXXX +7.80%
X Months XXXXXX +73%
X Year XXXXXX +285%

CreatorRank: XXXXXXX #

Social Influence

Social category influence technology brands XXXX% stocks XXXX% finance XXXX%

Social topic influence agentic #43, ai 11.76%, the first 5.88%, $googl 5.88%, token #554, strong 2.94%, reduce 2.94%, claude opus #1, pro 2.94%, native XXXX%

Top accounts mentioned or mentioned by @elaina43114880 @fiesta_mop @huggingface @togethercompute @lightningai @elonmusk @minihuizhu @minyangtian1 @haopenguiuc @anthropicais @viduaiofficial @mistralai @awscloud @azure @ibmwatsonx @fireworksaihq @modal @runwayml @kimimoonshots @allenai

Top assets mentioned Alphabet Inc Class A (GOOGL) Flex Ltd. Ordinary Shares (FLEX)

Top Social Posts

Top posts by engagements in the last XX hours

"ServiceNows upgraded Apriel-v1.6B-15B-Thinker grows its lead amongst small open weights models (40B parameters) and uses XX% fewer tokens to complete our Intelligence Index 🧠 Increased intelligence; agentic performance remains strong: Like v1.5 Apriel-v1.6-15B-Thinker is a dense 15B parameter open weights reasoning model. It scores XX on the Artificial Analysis Intelligence Index gaining X points on v1.5. The previous release scored highly in agentic tasks like multi-turn conversations and tool use. v1.6 gains in two other capabilities useful for agents - long context reasoning (20% to 50%"
X Link 2025-12-09T20:22Z 69.7K followers, 9407 engagements

"GPT-5.2 cost $XXX to run the XXX agentic tasks in GDPval-AA"
X Link 2025-12-12T09:02Z 69.7K followers, 6919 engagements

"DeepSeek has launched V3.2 Exp with their new DeepSeek Sparse Attention (DSA) architecture that claims to reduce the impact of the quadratic scaling of compute with context length Weve independently benchmarked V3.2 Exp as achieving similar intelligence to DeepSeek V3.1 Terminus; DeepSeek have switched to using V3.2 for their main API endpoint and have reduced API pricing by 50%. With DeepSeeks updated first party API pricing cost to run Artificial Analysis Intelligence Index falls from $XXX to $XX. DeepSeek claims to have deliberately aligned the training configurations of V3.1 Terminus and"
X Link 2025-10-01T03:34Z 69.5K followers, 32.7K engagements

"DeepSeek V3.2 Exp is cheaper than DeepSeek V3.1 Terminus via DeepSeek first party API due to a reduction in per token pricing"
X Link 2025-10-01T03:34Z 69.5K followers, 6962 engagements

"Anthropics new Claude Opus XXX is the #2 most intelligent model in the Artificial Analysis Intelligence Index narrowly behind Googles Gemini X Pro and tying OpenAIs GPT-5.1 (high) Claude Opus XXX delivers a substantial intelligence uplift over Claude Sonnet XXX (+7 points on the Artificial Analysis Intelligence Index) and Claude Opus XXX (+11 points) establishing it as @AnthropicAI's new leading model. Anthropic has dramatically cut per-token pricing for Claude Opus XXX to $5/$25 per million input/output tokens. However compared to the prior Claude Opus XXX model it used XX% more tokens to"
X Link 2025-11-25T11:53Z 69.7K followers, 66K engagements

"Amazon is back with Nova XXX a substantial upgrade over prior Amazon Nova models and demonstrating particular strength in agentic capabilities Amazon has released Nova XXX Pro (Preview) its new flagship model; Nova XXX Lite focused on speed and lower cost; and Nova XXX Omni a multimodal model handling text image video and speech inputs with text and image outputs. Key benchmarking takeaways: Amazon back amongst top AI players: This is Amazons latest release since Nova Premier and Amazons first release of reasoning models. Nova XXX Pro jumps XX points in the Artificial Analysis Intelligence"
X Link 2025-12-02T18:22Z 69.6K followers, 74.3K engagements

"Mistral just launched their new large open weights model Mistral Large X (675B total 41B active) alongside a set of three Ministral models (3B 8B 14B) Mistral has released Instruct (non-reasoning) variants of all four models as well as reasoning variants of the three Ministral models. All models support multimodal inputs and are available with an Apache XXX license today on @huggingface. We evaluated Mistral Large X and the Instruct variants of the three Ministral models prior to launch. Mistrals highest scoring model in Artificial Analysis Intelligence Index remains the proprietary Magistral"
X Link 2025-12-02T20:00Z 69.7K followers, 16.7K engagements

"Amazon has launched a new speech-to-speech model Nova Sonic XXX which ranks #2 on our Artificial Analysis Big Bench Audio Speech Reasoning benchmark The new model achieves a reasoning accuracy score of XXXX% on Big Bench Audio placing second overall behind Googles Gemini XXX Flash Native Audio Thinking and above other offerings including GPT Realtime Performance: ➤ Reasoning: Achieves XXXX% on Big Bench Audio ranking second on the Artificial Analysis Speech to Speech reasoning leaderboard between Googles Gemini XXX Flash Native Audio Thinking and OpenAIs GPT Realtime Aug XX ➤ Latency: At an"
X Link 2025-12-02T20:16Z 69.7K followers, 11.2K engagements

"Apriel-v1.6-15B-Thinker is the most intelligent open weights Small Model (40B parameters) further pushing the Pareto Frontier of Intelligence vs Total Parameters"
X Link 2025-12-09T20:22Z 69.7K followers, XXX engagements

"The model scores negative XX in our knowledge and hallucination benchmark AA-Omniscience. Apriel-v1.6-15B-Thinkers Omniscience Accuracy score of XX% is within expectations given its size. The driver of its lower score is its XX% Omniscience Hallucination rate - an increase from v1.5s score rate of 84%"
X Link 2025-12-09T20:22Z 69.7K followers, XXX engagements

"Motif-2-12.7B-Reasoning demonstrates particular strength in Competition Math and Instruction Following scoring XX% on AIME 2025 and XX% on Instruction Following. This places it amongst the best performing models in these categories when accounting for its 12.7B size"
X Link 2025-12-10T01:47Z 69.7K followers, 1083 engagements

"The Whisper-Thunder Reveal: Runway Gen-4.5 is now the leading Text to Video model in the Artificial Analysis Video Leaderboards surpassing Veo X Kling XXX Turbo and Sora X Pro Runway Gen-4.5 is the latest release from @runwayml succeeding Runway Gen-4 released in March. While Gen-4 only supported Image to Video Runway Gen-4.5 introduces Text to Video generation. We have not yet evaluated Runway Gen-4.5s Image to Video generation capabilities. Runway Gen-4.5 is gradually rolling out to users of the Runway application with wider availability expected in the coming days. See below for"
X Link 2025-12-03T03:01Z 69.7K followers, 9691 engagements

"DeepSeeks R1 leaps over xAI Meta and Anthropic to be tied as the worlds #2 AI Lab and the undisputed open-weights leader DeepSeek R1 0528 has jumped from XX to XX in the Artificial Analysis Intelligence Index our index of X leading evaluations that we run independently across all leading models. Thats the same magnitude of increase as the difference between OpenAIs o1 and o3 (62 to 70). This positions DeepSeek R1 as higher intelligence than xAIs Grok X mini (high) NVIDIAs Llama Nemotron Ultra Metas Llama X Maverick Alibabas Qwen X XXX and equal to Googles Gemini XXX Pro. Breakdown of the"
X Link 2025-05-29T12:49Z 69.7K followers, 598.3K engagements

"MoonshotAI has released Kimi K2 Thinking a new reasoning variant of Kimi K2 that achieves #1 in the Tau2 Bench Telecom agentic benchmark and is potentially the new leading open weights model Kimi K2 Thinking is one of the largest open weights models ever at 1T total parameters with 32B active. K2 Thinking is the first reasoning model release within @Kimi_Moonshot's Kimi K2 model family following non-reasoning Kimi K2 Instruct models released previously in July and September 2025. Key takeaways: ➤ Strong performance on agentic tasks: Kimi K2 Thinking achieves XX% in -Bench Telecom an agentic"
X Link 2025-11-06T21:10Z 69.7K followers, 1.4M engagements

"Google TPU v6e vs AMD MI300X vs NVIDIA H100/B200: Artificial Analysis Hardware Benchmarking shows NVIDIA achieving a 5x tokens-per-dollar advantage over TPU v6e (Trillium) and a 2x advantage over MI300X in our key inference cost metric In our metric for inference cost called Cost Per Million Input and Output Tokens at Reference Speed we see NVIDIA H100 and B200 systems achieving lower overall cost than TPU v6e and MI300X. For Llama XXX 70B running with vLLM at a Per-Query Reference Speed of XX output tokens/s NVIDIA H100 achieves a Cost Per Million Input and Output Tokens of $XXXX compared to"
X Link 2025-11-27T03:02Z 69.7K followers, 713.1K engagements

"Introducing the Artificial Analysis Openness Index: a standardized and independently assessed measure of AI model openness across availability and transparency Openness is not just the ability to download model weights. It is also licensing data and methodology - we developed a framework underpinning the Artificial Analysis Openness Index to incorporate these elements. It allows developers users and labs to compare across all these aspects of openness on a standardized basis and brings visibility to labs advancing the open AI ecosystem. A model with a score of XXX in Openness Index would be"
X Link 2025-12-01T15:59Z 69.7K followers, 103.9K engagements

"FLUX.2 pro ranks #2 in the Artificial Analysis Text to Image Leaderboard trailing only Nano Banana Pro (Gemini XXX Pro Image) while costing less than a quarter of the price FLUX.2 is a family of image models from Black Forest Labs @bfl_ml coming in pro flex and dev variants. All variants support both text to image and image editing. FLUX.2 pro comes in at #2 in the Text to Image Leaderboard and is positioned by BFL as the best balance of generation speed and quality. We observe generation times of 10s from Black Forest Labs' API comparable to FLUX.1 Kontext max (10s) and Seedream XXX (12s)."
X Link 2025-12-02T18:35Z 69.7K followers, 13.2K engagements

"DeepSeek V3.2 is the #2 most intelligent open weights model and also ranks ahead of Grok X and Claude Sonnet XXX (Thinking) - it takes DeepSeek Sparse Attention out of experimental status and couples it with a material boost to intelligence @deepseek_ai V3.2 scores XX on the Artificial Analysis Intelligence Index; a substantial intelligence uplift over DeepSeek V3.2-Exp (+9 points) released in September 2025. DeepSeek has switched its main API endpoint to V3.2 with no pricing change from the V3.2-Exp pricing - this puts pricing at just $0.28/$0.42 per 1M input/output tokens with XX% off for"
X Link 2025-12-03T06:52Z 69.7K followers, 82.8K engagements

"Motif Technologies a Korean AI lab has just launched Motif-2-12.7B-Reasoning a 12.7B open weights reasoning model that scores XX on the Artificial Analysis Intelligence Index and is now the leading model from Korea Key benchmarking takeaways: ➤ Open weights: Motif-2-12.7B-Reasoning is open weights and is a relatively small model at 12.7B parameters. This marks a shift for the Korean model ecosystem which has historically been more closed relative to Chinese open weights releases ➤ Strengths in Instruction Following and Competition Math: Motif-2-12.7B-Reasoning scores XX% on IFBench and XX% on"
X Link 2025-12-10T01:47Z 69.7K followers, 52.1K engagements

"Motif-2-12.7B-Reasoning generated 190M reasoning tokens while running the Artificial Analysis evaluation suite making it the most token-intensive model tested - this has latency and cost implications"
X Link 2025-12-10T01:47Z 69.7K followers, 1080 engagements

"Stirrup agents can be easily set up in just a few lines of code"
X Link 2025-12-10T16:02Z 69.7K followers, XXX engagements

"Stirrup includes built in logging to help you observe and debug agents"
X Link 2025-12-10T16:02Z 69.7K followers, 3736 engagements

"Announcing GDPval-AA our leaderboard and evaluation harness for comparing models on OpenAIs GDPval dataset of real-world knowledge work tasks Earlier today we announced our agentic harness called Stirrup which we built to run GDPval tasks on any language model. Were combining this with an AI-based grading pipeline to run GDPval tasks at scale - and we think this makes it todays best way to compare general agentic performance of language models. Key findings: 🥇 Claude Opus XXX is the leader in GDPval-AA followed by GPT-5 (not GPT-5.1) Claude Sonnet XXX and a tie between DeepSeek V3.2 and"
X Link 2025-12-10T19:45Z 69.7K followers, 49.5K engagements

"In addition to the models we compared the leading consumer chatbot apps where users frequently interact with these models (and ask for their support in work tasks) today: Claude was the clear winner - it consistently followed detailed instructions and showed extensive range including producing video and audio files that competitors did not attempt. This was driven by both Claude Opus 4.5s strength and the Claude application now allowing the most flexible access to tools and code execution of the major chatbot applications along with Anthropics Agent Skills that help the model with document"
X Link 2025-12-10T19:45Z 69.7K followers, 2098 engagements

"GPT-5.2 just overtook Claude Opus XXX to achieve the highest score in GDPval-AA a benchmark that focuses on performance in real-world economically valuable tasks However GPT-5.2 is also the most expensive model to run GDPval-AA: GPT-5.2 cost $XXX compared to Claude Opus 4.5s $XXX and GPT-5.1s $XX. This was driven by @OpenAI's GPT-5.2 using 6x more tokens than GPT-5.1 (250M compared to 40M) and OpenAI raising prices by XX% ($14/$1.75 per million input/output tokens compared to $1.25/$10). GDPval-AA uses our agentic harness Stirrup to run models on OpenAI's GDPval dataset and measures their"
X Link 2025-12-12T09:02Z 69.7K followers, 93.6K engagements

"Thanks for the support @AndrewYNg Completely agree faster token generation will become increasingly important as a greater proportion of output tokens are consumed by models such as in multi-step agentic workflows rather than being read by people"
X Link 2024-07-06T19:25Z 69.7K followers, 271K engagements

"Gemini X Pro is the new leader in AI. Google has the leading language model for the first time with Gemini X Pro debuting +3 points above GPT-5.1 in our Artificial Analysis Intelligence Index @GoogleDeepMind gave us pre-release access to Gemini X Pro Preview. The model outperforms all other models in Artificial Analysis Intelligence Index. It demonstrates strength across the board coming in first in X of the XX evaluations that make up Intelligence Index. Despite these intelligence gains Gemini X Pro Preview shows improved token efficiency from Gemini XXX Pro using significantly fewer tokens"
X Link 2025-11-18T16:03Z 69.7K followers, 250.2K engagements

"FLUX.2 dev is the new leading open weights text to image model surpassing HunyuanImage XXX Qwen-Image and HiDream-I1-Dev in the Artificial Analysis Image Arena @bfl_ml's latest release claims the top spot for open weights text to image generation while also ranking #2 in open weights Image Editing trailing only Alibaba's Qwen Image Edit 2509. FLUX.2 dev is released under the FLUX dev Non-Commercial License with weights available on @huggingface. Commercial applications require a separate license from Black Forest Labs. The model is available via API on @fal @replicate @runware Verda"
X Link 2025-12-05T04:40Z 69.7K followers, 13.8K engagements

"Claude Opus XXX was the best-performing model but it was also 2x more expensive than any other model we tested at$608 driven by both high per-token costs and relatively high token usage (higher than GPT-5s $XXX cost). DeepSeek V3.2 stands out as the only model that makes it into the top left quadrant below - it has landed in a 5th place tie with Gemini X Pro Preview and cost just $XX to run all GDPval tasks - over 20x cheaper than Claude Opus XXX and 3x cheaper that Gemini X Pros $XX cost. GPT-5.1 used XX% fewer tokens than GPT-5 but also scored a 67-point lower Elo. This suggests the model"
X Link 2025-12-10T19:45Z 69.7K followers, 3701 engagements

"GPT-5.2 used 250M tokens to run the XXX agentic tasks in GDPval-AA"
X Link 2025-12-12T09:02Z 69.7K followers, 5829 engagements