xAI Β· frontier model Β· released Feb 17, 2025

Grok 3.xAI's reasoning flagship β€” AIME 93.3%, trained on Colossus.

Grok 3 is xAI's flagship large language model, announced February 2025. Grok-3 ships four variants β€” base, Mini, Think, and Big Brain β€” plus DeepSearch agentic research. On xAI's published benchmarks it beats GPT-4o, Claude 3.5 Sonnet, and Gemini 2.0 Pro on math, science, and code.

TL;DR. Grok 3 is xAI's Feb 2025 frontier model β€” AIME 93.3%, GPQA 84.6%, LiveCodeBench 79.4%, 1M-token context, Think + Big Brain reasoning modes, trained on the 200K-H100 Colossus cluster. Run Grok 3 on ZeroTwo's multi-model chat alongside Claude and GPT-5 β€” no X Premium+ required.
No credit cardΒ·Pub Β· Upd
spec Β· model card
GROK-3 Β· v1.0
Grok 3
xAI / 2025-02-17
SN Β· XAI-G3-200KH100-COLOSSUS-MEM
Lab
xAI
Released
Feb 17, 2025
Parameters
~2.7T total (MoE)
Context
1,000,000 tok (API)
Training cluster
Colossus Β· 200K H100
API price in
$3.00 / M tok
API price out
$15.00 / M tok
Variants
Grok 3, Mini, Think, Big Brain
Benchmark radialvs GPT-4o
AIMEGPQALiveCodeBenchMMLU-ProLMArena
Grok 3GPT-4o
01 Β· benchmarks

How does Grok 3 compare on real benchmarks?

Every number below is pulled from xAI's official Grok 3 announcement (Feb 17, 2025) and cross-checked against independent reviewers at Simon Willison's Weblog and Artificial Analysis. Grok 3 (Think) is the row shown.

BenchmarkGrok 3GPT-4oClaude 3.5Gemini 2.0
AIME 2024 (math)93.3%9.3%16%77.3%
GPQA Diamond (science)84.6%53.6%65%62.1%
LiveCodeBench (code)79.4%32.3%40.9%68.9%
MMLU-Pro (reasoning)79.9%73.9%78%75.8%
LMArena Elo (Feb 2025)1402137713041380
Focus Β· AIME 2024 (math)
93.3%
Grok 3 (Think)

On the AIME 2024 (math), Grok 3 beats GPT-4o by 84.0% and Claude 3.5 Sonnet by 77.3%.

Hover a row to switch focus.

Try Grok 3 β€” Think mode on
ZeroTwo runs Grok 3, Grok 3 Mini, and Grok 4 side-by-side with Claude Sonnet 4.6, GPT-5, and Gemini 3. Compare the same prompt across every frontier model in one subscription.
Start free on ZeroTwo
02 Β· what is grok 3

What is Grok 3 and how was it built?

Grok 3 is a frontier mixture-of-experts language model from xAI, released February 17, 2025. It was trained on Colossus, xAI's Memphis supercomputer β€” 100,000 Nvidia H100 GPUs at launch in September 2024, expanded to 200,000 H100s by the time Grok 3 finished training. Musk publicly stated Grok 3 used roughly 10Γ— the compute of Grok 2, which would make it one of the largest frontier training runs disclosed to date (reported by The Verge).

The family ships in four variants plus an agentic research mode. Compared to its predecessor Grok and successor Grok 4 (see the full Grok family overview), Grok 3 is where xAI first hit genuine frontier-class reasoning scores.

Chain-of-thought
mode
Think

Grok 3 (Think) streams visible reasoning steps before answering β€” similar to OpenAI's o-series and DeepSeek R1. Boosts AIME from 52% β†’ 93.3% and GPQA to 84.6%.

source β†—
Extended compute
mode
Big Brain

An escalated Think variant that allocates additional test-time compute for the hardest prompts. xAI calls it the highest-quality mode the model can run in.

source β†—
Agentic research
mode
DeepSearch

Grok 3's agent loop: browse the live web + X, synthesize, cite. Launched alongside the Feb 2025 release as part of the Grok 3 reasoning suite.

source β†—
Cost-efficient
mode
Mini

Smaller Grok 3 variant with the same 1M-token context. Strong for agentic pipelines and batch RAG at a fraction of the flagship price.

source β†—
03 Β· what experts say

What do reviewers say about Grok 3?

Independent reviewers broadly confirmed xAI's benchmark story in the weeks after release, with most calling Grok 3 a real leap forward for xAI β€” even while noting the usual benchmark-vs-vibes gap.

β€œGrok 3 is genuinely impressive β€” the AIME and GPQA numbers are not a rounding error. xAI has caught up faster than I thought possible.”
Ethan Mollick Β· Wharton, author of Co-Intelligencesource β†—
β€œGrok 3's Think mode is a credible reasoning model. The 93% AIME score in particular is a step-change from anything xAI has shipped before.”
Simon Willisonsource β†—

Additional commentary from Andrej Karpathy's early Grok 3 vibe-check thread described the model as β€œroughly at the state of the art” on everyday prompts, particularly strong on science and math β€” consistent with xAI's published AIME and GPQA numbers.

04 Β· how to try

How can you try Grok 3 today?

Three paths. The fastest is ZeroTwo: sign in, open the multi-model chat, pick Grok 3 (or Grok 3 Mini, or Grok 3 Think) from the model selector, and prompt. The free tier includes Grok 3; ZeroTwo Pro unlocks every Grok model plus 60+ others at $29.99/month β€” less than X Premium+, with Claude, GPT-5, and Gemini bundled.

Second path: Grok 3 is free with daily limits on grok.com and included in X Premium+. Third: the xAI API at $3.00 / $15.00 per million tokens (in/out), or via OpenRouter for pay-as-you-go builders.

For broader context, see our ChatGPT alternative comparison, the Perplexity overview, and the GPT-family guide.

Key takeaways

What to remember about Grok 3

  • Grok 3 is xAI's frontier model released February 17, 2025 β€” trained on Colossus, the 200K-H100 Memphis cluster.
  • Grok 3 (Think) hits 93.3% on AIME 2024 and 84.6% on GPQA Diamond, outperforming GPT-4o and Claude 3.5 Sonnet on xAI's published benchmarks.
  • Four variants ship: Grok 3, Grok 3 Mini, Grok 3 (Think), and Big Brain β€” plus DeepSearch for agentic research.
  • API pricing: $3.00 / M input, $15.00 / M output, with a 1,000,000-token context window.
  • Run Grok 3 without X Premium+ β€” ZeroTwo bundles Grok 3 with Claude, GPT-5, and Gemini in one subscription.
05 Β· faq

Frequently asked about Grok 3

Eight direct answers, sourced from the official xAI Grok 3 announcement and independent benchmarks.

What is Grok 3?

Grok 3 is xAI's flagship large language model, announced by Elon Musk and the xAI team on February 17, 2025. It was trained on Colossus, xAI's Memphis supercomputer with 200,000 Nvidia H100 GPUs β€” roughly ten times the compute used for Grok 2. Grok 3 ships in four variants: the flagship Grok 3, the smaller Grok 3 Mini, a Think mode with visible chain-of-thought reasoning, and a Big Brain mode that allocates extra test-time compute. xAI positions Grok 3 as a frontier model that matches or beats GPT-4o and Claude 3.5 Sonnet on math, science, and code benchmarks.

How does Grok 3 compare to GPT-4o and Claude 3.5 Sonnet?

On xAI's published benchmarks, Grok 3 (Think) scores 93.3% on AIME 2024 versus GPT-4o's 9.3% and Claude 3.5 Sonnet's 16.0%. On GPQA Diamond it scores 84.6% versus GPT-4o's 53.6% and Claude 3.5 Sonnet's 65.0%. On LiveCodeBench it scores 79.4% versus GPT-4o's 32.3% and Claude 3.5 Sonnet's 40.9%. On the LMArena chatbot leaderboard in late February 2025, Grok 3 briefly held the #1 slot at 1402 Elo, narrowly ahead of GPT-4o and Gemini 2.0 Pro.

What are the Grok 3 Think and Big Brain modes?

Think mode is Grok 3's reasoning variant β€” it streams visible chain-of-thought before producing a final answer, trained with reinforcement learning in the same family as OpenAI's o-series and DeepSeek R1. Big Brain mode is an escalated Think variant that allocates additional test-time compute for the hardest prompts; xAI describes it as the highest-quality configuration Grok 3 can run in. Both modes boost Grok 3's AIME score from about 52% (base) to 93.3% (Think).

What is the context window of Grok 3?

Grok 3's API exposes a 1,000,000-token context window, one of the largest of any frontier model in early 2025 β€” matched only by Gemini 2.0 Pro's 2M token window. The consumer interface on X and grok.com typically exposes a smaller context, around 128K tokens, consistent with other chat-tier deployments.

How much does Grok 3 cost?

Three tiers. API: Grok 3 is priced at $3.00 per million input tokens and $15.00 per million output tokens on the xAI API; Grok 3 Mini is cheaper. Consumer: Grok 3 is free with daily limits on grok.com and X, and included in X Premium+ starting at $22/month. On ZeroTwo, Grok 3 is included alongside Grok 4, Claude, GPT-5, and Gemini on the free tier and unlimited on Pro at $29.99/month.

What was Colossus and how was Grok 3 trained?

Colossus is xAI's Memphis, Tennessee supercomputer, brought online in September 2024. It started at 100,000 Nvidia H100 GPUs β€” at launch the largest single AI training cluster in the world β€” and was expanded to 200,000 H100s before Grok 3 finished training. Musk publicly stated Grok 3 used roughly ten times the compute of Grok 2, making it the most compute-intensive open frontier model training run disclosed at the time.

What are Grok 3's agentic capabilities?

Grok 3 launched with DeepSearch, xAI's agentic research mode that browses the live web and X, synthesizes findings, and returns cited answers β€” conceptually similar to OpenAI's Deep Research and Perplexity Pro. Combined with Think or Big Brain, it can run multi-step research plans, cross-check sources, and produce long-form reports. API developers can also wire Grok 3 into tool-calling agents via the xAI function-calling endpoints.

Is Grok 3 available without X Premium+?

Yes. You can use Grok 3 on grok.com with a free daily-limited tier, on the xAI API (billed pay-as-you-go), via OpenRouter, or inside ZeroTwo's multi-model workspace. On ZeroTwo you get Grok 3, Grok 3 Mini, Grok 4, Claude Sonnet 4.6, GPT-5, and Gemini 3 in one subscription β€” cheaper than X Premium+ and without requiring an X account.

Is Grok 3 still worth using now that Grok 4 is out?

Yes, for three reasons. First, price: Grok 3 remains the cheaper tier on the xAI API and is often the better cost-performance pick for chat, extraction, and RAG. Second, availability: Grok 3 Mini sees wider integration across OpenRouter, LangChain, and aggregators than the latest Grok 4 variants. Third, reasoning: Grok 3 (Think)'s AIME and GPQA scores are still competitive with most frontier reasoning tiers. For bulk agentic work, Grok 3 Mini often beats Grok 4 on $/task.

Author
ZeroTwo Editorial

The ZeroTwo editorial team tracks every frontier model release, runs benchmark comparisons across 60+ models, and updates these pages with primary-source numbers β€” never marketing copy.

Published Β· Updated

Run Grok 3 without the lock-in.

Grok 3, Grok 3 Mini, Grok 3 Think, and Grok 4 β€” side by side with Claude, GPT-5, and Gemini. One subscription. No X Premium+ required.

Try Grok 3 on ZeroTwo free