What happened?
OpenAI released GPT-5.3-Codex just minutes after Anthropic dropped Opus 4.6 — neither company willing to cede the spotlight. Both releases represent significant technical leaps, but in different directions.
Key comparisons
- Context window — Anthropic jumped from 200K to 1M tokens. More importantly, Opus 4.6 scored 76% on the MRCR benchmark (8-needle retrieval accuracy) at 1M context, more than doubling the previous best of 32.6%. For reference, Gemini 3 drops to ~25% accuracy at 1M tokens. OpenAI kept GPT-5.3-Codex at 400K tokens.
- Terminal Bench v2 — GPT-5.3-Codex scored 75–77% on isolated Docker environment tasks (building repos, setting up servers, training LLMs), up from 64% on GPT-5.2. Opus 4.6 scored notably lower on this benchmark.
- Speed — OpenAI claims a 25% inference speed increase for Codex, addressing a major complaint vs Opus.
- Pricing — GPT-5.3-Codex: $1.75/M input, $14/M output. Opus 4.6: $5/M input, $25/M output. Opus is also reportedly very token-hungry, meaning the $20 Anthropic plan may not stretch as far given the 5-hour credit refresh window.
- Iteration speed — Anthropic ships Opus updates every 2–3 months. OpenAI has narrowed from 4 months down to 1–2 months between releases. Monthly or biweekly model drops could be the norm soon.
- Self-improvement — OpenAI stated GPT-5.3-Codex was used to assist in building itself.
Why this is interesting
The competition is no longer just about raw benchmarks — it's about where each lab is placing its bets. Anthropic is going deep on context fidelity (solving context rot at scale), while OpenAI is optimizing for developer ergonomics (speed, price, terminal navigation). The fact that releases are now timed to within minutes of each other signals how aware these labs are of the zero-sum attention game.