OpenAI’s GPT-5.3-Codex-Spark Hits 1,000 Tokens/Second on Cerebras Chips, Bypassing Nvidia

January 10, 2026

wpadmin

OpenAI has introduced its first production AI model operating on hardware not supplied by Nvidia. The GPT-5.3-Codex-Spark coding model runs on chips from Cerebras, delivering output at more than 1,000 tokens per second. This speed represents a roughly 15-fold increase over the previous version from OpenAI.

For comparison, Anthropic’s Claude Opus 4.6 model in its premium fast mode achieves about 2.5 times its standard rate of 68.2 tokens per second. Claude Opus is a larger and more capable model overall than Spark, highlighting a tradeoff between speed and breadth of function.

Sachin Katti, head of compute at OpenAI, stated, “Cerebras has been a great engineering partner, and we’re excited about adding fast inference as a new platform capability.” The model is available as a research preview to ChatGPT Pro subscribers, who pay $200 per month. Access is provided through the Codex app, a command-line interface, and a VS Code extension. OpenAI is also offering API access to a select group of design partners.

GPT-5.3-Codex-Spark ships with a context window of 128,000 tokens and handles text-only tasks at launch. It builds on the full GPT-5.3-Codex model that OpenAI released earlier this month. While the full model is designed for heavyweight agentic coding tasks, Spark has been tuned specifically for speed over depth of knowledge. OpenAI developed it as a text-only model optimized for coding, not for the general-purpose tasks handled by the larger GPT-5.3 version.

On benchmarks such as SWE-Bench Pro and Terminal-Bench 2.0, which evaluate software engineering ability, Spark reportedly outperforms the older GPT-5.1-Codex-mini while completing tasks in a fraction of the time. OpenAI has not shared independent validation of these performance numbers.

Historically, Codex’s speed has been a point of criticism. In a test from December where four AI coding agents built Minesweeper clones, Codex took approximately twice as long as Anthropic’s Claude Code to produce a working game.

The 1,000 tokens per second rate for GPT-5.3-Codex-Spark marks a significant leap over what OpenAI has previously delivered through its own infrastructure. Independent benchmarks from Artificial Analysis show that OpenAI’s fastest models on Nvidia hardware fall well below this mark: GPT-4o delivers roughly 147 tokens per second, o3-mini hits about 167, and GPT-4o mini clocks around 52.

OpenAI’s GPT-5.3-Codex-Spark Hits 1,000 Tokens/Second on Cerebras Chips, Bypassing Nvidia

Related Analysis

How Retrieval-Augmented Generation (RAG) Is Solving Hallucination Problems in Production AI Systems

The Hidden Costs of Fine-Tuning: A Pragmatic Analysis of When Custom LLMs Make Financial Sense

AI Tools Showdown: Comparing the Latest Code Generation Assistants on Real Developer Workflows