Movatterモバイル変換


[0]ホーム

URL:


Sign in / up
The Register

Systems

OpenAI dishes out its first model on a plate of Cerebras silicon

GPT-5.3-Codex-Spark may be a mouthfull, but it's certainly fast at 1,000 Tok/s running on Nvidia rival's CS3 accelerators

iconTobias Mann
Thu 12 Feb 2026 //22:32 UTC

Nvidia and AMD can take a seat. On Thursday, OpenAI unveiled GPT-5.3-Codex-Spark, its first model that will run on Cerebras Systems' dinner-place-sized AI accelerators, which feature some of the world's fastest on-chip memory.

The lightweight model isdesigned to provide a more interactive experience to users of OpenAI's Codex code assistant by leveraging Cerebras' SRAM-packed CS3 accelerators to generate responses at more than 1,000 tokens per second.

Last month, OpenAIsigned a $10 billion contract with Cerebras to deploy up to 750 megawatts of its custom AI silicon to serve up Altman and crew's latest generation of GPT models.

Cerebras' waferscale architecture is notable for using a kind of ultra-fast, on-chip memory called SRAM, which is roughly 1,000x faster than the HBM4 found on Nvidia's upcoming Rubin GPUs announced at CES earlier this year.

This, along with optimizations to the inference and application pipelines, allows OpenAI's latest model to churn out answers in the blink of an eye.

As Spark is a proprietary model, we don't have all the details on things like parameter count, as we would if OpenAI had released it on HuggingFace like it did withgpt-oss back in August. What we do know is, just like that model, it's a text-only model with a 128,000-token context window.

If you're not familiar, a model's context window refers to how many tokens (words, punctuation, numbers, etc) it can keep track of at any one time. Because of this, it's often referred to as the model's short-term memory.

While 128K tokens might sound like a lot, because the model has to keep track of both existing and newly generated code, code assistants like Codex can blow through that pretty quickly. Even starting from a blank slate, at 1,000 tokens a second it would take roughly two minutes to overflow the context limit.

This might be why OpenAI says Spark defaults to a "lightweight" style that only makes minimal targeted edits and won't run debug tests unless specifically asked.

A fast model isn't much good if it can't write working code. If OpenAI is to be believed, the Spark model delivers greater accuracy than GPT-5.1-Codex-Mini in Terminal-Bench 2.0 while also being much, much faster than its smarter GPT-5.3-Codex model.

OpenAI may be looking beyond GPUs, but it's certainly not abandoning them anytime soon.

"GPUs remain foundational across our training and inference pipelines and deliver the most cost effective tokens for broad usage. Cerebras complements that foundation by excelling at workflows that demand extremely low latency," OpenAI wrote.

This isn't just lip service. As fast as Cerebra's CS3 accelerators are, they can't match modern GPUs on memory capacity. SRAM may be fast, but it's not space efficient. The entire dinner-place-sized chip contains just 44 GB of memory. By comparison,Nvidia's Rubin will ship with 288 GB of HBM4 while AMD's MI455X willpack on 432 GB.

This makes GPUs more economical for running very large models, especially if speed isn't a priority. Having said that, OpenAI suggests that as Cerebras brings more compute online, it'll be bringing its larger models to the compute platform, presumably for those willing to pay a premium for high-speed inference.

GPT-5.3-Codex-Spark is currently available in preview to Codex Pro users and via API to select OpenAI partners. ®


More like these

More about


COMMENTS

More about

More like these

TIP US OFF

Send us news


Other stories you might like

Gemini lies to user about health info, says it wanted to make him feel better

Though commonly reported, Google doesn't consider it a security problem when models make things up
AI + ML17 Feb 2026 |1

Amazon's $200 billion capex plan: How I learned to stop worrying and love negative free cash flow

It isn't insane, and Amazon will be fine when the music stops. Other players, maybe not so much
On-Prem17 Feb 2026 |2

Infosys bows to its master, signs deal with Anthropic

After a selloff fueled by fears AI could upend the outsourcing model
AI + ML17 Feb 2026 |1

Why high-performance Java is becoming a business imperative

A new generation of JVM technologies is reshaping how businesses build, deploy, and scale mission-critical Java applications.
Sponsored Feature

China remains embedded in US energy networks 'for the purpose of taking it down'

Plus 3 new goon squads targeted critical infrastructure last year
Cyber-crime17 Feb 2026 |3

GPU who? Meta to deploy Nvidia CPUs at large scale

CPU adoption is part of deeper partnership between the Social Network and Nvidia which will see millions of GPUs deployed over next few years
Systems17 Feb 2026 |

AI gets all the good stuff, including Micron's speedy 28 GB/s PCIe 6.0 SSD

Consumers have a long wait ahead of them before they can bring that kind of performance home
Storage17 Feb 2026 |1

AI bit barns grow climate emergency by turning up the gas

Companies talk renewables while firing up gas turbines as fast as they can
Systems17 Feb 2026 |4

Scientists show it's possible to solve problems in your dreams by playing the right sounds

Could the same method one day power sleep-time ads?
Science17 Feb 2026 |12

React survey shows TanStack gains, doubts over server components

Not everyone's convinced React belongs on the server as well as in the browser
Devops17 Feb 2026 |2

European Parliament bars lawmakers from using AI tools

Who knows where that helpful email summary is being generated?
AI + ML17 Feb 2026 |5

Flush with potential? Activist investor insists Japanese toilet giant is an AI sleeper

Palliser Capital says Toto is sitting on hidden semiconductor value – and wants the company to lift the lid
Offbeat17 Feb 2026 |16

[8]ページ先頭

©2009-2026 Movatter.jp