Comparison · 9 min · 2 citations
Anthropic vs OpenAI API Pricing 2026: Per-Token Cost Compared
Anthropic vs OpenAI API pricing 2026: Claude Sonnet 4.6 is $3/$15 per million tokens, GPT-5.4 is $2.50/$15. Flagship, mini, and cached rates compared.
At the mid tier, OpenAI's GPT-5.4 ($2.50 input / $15 output per million tokens) is marginally cheaper on input than Claude Sonnet 4.6 ($3 / $15), with identical output[1][2]. At the flagship tier, Claude Opus 4.7 ($5 / $25) undercuts GPT-5.5 ($5 / $30) on output. At the cheap tier, GPT-5.4-mini ($0.75 / $4.50) is below Claude Haiku 4.5 ($1 / $5).
The blended winner depends on your input-to-output ratio: output-heavy workloads favor Claude at the flagship tier, input-heavy workloads favor OpenAI's mid tier. Both cut cached input by roughly 90% and offer a 50% batch discount. The per-token spread at the comparable tier is small enough that capability fit usually decides the choice, not price.
Update June 2026: Anthropic's Opus flagship is now Claude Opus 4.8 (released 2026-05-28), which replaces Opus 4.7 at the same $5 / $25 standard rate and 1M context. Every figure and comparison below still holds.
Anthropic and OpenAI price within cents of each other at matched tiers, so the real lever is your input-to-output ratio and how much you can cache. Claude Sonnet 4.6 runs $3 input / $15 output per million tokens; OpenAI's GPT-5.4 runs $2.50 / $15. This article maps the tiers across both vendors, prices the prompt-caching discount, and runs a SaaS margin scenario through the calculator so the cost difference shows up as gross margin rather than abstract per-token rates.
1. The per-token rate table
Standard (non-batch) rates per million tokens, verified against both official pricing pages as of May 25, 2026.
| Tier | Anthropic model | Input / Output | OpenAI model | Input / Output |
|---|---|---|---|---|
| Flagship | Claude Opus 4.7[1] | $5 / $25 | GPT-5.5[2] | $5 / $30 |
| Mid / workhorse | Claude Sonnet 4.6[1] | $3 / $15 | GPT-5.4[2] | $2.50 / $15 |
| Cheap / fast | Claude Haiku 4.5[1] | $1 / $5 | GPT-5.4-mini[2] | $0.75 / $4.50 |
| Cheapest | Claude Haiku 4.5[1] | $1 / $5 | GPT-5.4-nano[2] | $0.20 / $1.25 |
| Cached input (read) | Sonnet 4.6 cache hit[1] | $0.30 | GPT-5.4 cached[2] | $0.25 |
| Batch (50% off) | Sonnet 4.6 batch[1] | $1.50 / $7.50 | GPT-5.4 batch[2] | $1.25 / $7.50 |
The pattern: OpenAI is slightly lower on input at the mid and cheap tiers, while Anthropic is lower on output at the flagship tier ($25 vs $30). At the rock-bottom end, OpenAI's GPT-5.4-nano ($0.20 / $1.25) is far below anything Anthropic publishes, so high-volume, low-complexity classification favors OpenAI's nano tier.
2. Matching tiers across vendors
Comparing list prices is only fair if the models are comparable in capability. The rough mapping for cost planning:
- Flagship reasoning: Claude Opus (4.8 since 2026-05-28, $5/$25, same rate as the 4.7 it replaced) against GPT-5.5. Both top each vendor's lineup. The Opus 4.7 generation shipped a new tokenizer that the Anthropic docs noted may consume up to 35% more tokens for the same text, which can offset its lower per-token output rate; re-check token counts on 4.8 for your prompts rather than assuming the headline rate transfers one to one.
- Production workhorse: Claude Sonnet 4.6 against GPT-5.4. This is the tier most SaaS products run on. The $0.50 input gap favors OpenAI, the output is identical, so an input-heavy retrieval workload tilts to GPT-5.4.
- Cheap and fast: Claude Haiku 4.5 against GPT-5.4-mini. Close on output; GPT-5.4-mini is cheaper on input.
The tokenizer note matters for Opus 4.7 specifically. A lower per-token rate that consumes more tokens per request can net out higher than a competitor with a higher rate and a leaner tokenizer. The only honest comparison is to run your real prompts through both and compare total dollars, not per-token rates.
3. Prompt caching changes the effective rate
Both vendors discount cached input by roughly 90%. On Anthropic, a cache read costs 0.1x the base input rate, so Sonnet 4.6 cached input is $0.30 per million tokens instead of $3[1]. Anthropic charges a cache-write premium (1.25x base for a 5-minute cache, 2x for a 1-hour cache), so caching pays off after the first read on the short cache. OpenAI's cached input is similarly discounted, with GPT-5.4 cached input at $0.25 against $2.50 standard[2].
For an agent that resends a large system prompt, a long document, or conversation history on every call, the cached portion dominates input volume. If 80% of your input tokens are cacheable, the effective input rate on Sonnet 4.6 drops from $3 toward roughly $0.84 per million (0.8 × $0.30 + 0.2 × $3). Caching is the single biggest lever on input cost at both vendors, and it narrows the already-small per-token gap between them.
4. Margin impact on a $29 SaaS
Abstract per-token rates do not communicate the decision. The margin calculator below prices a $29/month SaaS where each user triggers 20 API calls a day at 500 input and 1,000 output tokens, first on Claude Sonnet 4.6 ($3 / $15), then on GPT-5.4 ($2.50 / $15). The engine renders the gross margin from the verified rates:
Show the recompute-verified inputs and outputs
| subscription_price | 29 |
|---|---|
| avg_api_calls_per_day | 20 |
| avg_input_tokens | 500 |
| avg_output_tokens | 1000 |
| input_cost_per_million | 3 |
| output_cost_per_million | 15 |
| hosting_cost_per_user | 0.5 |
| other_per_user_costs | 0.25 |
| api cost per user | 9.9 |
|---|---|
| total cost per user | 10.65 |
| gross margin per user | 18.35 |
| gross margin percent | 63.3 |
| api share percent | 93 |
| dominant cost driver | AI API |
| scale tiers › row 1 › users | 100 |
| scale tiers › row 1 › total revenue | 2900 |
| scale tiers › row 1 › total cost | 1065 |
| scale tiers › row 1 › total profit | 1835 |
| scale tiers › row 1 › margin percent | 63.3 |
| scale tiers › row 2 › users | 1000 |
| scale tiers › row 2 › total revenue | 29000 |
| scale tiers › row 2 › total cost | 10650 |
| scale tiers › row 2 › total profit | 18350 |
| scale tiers › row 2 › margin percent | 63.3 |
| scale tiers › row 3 › users | 10000 |
| scale tiers › row 3 › total revenue | 290000 |
| scale tiers › row 3 › total cost | 106500 |
| scale tiers › row 3 › total profit | 183500 |
| scale tiers › row 3 › margin percent | 63.3 |
| insight | AI API is 93% of your per-user cost ($9.9/mo). At 10K users, that is $99000/month on API alone. Caching responses or using a tiered model approach could significantly improve margins. |
Computed live at build time.
| subscription_price | 29 |
|---|---|
| avg_api_calls_per_day | 20 |
| avg_input_tokens | 500 |
| avg_output_tokens | 1000 |
| input_cost_per_million | 2.5 |
| output_cost_per_million | 15 |
| hosting_cost_per_user | 0.5 |
| other_per_user_costs | 0.25 |
| api cost per user | 9.75 |
|---|---|
| total cost per user | 10.5 |
| gross margin per user | 18.5 |
| gross margin percent | 63.8 |
| api share percent | 92.9 |
| dominant cost driver | AI API |
| scale tiers › row 1 › users | 100 |
| scale tiers › row 1 › total revenue | 2900 |
| scale tiers › row 1 › total cost | 1050 |
| scale tiers › row 1 › total profit | 1850 |
| scale tiers › row 1 › margin percent | 63.8 |
| scale tiers › row 2 › users | 1000 |
| scale tiers › row 2 › total revenue | 29000 |
| scale tiers › row 2 › total cost | 10500 |
| scale tiers › row 2 › total profit | 18500 |
| scale tiers › row 2 › margin percent | 63.8 |
| scale tiers › row 3 › users | 10000 |
| scale tiers › row 3 › total revenue | 290000 |
| scale tiers › row 3 › total cost | 105000 |
| scale tiers › row 3 › total profit | 185000 |
| scale tiers › row 3 › margin percent | 63.8 |
| insight | AI API is 92.9% of your per-user cost ($9.75/mo). At 10K users, that is $97500/month on API alone. Caching responses or using a tiered model approach could significantly improve margins. |
Computed live at build time.
The two engine runs land within a single point of gross margin of each other. The reason is that output tokens dominate the bill (1,000 output vs 500 input per call, and output is five times the input rate on both), and the output rate is identical at $15. The $0.50 input difference barely moves the per-user economics. The practical read: at the mid tier, the Claude-vs-GPT choice is not a margin decision. It is a capability decision. Pick the model that produces better output on your task, because the cost difference is noise.
The lever that does move margin is in the engine's own insight line: AI API cost is the dominant share of per-user cost in this scenario. Cutting output tokens (tighter prompts, shorter completions) or moving routine calls to the cheap tier (Haiku 4.5 or GPT-5.4-mini) changes margin far more than switching vendors at the same tier. The cheapest LLM API ranking covers the full provider field.
5. Decision guidance
- Output-heavy generation at the flagship tier: Claude Opus 4.7 at $25 output undercuts GPT-5.5 at $30, but verify against the Opus tokenizer overhead on your prompts.
- Input-heavy retrieval at the mid tier: GPT-5.4 at $2.50 input edges Claude Sonnet 4.6 at $3, with identical output.
- High-volume cheap classification: GPT-5.4-nano at $0.20 / $1.25 is far below anything in Anthropic's published lineup.
- Heavy cacheable context: both cut cached input ~90%; the decision stays a capability call, not a price call.
Re-verify both pricing pages before committing to a model in production. Per-token rates at this layer move with each model release. For the broader field including open-weight and low-cost providers, see the cheapest LLM API ranking and DeepSeek vs Gemini pricing.
All per-token figures verified against official pricing pages as of 2026-06-12. Anthropic's Opus flagship is now Claude Opus 4.8 (since 2026-05-28) at the same $5/$25 standard rate.
Frequently asked questions
Is Claude or GPT cheaper per token in 2026?
At the mid tier, OpenAI's GPT-5.4 is marginally cheaper on input ($2.50 per million input tokens) than Claude Sonnet 4.6 ($3 per million), with identical $15 output, verified on both official pricing pages as of May 2026. At the flagship tier, Claude Opus 4.7 ($5 input / $25 output) is cheaper on output than GPT-5.5 ($5 input / $30 output). At the cheap tier, GPT-5.4-mini ($0.75 / $4.50) is close to Claude Haiku 4.5 ($1 / $5). The blended winner depends on your input-to-output ratio.
How much does prompt caching reduce cost?
Both vendors discount cached input heavily. Anthropic charges 0.1x the base input rate on a cache read, so a Sonnet 4.6 cache hit is $0.30 per million tokens instead of $3. OpenAI's cached input is also roughly a 90% discount, for example GPT-5.4 cached input at $0.25 against $2.50 standard. For agents that resend a large system prompt every call, caching can cut effective input cost by 80 to 90 percent.
Which has the better batch discount?
Both offer a 50 percent batch discount for asynchronous workloads. Claude Sonnet 4.6 batch is $1.50 input / $7.50 output per million tokens; GPT-5.4 batch is $1.25 input / $7.50 output. The batch path is the cheapest route on both for non-time-sensitive jobs like bulk classification or document processing.
References
Sources
Primary sources only. No vendor-marketing blogs or aggregated secondary claims.
Tools referenced in this article
Related articles
9 min
Mistral vs OpenAI API Pricing 2026: Per-Token Cost Compared
Mistral vs OpenAI API pricing 2026: Mistral Large 3 is $0.50/$1.50 per million tokens, GPT-5.4 is $2.50/$15. EU data residency and the cost gap compared.
11 min
Cheapest LLM API 2026: 8 Providers Ranked by Blended Cost
Cheapest LLM API 2026: DeepSeek V4-flash and Gemini 2.5 Flash-Lite lead at ~$0.18 blended per million tokens. Eight providers ranked at a 3:1 token mix.