Skip to main content
aibizhub

Pillar Guide · 11 min · 6 citations

Cheapest LLM API 2026: 8 Providers Ranked by Blended Cost

Cheapest LLM API 2026: DeepSeek V4-flash and Gemini 2.5 Flash-Lite lead at ~$0.18 blended per million tokens. Eight providers ranked at a 3:1 token mix.

By AI Biz Hub · Published May 25, 2026

Education · General business information, not legal, tax, or financial advice. Editorial standards Sponsor disclosure Corrections

TL;DR

On a blended 3:1 input:output mix, DeepSeek V4-flash and Gemini 2.5 Flash-Lite lead at roughly $0.18 per million blended tokens[1][2]. The open-model hosts (Groq, Together, Fireworks) on GPT-OSS 120B follow at about $0.26[6]. Mistral Large 3 lands at $0.75; OpenAI GPT-5.4-mini at $1.69; Anthropic Haiku 4.5 at $2.00.

Blended cost ranks price, not capability. The cheapest provider is only the right one if it clears your task's quality bar. Caching and batch can reorder the list: both Anthropic and OpenAI cut cached input ~90% and offer 50% batch, and DeepSeek's cache hit is near-free. Find the cheapest provider that passes your eval, not the cheapest overall.

The cheapest LLM API in 2026 is DeepSeek V4-flash or Gemini 2.5 Flash-Lite, both near $0.18 per million blended tokens at a 3:1 input:output mix. This pillar ranks eight providers (Anthropic, OpenAI, Gemini, DeepSeek, Mistral, Groq, Together, Fireworks) on one consistent cost metric, flags where capability breaks the ranking, shows how caching and batch reorder it, and runs the cheapest tier through the margin calculator so the saving shows up as gross margin.

1. How the blended cost is computed

Comparing input and output rates separately is noisy, so this ranking uses one blended number: cost per million tokens at a 3:1 input:output ratio, computed as (0.75 × input rate) + (0.25 × output rate). A 3:1 ratio is a reasonable default for retrieval-augmented and chat workloads, where the prompt and context outweigh the completion. For each provider, a representative cheap-to-workhorse model is priced; flagship tiers are noted separately because few cost-sensitive products run them by default.

All per-token rates are verified against each provider's official pricing page as of May 25, 2026. The blended figure is arithmetic on those verified rates, not a benchmark of model quality. A provider can rank first on blended cost and still be the wrong choice if its model fails your task.

2. The 8-provider blended-cost ranking

Representative model per provider, ranked cheapest first by blended 3:1 cost per million tokens.

RankProvider / modelInput / OutputBlended (3:1)
1Gemini 2.5 Flash-Lite[2]$0.10 / $0.40$0.175
1DeepSeek V4-flash[1]$0.14 / $0.28$0.175
3Groq / Together / Fireworks, GPT-OSS 120B[6]$0.15 / $0.60$0.263
4Groq, Llama 3.3 70B[6]$0.59 / $0.79$0.640
5Mistral Large 3[5]$0.50 / $1.50$0.750
6OpenAI GPT-5.4-mini[3]$0.75 / $4.50$1.688
7Anthropic Haiku 4.5[4]$1.00 / $5.00$2.000
8Gemini 3.5 Flash[2]$1.50 / $9.00$3.375

The arithmetic on the leaders: Gemini 2.5 Flash-Lite is 0.75 × $0.10 + 0.25 × $0.40 = $0.075 + $0.10 = $0.175. DeepSeek V4-flash is 0.75 × $0.14 + 0.25 × $0.28 = $0.105 + $0.07 = $0.175. They tie. The open-model hosts cluster at $0.263 because GPT-OSS 120B is priced identically across Groq, Together, and Fireworks. For reference, the closed-frontier mid and flagship tiers (not in the cheap ranking) are far higher: OpenAI GPT-5.4 blends to $5.625, Anthropic Sonnet 4.6 to $6.00, on the same 3:1 mix.

3. The capability caveat

This ranking is a price ranking. It does not claim DeepSeek V4-flash or Gemini 2.5 Flash-Lite produces output equal to a flagship model. Three honest qualifiers:

  • Cheap-tier models are smaller. Flash-Lite and V4-flash are built for cost and speed, not maximum reasoning. For complex agentic or coding tasks, a pricier model often does the job in fewer tokens and fewer retries, which can make it cheaper in total despite a higher per-token rate.
  • Representative models differ in capability. The eight rows above are not capability-matched. GPT-OSS 120B is a large open model; Gemini 2.5 Flash-Lite is a small fast model. Same blended-cost table, different jobs.
  • Total cost includes retries and length. A model that needs two attempts or writes twice as much costs more than its per-token rate implies. Measure dollars per completed task on your workload, not dollars per token.

The correct decision rule: start at the top of the price ranking, run your eval, and stop at the first provider that clears your quality bar. That is usually not the single cheapest row, but it is rarely the most expensive one either.

4. Caching and batch change the order

The headline ranking assumes no caching and no batch. Both reorder the list for the right workload:

  • Cache hits. DeepSeek V4-flash cache-hit input is $0.0028 per million tokens[1], near-free. Anthropic cache reads are 0.1x base (Sonnet 4.6 cached input $0.30 against $3)[4]. A workload that resends a large stable context collapses input cost toward the cache rate, which can pull a nominally pricier provider below a cheaper one.
  • Batch. Anthropic and OpenAI both offer a 50% batch discount[3][4]. For non-time-sensitive bulk jobs (classification, document processing), the batch rate is the number to rank on, not the standard rate.

The practical instruction: compute the blended cost with your real cache-hit ratio and batch share. A product that caches 80% of input on Sonnet 4.6 has a far lower effective input cost than the $3 standard rate suggests, which narrows the gap to the cheap-tier leaders considerably.

5. Margin on a $19 SaaS at the cheap tier

The margin calculator prices a $19/month SaaS where each user triggers 40 API calls a day at 800 input and 600 output tokens, on the cheap-tier leader DeepSeek V4-flash ($0.14 / $0.28). The engine renders gross margin from the verified rate:

Show the recompute-verified inputs and outputs
$19 SaaS on the cheap-tier leader, DeepSeek V4-flash ($0.14 / $0.28 per MTok)
Inputs
subscription_price 19
avg_api_calls_per_day 40
avg_input_tokens 800
avg_output_tokens 600
input_cost_per_million 0.14
output_cost_per_million 0.28
hosting_cost_per_user 0.5
other_per_user_costs 0.25
Result
api cost per user 0.34
total cost per user 1.09
gross margin per user 17.91
gross margin percent 94.3
api share percent 31.2
dominant cost driver Hosting
scale tiers › row 1 › users 100
scale tiers › row 1 › total revenue 1900
scale tiers › row 1 › total cost 109
scale tiers › row 1 › total profit 1791
scale tiers › row 1 › margin percent 94.3
scale tiers › row 2 › users 1000
scale tiers › row 2 › total revenue 19000
scale tiers › row 2 › total cost 1090
scale tiers › row 2 › total profit 17910
scale tiers › row 2 › margin percent 94.3
scale tiers › row 3 › users 10000
scale tiers › row 3 › total revenue 190000
scale tiers › row 3 › total cost 10900
scale tiers › row 3 › total profit 179100
scale tiers › row 3 › margin percent 94.3
insight 94.3% gross margin is healthy. Hosting is your largest cost at $0.5/user/month. At 10K users you keep $179100/month after per-user costs.

Computed live at build time.

At the cheapest tier, API cost almost disappears from the per-user economics: the engine returns a high gross margin because a few cents of model cost per user leaves the bulk of the $19 as contribution. The strategic read for a cost-sensitive product is that the cheap tier turns the model from the dominant cost line into a rounding error, which frees the budget for acquisition or for a stronger model on the calls that need it.

The mirror scenario is on the pricier closed-frontier tiers, where the same call volume can consume a large share of a low subscription price. The Anthropic vs OpenAI comparison, the DeepSeek vs Gemini comparison, and the open-model host comparison drill into each pairing. The AI Product Margin Calculator lets you run your own rates.

6. Decision guidance

  • Cheapest raw cost, text-only: DeepSeek V4-flash or Gemini 2.5 Flash-Lite, tied at $0.175 blended. Gemini if you want Google Cloud and multimodal headroom; DeepSeek for the lowest output rate and near-free cache hits.
  • Open weights, want speed or self-host option: GPT-OSS 120B on Groq (fastest), Together, or Fireworks at $0.263 blended.
  • EU data residency at low cost: Mistral Large 3 at $0.75 blended, the cheapest EU-based option here.
  • Need flagship capability: step up to GPT-5.5, Claude Opus 4.8, or Gemini 3.1 Pro (the current Pro flagship; Gemini 3.5 Pro is its announced successor, rolling out mid-2026) and accept the higher blended cost. Rank those on the matched-tier comparisons, not this cheap-tier list.

Re-verify every provider's pricing page before committing; per-token rates at this layer move with each model release, and DeepSeek specifically has a V4-pro promo ending 2026-05-31. Rank on your real cache-hit and batch ratios, and stop at the cheapest provider that passes your eval.

All per-token figures verified against official pricing pages as of 2026-05-25.

Frequently asked questions

What is the cheapest LLM API in 2026?

On a blended 3-to-1 input-to-output mix, DeepSeek V4-flash ($0.14 input / $0.28 output) and Gemini 2.5 Flash-Lite ($0.10 input / $0.40 output) tie at roughly $0.18 per million blended tokens, the cheapest of the eight providers checked as of May 2026. The three open-model hosts (Groq, Together, Fireworks) running GPT-OSS 120B at $0.15 / $0.60 follow at about $0.26 blended. All rates verified on each provider's official pricing page.

Is the cheapest LLM API also the best choice?

Not always. Blended cost ranks raw price, not capability. DeepSeek and Gemini Flash-Lite win on dollars per token, but a harder task may need a stronger model where the cost is justified by output quality and retention. The right method is to find the cheapest provider that clears your quality bar on your actual prompts, not the cheapest provider overall.

How much do caching and batch change the ranking?

A lot for the right workload. Anthropic and OpenAI both cut cached input by roughly 90 percent and offer a 50 percent batch discount, and DeepSeek's V4-flash cache-hit input is $0.0028 per million tokens. A heavily cached or batched workload can reorder the ranking, pulling a nominally pricier provider below a cheaper one. Compute the blended cost with your real cache-hit and batch ratios, not the headline rates.

What is the cheapest LLM API for classification and extraction on a solo SaaS under $1 per user per month?

For classification, extraction, and routing on a solo SaaS, the cheapest options are Gemini 2.5 Flash-Lite ($0.10 input / $0.40 output) and DeepSeek V4-flash ($0.14 / $0.28), tied at $0.175 per million blended tokens on a 3-to-1 input-to-output mix. On the $19 SaaS scenario modeled here at 40 API calls per user per day, the AI Product Margin Calculator shows API cost almost disappearing from the per-user economics on DeepSeek V4-flash, well under $1 per user per month. These small fast models are built for exactly this kind of high-volume, low-reasoning task; reserve a pricier tier for the minority of calls that genuinely need stronger reasoning.

Which cheap LLM API should I pick if I need EU data residency under a tight budget?

Mistral Large 3 at $0.50 input / $1.50 output per million tokens, which blends to $0.75 per million on a 3-to-1 mix, is the cheapest EU-based option in this eight-provider ranking. It costs more than the global cheap-tier leaders (DeepSeek V4-flash and Gemini 2.5 Flash-Lite at $0.175 blended) but is the lowest-cost choice when EU data residency is a hard requirement. Verify Mistral's current rates and residency terms on their docs before committing, since per-token pricing at this layer moves with each model release.

References

Sources

Primary sources only. No vendor-marketing blogs or aggregated secondary claims.

  1. 1 DeepSeek — API pricing (V4-flash $0.14/$0.28) — accessed 2026-05-25
  2. 2 Google — Gemini API pricing (2.5 Flash-Lite $0.10/$0.40, 3.5 Flash $1.50/$9.00) — accessed 2026-05-25
  3. 3 OpenAI — API pricing (GPT-5.4-mini $0.75/$4.50, GPT-5.4 $2.50/$15) — accessed 2026-05-25
  4. 4 Anthropic — Claude API pricing (Haiku 4.5 $1/$5, Sonnet 4.6 $3/$15) — accessed 2026-05-25
  5. 5 Mistral — Mistral Large 3 docs ($0.50/$1.50) — accessed 2026-05-25
  6. 6 Groq / Together / Fireworks — pricing (GPT-OSS 120B $0.15/$0.60 on all three) — accessed 2026-05-25

Tools referenced in this article

Related articles

Business planning estimates — not legal, tax, or accounting advice.