Comparison · 9 min · 3 citations
Groq vs Together vs Fireworks API Pricing 2026: Cost Compared
Groq vs Together vs Fireworks API pricing 2026: all three host GPT-OSS 120B at $0.15/$0.60 per million tokens; Llama 3.3 70B rates and speed split them.
On the same model they tie: Groq, Together, and Fireworks all host GPT-OSS 120B at $0.15 input / $0.60 output per million tokens[1][2][3]. On Llama 3.3 70B they split: Groq $0.59 / $0.79, Together $0.88 / $0.88, Fireworks $0.90 (16B+ band). Groq is cheapest on that model.
Because these providers host the same open weights, per-token price converges on popular models and the real differentiators are inference speed and model catalog. Groq's selling point is throughput (GPT-OSS 120B at about 500 tokens/sec). Together and Fireworks compete on breadth and price. For a SaaS running GPT-OSS 120B, all three cost the same per token; pick on speed and the rest of your model mix.
Groq, Together, and Fireworks all sell hosted inference on the same open-weight models, so on a popular model like GPT-OSS 120B the per-token price is identical at $0.15 / $0.60. The spread appears on other models (Llama 3.3 70B ranges from $0.59 to $0.90) and on inference speed, where Groq leads. This article anchors on a common model to show the tie, then works the divergence on Llama 3.3 70B, the speed trade, and the margin on a SaaS workload.
1. Same model, three providers
Per-million-token rates verified against each provider's pricing page as of May 25, 2026. GPT-OSS 120B is the common anchor because all three list it.
| Provider | GPT-OSS 120B (in / out) | GPT-OSS 20B (in / out) | Headline speed |
|---|---|---|---|
| Groq | $0.15 / $0.60[1] | $0.075 / $0.30[1] | ~500 TPS (120B)[1] |
| Together AI | $0.15 / $0.60[2] | $0.05 / $0.20[2] | Not headlined[2] |
| Fireworks AI | $0.15 / $0.60[3] | Parameter band[3] | Not headlined[3] |
GPT-OSS 120B is a dead heat at $0.15 / $0.60 across all three. On the smaller GPT-OSS 20B, Together is marginally cheapest at $0.05 / $0.20, against Groq's $0.075 / $0.30. The lesson is structural: when three providers host identical open weights, competition drives the popular-model price to the same number, and you choose on everything else.
2. Where the rates diverge: Llama 3.3 70B
The convergence breaks on models priced by each provider's own structure. Llama 3.3 70B is the clearest example:
| Provider | Llama 3.3 70B (in / out) | Pricing basis |
|---|---|---|
| Groq | $0.59 / $0.79[1] | Per-model rate (Versatile) |
| Together AI | $0.88 / $0.88[2] | Per-model flat rate |
| Fireworks AI | $0.90 flat[3] | 16B+ parameter band |
Groq is the cheapest on Llama 3.3 70B at $0.59 / $0.79, roughly a third below Together and Fireworks on input. Fireworks prices by parameter band ($0.10 under 4B, $0.20 for 4 to 16B, $0.90 above 16B), so any 70B model lands in the $0.90 band regardless of which one. Together and Groq price per model, which can be cheaper for specific popular models and is the reason the Llama 70B spread is wider than the GPT-OSS 120B tie.
3. Speed is the Groq differentiator
Groq's pricing page leads with tokens-per-second, which the other two do not headline[1]. Published figures include GPT-OSS 120B at about 500 tokens/sec, Llama 3.1 8B Instant at about 840 tokens/sec, and Llama 3.3 70B Versatile at about 394 tokens/sec. For a chat or agent product where time-to-first-token and total latency drive the user experience, that throughput is the reason to pick Groq even at a tied per-token price.
The trade-off framing: on GPT-OSS 120B the three providers cost the same per token, so Groq's speed is a free upgrade if it carries your model. On a model where Groq is also cheaper (Llama 3.3 70B), it wins on both axes. Together and Fireworks earn their place on catalog breadth: Together lists a wide range of Qwen, DeepSeek, and Gemma variants, and Fireworks covers a large model library with predictable parameter-band pricing for less common models.
4. Margin on a $25 SaaS with GPT-OSS 120B
Because GPT-OSS 120B is priced identically on all three, one engine run prices the workload for any of them. The calculator below prices a $25/month SaaS where each user triggers 30 API calls a day at 600 input and 400 output tokens on GPT-OSS 120B ($0.15 / $0.60). The engine renders the margin:
Show the recompute-verified inputs and outputs
| subscription_price | 25 |
|---|---|
| avg_api_calls_per_day | 30 |
| avg_input_tokens | 600 |
| avg_output_tokens | 400 |
| input_cost_per_million | 0.15 |
| output_cost_per_million | 0.6 |
| hosting_cost_per_user | 0.4 |
| other_per_user_costs | 0.2 |
| api cost per user | 0.3 |
|---|---|
| total cost per user | 0.9 |
| gross margin per user | 24.1 |
| gross margin percent | 96.4 |
| api share percent | 33.3 |
| dominant cost driver | Hosting |
| scale tiers › row 1 › users | 100 |
| scale tiers › row 1 › total revenue | 2500 |
| scale tiers › row 1 › total cost | 90 |
| scale tiers › row 1 › total profit | 2410 |
| scale tiers › row 1 › margin percent | 96.4 |
| scale tiers › row 2 › users | 1000 |
| scale tiers › row 2 › total revenue | 25000 |
| scale tiers › row 2 › total cost | 900 |
| scale tiers › row 2 › total profit | 24100 |
| scale tiers › row 2 › margin percent | 96.4 |
| scale tiers › row 3 › users | 10000 |
| scale tiers › row 3 › total revenue | 250000 |
| scale tiers › row 3 › total cost | 9000 |
| scale tiers › row 3 › total profit | 241000 |
| scale tiers › row 3 › margin percent | 96.4 |
| insight | 96.4% gross margin is healthy. Hosting is your largest cost at $0.4/user/month. At 10K users you keep $241000/month after per-user costs. |
Computed live at build time.
The engine returns a high gross margin because GPT-OSS 120B at $0.15 / $0.60 makes per-user API cost a few tens of cents at this call volume. At these rates, the model cost is not the constraint on the business; hosting and other per-user costs become the larger share. The decision among the three providers therefore does not move margin on this model. It moves latency (Groq) and which other models you can reach without adding a vendor (Together, Fireworks).
For products that also run a 70B model, the picture shifts: Groq's $0.59 / $0.79 on Llama 3.3 70B is meaningfully cheaper than Together's $0.88 / $0.88, so a mixed workload tilts toward Groq on both cost and speed. The cheapest LLM API ranking places these open-model hosts against the closed-frontier vendors.
5. Decision guidance
- Running GPT-OSS 120B only: all three tie on price. Pick Groq for speed unless you need a model only Together or Fireworks hosts.
- Running Llama 3.3 70B: Groq is cheapest at $0.59 / $0.79, and fastest. Clear pick.
- Wide or unusual model catalog: Together (broad list) or Fireworks (predictable parameter-band pricing for less common models).
- Latency-critical chat or agents: Groq, for its published tokens-per-second lead.
Re-verify each provider's pricing page before committing; open-model hosting rates move with hardware and competition. For the closed-frontier comparison, see Anthropic vs OpenAI pricing and the full cheapest LLM API ranking.
All per-token figures verified against official pricing pages as of 2026-05-25.
Frequently asked questions
Which is cheapest for GPT-OSS 120B in 2026?
They tie. Groq, Together AI, and Fireworks all host OpenAI's GPT-OSS 120B at $0.15 per million input tokens and $0.60 per million output tokens, verified on each provider's pricing page as of May 2026. Because the model is identical open weights, the per-token cost is a dead heat on this model. The decision moves to inference speed and the price of other models you also run.
Is Groq faster than Together and Fireworks?
Groq publishes tokens-per-second figures that are its main selling point: GPT-OSS 120B at about 500 tokens per second and Llama 3.1 8B Instant at about 840 tokens per second on its pricing page. Together and Fireworks compete on price and model breadth more than headline throughput. For latency-sensitive applications, Groq's speed is the differentiator at a comparable per-token price.
Do the three providers price Llama 3.3 70B the same?
No. Groq prices Llama 3.3 70B Versatile at $0.59 input / $0.79 output per million tokens. Together prices Llama 3.3 70B at $0.88 for both input and output. Fireworks places models above 16B parameters in a $0.90 per million token band. Groq is the cheapest of the three on this specific model, and the spread is wider than on the identically-priced GPT-OSS 120B.
References
Sources
Primary sources only. No vendor-marketing blogs or aggregated secondary claims.
- 1 Groq — Pricing (per-MTok rates and tokens-per-second for hosted open models) — accessed 2026-05-25
- 2 Together AI — Pricing (serverless per-MTok rates for open models) — accessed 2026-05-25
- 3 Fireworks AI — Serverless pricing (per-MTok rates and parameter bands) — accessed 2026-05-25
Tools referenced in this article
Related articles