Comparison · 9 min · 3 citations

Replicate vs fal vs Modal Pricing 2026: GPU Cost Compared

Replicate vs fal vs Modal pricing 2026: H100 runs $0.001525/s on Replicate, $0.0005/s on fal, $0.001097/s on Modal. Per-second GPU billing compared.

By AI Biz Hub · Published May 25, 2026

Education · General business information, not legal, tax, or financial advice. Editorial standards Sponsor disclosure Corrections

TL;DR

On the published per-second H100 rate, fal is cheapest at $0.0005/s, against Modal at $0.001097/s and Replicate at $0.001525/s^[1]^[2]^[3]. All three bill GPU time per second, the right granularity for bursty inference.

The rate is half the story. How each bills idle and setup time matters as much: Modal charges only active compute and adds $30/mo free credits^[3]; Replicate private models can bill setup and idle but offer the simplest public-model deploy. Pick on rate plus idle behavior plus how you deploy, not the rate alone.

Replicate, fal, and Modal are the three GPU compute platforms a developer compares when hosting an AI model or running inference at scale in 2026. All three bill per second, which makes the rate comparison clean, but the published rates differ several-fold and the real bill also depends on how each platform charges for idle and setup time. This article puts the verified per-second GPU rates side by side, then explains the idle-time behavior that the sticker rate hides.

1. All three bill GPU time per second

All three price GPU compute per second of use, the correct granularity for bursty inference where you do not want to pay for a full hour to run a ten-second job. Verified against each vendor's pricing page as of May 25, 2026.

Per-second billing means the cost of a job is the per-second GPU rate times the seconds the job runs (plus, on some platforms, setup and idle time). Because the rates are published per second, you can compute the marginal cost of any job directly: a 10-second H100 inference costs the H100 per-second rate times 10. The platforms differ on the rate and on what additional time they count as billable, which is where the comparison gets interesting.

2. H100 and A100 rates compared

Published per-second GPU rates as of May 25, 2026. Note that hardware naming and memory configurations vary slightly between vendors, so match the closest equivalent.

GPU	Replicate	fal	Modal
H100	$0.001525/s^[1]	$0.0005/s^[2]	$0.001097/s^[3]
A100 80GB	$0.001400/s^[1]	not listed (A100 40GB $0.0003/s)^[2]	$0.000694/s^[3]
L40S	$0.000975/s^[1]	not listed^[2]	$0.000542/s^[3]
T4	$0.000225/s^[1]	not listed^[2]	$0.000164/s^[3]

On the H100, fal's $0.0005/s is the lowest by a wide margin, roughly a third of Replicate's $0.001525/s and under half of Modal's $0.001097/s. fal lists fewer GPU types on its public pricing (H100, H200, A100 40GB), so for L40S and T4 the comparison is between Replicate and Modal, where Modal is consistently lower per second. The practical reading: fal is the per-second price leader on the flagship H100, while Modal undercuts Replicate across the broader hardware range it lists.

3. The cost trap is idle time, not the rate

The per-second rate is only the price of active compute. The bill that surprises people is idle and setup time, and the three handle it differently:

Modal states you pay only for actual compute time and never for idle resources, billing by the CPU cycle^[3]. That makes its effective cost on bursty workloads close to the headline rate.
Replicate bills public models by execution time, but private models can be billed for setup, idle, and active time, so a model that boots slowly and stays warm can cost more than the active-only rate suggests^[1].
fal uses per-second billing with output-based pricing on its hosted models; verify whether your specific deployment is billed only for active inference or also for warm-pool time^[2].

This is why the cheapest per-second rate does not automatically mean the cheapest bill. A workload with long idle gaps and slow cold starts can cost far more on a platform that bills idle time than on one that does not, even at a higher headline rate. For sporadic, bursty inference, Modal's active-only billing is a structural advantage; for steady high-utilization workloads where the GPU is busy most of the time, the per-second rate dominates and fal's low H100 rate leads. Fold whichever platform you choose into your full monthly stack budget with the AI stack cost calculator.

4. Free credits and plan structure

Plan structure and free credits differ, which matters for experimentation before you commit:

Platform	Free credit	Plan note
Modal	$30/mo on Starter; $100/mo on Team^[3]	Team $250/mo + compute^[3]
Replicate	None advertised on pricing page^[1]	Pay for what you use; simplest public-model deploy^[1]
fal	None listed on pricing page^[2]	Per-second + output-based pricing^[2]

Modal's recurring $30/month free credit on the free Starter plan is a real differentiator for developers who want to prototype without an upfront bill, and the Team plan adds $100/month in credits on a $250/month base. Replicate and fal do not advertise recurring free credits on their pricing pages, so budget for usage from the first job. Replicate's offsetting strength is deployment simplicity: running a published model is often a single API call, which lowers the engineering cost even if the per-second compute rate is the highest of the three.

5. Decision guidance

Lowest H100 per-second rate: fal at $0.0005/s, well below Modal and Replicate.
Bursty workloads with idle gaps: Modal, which bills only active compute and never idle resources.
Want recurring free credits to prototype: Modal ($30/month on Starter).
Simplest public-model deployment: Replicate, accepting the highest per-second rate for the single-API-call convenience.
Always check idle billing: the cheapest rate can lose to a platform that does not bill idle and setup time on a bursty workload.

Re-verify each pricing page before committing; GPU rates and hardware availability move with supply and demand. For the broader AI-vendor cost picture, see the cheapest LLM API ranking and the 2026 AI solopreneur stack.

All rate figures verified against official pricing pages as of 2026-05-25.

Frequently asked questions

Which is cheapest for GPU inference in 2026: Replicate, fal, or Modal?

On the published per-second H100 rate, fal is the cheapest at $0.0005 per second, against Modal at $0.001097 and Replicate at $0.001525, verified on each vendor's pricing as of May 2026. But the headline rate is only part of the cost: how each platform bills idle and setup time matters as much. fal's per-second H100 rate is the lowest of the three by a wide margin, Modal sits in the middle and gives free monthly credits, and Replicate is the highest per-second but offers the simplest public-model deployment.

Do Replicate, fal, and Modal bill per second?

Yes, all three bill GPU compute per second of use, which is the right granularity for bursty inference workloads. Replicate publishes per-second rates by hardware type, fal publishes per-second rates per GPU, and Modal bills per second for GPU, CPU, and memory separately. The difference is what counts as billable time: Replicate's private models can bill for setup and idle as well as active processing, while Modal bills only actual compute time with no charge for idle resources.

Does Modal give free credits?

Yes. Modal's Starter plan is free and includes $30 per month in free credits, and its Team plan ($250 per month plus compute) includes $100 per month in credits, verified on Modal's pricing as of May 2026. Replicate's pricing page does not advertise a recurring free credit, and fal's pricing page does not list free credits. So for a developer who wants to experiment without an upfront bill, Modal's recurring monthly free credits are a distinguishing feature.

References

Sources

Primary sources only. No vendor-marketing blogs or aggregated secondary claims.

1 Replicate — Pricing (per-second GPU: H100 $0.001525/s, A100 80GB $0.001400/s, L40S $0.000975/s, T4 $0.000225/s) — accessed 2026-05-25
2 fal — Pricing (per-second GPU: H100 $0.0005/s, H200 $0.0006/s, A100 40GB $0.0003/s) — accessed 2026-05-25
3 Modal — Pricing (per-second GPU: H100 $0.001097/s, A100 80GB $0.000694/s, L40S $0.000542/s, T4 $0.000164/s; $30/mo free credits on Starter) — accessed 2026-05-25

Tools referenced in this article

Plan Your Build

AI Stack Cost Calculator

Estimate your full AI app stack cost at different user scales — hosting, DB, auth, AI API, and services.