1. Scope
Runs a two-proportion z-test on binary conversion data and reports p-value, confidence interval, and observed lift. It is not a Bayesian engine and does not correct for peeking, multiple comparisons, or sequential analysis.
2. Inputs and outputs
Inputs
- controlVisitors number
- controlConversions number
- variantVisitors number
- variantConversions number
- alpha percent default: 5
Significance threshold.
Outputs
- zScore
Two-proportion z statistic.
- pValue
Two-sided p-value.
- liftPercent
(variantRate − controlRate) / controlRate.
- isSignificant
True iff pValue < alpha.
Engine source: src/lib/ab-test-significance-calculator/engine.ts
3. Formula / scoring logic
p_pooled = (xA + xB) / (nA + nB)
SE = sqrt(p_pooled * (1 - p_pooled) * (1/nA + 1/nB))
z = (pB - pA) / SE
p_value = 2 * (1 - Φ(|z|)) 4. Assumptions
- Samples are independent and randomly assigned.
- Visitor counts are large enough for the normal approximation (rule of thumb: np ≥ 10 and n(1−p) ≥ 10 in both arms).
- Two-sided test at a fixed alpha entered up front — no sequential-testing correction.
5. Data sources
6. Known limitations
- Peeking inflates false-positive rate. Fix the sample size up front or use a sequential-testing method (mSPRT, Bayesian bandit) instead.
- The folk claim that "90% of A/B tests are inconclusive" has no peer-reviewed source; we do not cite it. Power your experiments to detect a lift you would actually act on.
- Two-proportion z-test is unreliable for very small counts. For small n, use Fisher's exact test.
7. Reproducibility
Input
A: 5,000 visitors / 250 conversions; B: 5,000 visitors / 300 conversions; alpha = 5%.
Expected output
rateA = 5%, rateB = 6%, z ≈ 2.18, p ≈ 0.029, lift = 20%, significant at α = 5%.
8. Change log
- 2026-04-24 methodology page first published.