1. Scope
Runs a two-proportion z-test on binary conversion data and reports p-value, confidence interval, and observed lift. It is not a Bayesian engine and does not correct for peeking, multiple comparisons, or sequential analysis.
2. Inputs and outputs
Inputs
- controlVisitors number
- controlConversions number
- variantVisitors number
- variantConversions number
- alpha percent default: 5
Significance threshold.
Outputs
- zScore
Two-proportion z statistic.
- pValue
Two-sided p-value.
- liftPercent
(variantRate − controlRate) / controlRate.
- isSignificant
True iff pValue < alpha.
Engine source: src/lib/ab-test-significance-calculator/engine.ts
3. Formula / scoring logic
p_pooled = (xA + xB) / (nA + nB)
SE = sqrt(p_pooled * (1 - p_pooled) * (1/nA + 1/nB))
z = (pB - pA) / SE
p_value = 2 * (1 - Φ(|z|)) 4. Assumptions
- Samples are independent and randomly assigned.
- Visitor counts are large enough for the normal approximation (rule of thumb: np ≥ 10 and n(1−p) ≥ 10 in both arms).
- Two-sided test at a fixed alpha entered up front — no sequential-testing correction.
5. Data sources
6. Known limitations
- Peeking inflates false-positive rate. Fix the sample size up front or use a sequential-testing method (mSPRT, Bayesian bandit) instead.
- The folk claim that "90% of A/B tests are inconclusive" has no peer-reviewed source; we do not cite it. Power your experiments to detect a lift you would actually act on.
- Two-proportion z-test is unreliable for very small counts. For small n, use Fisher's exact test.
7. Reproducibility
Input
A: 5,000 visitors / 250 conversions; B: 5,000 visitors / 300 conversions; alpha = 5%.
Expected output
rateA = 5%, rateB = 6%, z ≈ 2.18, p ≈ 0.029, lift = 20%, significant at α = 5%.
8. Change log
- 2026-04-24 methodology page first published.
Worked example
Run live against the same engine this site ships
(/engines/ab-test-significance-calculator.js).
The inputs and outputs below are recomputed on every build and
independently re-verified in CI — they are never hand-authored.
Input
- tool
- ab_test_significance
- visitors_a
- 5000
- conversions_a
- 250
- visitors_b
- 5000
- conversions_b
- 285
- confidence_level
- 95
Output
- rateA
- 0.05
- rateB
- 0.057
- relativeLift
- 14
- zScore
- 1.5554
- pValue
- 0.1199
- conclusion
- Not Significant
- confidenceLevel
- 95
- requiredSampleSize
- 16224
- powerMessage
- Not significant yet. Continue the test or increase traffic to reach a reliable conclusion.
Frequently asked questions
- What does the A/B Test Significance Calculator calculate?
- Runs a two-proportion z-test on binary conversion data and reports p-value, confidence interval, and observed lift. It is not a Bayesian engine and does not correct for peeking, multiple comparisons, or sequential analysis.
- What inputs does the A/B Test Significance Calculator need?
- It takes 5 inputs: controlVisitors, controlConversions, variantVisitors, variantConversions, alpha (default 5). Outputs returned: zScore, pValue, liftPercent, isSignificant.
- What formula does the A/B Test Significance Calculator use?
- The exact computation is: p_pooled = (xA + xB) / (nA + nB); SE = sqrt(p_pooled * (1 - p_pooled) * (1/nA + 1/nB)); z = (pB - pA) / SE; p_value = 2 * (1 - Φ(|z|))
- Can I verify the A/B Test Significance Calculator with a worked example?
- Yes. With A: 5,000 visitors / 250 conversions; B: 5,000 visitors / 300 conversions; alpha = 5%. the tool returns rateA = 5%, rateB = 6%, z ≈ 2.18, p ≈ 0.029, lift = 20%, significant at α = 5%.
- Where does the A/B Test Significance Calculator get its benchmark data?
- Reference data is sourced from: Agresti & Coull (1998), Approximate is Better than Exact for Interval Estimation (as of 1998).
- What can the A/B Test Significance Calculator not tell me?
- Known limitations: Peeking inflates false-positive rate. Fix the sample size up front or use a sequential-testing method (mSPRT, Bayesian bandit) instead. The folk claim that "90% of A/B tests are inconclusive" has no peer-reviewed source; we do not cite it. Power your experiments to detect a lift you would actually act on. Two-proportion z-test is unreliable for very small counts. For small n, use Fisher's exact test.