Skip to main content
aibizhub
Structured methodology As of 2026-04-24

How A/B Test Significance Calculator works

What the tool assumes, what data it pulls from, and what it cannot tell you.

Education · General business information, not legal, tax, or financial advice. Editorial standards Sponsor disclosure Corrections

1. Scope

Runs a two-proportion z-test on binary conversion data and reports p-value, confidence interval, and observed lift. It is not a Bayesian engine and does not correct for peeking, multiple comparisons, or sequential analysis.

2. Inputs and outputs

Inputs

  • controlVisitors number
  • controlConversions number
  • variantVisitors number
  • variantConversions number
  • alpha percent default: 5

    Significance threshold.

Outputs

  • zScore

    Two-proportion z statistic.

  • pValue

    Two-sided p-value.

  • liftPercent

    (variantRate − controlRate) / controlRate.

  • isSignificant

    True iff pValue < alpha.

Engine source: src/lib/ab-test-significance-calculator/engine.ts

3. Formula / scoring logic

p_pooled = (xA + xB) / (nA + nB)
SE       = sqrt(p_pooled * (1 - p_pooled) * (1/nA + 1/nB))
z        = (pB - pA) / SE
p_value  = 2 * (1 - Φ(|z|))

4. Assumptions

  • Samples are independent and randomly assigned.
  • Visitor counts are large enough for the normal approximation (rule of thumb: np ≥ 10 and n(1−p) ≥ 10 in both arms).
  • Two-sided test at a fixed alpha entered up front — no sequential-testing correction.

5. Data sources

6. Known limitations

  • Peeking inflates false-positive rate. Fix the sample size up front or use a sequential-testing method (mSPRT, Bayesian bandit) instead.
  • The folk claim that "90% of A/B tests are inconclusive" has no peer-reviewed source; we do not cite it. Power your experiments to detect a lift you would actually act on.
  • Two-proportion z-test is unreliable for very small counts. For small n, use Fisher's exact test.

7. Reproducibility

Input
A: 5,000 visitors / 250 conversions; B: 5,000 visitors / 300 conversions; alpha = 5%.

Expected output
rateA = 5%, rateB = 6%, z ≈ 2.18, p ≈ 0.029, lift = 20%, significant at α = 5%.

8. Change log

  • 2026-04-24 methodology page first published.

Worked example

Run live against the same engine this site ships (/engines/ab-test-significance-calculator.js). The inputs and outputs below are recomputed on every build and independently re-verified in CI — they are never hand-authored.

Input

tool
ab_test_significance
visitors_a
5000
conversions_a
250
visitors_b
5000
conversions_b
285
confidence_level
95

Output

rateA
0.05
rateB
0.057
relativeLift
14
zScore
1.5554
pValue
0.1199
conclusion
Not Significant
confidenceLevel
95
requiredSampleSize
16224
powerMessage
Not significant yet. Continue the test or increase traffic to reach a reliable conclusion.

Frequently asked questions

What does the A/B Test Significance Calculator calculate?
Runs a two-proportion z-test on binary conversion data and reports p-value, confidence interval, and observed lift. It is not a Bayesian engine and does not correct for peeking, multiple comparisons, or sequential analysis.
What inputs does the A/B Test Significance Calculator need?
It takes 5 inputs: controlVisitors, controlConversions, variantVisitors, variantConversions, alpha (default 5). Outputs returned: zScore, pValue, liftPercent, isSignificant.
What formula does the A/B Test Significance Calculator use?
The exact computation is: p_pooled = (xA + xB) / (nA + nB); SE = sqrt(p_pooled * (1 - p_pooled) * (1/nA + 1/nB)); z = (pB - pA) / SE; p_value = 2 * (1 - Φ(|z|))
Can I verify the A/B Test Significance Calculator with a worked example?
Yes. With A: 5,000 visitors / 250 conversions; B: 5,000 visitors / 300 conversions; alpha = 5%. the tool returns rateA = 5%, rateB = 6%, z ≈ 2.18, p ≈ 0.029, lift = 20%, significant at α = 5%.
Where does the A/B Test Significance Calculator get its benchmark data?
Reference data is sourced from: Agresti & Coull (1998), Approximate is Better than Exact for Interval Estimation (as of 1998).
What can the A/B Test Significance Calculator not tell me?
Known limitations: Peeking inflates false-positive rate. Fix the sample size up front or use a sequential-testing method (mSPRT, Bayesian bandit) instead. The folk claim that "90% of A/B tests are inconclusive" has no peer-reviewed source; we do not cite it. Power your experiments to detect a lift you would actually act on. Two-proportion z-test is unreliable for very small counts. For small n, use Fisher's exact test.
Business planning estimates — not legal, tax, or accounting advice.