aibizhub

Marketing ROI Engine

A/B Test Significance Calculator

Check if your A/B test results are statistically significant and estimate sample size needed for reliable conclusions. Two-tailed z-test with configurable confidence level.

Test Data

Control — Variant A

Test — Variant B

Significance Analysis

P-Value
0.1199
Not Significant
Rate A
5.00%
Rate B
5.70%
Relative Lift
+14.00%
Z-Score
1.5554
Confidence Level
95%
Sample Size / Variant
16,224

Conversion Rate Comparison

Control (A) vs test (B) conversion rates.

Variant A
5.00%
Variant B
5.70%

Recommendation

Not significant yet. Continue the test or increase traffic to reach a reliable conclusion.

Disclaimer: This calculator uses a two-tailed z-test for proportions. It assumes independent samples and fixed sample sizes. For sequential testing, multi-armed bandits, or tests with multiple comparisons, use a dedicated experimentation platform.

How to use it

  1. Enter visitors and conversions for control A and variant B, then choose a confidence level. Use 95% for most product and marketing decisions and 99% when the change affects revenue, compliance, or a large user population.
  2. Read both conversion rates, relative lift, z-score, p-value, conclusion, required sample size, and the power message. A Borderline result means the data is close enough that peeking early could easily push you into a false decision.
  3. Interpret significance and effect size together. A result can be statistically significant but too small to matter commercially, while a large-looking lift with a Not Significant label usually means you need more traffic before shipping anything.
  4. Use the required sample size to decide whether to continue, stop, or redesign the experiment. Predefine the minimum lift worth shipping so a tiny 0.1-0.2 point improvement does not consume engineering effort with no meaningful business return.
  5. Re-run only after full business cycles or materially more traffic arrives. Track win rate and realized post-launch lift by experiment type so your testing program learns which kinds of hypotheses actually produce durable gains.

AI Integrations

Contract, discovery endpoints, and developer notes for agent use.

Always available for agents

Tool contract JSON

https://aibizhub.io/contracts/ab-test-significance-calculator.json

Stable input and output contract for this exact tool.

Human review

People can use the browser page to sense-check outputs and charts, but agents should still execute against the contract and discovery endpoints.

{
  "tool": "ab_test_significance",
  "visitors_a": 5000,
  "conversions_a": 250,
  "visitors_b": 5000,
  "conversions_b": 285,
  "confidence_level": 95
}
Expand developer notes

Agent playbook

  1. Resolve A/B Test Significance Calculator from /agent-tools.json and open its contract before execution.
  2. Validate inputs against the contract schema instead of scraping labels from the page UI.
  3. Open the browser page only when a person wants to review charts, assumptions, or related tools.

Agent FAQ

Should ChatGPT, Claude, or another agent click through the UI?

No. Start with /agent-tools.json, then follow the tool's contract URL. The page UI is for human review, not parameter discovery.

When do tools show Quick and Advanced?

Every tool opens in Quick Start first. Advanced Controls keeps the same scenario, reveals more assumptions or diagnostics, and every tool keeps AI integrations inline below the instructions.

When should an agent still open the browser page?

Open it when a human wants to sense-check the output, review the chart, or keep exploring related tools after the calculation finishes.

Questions people usually ask
What sample size do I need for a valid A/B test?

It depends on your baseline conversion rate and minimum detectable effect. To detect a 20% relative improvement on a 5% baseline (from 5% to 6%) at 95% confidence and 80% statistical power, you need approximately 4,800 visitors per variant. Smaller effects require dramatically larger samples — detecting a 10% relative improvement requires roughly 19,000 per variant.

What is the difference between statistical significance and practical significance?

A test can be statistically significant (very unlikely to be due to chance) but practically insignificant (effect too small to matter). A 0.1% conversion rate improvement may be p<0.01 with 500,000 visitors but generate only $200/month in additional revenue. Always evaluate effect size alongside p-value — significance without magnitude is misleading.

How long should I run an A/B test?

Run for at least 1-2 full business cycles (usually 2-4 weeks minimum) regardless of when significance is reached. Stopping early when significance appears inflates false positive rates significantly — the peaking-at-significance problem can produce 30-50% of results that fail to replicate. Pre-specify sample size before launching.

Related Resources

Learn the decision before you act

Every link here is tied directly to A/B Test Significance Calculator. Use the explanation, formula, examples, and benchmarks to pressure-test the calculator output from first principles.

Browse all 20 resources

Continue With Related Tools

More in Marketing ROI Engine

Know whether your marketing spend is building value or burning cash.

Read the full Marketing ROI Engine guide →

Decision Workflows

Step-by-step guides that use this tool.

Browse by Use Case

Business planning estimates — not legal, tax, or accounting advice.