Statistically rigorous A/B/n testing for video ad creatives. Stop guessing which ad wins. Run proper experiments, get confidence intervals, and let the data decide.
Built by Sediman AI β because most marketers A/B test wrong.
Most "A/B tests" in ad creative are just:
"We ran two ads for a week and the one with more clicks won."
That's not testing. That's vibing. This tool gives you:
- β Statistical significance β z-tests with p-values
- β Confidence intervals β know the range, not just the point estimate
- β Sample size calculator β know how much data you actually need
- β Multi-variant support β A/B/n tests, not just A/B
- β Multiple metrics β CTR, CVR, CPA, ROAS
pip install creative-test-frameworkOr just grab the single file:
curl -O https://raw.githubusercontent.com/sediman-ai/creative-test-framework/main/abtest.py
python abtest.py --helpcreative-test generate --output my_experiment.csvcreative-test analyze my_experiment.csv --metric ctr --confidence 0.95Output:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π§ͺ EXPERIMENT: my_experiment
π Metric: CTR | Confidence: 95%
ποΈ Total Impressions: 420,000
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Variant Impr Clicks Conv CTR% CVR% CPA ROAS
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
control 140,000 2,800 84 2.000% 3.000% $38.57 2.14
variant_a 140,000 3,500 122 2.500% 3.486% $27.05 3.42
variant_b 140,000 2,520 71 1.800% 2.817% $45.83 1.87
β‘ STATISTICAL COMPARISONS (vs Control):
β
variant_a vs control
Z=+4.2145 p=0.000025 lift=+25.0%
π variant_a WINS with +25.0% lift!
β³ variant_b vs control
Not yet significant β keep collecting data
creative-test plan --baseline-ctr 2.0 --mde 20 --confidence 95 --power 80creative-test analyze data.csv --json results.jsonYour CSV should have these columns:
| Column | Required | Description |
|---|---|---|
variant |
β | Name of the creative variant |
impressions |
β | Number of impressions served |
clicks |
β | Number of clicks |
conversions |
β | Number of conversions |
spend |
β | Total spend ($)) |
revenue |
β | Total revenue ($) |
date |
Optional | Date of observation (for time series) |
Data is automatically aggregated by variant.
| Feature | Description |
|---|---|
| Z-test | Two-proportion z-test for statistical significance |
| Wilson CI | Wilson score confidence intervals |
| Sample size | Pre-experiment sample size calculator |
| Multi-variant | A/B/n support β test 3, 5, 10 variants at once |
| JSON export | Machine-readable results for dashboards |
| Zero deps | Pure Python stdlib β no numpy, scipy, or pandas needed |
We use a two-proportion z-test:
z = (pβ - pβ) / β(pΜ(1-pΜ)(1/nβ + 1/nβ))
Where pΜ is the pooled proportion. Confidence intervals use the Wilson score method for better accuracy at small sample sizes.
- β p-value < 0.05 AND you've hit your pre-calculated sample size
- β Don't stop early just because p < 0.05 (peeking problem)
- β Don't test for less than 7 days (day-of-week effects)
The hard part of A/B testing isn't the math β it's creating enough variants to test. That's where Sediman comes in. Generate 50 video ad variants from text prompts in the time it takes to brew coffee.
MIT Β© Sediman AI