Skip to main content
aibizhub
Hand-written methodology As of 2026-04-24

How Ship-or-Kill Decision Score works

What the tool assumes, what data it pulls from, and what it cannot tell you.

Education · General business information, not legal, tax, or financial advice. Editorial standards Sponsor disclosure Corrections

1. Scope

Ship-or-Kill ranks a bootstrapped micro-SaaS or side project across five dimensions — traction, economics, momentum, market, efficiency — and outputs an ordinal verdict: SHIP, ITERATE, or KILL. It is a pattern-matched editorial heuristic, not a Bayesian model. It does not predict revenue, model founder psychology, weigh macro conditions, or account for the dynamics of an equity-funded company. It is calibrated — loosely — against public postmortems of bootstrapped micro-SaaS projects, not against a labelled dataset of successful vs failed ventures.

2. Inputs and outputs

Inputs (11 fields): hoursSpent, hoursPerWeek, payingCustomers, mrr, freeUsers, wowGrowth, monthlyCost, monthlyRevenue, competitors, distribution (one of organic / paid / community / none / not_thought), and differentiation (unique / better_ux / cheaper / not_different / clone).

Outputs: a composite 0–100 score, a verdict band (SHIP ≥ 70, ITERATE 40–69, KILL < 40), per-dimension sub-scores with prose insights, a one-paragraph summary, and up to three prioritised next steps derived from the two weakest dimensions.

Engine source: src/lib/ship-or-kill-score/engine.ts. All scoring gates, weights, and prose templates live in that one file.

3. Formula / scoring logic

composite = 0.35 * Traction
          + 0.20 * Economics
          + 0.15 * Momentum
          + 0.15 * Market
          + 0.15 * Efficiency

each dimension is scored 0–100 via threshold gates
(e.g. Traction gets +30 for payingCustomers > 0, +20 for > 10, +10 for > 50)

verdict bands:
  composite >= 70  -> SHIP
  40 <= composite < 70 -> ITERATE
  composite < 40  -> KILL

Weights reflect an editorial prior drawn from public postmortems of bootstrapped projects (Indie Hackers milestone threads, public Acquire.com listings with verified financials, CB Insights' Top Reasons Startups Fail study). The reasoning for each weight:

  • Traction 35% — "do people pay" is the single most discriminating signal in bootstrapped postmortems. A product with paying customers and weak momentum is usually recoverable; a product with zero paying customers after significant time almost never is.
  • Economics 20% — the difference between break-even and burning is the difference between runway and a deadline. Weighted below traction because early-stage zero-cost projects are still rational to continue.
  • Momentum 15% — week-over-week growth is noisy over short horizons; it's a useful secondary signal but easy to over-weight on a lucky week.
  • Market 15% — distribution strategy and differentiation matter, but a well-distributed undifferentiated product can still beat a better one that nobody finds. Reflects that distribution moves more of the outcome than positioning.
  • Efficiency 15% — "how many hours into this" encodes sunk-cost risk. At high hours with no revenue, efficiency becomes a kill signal — reflected in a penalty term inside the dimension, not a bigger weight.

4. Assumptions

  • Population: solo / bootstrapped micro-SaaS. The gate thresholds (e.g. "50 paying customers" or "10% WoW growth") are meaningful in that population. They are wrong for VC-backed pre-product companies, consumer mobile apps, and enterprise-SaaS where sales cycles dominate.
  • Inputs are honest. Traction scoring rewards paying customers heavily; a user gaming the input to get a SHIP verdict learns nothing.
  • One-off revenue is not MRR. Consulting contracts attached to a product do not count toward MRR for Economics/Momentum scoring.
  • Hours are cumulative. The tool treats "500 hours and zero revenue" as a kill signal regardless of whether those hours were evenly distributed.

5. Data sources

Weights are editorial, drawn from pattern-matching across public bootstrapped postmortems. The reference material includes:

6. Known limitations

  • We have not backtested these weights against a labelled dataset of successful vs failed projects. Scores are ordinal (higher is better), not cardinal. A score of 70 does not mean "70% likely to succeed". Calling this a probability would be false precision.
  • What the score cannot tell you: founder psychology (do you still want to do this?), timing (is the market arriving, here, or gone?), macro conditions (is capital cheap, expensive, or off?), and equity/VC-funded dynamics (dilution, board pressure, milestone-based tranches).
  • Edge inputs break the verdict band. Negative runway with paying customers, very high free-user counts with zero paying customers, or above-median outlier metrics push the tool outside its calibrated range. The verdict text will still emit, but the score loses meaning.
  • Weight drift is possible. The 35/20/15/15/15 split is our current prior; we may revise it as public postmortem evidence accumulates. Every revision will be logged in the change log, not silently shipped.
  • Gate thresholds are coarse. A product at 49 paying customers scores the same as one at 10; one at 51 scores higher. We prefer coarse, defensible gates over curve-fit thresholds that imply non-existent precision.

7. Reproducibility

Input
hoursSpent = 200, hoursPerWeek = 10, payingCustomers = 15, mrr = 450, freeUsers = 500, wowGrowth = 8, monthlyCost = 100, monthlyRevenue = 450, competitors = 4, distribution = organic, differentiation = better_ux.

Expected output
Traction 75 · Economics 90 · Momentum 40 · Market 60 · Efficiency 60. Weighted composite ≈ 69 → verdict ITERATE (just below the SHIP threshold). Summary emphasises weakest dimension (Momentum) and suggests the next steps are to drive WoW growth and interview existing paying customers.

8. Change log

  • 2026-04-24methodology page first published. Weights documented as 35/20/15/15/15 with rationale for each. Explicit statement that scores are ordinal, not cardinal.

Worked example

Run live against the same engine this site ships (/engines/ship-or-kill-score.js). The inputs and outputs below are recomputed on every build and independently re-verified in CI — they are never hand-authored.

Input

tool
ship_or_kill_score
hours_spent
200
hours_per_week
10
paying_customers
5
mrr
100
free_users
50
wow_growth
5
monthly_cost
50
monthly_revenue
100
competitors
5
distribution
organic
differentiation
better_ux

Output

composite
40
verdict
ITERATE
dimensions[0].name
Traction
dimensions[0].score
30
dimensions[0].weight
0.35
dimensions[0].insight
5 paying customers is real signal. Now figure out why they pay and find 50 more like them.
dimensions[1].name
Economics
dimensions[1].score
70
dimensions[1].weight
0.2
dimensions[1].insight
You're profitable but not by much. Raise prices or cut costs before scaling.
dimensions[2].name
Momentum
dimensions[2].score
20
dimensions[2].weight
0.15
dimensions[2].insight
5% WoW growth is decent. Sustain it for 8 weeks and you'll have real compounding.
dimensions[3].name
Efficiency
dimensions[3].score
20
dimensions[3].weight
0.15
dimensions[3].insight
10h/week with 200 hours invested. Make sure each hour is moving a metric, not just shipping code.
dimensions[4].name
Market
dimensions[4].score
60
dimensions[4].weight
0.15
dimensions[4].insight
5 competitors in a market where you're not clearly unique. You need a wedge — a specific use case or audience nobody else owns.
summary
5 paying customers is real signal, but $100/mo MRR won't sustain this. You need to 10x your price, your customer count, or both. Focus on understanding why those 5 customers pay.
nextSteps[0]
Double down on whatever is driving growth. Document and systematize it before it fades.
nextSteps[1]
Track hours by activity type (building vs. selling vs. marketing). Shift ratio toward revenue-generating work.
nextSteps[2]
Set a 30-day checkpoint with specific metrics. If the numbers don't improve, make a hard call.

Frequently asked questions

What does the Ship-or-Kill Score output?
It ranks a bootstrapped micro-SaaS or side project across five dimensions — traction, economics, momentum, market, efficiency — and outputs an ordinal verdict: SHIP, ITERATE, or KILL. It is a pattern-matched editorial heuristic, not a Bayesian model.
What is the scoring formula?
composite = 0.35 × Traction + 0.20 × Economics + 0.15 × Momentum + 0.15 × Market + 0.15 × Efficiency. Verdict bands: composite ≥ 70 → SHIP, 40–69 → ITERATE, < 40 → KILL.
What does the score not do?
It does not predict revenue, model founder psychology, weigh macro conditions, or account for the dynamics of an equity-funded company. It is calibrated loosely against public postmortems of bootstrapped micro-SaaS projects, not a labelled dataset.
Business planning estimates — not legal, tax, or accounting advice.