Methodology: Ship-or-Kill Score

1. Scope

Ship-or-Kill ranks a bootstrapped micro-SaaS or side project across five dimensions — traction, economics, momentum, market, efficiency — and outputs an ordinal verdict: SHIP, ITERATE, or KILL. It is a pattern-matched editorial heuristic, not a Bayesian model. It does not predict revenue, model founder psychology, weigh macro conditions, or account for the dynamics of an equity-funded company. It is calibrated — loosely — against public postmortems of bootstrapped micro-SaaS projects, not against a labelled dataset of successful vs failed ventures.

2. Inputs and outputs

Inputs (11 fields): hoursSpent, hoursPerWeek, payingCustomers, mrr, freeUsers, wowGrowth, monthlyCost, monthlyRevenue, competitors, distribution (one of organic / paid / community / none / not_thought), and differentiation (unique / better_ux / cheaper / not_different / clone).

Outputs: a composite 0–100 score, a verdict band (SHIP ≥ 70, ITERATE 40–69, KILL < 40), per-dimension sub-scores with prose insights, a one-paragraph summary, and up to three prioritised next steps derived from the two weakest dimensions.

Engine source: src/lib/ship-or-kill-score/engine.ts. All scoring gates, weights, and prose templates live in that one file.

3. Formula / scoring logic

composite = 0.35 * Traction
          + 0.20 * Economics
          + 0.15 * Momentum
          + 0.15 * Market
          + 0.15 * Efficiency

each dimension is scored 0–100 via threshold gates
(e.g. Traction gets +30 for payingCustomers > 0, +20 for > 10, +10 for > 50)

verdict bands:
  composite >= 70  -> SHIP
  40 <= composite < 70 -> ITERATE
  composite < 40  -> KILL

Weights reflect an editorial prior drawn from public postmortems of bootstrapped projects (Indie Hackers milestone threads, public Acquire.com listings with verified financials, CB Insights' Top Reasons Startups Fail study). The reasoning for each weight:

Traction 35% — "do people pay" is the single most discriminating signal in bootstrapped postmortems. A product with paying customers and weak momentum is usually recoverable; a product with zero paying customers after significant time almost never is.
Economics 20% — the difference between break-even and burning is the difference between runway and a deadline. Weighted below traction because early-stage zero-cost projects are still rational to continue.
Momentum 15% — week-over-week growth is noisy over short horizons; it's a useful secondary signal but easy to over-weight on a lucky week.
Market 15% — distribution strategy and differentiation matter, but a well-distributed undifferentiated product can still beat a better one that nobody finds. Reflects that distribution moves more of the outcome than positioning.
Efficiency 15% — "how many hours into this" encodes sunk-cost risk. At high hours with no revenue, efficiency becomes a kill signal — reflected in a penalty term inside the dimension, not a bigger weight.

4. Assumptions

Population: solo / bootstrapped micro-SaaS. The gate thresholds (e.g. "50 paying customers" or "10% WoW growth") are meaningful in that population. They are wrong for VC-backed pre-product companies, consumer mobile apps, and enterprise-SaaS where sales cycles dominate.
Inputs are honest. Traction scoring rewards paying customers heavily; a user gaming the input to get a SHIP verdict learns nothing.
One-off revenue is not MRR. Consulting contracts attached to a product do not count toward MRR for Economics/Momentum scoring.
Hours are cumulative. The tool treats "500 hours and zero revenue" as a kill signal regardless of whether those hours were evenly distributed.

5. Data sources

Weights are editorial, drawn from pattern-matching across public bootstrapped postmortems. The reference material includes:

Indie Hackers — founder interviews and postmortems (qualitative, not a structured dataset).
CB Insights — The Top 12 Reasons Startups Fail (public research report).
OpenView SaaS Benchmarks 2024 — used to sanity-check MRR and growth-rate thresholds against documented percentiles, even though the underlying dataset is biased toward funded SaaS.

6. Known limitations

We have not backtested these weights against a labelled dataset of successful vs failed projects. Scores are ordinal (higher is better), not cardinal. A score of 70 does not mean "70% likely to succeed". Calling this a probability would be false precision.
What the score cannot tell you: founder psychology (do you still want to do this?), timing (is the market arriving, here, or gone?), macro conditions (is capital cheap, expensive, or off?), and equity/VC-funded dynamics (dilution, board pressure, milestone-based tranches).
Edge inputs break the verdict band. Negative runway with paying customers, very high free-user counts with zero paying customers, or above-median outlier metrics push the tool outside its calibrated range. The verdict text will still emit, but the score loses meaning.
Weight drift is possible. The 35/20/15/15/15 split is our current prior; we may revise it as public postmortem evidence accumulates. Every revision will be logged in the change log, not silently shipped.
Gate thresholds are coarse. A product at 49 paying customers scores the same as one at 10; one at 51 scores higher. We prefer coarse, defensible gates over curve-fit thresholds that imply non-existent precision.

7. Reproducibility

Input
hoursSpent = 200, hoursPerWeek = 10, payingCustomers = 15, mrr = 450, freeUsers = 500, wowGrowth = 8, monthlyCost = 100, monthlyRevenue = 450, competitors = 4, distribution = organic, differentiation = better_ux.

Expected output
Traction 75 · Economics 90 · Momentum 40 · Market 60 · Efficiency 60. Weighted composite ≈ 69 → verdict ITERATE (just below the SHIP threshold). Summary emphasises weakest dimension (Momentum) and suggests the next steps are to drive WoW growth and interview existing paying customers.

8. Change log

2026-04-24methodology page first published. Weights documented as 35/20/15/15/15 with rationale for each. Explicit statement that scores are ordinal, not cardinal.