How falsification logic transforms your phase gate from a cheerleading session into a decision engine
In ancient Rome, Aesculapius was the embodiment of a single discipline: diagnosis before treatment. His physicians were trained to falsify their working hypothesis before acting on it. The question was never 'what supports my diagnosis?' It was 'what would prove me wrong, and have I looked for it?' A healer who gathered only the symptoms that confirmed what they already believed was not diagnosing. They were performing theater. The patient sometimes recovered. Often, they did not.
As product managers (PMs) in life science tools, we run stage gates as part of the new product development (NPD) process. And most of us, if we are honest, have sat in rooms where the gate was a diagnosis performed by a physician who had already decided the answer.
The Stage Gate That Approves What It Should Kill
The more common failure is not a PM who built a careless deck. It is a PM who built a very careful one. They know what the committee expects. They know which market sizing methodology will hold up to scrutiny and which framing of the competitive landscape will land without pushback. They construct a business case that is technically defensible on every slide while systematically avoiding the questions that would put the whole thing at risk. The gate opens. The program proceeds. And the market, eventually, delivers the verdict the business case never invited.
|
You have done the VOC interviews. You have built the business case. Your R&D partner has hit the technical milestones. Your VP is asking for the go-forward call. And the honest answer, the one you have not said out loud yet, is that you have accumulated evidence in favor of proceeding, but you have not seriously tested what would make this fail. |
This is not a character flaw. It is a structural one. A gate review that asks PMs to build a business case for the product will, by design, produce PMs who are good at building cases. A case selects evidence that supports a conclusion. That is not how Aesculapius trained his physicians. And it is not how a gate should work.
In life science tools, the consequences compound. A capital instrument program that should have been killed at Feasibility Review consumes $2M to $4M in development spend before the evidence becomes impossible to ignore. A reagent line extension that passes Development Review on a hockey-stick model misses launch targets by 60% and ties up two FTEs in remediation for eighteen months. The gate did not fail because it lacked data. It failed because it asked the wrong question.
The Evidence Standard: Falsification Before Accumulation
The fix is a different question at the center of every gate review. Not 'what supports this?' but 'what would kill this, and have we tested it?' This is Karl Popper's falsification principle applied to commercial decision-making. A hypothesis has no scientific value unless it can be proven wrong. Every major claim in your gate package should be accompanied by the conditions under which it would fail, and evidence that those conditions were tested.
Consider how your R&D colleagues already operate. When a scientist reports that a cell viability assay hits 95% reproducibility, they do not just show the runs that worked. They show the edge cases: the low-cell-density outliers, the passage-number boundary, the conditions under which the assay starts to break. The commercial gate should hold itself to the same standard.
The Evidence Tier Framework
Not all claims in a gate package are equally falsifiable. The framework has two tracks.
|
Claim Type |
Falsification Test |
Evidence Standard |
||
|
Technical performance |
Define the failure condition. Run the test. |
Data with edge cases and stated failure modes. |
||
|
Customer problem hypothesis |
Find customers for whom the problem is already solved. What did they use? |
VOC with dissenting voices, not just enthusiast quotes. |
||
|
Competitive differentiation |
State the condition under which a competitor's product is the better choice. |
Head-to-head on the dimension that matters most to the buyer, not the one you win. |
||
|
Market size |
Not strictly falsifiable. Apply a methodology audit. |
Transparent assumption chain, scenario range (not a point estimate). |
||
|
Revenue forecast |
Not strictly falsifiable. Apply a methodology audit. |
Beachhead segment modeled with real customer data. No hockey sticks without a named mechanism. |
The distinction in the bottom two rows matters. Falsification logic applies cleanly to technical claims and customer problem hypotheses, because those can be tested against reality before the gate. It applies less cleanly to financial projections, because the conditions are not controlled and the outcome is probabilistic. Demanding falsifiable proof of a revenue forecast is not rigor. It is an excuse to kill everything.
The honest standard for financial claims is methodological transparency: is the assumption chain visible, is the beachhead segment sized with real data, and does the model state what has to be true for it to work?
Gate-Specific Evidence Standards
The burden of proof should escalate as the investment grows. A synthetic customer is a useful first-pass pressure test across most claim types: a queryable AI representation of your buyer, built from real VOC data, surfaces objections and challenges assumptions before you commit to deeper investigation. It is a step one, not a substitute for primary evidence. The gate-specific standards below define what is sufficient at each phase, and equally, what crosses into analysis paralysis.
| Gate | Key Falsification Question | Minimum Evidence | Analysis Paralysis Signal | |||
|
Feasibility |
Have we tested conditions where the technical approach fails, or only where it works? |
Experiments designed to find failure modes. Synthetic customer panel to pressure-test the problem hypothesis. |
Waiting for n=30 VOC interviews before a technical decision. Run the boundary experiment first. |
|||
|
Development |
Is the customer problem real enough to drive purchase behavior, or only interesting to researchers who would never buy? |
Stated willingness to pay from named accounts. Synthetic panel stress-test of the value proposition before KOL investment. |
Redesigning the product based on every objection surfaced. Classify objections: disqualifying vs. in-scope. |
|||
|
Launch |
Does the go-to-market assumption hold against real sales motion, or only in the deck? |
Alpha/beta conversion data, pricing test results, channel feedback with objection documentation. |
Running a second beta cohort because the first one raised questions. Questions are the point. |
|||
|
Lifecycle |
Is continued investment justified by evidence of strategic fit, or by sunk cost? |
Market share trend, NPS trajectory, competitive displacement data. |
Commissioning a new market study to justify a decision already visible in the trend data. |
What Aesculapius Would Ask at Your Next Gate
Aesculapius did not train his physicians to be slow or pessimistic. He trained them to resist premature conclusions. A good diagnosis is not a cautious one. It is a complete one, the kind that rules out the dangerous possibilities before committing to a path. The stage gate that tests failure conditions before writing the check has done exactly that.
The practical shifts are not large. Add a failure condition column to your evidence table. Require VOC documentation to include dissenting voices. Hold financial models to a methodology standard, not a falsifiability standard. Ask, at every Feasibility gate, 'what did we design specifically to break this?'
The gate that passes the most products is not doing its job. The gate that asks the hardest questions and still passes the product is.
Q: My organization's gate process is already set by a template. How do I introduce falsification logic without redesigning the whole system? ▼
You do not need to redesign anything. The change is in what you populate the evidence fields with, not in the fields themselves. In the 'market opportunity' section, add one sentence stating the condition under which the opportunity would not be worth pursuing, and one sentence stating what you found when you looked for that condition. In the 'technical readiness' section, document the failure modes the R&D team tested alongside the results that passed. This adds two to three sentences per section. It requires a different mindset about what the template is for, not a new template.
Q: What if my committee sees the failure cases and uses them to kill a product that should proceed? ▼
This reflects a governance problem more than an evidence problem. A committee that uses documented failure cases to block a product that has tested and contextualized those cases is applying risk aversion dressed as rigor. Present failure cases with explicit scope statements: 'This failure mode applies to X use case and does not affect our target segment.' A well-structured gate package gives you more protection, not less, because it shows you looked for the problems before spending the money.
Q: We are a small team. Running experiments specifically designed to fail sounds expensive. Where do we start?▼
Start with the claim that carries the most investment weight. For a reagent program at Feasibility, that is usually the core technical performance claim. For a qPCR detection kit, that might mean testing at the lowest expected template concentration in the noisiest expected sample matrix. You are not running a full validation study. You are answering one question: does this hold where it is most likely to fail? That single experiment, done before the Feasibility gate, is worth more than forty slides of market data, because it tells you whether you have anything worth commercializing at all.