Press ESC to close

The Ethics of “Good Enough” Shipping

My first ‘proper’ job was tech‑support on Compaq Presario PCs. A few models shipped with firmware quirks which caused the machines to sometimes be unusable.  We already knew about at time of shipping. The business case said the warranty budget could mop up the fallout.

Customers thought otherwise: returns spiked, reputation dipped, and future launches carried a trust tax. We never did get sight of the impact on return purchasers.

Lesson learned: deliberately budgeting for damage isn’t risk management—it’s gambling with other people’s chips.

This all hapened at a time when people had to call in to find out about the issue, navigate a complex support system, get the right agent who would agree to send them a disk, then wait up to 30 days for delivery.

A Risk Lens That Actually Scales

Fusing Jeff Bezos’ door test with ISO 31000

Most decisions are two‑way doors—easy to reverse. Treat the true one‑way doors with extra care.” — Jeff Bezos

Think of decisions as doors. A tiny CSS tweak shipped behind a feature flag is a revolving door—step in, step out, and if metrics dip you close it without anybody noticing. Migrating every customer record to a new encryption scheme is more like a concrete fire door—once it slams shut, reopening means smoke, cost, and panic. The table below shows how ISO 31000 lets us calibrate the ceremony each door deserves, before the real‑world examples that follow bring the point home:

ISO 31000 Step Door Overlay Real‑world Example Lightweight Artefact
Context Identify door type & stakeholders CSS tweak affects UX team; key rotation affects SRE, Legal, and every customer 3‑sentence risk brief
Risk Identification List hazards Visual glitch vs. data corruption Bullet list in story card
Risk Analysis Likelihood × Impact Low × Low vs. Low × Catastrophic T‑shirt size (S/M/L)
Risk Evaluation Compare to appetite via DORA change‑fail % 1% acceptable vs. 0.1% CI dashboard gate
Risk Treatment Accept, reduce, share, avoid Flag & rollback vs. phased migration, extra backups Release template

Rolling out a new pricing algorithm across every market? That’s a one‑way door—revenue curves, customer trust, and even regulatory compliance hang in the balance. Time to activate the slow‑thinking protocol: premortem, FMEA‑lite, rehearsal of rollback, and a phased release schedule.

From MVP to MLP: Minimum Lovable Product

When a product sparks that first “wow, that was smooth!”—the doc auto‑saves before the battery dies, or the shared cursor appears the instant a teammate joins—users feel the platform is working for them, not against them. Those micro‑delights forge emotional equity: early adopters start tweeting praise, filing constructive bugs, and defending you when release 1.1 inevitably wobbles. Put plainly, delight isn’t surface polish; it’s the compound interest that funds future risks and fuels organic growth.

Dropbox’s very first public demo was a video: clearly MVP territory—enough to prove syncing worked. Six months later the private beta included the little blue progress bar and desktop notifications. Those touches made early adopters rave, turning an MVP into an MLP.

Dimension MVP MLP
Core function Works on happy path Handles edge cases & recovery
Emotional journey “Okay, I guess.” “That felt effortless!”
Support load High (forums buzzing) Lower (self‑service)

Why some MVPs stall out. Take Google Wave (2009): the team shipped a bare‑bones collaboration demo hoping users would teach them the killer use‑cases. What early adopters encountered was a maze—no obvious “aha,” clunky onboarding, and unclear value. Churn spiked, media buzz fizzled, and the product was shuttered within a year. The lesson: when the core flow doesn’t delight, users won’t stick around long enough to fuel iteration.

The flip‑side—MLP in action. Basecamp flips that script. Every major release ships a single, highly‑polished workflow (the “killer flow”), wrapped in crisp copy and tiny touches of joy—threaded inboxes that stay readable, progress‑bars that finish before attention wanes. Users instantly love relying on that flow daily, so they forgive v1 gaps and happily lobby for what comes next. Love is an accelerant—and it only takes one delightful flow to ignite it.

Governance Through the DORA Four Metrics

Gene Kim’s novel The Phoenix Project offers a sharper cautionary tale: at Parts Unlimited, nearly every tenth deployment caused a Sev‑1 outage, lead‑time stretched into weeks, and staff morale tanked. By reframing work in terms of value streams, introducing automated tests, and treating the DORA Four Metrics as the team’s scoreboard, they flipped the story within a single quarter—deploy frequency jumped from a handful per month to dozens per day, change‑fail % fell below 5 %, and MTTR shrank from hours to minutes. The takeaway is clear: shining a light on the numbers doesn’t slow shipping—it reveals exactly where to turn the dials for safer speed.

Metric Story it tells
Deploy Frequency Are we iterating or hoarding?
Lead‑time for changes Is analysis paralysis winning?
Change‑fail % Is risk really under control?
MTTR How fast can we turn a 1‑way door into a 2‑way recovery?

Real‑world proof. Etsy’s transformation is legendary. In 2009 the craft‑marketplace shipped changes only twice a week, and every deploy was a nail‑biter with a serious rollback risk. By 2014—after embracing continuous delivery, blameless post‑mortems, and tracking these exact DORA numbers—they were deploying 50+ times per day, change‑fail % fell below 1 %, and MTTR was measured in minutes. The same scoreboard that once exposed the pain became the trophy cabinet teams competed to fill.

Ethical & Societal Guard‑rails

Technology ships into society—not a vacuum. The question isn’t can we deploy, but should we, and on whose terms?

Why “Good Enough” ≠ “Fair Enough”

  • Snap Map (2017) let teens broadcast their real‑time location. Playful for meet‑ups; frightening for stalking victims and runaways.

  • Peloton Tread+ (2021) delivered immersive workouts—then injured several children before a global recall.

  • Ring Doorbells (2019‑20) helped homeowners feel safer while exposing bystanders to constant surveillance.

In every case the risk lived outside the app—once the 1‑way door slammed, rollback was messy or impossible. ISO 31000 tells us that when a hazard crosses the screen boundary (safety, privacy, mental health), we must re‑evaluate context and risk appetite before release, not after.

Four Guard‑rails Before You Hit “Deploy”

Guard‑rail Trigger Question Lightweight Tool Example in Action
Stakeholder mapping Who is affected indirectly? 15‑min impact wheel Apple paused CSAM scanning after child‑safety andprivacy NGOs raised red flags.
Proportional consent Do users opt‑in by default? Dark‑pattern checklist GDPR fines on deceptive cookie banners prove consent theatre won’t fly.
Safe defaults Is the failure mode silent or noisy? “Off‑until‑confident” flag Gmail Smart Compose launched to 1 % of English users with easy opt‑out.
Transparency loop How will we surface harm signals fast? Public changelog & hotline Stripe posts major‑incident retros within 48 hrs, inviting customer feedback.

A 3‑Step Ethical Check‑point (add to your Definition‑of‑Done)

  1. Identify harm vector – physical, emotional, reputational, or data‑driven?

  2. Minimise & mitigate – Can we trial with synthetic data, a dark launch, or granular opt‑in?

  3. Communicate – Plain‑language release notes plus a clear contact route for affected parties.

If any answer remains ambiguous, escalate to an Ethics & Risk Review—a 30‑minute, multi‑discipline call spun up like a Sev‑1 incident. Potential harm is a Sev‑0.

Ethics is the ultimate observability layer—if you can’t see the blast radius, you don’t understand the system.

Frameworks that help: IEEE Ethically Aligned Design (8 principles), the ACM Code of Ethics (“avoid harm”), and the upcoming EU AI Act. Map them to your Door Matrix—high‑risk systems equal 1‑way doors by default.

Cognitive & Cultural Bias Checks

Healthy risk‑based shipping lives and dies on team psychology. Blind spots in judgment let small hazards sneak past gatekeepers and turn reversible doors into concrete walls.

Bias How it shows up in shipping decisions Early Signal Counter‑move
Expert‑babble bias (“trust me, I know this infra”) Senior specialist down‑plays rollback complexity (“Kubernetes handles that”) Risk register stays blank while juniors stay silent Run a 5‑minute failure‑story round‑robin: each expert recalls a recent outage in their domain—humility resets optimism.
Fixed‑agenda bias Launch date announced before test scope is agreed; objections framed as “blocking the business” Sprint goals morph to fit the date rather than the risk Ask “door or law?” at roadmap reviews—if the item is a 1‑way door, marketing plans around engineering, not vice‑versa.
Middle‑way blind spot Organisation whiplashes between cowboy shipping and gold‑plating Release cadence swings wildly quarter‑to‑quarter Rotate a bias referee role—one dev per sprint tracks extreme swings and brings data to retro.

Narrative: two tweets that taught a billion‑dollar lesson

  • Microsoft Tay (2016) – Released to Twitter without adversarial testing, absorbed toxic input, became racist in <24 h. Root cause: expert‑babble (“our AI stack is state‑of‑the‑art”) plus fixed‑agenda (marketing wanted a SXSW splash).

  • Knight Capital (2012) – A dormant flag triggered $440 m in rogue trades within 45 minutes. Engineers had warned about the legacy code path but were overruled by “we always hit summer earnings”—classic fixed‑agenda.

Both illustrate that biased cognition, not just raw code, can convert a 2‑way door into a catastrophic 1‑way slam.

Embed bias checks: tack a single question to every pull‑request template—“Which cognitive bias could bite us here, and how are we checking it?” Forced reflection is cheap; disaster is not.

Observability: Define “Good” & “Bad”

Fancy dashboards are useless if the team can’t answer two plain‑English questions before the feature ships:

  1. What does “good” feel like for our users?
    Example: “A dashboard filter runs in < 1 s so the screen feels instant.”

  2. What’s the floor where it turns “bad”?
    Example: “> 5 s for 1 % of sessions = trigger rollback.”

These are context numbers. A trading app’s “bad” might be 100 ms; a photo archive can tolerate 2 s. The point is that everyone—from designer to SRE—knows the smile point and the scream point.

Lightweight Playbook

  • Write it down in the design doc: Good = X, Bad = Y.

  • Instrument only those signals (don’t drown in metrics).

  • Alert the team when the line is crossed—no blame, just fix or roll back.

If you can’t describe “good” in one tweet, you don’t understand the experience. If you can’t name “bad,” you won’t spot it in prod.

Time‑Box the Slow Thinking

SpaceX famously conducts a rapid‑iteration/rapid‑failure approach, yet before every static‑fire they run a tightly scoped risk review: 90 minutes, all disciplines. Decision defaults to go unless a blocker surfaces. That constraint keeps innovation moving without skipping due diligence.

Suggested box for software releases:

  1. Premortem – 1 hour: “Imagine we’re post‑mortem—what killed us?”

  2. FMEA‑lite – 2 hours: Rank failures, assign mitigations.

  3. Go/No‑go – 30 minutes: Stakeholders vote. Silence ≠ consent.

Over‑run? The sponsor makes the call—analysis can’t be infinite.

Putting It Together

  1. Classify the change: Door type × Blast radius.

  2. Choose the bar: MVP, MFP, or MLP.

  3. Apply guard‑rails: Ethical gate + Bias check.

  4. Govern with DORA dashboards.

  5. Observe: Pre‑define “good” & “floor bad.”

  6. Iterate: Ship, measure, learn, repeat.

“Good enough” sounds simple, but getting it right is complex. The secret is proportional rigor—heavy where the blast can level the village, feather‑light where a rollback takes seconds.

Ready to pilot this canvas on a live feature or turn it into a pull‑request template? Ping me; we’ll hard‑wire it into your workflow.

Resources & Further Reading

(All links accessed 30 June 2025.)

Leave a Reply

Your email address will not be published. Required fields are marked *