Minas Tirith Charcoal

Minas Tirith Charcoal

A practitioner’s reflection on what really makes recovery work

Over the years, I’ve been involved in more than one IT initiative that drifted into trouble. Different organisations, different technologies, different pressures, but the patterns were surprisingly consistent. What ultimately determined success or failure was rarely a single technical decision. It was the combination of technical discipline and organisational maturity that made recovery possible.

This post distils the most important lessons I’ve learned from recovering enterprise-scale IT projects, written deliberately in a technology‑agnostic way since the specifics change, but the lessons don’t.

Most technical failures start as organisational ones

When a project starts failing, the first instinct is often to blame the technology. The platform is unstable, the architecture is wrong, the tooling isn’t mature. In my experience, those are symptoms, not root causes.

The real problems usually appear earlier:

By the time technical issues explode, the organisational foundations have often already eroded. Recovery starts by acknowledging that reality—without blame.

If you try to fix the technology without fixing how decisions are made, you’re just buying time.

Stabilisation beats optimisation every time

One of the hardest lessons to apply under pressure is this: don’t improve, stabilise.

When a project is in distress, people naturally want to:

That almost always makes things worse.

Successful recoveries I’ve seen all started with the same move:

Only once the environment stopped shaking could meaningful improvements happen.

A stable but imperfect system is infinitely more valuable than a perfect design that never lands.

Transparency is uncomfortable but non-negotiable

Project recovery exposes uncomfortable truths:

The turning point in every successful recovery I’ve been part of was radical transparency:

This often feels risky especially in hierarchical organisations but without it, trust never resets.

You can’t recover stakeholder confidence with optimism, only with credibility.

Governance must get closer, not heavier

A failing project doesn’t need more bureaucracy it needs better governance.

What worked best was:

In recovery mode, governance shifts from oversight to active steering. Leaders who stayed engaged not just informed made a measurable difference.

Good governance accelerates recovery when it removes ambiguity, not when it adds process.

Culture determines how fast you can turn the ship

Two projects can have identical problems and completely different outcomes. The difference is almost always cultural.

Recovery is dramatically slower when:

Conversely, recovery accelerates when:

No recovery plan survives contact with a broken culture. Psychological safety is not a nice to have in recovery, but it is a delivery dependency.

Security and risk don’t disappear during recovery

Under pressure, there’s a temptation to fix it now and secure it later. That shortcut always comes back with interest.

What worked better:

Recovered projects that ignored this inevitably re-entered crisis mode later, just under a different name. Recovery that creates new risk is just delayed failure.

Small wins rebuild momentum faster than grand plans

One of the most effective recovery techniques I’ve seen is engineering early, visible wins:

These moments matter. They restore belief inside the team and among stakeholders that recovery is real. Momentum is rebuilt incrementally, not announced in slide decks.

Recovery isn’t finished when the project is back on track

Some of the worst relapses I’ve seen happened after recovery was declared successful.

Why?

The strongest organisations treated recovery as a learning event, not just a rescue:

If the organisation doesn’t change, the next project will fail the same way, just faster.


Recovering an IT project is never just a technical exercise. It’s a stress test of leadership, culture, and organisational honesty.

The most important lesson I’ve learned is this:

Projects don’t recover because plans get better. They recover because behaviours change.

Technology matters, but people, structure, and trust matter more.