Cyber Recovery: Hope Is Not a Strategy

TL;DR

94% of ransomware attacks target backups. If you haven't tested your recovery plan, you don't have one.

Adapted from a masterclass at FutureScot Digital Scotland — but the lessons apply to anyone who’d rather not explain to the board why the backups didn’t work.


On 19 November 2024, I ran a masterclass at FutureScot Digital Scotland on cyber recovery strategies. The audience was largely public sector, but the questions afterwards made it clear: everyone is worried about this. And most organisations are worried for good reason — because their disaster recovery plans are, to put it charitably, optimistic fiction.

So here’s the expanded version of that session. If you’re responsible for keeping systems running when everything goes wrong, this is for you.

The Uncomfortable Reality

Let’s not kid ourselves: most organisations are sitting ducks for cyberattacks. Ransomware, supplier outages, and plain old misconfiguration mean that outages aren’t a question of “if” but “when”.

The stats are grim:

  • 97% of unplanned outages last an average of seven hours
  • 94% of ransomware attacks now target backups specifically
  • Only 31% of organisations are confident in their disaster recovery plans

That last number should terrify you. Nearly 70% of organisations know their DR plans probably won’t work. They’re just hoping they won’t have to find out.

If you think your organisation is the exception, you’re probably deluding yourself. I say this with love.

RTO, RPO, and the Fantasy of “Zero Downtime”

Two metrics matter when everything catches fire:

Recovery Time Objective (RTO) — How long can you afford to be offline?

Recovery Point Objective (RPO) — How much data can you afford to lose?

The lower your targets, the more you’ll pay. That’s not a vendor upsell — it’s physics. Continuous replication costs more than daily backups. Instant failover costs more than manual recovery. Set your targets based on a proper business impact analysis, not wishful thinking or vendor PowerPoint slides.

And no, you can’t just write “zero downtime” on a requirements document and expect it to happen. That’s not how any of this works.

Tiering Your Workloads

Here’s where pragmatism beats perfectionism:

TierDescriptionTypical RTOTypical RPO
T1Crown jewels — business-critical systemsMinutesNear-zero
T2Important but not criticalHoursHours
T3Stuff nobody will miss for a dayDaysDaily

Over-engineering everything to T1 standards is a fast track to budget hell. Be honest about what actually matters. That internal wiki? Probably T3. The payment processing system? T1, obviously.

The Shared Responsibility Model: Stop Blaming the Cloud

Here’s a reality check that catches people out: your cloud provider is responsible for the infrastructure. Your data resilience is on you.

If you botch your backups or misconfigure your recovery, don’t expect AWS or Azure to swoop in and save you. That’s not how the shared responsibility model works. Multi-AZ sounds fancy, but it won’t help if you haven’t:

  • Locked down your data with proper access controls
  • Implemented immutable backups (so ransomware can’t encrypt them)
  • Actually tested your recovery runbooks

Computer says no isn’t an acceptable answer when the board asks why you couldn’t recover.

Accountability isn’t optional. If you can’t prove your plan works, you don’t have a plan. You have a document.

Recovery Strategies: Horses for Courses

There’s no one-size-fits-all. Different strategies trade off cost against recovery speed:

Backup & Restore — Cheapest option. Restore from backups when needed. Simple, but slow. Hours to days for recovery depending on data volumes.

Pilot Light — Minimal environment kept running with core components. Database replication active, but compute scaled down to near-zero. Faster recovery, moderate cost.

Warm Standby — Scaled-down but functional live environment. Can handle traffic at reduced capacity immediately, then scale up. Faster still, pricier still.

Active/Active — Continuous replication, multiple live sites, automatic failover. Zero downtime, zero data loss, and a bill to match.

Here’s the thing — don’t waste money on Active/Active for systems nobody cares about. Mix and match strategies by workload and by component.

For example: your database might need Warm Standby because data loss is unacceptable, but the front-end application can sit on Backup & Restore because it’s stateless and can be redeployed in minutes. The web tier isn’t where your data lives.

It’s about business impact, not technical vanity.

Testing: The Bit Everyone Ignores

An untested backup is a fantasy, not a recovery plan.

I cannot stress this enough. I’ve seen organisations with beautiful DR documentation, automated backup jobs running every night, and zero evidence that any of it actually works. Then something goes wrong, and they discover the backups were silently failing for six months.

Here’s a testing cadence that actually works:

ActivityFrequencyPurpose
Backup verificationDailyConfirm jobs completed successfully
Component restoreMonthlyTest individual system recovery
Tabletop exerciseQuarterlyWalk through scenarios with the team
Full DR testAnnuallyEnd-to-end recovery in isolated environment

Document your actual RTO/RPO versus your targets. If your target RTO is four hours but your last test took twelve, you don’t have a four-hour RTO. You have a twelve-hour RTO and a document that says four.

Treat every test as a chance to find what’s broken before reality does it for you.

The Greatest Hits of Failure

I’ve seen these patterns repeatedly. Don’t be a statistic:

“Set and forget” DR plans — Written three years ago, never updated, references infrastructure that no longer exists.

Incomplete documentation — If it’s not written down, it doesn’t exist. If the person who knows how to recover the system is on holiday, you’re in trouble.

Single points of failure — One backup location? All backups in the same region as production? Really?

Ignoring dependencies — Everything is connected. Your application might recover fine, but if the authentication service it depends on doesn’t, you’re still down.

Staff who don’t know the runbooks — Documentation is worthless if nobody’s trained on it.

Penny-pinching on DR — Saving money now, paying in reputation and regulatory fines later.

A Framework That Works

If you need a process — and you do — here’s one that’s stood up to real-world use:

1. Assess

Identify what actually matters. Run a business impact analysis. Talk to the business, not just IT. Find out which systems are genuinely critical and which ones people think are critical because they’ve never been asked to prioritise.

2. Design

Set realistic RTO/RPO targets based on the assessment. Pick recovery strategies appropriate to each tier. Don’t let technical preferences override business requirements.

3. Implement

Deploy the infrastructure. Automate everything possible. Lock down access. Implement immutable backups. Make sure your recovery environment is actually separate from production — ransomware that compromises your production admin credentials shouldn’t automatically compromise your DR environment too.

4. Test

Relentlessly. See the testing section above. If you’re not testing, you’re hoping. And hope is not a strategy.

5. Maintain

Update documentation when things change. Train staff. Review after incidents. Improve based on test results. Threats evolve, and so should your resilience posture.

This isn’t a one-time project. It’s a continuous practice.

The Broader Point

Resilience isn’t optional. It’s not a nice-to-have that can wait until next financial year. It’s not something you can outsource entirely to your cloud provider and forget about.

Every organisation — public sector, private sector, large, small — is a target. The question isn’t whether you’ll face an incident, but whether you’ll recover from it.

Set clear targets. Test relentlessly. Document everything. Train your people. And stop pretending the cloud will save you from your own negligence.

If you can’t prove your recovery works, it doesn’t.


This post is adapted from my masterclass at FutureScot Digital Scotland on 19 November 2024. If you’re wrestling with DR strategy, or you’ve got war stories about recovery plans that didn’t survive contact with reality, I’d be interested to hear them.

Architecture Notes / Takeaways

  • Set RTO/RPO based on business impact analysis, not wishful thinking
  • Your cloud provider won't save you from your own misconfiguration
  • An untested backup is a fantasy, not a recovery plan
  • Match recovery strategies to workload criticality — Active/Active for everything is budget suicide

Search