RTO: The Recovery Time Objective (RTO) is the duration of time and a service level within which a business process must be restored after a disaster (or disruption) in order to avoid unacceptable consequences associated with a break in business continuity.
Perhaps it’s time for that definition to change.
The problem with the traditional definition of RTO is that it’s focused on addressing the time it takes to get back up and running at a disaster recovery site and doesn’t take into account the move back to the production environment when that occurs.
Most organizations only consider the former and not the latter and are surprised when the disaster recovery solution they’ve implemented either requires a significant amount of downtime to “fallback” to production or doesn’t provide and automated way to do so.
Lengthy data “re-seeding” processes, for example, can mean that moving data back into production could take hours – even days. Complicated procedures for reversing the original failover process can introduce the potential for human error and downtime that wasn’t accounted for as part of the original stated Recovery Time Objectives.
Disaster recovery sites and computing infrastructures aren’t built to run at the same capacity and performance levels as production environments. We’re willing to accept a certain level of degraded service so long as we’re running in “recovery mode” because we aren’t willing to invest the amount of resources required to achieve parity for such a short amount of time.
So to ensure we’re making the right choices when selecting a disaster recovery approach, let’s broaden the scope of our RTO evaluations to include not just the time it takes to restore after a disaster but the time it takes to also restore production services to the state they were in before the disaster occurred.