Published: August 17, 2020
The 451 Take
Automation is another key area where organizations should focus since this will not only make it easier to test DR plans, but it will also make recovery operations faster and more consistent since human errors could be minimized. Another key cog in the disaster recovery modernization effort is public cloud storage, where elasticity has become attractive since it keeps costs down until a failover operation is necessary. This trend should continue for many organizations since few have an appetite for creating and maintaining additional datacenters and the IT infrastructure within them.
Outages are costly, with many negative consequences
Nearly half of these outages (49%) led to losses over $100,000 for the affected companies. The largest organizations, which had headcounts of 10,000+ employees, had a higher proportion of costly outages, with 13% of respondents reporting an outage costing over $1m compared with midsized (1,000-9,999 employees) companies, where 8% suffered through a $1m+ outage.
The negative impact of lost or unrecoverable backup data can impact an organization in multiple ways (see Figure 1). Lost worker productivity was the most frequently chosen negative impact for 49% of the respondents, which emphasizes why speed of recovery is important to get those workers back on track as soon as possible with minimal data loss. Data loss and outages also impact the reputation of an organization (35% of respondents) and customer loyalty (19%), which should be a key concern for those focused on delivering a strong and consistent customer experience.
A wide range of causes lead to outages
Security issues like ransomware and viruses resulted in outages for 17% of respondents, and this has been a point of emphasis for storage vendors that are looking to improve recoverability while also ensuring that an uncompromised, golden copy of data is available when the data protected in other recovery options such as snapshots and short-term backups is corrupted. Human error was a cause of failure for 15% of respondents and proponents of automation claim that the reduction of manual processes that often create these errors could substantially boost reliability and consistency.
Although facility power (15%) and network failures (6%) are often discussed as a byproduct of natural disasters such as hurricanes, in the survey these two types of failures were lower on the list compared with the previously mentioned issues. Cloud or SaaS failures were responsible for only 2% of the respondents' most recent outages, although we expect this figure to rise as a growing number of organizations leverage these services for production workloads.
- Improve disaster recovery testing and preparedness.
As we discussed in a previous report on DR testing, only 17% of organizations test their DR implementations more than twice a year, with 46% of respondents settling for annual testing. Given the rapid changes that are occurring in production environments from software and hardware updates, even biannual testing of DR is not adequate to keep up with the pace of change of infrastructure and applications.
- Invest in automation.
In the study, 80% of respondents said they would allow artificial intelligence or automated tools to initiate a failover operation. The negative impact of outages is exacerbated by the length of downtime. With automation in place, organizations can restore their production environments quickly and consistently at a secondary site or potentially in a cloud environment. Automation can also help facilitate and accelerate DR testing and validation.
- Leverage AI-enhanced management and monitoring tools.
By using these tools, organizations will be able to locate issues before they become major outages. These tools often provide recommendations that customers can use to proactively improve their software and hardware updates and maintenance, which is important given that hardware and software failures accounted for 42% of failures (see Figure 2).
- Implement cloud-based disaster recovery.
The replacement of DR sites was the top driver for respondents that were using public cloud storage services and was chosen by 37%, while 34% said they were using cloud as a replacement for tape for long-term storage. In the study, 39% of respondents were already deploying hybrid cloud data protection with local backups and long-term backup data being stored in a cloud environment. Cloud-based disaster recovery is attractive to organizations since it takes advantage of the elasticity of cloud and only consumes production resources when they are activated in a failover.
Henry Baltazar is a Research Director for the storage practice at 451 Research, a part of S&P Global Market Intelligence. Henry returned to 451 Research after spending nearly three years at Forrester Research as a senior analyst serving Infrastructure & Operations Professionals and advising Forrester clients on datacenter infrastructure technologies.