IT is the backbone on which business runs. Gone are the days when IT was considered simply a cost center; it is a vital business enablement platform and provides a competitive advantage. As the importance of IT increases, so do the threats against it. System failures, natural disasters, and rising criminal activity require businesses to review proactive recovery measures continually. Disaster recovery (DR) is one part of an overall business continuity plan. It consists of processes and procedures designed to restore vital data and services in the event of disruption or loss.
One important aspect of DR is data backup and systems recovery. Given the complexity of IT infrastructure, there are myriad data systems and devices requiring backups. These include Servers, PC’s, security devices, and an ever-expanding set of IOT devices.
Is essential data being backed up regularly, and how is it restored? Answering this question starts with a definition of what data is essential in your business environment. Core business functions such as finance and accounting, production, research and development, and shipping all have different types of data that must be prioritized based on:
- Business Impact – Who and what will be affected by a loss?
- Restore Point Objective – How “fresh” the restored data needs to be?
- Restore Time Objective – How quickly do you need specific systems up and running?
- Backup vs Replication – Are backups readily available to meet systems restoration requirements?
Business Impact Analysis (BIA)
We can determine the criticality of data and systems through business impact analysis. This process can vary from simple to complex depending on the organization’s size, infrastructure and data requirements, and standards compliance. At its most basic level, the goal is to determine the importance of systems and data based on their effect on business activities.
- Business Area or System Affected – This can be a department, function, product, service, or process. An example would be a data loss event that results in incomplete time entries from the preceding payroll cycle.
- Potential Impact – In this example, an impact would be the delay of payroll or processing of an incomplete payroll.
- Positive or Negative Impact – Of course, a delay in payroll is a negative impact.
- Impact Magnitude (small, medium, large) – In most cases, delayed payroll would be a large impact.
- Maximum Allowable Downtime – The amount of downtime the business is willing or able to survive on.
Restore Point Objective (RPO)
RPO is the interval of time that might pass during an event before the quantity of data lost during that period exceeds a predetermined allowable threshold. Simply put, you are asking, what will be our starting point once systems are restored? For example, if your RPO is four hours for Application X, then backing up your data once per day will not meet your RPO requirement.
Restore Time Objective (RTO)
RTO is the duration of time and service level after a disaster within which a business process must be restored. Your BIA should prioritize IT services and the order in which systems are restored. The questions to ask are: What are your priority systems? When do they need to return to operations? What dependencies need to be considered?
Traditional Backups and Continuous Replication
Backups and replication processes are distinct. Backups are copies of data taken at a specific point in time, and at specific intervals for offsite storage. For example, copies of data to a disk or a cloud service twice per day. Replication is a copy-move process that occurs in real-time or in short intervals. Data is copied directly between sites to hosts and/or storage in data centers, or public or private cloud. While both approaches allow the restoration of data in the event of loss, each has its advantages and disadvantages.
|Backup|| || |
|Replication|| || |
As we noted in our post about moving backups to the cloud, regardless of the methodology chosen, make sure you’re not in a place where cybercriminals can delete your backups. Keeping air-gapped (offline) backups is an extremely effective way to protect your data, especially if admin accounts are breached. Learn more about air-gapped backups in this post.
In the case of device failure or loss of equipment, hardware replacement may be required. An inventory of spare parts, appliances, PCs, and other equipment may be maintained when practical. To be effective, these spare assets must be deployed to the affected location quickly. Set up a plan for spares as well as a procurement plan for resources not maintained as spares. Images and configuration data may be required to restore business activity. Ensure that this system information is available for the recovery process. IOT devices are also a growing part of business networks. As such, their configuration and firmware may need to be considered part of a backup plan.
When your business begins disaster recovery planning, using a three-step approach for systems backup and restoration is a key place to start. First, plan using business impacts to support decision points. Second, determine if backup, replication, or a hybrid approach is best for your environment. Lastly, test the backup-restoration process regularly; waiting for a disaster is not the best way to determine if your DR plan works. Your business’s ability to re-open quickly after a disaster provides an edge over competitors and may open opportunities and further success in your industry.