How to Create a Disaster Recovery PlanBy Tracey Wilson
IT professionals are often concerned with creating redundancies in their infrastructure and reduction of single points of failure. But the major concern for an organization or company is disaster recovery and in many cases, business continuity.
Disaster recovery is an important concept covered on the CompTIA Security+ (SY0-201) exam and an entire focus element is devoted to it and business continuity in the CISSP.
In this article, we will investigate the needs for disaster recovery and we will take a look at some of the best practices to ensure data can be protected and recovered during a disaster, thus ensuring that an organization can continue to function.
Why Organizations Need Disaster Recovery?
Disaster recovery of critical data and resumption of IT services is at the core of a full business continuity plan, but why is it needed?
Disasters can strike in many different ways including:
- Natural Disasters
- Tornadoes, fires, earthquakes
- Terrorist Attacks
- Computer Based Disasters
- Viruses, worms, malicious code
- Critical equipment failure
Having detailed plans and architectures in place to recover critical data from a disaster such as these is crucial for any organization. Their survival may well depend upon it. Let’s look at a few examples.
September 11, 2001 is one of most infamous dates in history of the United States. The attack on the World Trade Center buildings claimed thousands of lives, but their destruction also decimated thousands of businesses. Many of those businesses failed to survive due to the fact that they did not have adequate backups and plans to recover critical data. Some had backups, but they went to the other building and were not in remote geographic locations. There were a few companies that did survive without a backup, but lost tremendous amounts of time and money recreating data necessary to function.
In 2005, Hurricane Katrina slammed into the Deep South of the United States and wrecked infrastructure, homes, businesses, and left a region in turmoil for months. Communications were tremendously impacted as phones and networks were completely knocked out. Many local Points of Presence (POPs) and local data centers were damaged or under water. Some businesses were able to recover their data from remote backups, effectively move to a temporary location, and utilize alternative communication for those sites because they utilized detailed disaster recovery and business continuity plans.
While major catastrophes are not common, smaller scale disasters can still be decimating to an organization. One large computer virus that runs rampant in an IT organization can cripple functions and corrupt valuable data. Failures for redundancy in equipment and power can and do happen. The important thing to remember is that disasters may be minimized (viruses, attacks, equipment failures, etc), but they will happen. It is up to individual organizations to devise effective plan for recovery and test its effectiveness.
Creating a Disaster Recovery Plan for Your Needs
As disasters range in type, size, and scope, it is critical to recognize key areas to prevent or minimize impact on the IT infrastructure.
The first step to achieving a disaster recovery design is to effectively plan for what you need to protect. The IT plan should be an integral part of an organization’s overall business continuity plan and be aligned to the same goals for protecting crucial data, access, and continued function.
Some company’s plans may be very simple due to scope and size, but plans can become more complex with organizations containing multiple departments and/or locations. The true purpose should be to develop a plan to restore services and data as quickly as possible in event of a disaster.
There are several great planning books on disaster recovery and business continuity, but the steps below are common to many of the strategies for creating a recovery plan.
Step 1: Identify the Critical Needs
As an IT Security Professional involved in disaster recovery planning, development of a disaster recovery plan needs a list of requirements. The list of requirements should reflect the critical needs of the IT organization, how they impact the larger company, and will allow it to continue to function.
The key here is look at larger concepts first i.e., redundancy for power and equipment, backups of data, and potential alternative sites. Don’t drill down into the exact items yet, just look at the general.
Cost of a plan is also critically important and should not be overlooked in this process. A well detailed plan that can protect everything and keep the company up and running is great, but if it is too expensive and will bankrupt the organization, the plan will never be implemented.
Step 2: Identify Redundancy and Failover Potential
IT Security Professionals should work hand in hand with IT administrators to identify potential weakness in the organization and provide for redundancy and failover.
The focus here is to maintain access to key systems and data due to power and/or equipment failure. For power, redundant connections, Uninterruptable Power Supplies (UPS), and generators are the most common ways to ensure that systems can operate with power fluctuations or outages.
Equipment redundancy should involve redundant power supplies, management modules, RAID disk arrays with hot spare disks for failure, and redundant or alternative network connections. Whatever items are identified, remember that redundancy can be expensive and it is important to develop the plan to match your budget.
Step 3: Identify Backups and Backup Locations
Each organization will have different types of data and in varying amounts that need to be backed up. It is important to recognize types, since this will drive not only the amount of storage required, but also how regularly backups are needed.
Types of data typically backed up are:
- User data files
- Operating System backups
- Financial Data
- Audit Data
- Transaction files
- Website data
- Organizational documents
- Customer lists
Depending on cost, it may be needed to store copies of the data locally or at a remote facility.
Onsite storage could be on spinning disk arrays or in a tape library. In most common architectures, the data can be stored in a redundant file system or utilize an archive for backups. Archives can utilize Virtual Tape Libraries (VTLs) that utilizes spinning disk arrays and traditional tape libraries to store data long term. It is highly recommended that a backup of data is stored at a remote location and this is generally preferred to be a geographically remote location of more than 50 miles from the primary location. Natural disasters like earthquakes and hurricanes have a tendency to wreak havoc over larger areas.
Now that you have identified the data and where to store it, it is important to address the method of backup and thus its restoration. Backups can consist of a full backup, an incremental backup, and a differential backup. A full backup is a comprehensive and complete copy of all files on disk or file system at that point in time. Systems can be restored from a full backup alone, but this backup type is recommended on a less frequent basis (like once a week) due to its long process time and resource demand.
An incremental backup is a partial backup that only copies the information that has been changed since the last full or incremental backup. This backup is less intensive than a full backup and can be run more frequently after a full backup has been performed. If a restore is initiated, the last full back must be restored and then each incremental backup restored to ensure full recovery.
A differential backup is similar to an incremental backup, but it operates a little differently. This backup method stores files that have altered since the last full backup and makes copies of files that have not been altered since the last differential backup. When restoring, the administrator can use the last full backup and then the last differential backup.
Once the backup methods to be used have been identified, specific attention needs to be placed on the frequency and schedule. Some data backups will occur more frequently and the next section concerning equipment will be impacted on your decision.
Step 4: Backup Equipment
Specifying the correct and affordable equipment to provide access to your data is absolutely essential to run your organization, but almost of equal importance is the type of equipment to backup and store that data. As part of the plan and design of the architecture, tremendous forethought is needed to address performance, growth in the amount of data, and supportability of the backup applications.
Backups need to complete and not hinder production environments, network connections must be capable to move all backups internally and externally without bottlenecks, and the architecture needs to be flexible to growth as the amount of data increases.
Care must also be taken to choose servers and components that will work with the applications performing backups or managing the archives. As always with this process, be mindful of the budget involved.
Step 5: Investigate Alternate Sites
An option for an organization is to investigate an alternative site for that organization to temporarily function in the case of a disaster. Items to consider here are size and supportability for the organization and IT, network and phone connectivity, and cost.
Step 6: Test
Disaster recovery and business continuity plans are useless unless they are tested for effectiveness. IT Security Professionals should work with members of management to conduct disaster drills that test recovery of data, failovers, and access to remotely stored data.
The Need for Disaster Recovery
As you can see from these few steps, there is already a lot of detail required for disaster recovery plans. Detail is crucial to making sure that any disaster recovery plan is effective, can restore access to data, and allow an organization to continue to function.
There are issues here that we have not addressed concerning security of data at a remote location or in transit to the remote relocation, but those issues will be addressed in a later article.
Today, United State Federal Agencies require all departments to have clearly defined business continuity plans that include detailed disaster recovery architectures and restoration plans. Many organizations and companies around the world have also developed their own plans and several of those have seen the need to act on them to restore their data and maintain function.
Developing a disaster recovery solution can be expensive, but properly managed, the cost can be justified. The question that should be asked to members of management of an organization that doesn’t have a disaster recovery or business continuity plan is: ”
When disaster strikes, can we afford to start over?”
About the Author
Tracey Wilson (CCNA, JNCIS, SNIA, MCSE) has a B.S. in Electrical Engineering and experience in network administration, network architecture and disaster recovery solutions. He’s also an active participant in SCinet, the organization responsible for planning and implementing the “World’s faster Network” as well as IEEE Computer Society and Association for Computing Machinery (ACM). Tracey currently serves as the technical lead and program manager for DICE - Data Intensive Computing Environment, evaluating new and emerging technologies to solve HPC and data management issues.