Issue of Availability for Business Continuity
Every minute of downtime can equate to a massive loss in revenue for a company – this explains why most CIO's identify business continuity as one of the highest technology priorities.
IT is a critical component of business resiliency and continuity, and its imperative you develop and implement IT disaster recovery plans to protect your operations from downtime and data loss.
Any good system that is targeting the public or the enterprise these days must be built to expect the unexpected. No system is perfect and at some point, something will happen that will render a system inoperative – a fire, a hurricane, an earthquake, human error – the list goes on. Because there are so many different possible ways that systems can fail, systems need to be designed with the expectation that failure will occur.
There are two related, but often confused topics that play into system architecture that mitigate against failure:
High Availability (HA) and Disaster Recovery (DR).
High availability, simply put, is eliminating single points of failure and disaster recovery is the process of getting a system back to an operational state when a system is rendered inoperative. In essence, disaster recovery picks up when high availability fails.
So, let's examine High Availability first.
High Availability
As mentioned, High Availability is about eliminating single points of failure, so it implies redundancy.
There are basically 3 kinds of redundancy that are implemented in most systems: hardware, software, and environmental.
1. Hardware Redundancy
Hardware redundancy was one of the first ways that high availability was introduced into computing. Before most apps were connected to the internet, they served enterprises on a LAN. These servers didn't need the scale that modern applications do where there may be thousands of simultaneous connections with 24/7 demand. These applications did, however, supplied business critical data, so they needed hardware that was fault tolerant.
Single points of failure were eliminated by manufacturers building servers that had:
- Redundant storage with RAID or similar technology, which ensured that data was written to be read from multiple physical disks. This prevented data loss and downtime.
- Redundant power, typically in the form of multiple power supplies, enabled admins to connect servers to independent power sources so servers could remain powered on if there was a power loss from one source.
- Error correction, such as ECC RAM, that enabled data to be healed in the event of data corruption in storage.
- Redundant networking, such as multiple NIC's connected to independent networks to ensure that a server remained online in the event of network failures.
2. Software Redundancy
Software redundancy soon followed suit. Application designers worked to ensure that applications themselves could tolerate failures in a system, be it hardware failure, configuration errors, or any number of other reasons that could take down a part of the software.
A few ways this has been accomplished includes:
- Clustering technologies, such as database clusters, that spread workloads across multiple servers.
- Statelessness in applications for rapid scaling and easy-to-configure high availability.
- Load balancing with application monitoring by way of health probes. This allows incoming requests to applications to be routed to healthy application nodes as well as raise events to proactively handle failure.
- Self-healing systems that move workloads around or allocate additional capacity when failures are detected.
3. Environmental Redundancy
With the rise of cloud computing, cloud providers have taken high availability to a whole new level to include large scale, environmental redundancy with:
- Hardware redundancy on a server rack within data centers to include discrete networking, power, and storage for hardware that allows users to spread workloads to mitigate single points of failure. Azure calls these "fault domains".
- Data center redundancy within a geographic region, typically referred to as an "availability zone", allow users to run applications in separate data centers that are located geographically close to one another.
All these domains (hardware, software, and environmental) seek to solve the same basic problem by making efforts to eliminate single points of failure. The results now supply high service level agreements (SLA's) that measure unplanned downtime to less than 10 seconds for a given 24-hour period.
Disaster Recovery
Disaster recovery picks up where high availability fails.
Disaster recovery can be as simple as restoring from a backup, but it can also be very complex too depending on two factors: the Recovery Time Objective (RTO) and Recovery Point Objective (RPO).
ARecovery Time Objective is the maximum amount of time that a system can be down before it is recovered to an operational state. For some systems, this recovery time objective can be measured in hours or even days, but for more mission-critical systems the recovery time objectives are typically measured in seconds.
ARecovery Point Objective is the amount of data loss, measured in time that is tolerable in a disaster. For some systems, losing a day's worth of data might be acceptable while for other systems this might be mere minutes or seconds. The length of RTO's and RPO's have profound implications on how disaster recovery plans are implemented.
Short RTO's and RPO's require that a system implements active data replication between primary and recovery systems (such as database log shipping) and maintaining failover systems in a ready (expressed as "hot-hot") or near ready ("hot-warm") state to take over in the event of a disaster. Likewise, the trigger for a disaster recovery failover is automated.
For longer RTO's and RPO's, restoring systems from daily backups might be enough to meet the RTO's and RPO's. These backups might be backups of application servers, databases, or both. The process for restoring these may be manual, automated, or both. Whenever backups are used to restore systems to an operational state though, this is typically referred to as a "hot-cold" configuration. In any case, the process of recovering a hot-cold configuration is significantly longer than hot-warm or hot-hot.
One of the biggest factors that prevents organizations from implementing high availability and short RTO's and RPO's is cost. Where HA is concerned, more redundancy requires more resources which translates into higher costs. Similarly, short RTO's and RPO's require that capacity be available to handle a failover, which also translates into higher costs. There is always a balancing act between costs and system downtime, and sometime the costs of HA, short RTO's, and short RPO's is not worth it for some apps, while for others it is necessary no matter what the costs may be.
Fundamentally, High Availability and Disaster Recovery are aimed at the same problem: keeping systems up and running in an operational state, with the main difference being that HA is intended to handle problems while a system is running while DR is intended to handle problems after a system fails.
Regardless though of how highly available a system is, any production application no matter how trivial minimally needs to have some sort of disaster recovery plan in place.
Types of Disaster Recovery Plans
Disaster recovery is an essential part of keeping data safe and maintaining business continuity. However, with so many different options of disaster recovery plans that a business can implement out there, the process of finding the best fit can be overwhelming. Each business is different, so it's important to understand all of the choices available to you. This way, you can pick which plan best suits your needs. To help you out, here's what you should consider about four types of disaster recovery plans:
1. Data Center Disaster Recovery
In this approach, the disaster recovery plan is not just limited to the computing facility it's housed in. The entire building plays a large role in data center DR. Features and tools within the building, such as physical security, support personnel, backup power, HVAC, utility providers, and fire suppression all have an effect on data center DR.
In the event of any sort of outage, these elements within the building must be in working order. With these components working, your data is at a lower risk against intruders and cybercriminals. However, even if everything is functioning correctly, your data center can still be susceptible to a natural disaster.
2. Cloud Based Disaster Recovery
When using a cloud-based approach, you are able to cut costs by using a cloud provider's data center as a recovery site, rather than spending more on your own data center's facilities, personnel, and systems.
Users also benefit from the competition between cloud providers, as they continue to attempt to best each other in the market. Before committing to this method, determine the challenges that providers may have with your business' backup and recovery. The provider may be able to assist you in fixing those problems before the cloud becomes a part of your DR plan.
3. Virtualization Disaster Recovery
Virtualization negates the need to reconstruct a physical server in the event of a disaster. You are also able to achieve your targeted recovery time objectives (RTO) more easily by placing a virtual server on reserve capacity or the cloud.
4. Disaster Recovery as a Service (DRaaS)
While Disaster Recovery as a Service (DRaaS) is often based in the cloud, it is not strictly cloud-based. Some DRaaS providers offer their solutions as a site-to-site service, in which they host and run a secondary hot site. Additionally, providers can rebuild and ship servers to an organization's site as a server replacement service.
On the other hand, cloud-based DRaaS enables users to failover applications immediately, orchestrate failback to rebuilt servers, and reconnect users through VPN or Remote Desktop Protocol.
6 Benefits of Disaster Recovery for Businesses
Many businesses are implementing or switching to disaster recovery plans. Here are six key benefits:
1. Drastic reduction of restore times and lower RTO & RPO
Thanks to Disaster Recovery solutions, you have the possibility restore systems, services and applications in short times and get significantly lower RTO and RPO. According to the parameters defined from DR plan, you could drastically reduce restore times on the basis of your needs, which would be completely impossible without using a Disaster Recovery solution.
2. Limits the losses due to revenue reduction or other costs
By reducing restore times of business information systems, you can limit the losses not only in terms of revenues, but even related to, for example, costs for possible damage caused by downtime and management or technical assistance expenditure.
3. Minimize the interruption of critical processes and safeguard business operations
Each company has critical processes that must be always active and are vital for the business continuity. Through a Disaster Recovery solution these kinds of processes will be preserved and possible interruptions minimized allowing a short resume to operations.
4. Avoid compromising the business reputation
Downtimes caused by unexpected incidents seriously threaten a firms' reputation. A short recovery bolsters the business strength as well as avoids causing irreversible damage to the corporate image.
5. Granular Management
The DR solution enables to manage replications in a granular way (which means restoration of data at file level or even smaller units), with the aim of assuring a complete recovery of data and services.
6. Regulatory Compliance
With a sound managed IT recovery plan in place, your company will remain in compliance with oversight requirements. For those in medical, financial services, or food production verticals, for example, various policies require continued compliance and protection of patients, customers, and the general public.
Still not sure which Disaster Recovery plan you should implement?
If you are worried about rains, earthquakes, power outages, and cyberattacks putting you out of business, please get in touch with our domain experts. We will provide a comprehensive risk assessment, understand your recovery objectives, and design a plan that minimizes downtime and ensures your peace of mind.
Source: https://www.reyamitech.com/disaster-recovery-high-availability-business-continuity/
0 Response to "Issue of Availability for Business Continuity"
Post a Comment