← Back to Index
Chapter 17 of 20

Disaster Recovery & High Availability

Domain 2 — Resilient Architectures (26%)
🔄
Question 1Knowledge

What is the correct definition of RTO and RPO in disaster recovery planning?

Explanation

RTO = Recovery Time Objective: how long can the business tolerate being down? (e.g., "we must be back online within 4 hours"). RPO = Recovery Point Objective: how much data loss is acceptable? (e.g., "we can lose at most 1 hour of data"). A lower RTO/RPO requires a more expensive, always-ready DR solution.

Question 2Scenario

A company has an RTO of 4 hours and RPO of 1 hour. They want the MOST cost-effective DR strategy. Which approach BEST meets these requirements without over-engineering?

Explanation

DR Strategies (cheapest → most expensive, slowest → fastest RTO): Backup & Restore (RTO: hours/days) → Pilot Light (RTO: hours) → Warm Standby (RTO: minutes) → Multi-Site Active-Active (RTO: seconds). Pilot Light keeps the "core flame" lit (e.g., DB replication) and scales up the rest on failover. It meets a 4-hour RTO at lower cost than Warm Standby.

StrategyRTORPOCost
Backup & RestoreHours–DaysHours$
Pilot LightHoursMinutes$$
Warm StandbyMinutesSeconds$$$
Multi-Site Active-ActiveSecondsNear Zero$$$$
Question 3Scenario

A company uses Route 53 Failover routing with a primary endpoint in us-east-1. What MUST be configured for Route 53 to automatically switch DNS to the secondary endpoint when the primary fails?

Explanation

Route 53 Health Checks actively monitor endpoints (HTTP/HTTPS/TCP) at configurable intervals (10 or 30 seconds). If the health check fails a threshold number of consecutive checks, Route 53 marks the record as unhealthy and routes traffic to the healthy failover record. Health checks can also monitor CloudWatch alarms for application-level health signals.

Question 4Scenario

A company wants centralised backup management for RDS, DynamoDB, EFS, EBS volumes, and EC2 instances across 20 AWS accounts in their Organisation. Which service simplifies this?

Explanation

AWS Backup provides a centralised place to configure and audit backup policies across AWS services (RDS, Aurora, DynamoDB, EFS, EBS, FSx, Storage Gateway, EC2) and AWS accounts. You create Backup Plans with schedules and retention rules, and assign resources. Cross-account backup and cross-region copy are supported for DR compliance.

Question 5Scenario

During a Pilot Light DR failover, a company's DR team needs to promote the standby RDS Read Replica in the DR region to a primary database. After promotion, what else must be done to restore service?

Explanation

Pilot Light failover is not fully automatic. After promoting the Read Replica: (1) Scale up EC2/ECS application tier in DR region (it was stopped/minimal). (2) Update connection strings or Route 53 records to the new RDS endpoint. (3) Verify all application components are healthy. This manual orchestration is why Pilot Light has a longer RTO than Warm Standby.