Skip to content

Disaster Recovery

Comprehensive disaster recovery procedures and business continuity planning for the RCIIS DevOps platform.

Overview

This disaster recovery plan ensures business continuity in the event of system failures, data loss, or infrastructure outages affecting the RCIIS platform.

Recovery Objectives

Recovery Time Objective (RTO)

  • Critical Services: 15 minutes
  • Standard Services: 1 hour
  • Development Services: 4 hours
  • Full System Recovery: 4 hours

Recovery Point Objective (RPO)

  • Database: 15 minutes (transaction log backups)
  • Application State: 1 hour (application backups)
  • Configuration: 24 hours (daily Git snapshots)
  • File Storage: 1 hour (incremental backups)

Disaster Scenarios

Infrastructure Failures

  • Complete cluster failure
  • Node hardware failures
  • Network connectivity loss
  • Storage system failures

Data Loss Events

  • Database corruption
  • Accidental data deletion
  • Ransomware attacks
  • Configuration corruption

Service Disruptions

  • Application failures
  • Security breaches
  • Third-party service outages
  • Natural disasters

Recovery Procedures

Database Recovery

# Stop applications
kubectl scale deployment nucleus --replicas=0 -n nucleus

# Restore from backup
kubectl exec -it mssql-0 -n database -- \
  /opt/mssql-tools/bin/sqlcmd -S localhost -U sa -P $SA_PASSWORD \
  -Q "RESTORE DATABASE [NucleusDB] FROM DISK = '/backup/latest.bak' WITH REPLACE"

# Verify restoration
kubectl exec -it mssql-0 -n database -- \
  /opt/mssql-tools/bin/sqlcmd -S localhost -U sa -P $SA_PASSWORD \
  -Q "SELECT COUNT(*) FROM Declarations"

# Restart applications
kubectl scale deployment nucleus --replicas=2 -n nucleus

Cluster Recovery

# Recreate cluster
kind delete cluster --name rciis-local
kind create cluster --config kind-config.yaml --name rciis-local

# Deploy infrastructure
kubectl apply -f apps/infra/

# Restore applications
argocd app sync --all

Configuration Recovery

# Restore from Git backup
git clone backup-repo.git
kubectl apply -k restored-configs/

Backup Strategy

Automated Backups

  • Database: Every 15 minutes
  • Persistent Volumes: Daily snapshots
  • Configuration: Git commits
  • Secrets: Encrypted daily backups

Backup Verification

# Test database backup
sqlcmd -Q "RESTORE VERIFYONLY FROM DISK = '/backup/latest.bak'"

# Test configuration restore
kustomize build --dry-run restored-configs/

Testing and Validation

DR Testing Schedule

  • Monthly: Database restore testing
  • Quarterly: Full system recovery testing
  • Annually: Complete disaster simulation

Test Procedures

# Simulate database failure
kubectl delete pod mssql-0 -n database

# Test backup restoration
./scripts/test-database-restore.sh

# Validate system functionality
./scripts/smoke-tests.sh

For detailed backup procedures, refer to the Backup and Restore documentation.