Deployment Operations¶
Comprehensive deployment procedures and operational workflows for the RCIIS DevOps platform.
Overview¶
The RCIIS platform uses GitOps-based deployment with ArgoCD, ensuring declarative, version-controlled, and auditable deployments across all environments.
Deployment Architecture¶
GitOps Workflow¶
- Code Commit: Developers push code changes to feature branches
- CI Pipeline: Automated build, test, and image creation
- Chart Update: Automatic chart version bump and registry push
- GitOps Repository: Update deployment configurations
- ArgoCD Sync: Automatic deployment to target environments
- Validation: Post-deployment testing and monitoring
Environment Progression¶
- Local: Developer workstations for initial testing
- Testing: Automated CI/CD testing environment
- Staging: Pre-production validation and user acceptance
- Production: Live production environment (planned)
Deployment Procedures¶
Standard Application Deployment¶
Prerequisites: - Code merged to master branch - All tests passing in CI pipeline - Chart version updated - Deployment configuration reviewed
Deployment Steps:
# 1. Verify current state
argocd app list
argocd app get nucleus-staging
# 2. Check for drift
argocd app diff nucleus-staging
# 3. Sync application
argocd app sync nucleus-staging
# 4. Monitor deployment
kubectl get pods -n nucleus --watch
kubectl rollout status deployment/nucleus -n nucleus
# 5. Verify health
kubectl get ingress -n nucleus
curl -f https://nucleus-staging.devops.africa/health
Infrastructure Deployment¶
Core Infrastructure Components:
# Deploy infrastructure in order
argocd app sync cert-manager-staging
argocd app sync ingress-nginx-staging
argocd app sync argocd-staging
# Wait for readiness
kubectl wait --for=condition=ready pod -l app=cert-manager -n cert-manager --timeout=300s
kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=ingress-nginx -n ingress-nginx --timeout=300s
# Verify certificates
kubectl get certificates -A
kubectl get clusterissuer
Database Deployment and Migration¶
SQL Server Deployment:
# Deploy database
helm install mssql bitnami/mssql \
--namespace database \
--create-namespace \
--values apps/rciis/database/staging/values.yaml
# Wait for database ready
kubectl wait --for=condition=ready pod -l app=mssql -n database --timeout=600s
# Run migrations
kubectl exec deployment/nucleus -n nucleus -- \
dotnet ef database update --project /app/Nucleus.Data.dll
Migration Verification:
# Check migration status
kubectl exec deployment/nucleus -n nucleus -- \
dotnet ef migrations list --project /app/Nucleus.Data.dll
# Verify database schema
kubectl exec -it mssql-0 -n database -- \
/opt/mssql-tools/bin/sqlcmd -S localhost -U sa -P $SA_PASSWORD \
-Q "SELECT name FROM sys.tables"
Message Queue Deployment¶
Kafka Cluster Deployment:
# Deploy Strimzi operator
kubectl apply -f apps/rciis/strimzi/staging/
# Wait for operator
kubectl wait --for=condition=ready pod -l name=strimzi-cluster-operator -n kafka --timeout=300s
# Deploy Kafka cluster
kubectl apply -f apps/rciis/strimzi/staging/extra/kafka.yaml
# Wait for Kafka ready
kubectl wait --for=condition=Ready kafka/kafka-cluster -n kafka --timeout=600s
# Create topics and users
kubectl apply -f apps/rciis/strimzi/staging/extra/
Kafka Verification:
# Check cluster status
kubectl get kafka kafka-cluster -n kafka
# Verify topics
kubectl get kafkatopic -n kafka
# Test producer/consumer
kubectl exec kafka-cluster-kafka-0 -n kafka -- \
bin/kafka-topics.sh --bootstrap-server localhost:9092 --list
Rolling Updates¶
Application Updates¶
Zero-Downtime Deployment Strategy:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nucleus
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 1
template:
spec:
containers:
- name: nucleus
image: harbor.devops.africa/rciis/nucleus:v1.2.3
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
Update Procedure:
# Update image tag
yq '.image.tag = "v1.2.3"' -i apps/rciis/nucleus/staging/values.yaml
# Commit changes
git add apps/rciis/nucleus/staging/values.yaml
git commit -m "Update nucleus to v1.2.3"
git push origin master
# Monitor ArgoCD sync
argocd app sync nucleus-staging --timeout 300
# Watch rollout
kubectl rollout status deployment/nucleus -n nucleus --timeout=300s
# Verify new version
kubectl get pods -n nucleus -o wide
kubectl exec deployment/nucleus -n nucleus -- cat /app/version.txt
Infrastructure Updates¶
Helm Chart Updates:
# Update Helm repositories
helm repo update
# Check for available updates
helm list -A
helm search repo ingress-nginx --versions
# Update chart version
helm upgrade ingress-nginx ingress-nginx/ingress-nginx \
--namespace ingress-nginx \
--values apps/infra/ingress-nginx/local/values.yaml \
--version 4.8.0
# Verify update
kubectl get pods -n ingress-nginx
kubectl get deployment ingress-nginx-controller -n ingress-nginx -o yaml | grep image:
Rollback Procedures¶
Application Rollback¶
Quick Rollback:
# Check rollout history
kubectl rollout history deployment/nucleus -n nucleus
# Rollback to previous version
kubectl rollout undo deployment/nucleus -n nucleus
# Rollback to specific revision
kubectl rollout undo deployment/nucleus -n nucleus --to-revision=2
# Verify rollback
kubectl rollout status deployment/nucleus -n nucleus
ArgoCD Rollback:
# View application history
argocd app history nucleus-staging
# Rollback to specific revision
argocd app rollback nucleus-staging 123
# Force sync if needed
argocd app sync nucleus-staging --force --prune
Database Rollback¶
Migration Rollback:
# List applied migrations
kubectl exec deployment/nucleus -n nucleus -- \
dotnet ef migrations list --project /app/Nucleus.Data.dll
# Rollback to specific migration
kubectl exec deployment/nucleus -n nucleus -- \
dotnet ef database update PreviousMigrationName --project /app/Nucleus.Data.dll
# Verify rollback
kubectl exec -it mssql-0 -n database -- \
/opt/mssql-tools/bin/sqlcmd -S localhost -U sa -P $SA_PASSWORD \
-Q "SELECT * FROM __EFMigrationsHistory ORDER BY MigrationId DESC"
Blue-Green Deployments¶
Blue-Green Strategy Setup¶
Blue Environment (Current):
apiVersion: apps/v1
kind: Deployment
metadata:
name: nucleus-blue
labels:
app: nucleus
version: blue
spec:
replicas: 3
selector:
matchLabels:
app: nucleus
version: blue
Green Environment (New):
apiVersion: apps/v1
kind: Deployment
metadata:
name: nucleus-green
labels:
app: nucleus
version: green
spec:
replicas: 3
selector:
matchLabels:
app: nucleus
version: green
Traffic Switching:
# Deploy green environment
kubectl apply -f nucleus-green-deployment.yaml
# Wait for green pods ready
kubectl wait --for=condition=ready pod -l app=nucleus,version=green -n nucleus
# Test green environment
kubectl port-forward deployment/nucleus-green 8080:8080 -n nucleus
curl http://localhost:8080/health
# Switch traffic to green
kubectl patch service nucleus -p '{"spec":{"selector":{"version":"green"}}}' -n nucleus
# Verify traffic switch
kubectl get endpoints nucleus -n nucleus
# Remove blue environment
kubectl delete deployment nucleus-blue -n nucleus
Canary Deployments¶
Canary Strategy with Flagger¶
Canary Configuration:
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: nucleus
namespace: nucleus
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: nucleus
progressDeadlineSeconds: 60
service:
port: 80
targetPort: 8080
analysis:
interval: 30s
threshold: 5
maxWeight: 50
stepWeight: 10
metrics:
- name: request-success-rate
threshold: 99
interval: 1m
- name: request-duration
threshold: 500
interval: 30s
Canary Deployment Process:
# Start canary deployment
kubectl set image deployment/nucleus nucleus=harbor.devops.africa/rciis/nucleus:v1.2.3 -n nucleus
# Monitor canary progress
kubectl get canary nucleus -n nucleus --watch
# Check canary metrics
kubectl describe canary nucleus -n nucleus
# Promote or rollback based on metrics
flagger promote nucleus -n nucleus # or flagger rollback nucleus -n nucleus
Health Checks and Validation¶
Post-Deployment Validation¶
Application Health Checks:
# Health endpoint validation
curl -f https://nucleus-staging.devops.africa/health
# API functionality test
curl -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://nucleus-staging.devops.africa/api/status
# Database connectivity test
kubectl exec deployment/nucleus -n nucleus -- \
dotnet run --project /app/Tools/DatabaseCheck.dll
Infrastructure Validation:
# Certificate validation
kubectl get certificates -A
openssl s_client -connect nucleus-staging.devops.africa:443 -servername nucleus-staging.devops.africa < /dev/null
# Ingress validation
kubectl get ingress -A
kubectl describe ingress nucleus-ingress -n nucleus
# Service mesh validation (if applicable)
kubectl get virtualservice -A
kubectl get destinationrule -A
Smoke Tests¶
Automated Smoke Test Suite:
#!/bin/bash
# smoke-test.sh
set -e
echo "Running smoke tests for nucleus-staging..."
# Test health endpoint
echo "Testing health endpoint..."
curl -f https://nucleus-staging.devops.africa/health
# Test authentication
echo "Testing authentication..."
AUTH_RESPONSE=$(curl -s -X POST https://nucleus-staging.devops.africa/api/auth/login \
-H "Content-Type: application/json" \
-d '{"username":"[email protected]","password":"test123"}')
TOKEN=$(echo $AUTH_RESPONSE | jq -r '.token')
# Test protected endpoint
echo "Testing protected endpoint..."
curl -f -H "Authorization: Bearer $TOKEN" \
https://nucleus-staging.devops.africa/api/declarations
# Test database connection
echo "Testing database connection..."
kubectl exec deployment/nucleus -n nucleus -- \
/app/tools/db-check --connection-string "$DB_CONNECTION"
# Test Kafka connectivity
echo "Testing Kafka connectivity..."
kubectl exec deployment/nucleus -n nucleus -- \
/app/tools/kafka-check --bootstrap-servers "kafka-cluster-kafka-bootstrap:9092"
echo "All smoke tests passed!"
Monitoring Deployments¶
Deployment Metrics¶
Key Deployment Metrics:
# Deployment success rate
rate(deployment_status_condition{condition="Progressing",status="True"}[5m])
# Pod restart rate during deployment
rate(kube_pod_container_status_restarts_total[5m])
# Application response time during deployment
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
# Error rate during deployment
rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m])
Deployment Alerts:
groups:
- name: deployment.rules
rules:
- alert: DeploymentFailed
expr: kube_deployment_status_condition{condition="Progressing",status="False"} == 1
for: 10m
labels:
severity: critical
annotations:
summary: "Deployment {{ $labels.deployment }} failed"
description: "Deployment has been failing for more than 10 minutes"
- alert: HighErrorRateDuringDeployment
expr: rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) > 0.05
for: 2m
labels:
severity: warning
annotations:
summary: "High error rate during deployment"
description: "Error rate is {{ $value | humanizePercentage }}"
Troubleshooting Deployments¶
Common Deployment Issues¶
Pod Startup Failures:
# Check pod status
kubectl get pods -n nucleus
kubectl describe pod <pod-name> -n nucleus
# Check logs
kubectl logs <pod-name> -n nucleus --previous
kubectl logs deployment/nucleus -n nucleus --tail=100
# Check events
kubectl get events -n nucleus --sort-by=.metadata.creationTimestamp --field-selector involvedObject.name=<pod-name>
Image Pull Issues:
# Check image pull secrets
kubectl get secrets -n nucleus | grep docker
kubectl describe secret harbor-registry -n nucleus
# Test image pull manually
docker pull harbor.devops.africa/rciis/nucleus:v1.2.3
# Check registry connectivity
kubectl run debug --image=busybox -it --rm -- /bin/sh
# Inside pod: wget -O- https://harbor.devops.africa/v2/
Resource Constraints:
# Check resource usage
kubectl top nodes
kubectl top pods -n nucleus
# Check resource quotas
kubectl describe resourcequota -n nucleus
# Check node capacity
kubectl describe node <node-name>
Diagnostic Commands¶
# Check deployment status
kubectl get deployments -A
kubectl rollout status deployment/nucleus -n nucleus
# Check replica sets
kubectl get replicasets -n nucleus
kubectl describe replicaset <rs-name> -n nucleus
# Check ArgoCD application status
argocd app get nucleus-staging
argocd app diff nucleus-staging
# Check Helm releases
helm list -A
helm status nucleus -n nucleus
Best Practices¶
Deployment Strategy¶
- Automated Testing: Comprehensive test coverage before deployment
- Gradual Rollout: Use rolling updates or canary deployments
- Health Checks: Robust liveness and readiness probes
- Monitoring: Real-time monitoring during deployments
Change Management¶
- Code Review: All changes reviewed before deployment
- Documentation: Update documentation with changes
- Communication: Notify stakeholders of significant deployments
- Rollback Plan: Always have a rollback strategy
Security¶
- Image Scanning: Scan container images for vulnerabilities
- Secret Management: Secure handling of sensitive configuration
- Access Control: Limit deployment permissions
- Audit Trail: Maintain complete deployment history
Performance¶
- Resource Planning: Adequate resource allocation
- Scaling: Plan for traffic increases
- Optimization: Regular performance optimization
- Capacity Management: Monitor and plan capacity
For advanced deployment strategies and troubleshooting, refer to the Kubernetes and ArgoCD documentation.