fiddy/docs/09_DEPLOYMENT_EXECUTION_PLAYBOOK.md

# Deployment Execution Playbook (Hands-On Checkpoints)

Purpose: keep implementation work prepared in-repo, and call for operator actions only when local infrastructure access is required.

## Status Icon Legend
Use these in execution updates for fast scanning:
- `🔄` in progress
- `✅` completed
- `🧪` test/lint/verification result
- `📄` documentation update
- `🗄️` database or migration change
- `🚀` deploy/release step
- `⚠️` risk, blocker, or manual operator action needed
- `❌` failed command or unsuccessful attempt
- `ℹ️` informational context
- `🧭` recommendation or next-step option

## Phase 0: Preflight (No Infra Changes)
- [ ] `npm run lint`
- [ ] `npm test`
- [ ] `npm run build`
- [ ] Confirm docs are up to date:
  - [ ] `docs/public-launch-runbook.md`
  - [ ] `docs/07_PUBLIC_LAUNCH_CHECKLIST.md`
  - [ ] `docs/08_NGINX_PROXY_MANAGER_SETUP.md`
  - [ ] `docs/06_SECURITY_REVIEW.md`

## Phase 1: Registry + SSH Compose Wiring (Operator Needed)

Hands-on checkpoints:
1. Create/verify secrets in Gitea:
  - `REGISTRY_USER`
  - `REGISTRY_PASS`
  - `DEPLOY_KEY`
  - `DEPLOY_HOST`
  - `DEPLOY_USER`
  - `DEPLOY_HEALTHCHECK_URL`
2. Prepare deploy host for SSH Compose:
  - install Docker Engine + Compose plugin
  - create `/opt/fiddy/.env` with production variables
  - run `docker login git.nicosaya.com` as deploy user
3. Confirm production compose contract:
  - web image source: `git.nicosaya.com/nalalangan/fiddy/web`
  - scheduler image source: `git.nicosaya.com/nalalangan/fiddy/scheduler`
  - web publishes `3010:3000`
  - scheduler has no public port

Validation:
- [ ] Push-to-main triggers `.gitea/workflows/deploy-ssh-compose.yml`
- [ ] SSH deploy updates both web and scheduler containers
- [ ] Deploy guard confirms web and scheduler are running
- [ ] Health gate completes via `scripts/wait-for-health.sh`

## Phase 2: NPM Edge Setup (Operator Needed)
Use `docs/08_NGINX_PROXY_MANAGER_SETUP.md`.
Execution order helper: `docs/10_NPM_HANDS_ON_RUNSHEET.md`.

Hands-on checkpoints:
1. Proxy Host for Fiddy domain configured to internal app IP:port.
2. Proxy Host Advanced:
  - `docker/nginx/npm/proxy-host-advanced.conf.example`
3. Custom Location `/`:
  - `docker/nginx/npm/location-root-advanced.conf.example`
4. Custom auth/write locations:
  - `docker/nginx/npm/location-auth-advanced.conf.example`
  - `docker/nginx/npm/location-write-advanced.conf.example`
5. Global NPM `http` config includes:
  - `docker/nginx/npm/http_top.conf.example`

Validation:
- [ ] `scripts/smoke-public-launch.sh https://<domain>` passes
- [ ] Response header `X-Request-Id` present
- [ ] Response body includes `request_id`
- [ ] Rate limits are active under burst tests

## Phase 3: Host Security Baseline (Operator Needed)
Hands-on checkpoints:
1. Firewall baseline:
  - dry run: `SSH_ALLOW_CIDR=<cidr> DRY_RUN=1 scripts/harden-host-ufw.sh`
  - apply: `SSH_ALLOW_CIDR=<cidr> DRY_RUN=0 sudo scripts/harden-host-ufw.sh`
2. Security snapshot:
  - `scripts/check-host-security.sh`
3. Auto-ban tooling:
  - fail2ban and/or crowdsec using `docker/security/*`

Validation:
- [ ] Only expected public ports exposed (`80/443`)
- [ ] SSH restricted by allowlist/VPN
- [ ] Ban tooling sees nginx logs and can ban test offender

## Phase 4: Observability + Alerts (Operator Needed)
Hands-on checkpoints:
1. Start stack:
  - `docker compose -f docker/observability/docker-compose.observability.yml up -d`
2. Grafana datasource:
  - Loki `http://loki:3100`
3. Uptime Kuma monitors:
  - `/api/health/live`
  - `/api/health/ready`
  - `/`

Validation:
- [ ] nginx logs appear in Loki (`job="nginx"`)
- [ ] alert rules configured (5xx/auth spikes/DB failures/resource pressure)

## Phase 5: Backup + DR (Operator Needed)
Hands-on checkpoints:
1. Schedule logical backups:
  - `scripts/backup-postgres.sh`
2. Schedule periodic base backups:
  - `PRIMARY_DATABASE_URL=<replication-url> scripts/basebackup-postgres.sh`
3. Run restore drill:
  - `scripts/restore-drill-postgres.sh <dump> <target_db_url>`
4. Log drill:
  - `scripts/log-restore-drill.sh <env> <dump> <target> <status> <rto_min> <notes>`

Validation:
- [ ] latest drill entry in `docs/restore-drill-log.csv`
- [ ] measured RTO acceptable

## Phase 6: Launch Gate
Run final checklist:
- `docs/07_PUBLIC_LAUNCH_CHECKLIST.md`

Go-live only after all required boxes are checked.

## Deferred Item (Intentional)
- NPM host-specific tailoring (domain/upstream/custom locations) is intentionally deferred and tracked for a later hands-on session.