129 lines
4.4 KiB
Markdown
129 lines
4.4 KiB
Markdown
# Deployment Execution Playbook (Hands-On Checkpoints)
|
||
|
||
Purpose: keep implementation work prepared in-repo, and call for operator actions only when local infrastructure access is required.
|
||
|
||
## Status Icon Legend
|
||
Use these in execution updates for fast scanning:
|
||
- `🔄` in progress
|
||
- `✅` completed
|
||
- `🧪` test/lint/verification result
|
||
- `📄` documentation update
|
||
- `🗄️` database or migration change
|
||
- `🚀` deploy/release step
|
||
- `⚠️` risk, blocker, or manual operator action needed
|
||
- `❌` failed command or unsuccessful attempt
|
||
- `ℹ️` informational context
|
||
- `🧭` recommendation or next-step option
|
||
|
||
## Phase 0: Preflight (No Infra Changes)
|
||
- [ ] `npm run lint`
|
||
- [ ] `npm test`
|
||
- [ ] `npm run build`
|
||
- [ ] Confirm docs are up to date:
|
||
- [ ] `docs/public-launch-runbook.md`
|
||
- [ ] `docs/07_PUBLIC_LAUNCH_CHECKLIST.md`
|
||
- [ ] `docs/08_NGINX_PROXY_MANAGER_SETUP.md`
|
||
- [ ] `docs/06_SECURITY_REVIEW.md`
|
||
|
||
## Phase 1: Registry + SSH Compose Wiring (Operator Needed)
|
||
|
||
Hands-on checkpoints:
|
||
1. Create/verify secrets in Gitea:
|
||
- `REGISTRY_USER`
|
||
- `REGISTRY_PASS`
|
||
- `DEPLOY_KEY`
|
||
- `DEPLOY_HOST`
|
||
- `DEPLOY_USER`
|
||
- `DEPLOY_HEALTHCHECK_URL`
|
||
2. Prepare deploy host for SSH Compose:
|
||
- install Docker Engine + Compose plugin
|
||
- create `/opt/fiddy/.env` with production variables
|
||
- run `docker login git.nicosaya.com` as deploy user
|
||
3. Confirm production compose contract:
|
||
- web image source: `git.nicosaya.com/nalalangan/fiddy/web`
|
||
- scheduler image source: `git.nicosaya.com/nalalangan/fiddy/scheduler`
|
||
- web publishes `3010:3000`
|
||
- scheduler has no public port
|
||
|
||
Validation:
|
||
- [ ] Push-to-main triggers `.gitea/workflows/deploy-ssh-compose.yml`
|
||
- [ ] SSH deploy updates both web and scheduler containers
|
||
- [ ] Deploy guard confirms web and scheduler are running
|
||
- [ ] Health gate completes via `scripts/wait-for-health.sh`
|
||
|
||
## Phase 2: NPM Edge Setup (Operator Needed)
|
||
Use `docs/08_NGINX_PROXY_MANAGER_SETUP.md`.
|
||
Execution order helper: `docs/10_NPM_HANDS_ON_RUNSHEET.md`.
|
||
|
||
Hands-on checkpoints:
|
||
1. Proxy Host for Fiddy domain configured to internal app IP:port.
|
||
2. Proxy Host Advanced:
|
||
- `docker/nginx/npm/proxy-host-advanced.conf.example`
|
||
3. Custom Location `/`:
|
||
- `docker/nginx/npm/location-root-advanced.conf.example`
|
||
4. Custom auth/write locations:
|
||
- `docker/nginx/npm/location-auth-advanced.conf.example`
|
||
- `docker/nginx/npm/location-write-advanced.conf.example`
|
||
5. Global NPM `http` config includes:
|
||
- `docker/nginx/npm/http_top.conf.example`
|
||
|
||
Validation:
|
||
- [ ] `scripts/smoke-public-launch.sh https://<domain>` passes
|
||
- [ ] Response header `X-Request-Id` present
|
||
- [ ] Response body includes `request_id`
|
||
- [ ] Rate limits are active under burst tests
|
||
|
||
## Phase 3: Host Security Baseline (Operator Needed)
|
||
Hands-on checkpoints:
|
||
1. Firewall baseline:
|
||
- dry run: `SSH_ALLOW_CIDR=<cidr> DRY_RUN=1 scripts/harden-host-ufw.sh`
|
||
- apply: `SSH_ALLOW_CIDR=<cidr> DRY_RUN=0 sudo scripts/harden-host-ufw.sh`
|
||
2. Security snapshot:
|
||
- `scripts/check-host-security.sh`
|
||
3. Auto-ban tooling:
|
||
- fail2ban and/or crowdsec using `docker/security/*`
|
||
|
||
Validation:
|
||
- [ ] Only expected public ports exposed (`80/443`)
|
||
- [ ] SSH restricted by allowlist/VPN
|
||
- [ ] Ban tooling sees nginx logs and can ban test offender
|
||
|
||
## Phase 4: Observability + Alerts (Operator Needed)
|
||
Hands-on checkpoints:
|
||
1. Start stack:
|
||
- `docker compose -f docker/observability/docker-compose.observability.yml up -d`
|
||
2. Grafana datasource:
|
||
- Loki `http://loki:3100`
|
||
3. Uptime Kuma monitors:
|
||
- `/api/health/live`
|
||
- `/api/health/ready`
|
||
- `/`
|
||
|
||
Validation:
|
||
- [ ] nginx logs appear in Loki (`job="nginx"`)
|
||
- [ ] alert rules configured (5xx/auth spikes/DB failures/resource pressure)
|
||
|
||
## Phase 5: Backup + DR (Operator Needed)
|
||
Hands-on checkpoints:
|
||
1. Schedule logical backups:
|
||
- `scripts/backup-postgres.sh`
|
||
2. Schedule periodic base backups:
|
||
- `PRIMARY_DATABASE_URL=<replication-url> scripts/basebackup-postgres.sh`
|
||
3. Run restore drill:
|
||
- `scripts/restore-drill-postgres.sh <dump> <target_db_url>`
|
||
4. Log drill:
|
||
- `scripts/log-restore-drill.sh <env> <dump> <target> <status> <rto_min> <notes>`
|
||
|
||
Validation:
|
||
- [ ] latest drill entry in `docs/restore-drill-log.csv`
|
||
- [ ] measured RTO acceptable
|
||
|
||
## Phase 6: Launch Gate
|
||
Run final checklist:
|
||
- `docs/07_PUBLIC_LAUNCH_CHECKLIST.md`
|
||
|
||
Go-live only after all required boxes are checked.
|
||
|
||
## Deferred Item (Intentional)
|
||
- NPM host-specific tailoring (domain/upstream/custom locations) is intentionally deferred and tracked for a later hands-on session.
|