fiddy/docs/public-launch-runbook.md

3.9 KiB

Public Launch Runbook (Self-Hosted + Dokploy)

1) Goals

  • Deploy Fiddy publicly without stack rewrite.
  • Keep Postgres self-hosted.
  • Enable fast rollback and basic operational visibility.
  • Keep security baseline enforceable for direct home-IP exposure.

2) Deploy Control Plane (Dokploy)

  1. Install Dokploy on your Proxmox Docker host.
  2. Add project in Dokploy and connect Gitea repository.
  3. Configure image source: git.nicosaya.com/nalalangan/fiddy/web.
  4. Deploy by immutable tag (github.sha) and keep main as convenience tag.
  5. Configure health check endpoint: /api/health/ready.
  6. Keep previous releases for rollback and verify rollback button path.

Required secrets/variables

  • DATABASE_URL
  • DATABASE_SSL
  • ALLOWED_DB_NAMES
  • SESSION_COOKIE_NAME
  • SESSION_TTL_DAYS
  • DEBUG_API=0

3) CI/CD (Gitea Actions)

  • Use .gitea/workflows/deploy-dokploy.yml.
  • Required secrets:
    • REGISTRY_USER
    • REGISTRY_PASS
    • DOKPLOY_DEPLOY_HOOK
    • DOKPLOY_HEALTHCHECK_URL
  • Health gate:
    • workflow calls scripts/wait-for-health.sh against DOKPLOY_HEALTHCHECK_URL
    • default retry window: 5 minutes (30 attempts x 10s)

4) Reverse Proxy + Network Hardening

  • Use docker/nginx/fiddy.conf as baseline.
  • Install certificate with Let's Encrypt.
  • Route 443 -> app container only.
  • Keep Postgres private; never expose 5432 publicly.
  • Restrict SSH to allowlist/VPN.
  • Add host firewall rules:
    • Allow inbound 80/443.
    • Deny all other inbound by default.
  • Confirm Nginx writes JSON logs:
    • /var/log/nginx/fiddy-access.log
    • /var/log/nginx/fiddy-error.log
  • Apply/verify host baseline using scripts:
    • dry-run firewall apply: SSH_ALLOW_CIDR=<your-cidr> DRY_RUN=1 scripts/harden-host-ufw.sh
    • real firewall apply: SSH_ALLOW_CIDR=<your-cidr> DRY_RUN=0 sudo scripts/harden-host-ufw.sh
    • host status audit: scripts/check-host-security.sh
  • Auto-ban templates:
    • fail2ban: docker/security/fail2ban/*
    • crowdsec (optional): docker/security/crowdsec/acquis.yaml

5) Observability

  • Bring up monitoring stack:
    • docker compose -f docker/observability/docker-compose.observability.yml up -d
  • Configure Grafana datasource to Loki (http://loki:3100).
  • Verify nginx logs are ingested by Promtail (job="nginx").
  • Add Uptime Kuma monitors:
    • /api/health/live
    • /api/health/ready
    • home page (/)

5.1) Deployment Smoke Check

  • Run after every deploy and rollback:
    • scripts/smoke-public-launch.sh https://your-domain
  • The script verifies:
    • /api/health/live and /api/health/ready return 200
    • both responses include X-Request-Id header
    • both response bodies include request_id

6) Backup + Restore

  • Daily backup command:
    • scripts/backup-postgres.sh
  • Periodic base backup (for faster full recovery):
    • PRIMARY_DATABASE_URL=<replication-url> scripts/basebackup-postgres.sh
  • Retention:
    • default 7 days (RETENTION_DAYS=7)
  • Restore drill:
    • scripts/restore-drill-postgres.sh backups/postgres/<file>.dump <target_database_url>
  • Run restore drill on non-prod DB before public launch.
  • Record drill outcome:
    • scripts/log-restore-drill.sh <environment> <backup_file> <restore_target> <status> <rto_minutes> <notes>
    • log file: docs/restore-drill-log.csv

7) Incident Response Quick Flow

  1. Identify failing request and request_id.
  2. Correlate application logs (Loki) by request_id.
  3. Check /api/health/ready status and DB connectivity.
  4. Roll back to previous known-good Dokploy release if needed.
  5. Capture root cause and update this runbook/checklist.

8) Rollback Checklist

  1. Select previous healthy image in Dokploy release history.
  2. Trigger rollback and wait for deployment completion.
  3. Run scripts/smoke-public-launch.sh https://your-domain.
  4. Verify error-rate drop in Grafana/Loki and confirm no DB migration mismatch.
  5. Log the rolled back version, timestamp, and reason.