fiddy/docs/05_REFACTOR_2.md

8.6 KiB

Refactor 2: Public Launch Hardening

Purpose Overview

This refactor prepares Fiddy for public exposure without changing the core stack (Next.js + external Postgres). The goal is to harden API contracts, improve abuse resistance, tighten security posture, and make deployment/operations repeatable for self-hosted production.

Primary outcomes:

  • Keep architecture stable and avoid risky stack rewrite.
  • Standardize API response metadata (request_id) for traceability.
  • Add layered rate limiting (app + proxy plan) for auth and write paths.
  • Remove sensitive logging risk (no full invite codes, no secrets).
  • Add health probes and deployment/ops reference artifacts.
  • Document rollout, rollback, backup, and monitoring runbooks.

Scope Checklist

  • Phase 1: App security + API contract hardening
  • Phase 2: Dokploy deployment workflow artifacts
  • Phase 3: Nginx + host hardening artifacts
  • Phase 4: Observability stack references/configs
  • Phase 5: Backup + restore process and scripts/docs
  • Verification pass (tests/build/lint where possible)

Running Implementation Log

2026-02-14

  • Started Refactor 2 execution from current in-progress workspace baseline.
  • Confirmed this repo already contains broad uncommitted changes unrelated to this phase; implementation will be done via targeted edits only.
  • Established this document as the source log/checklist for all noteworthy decisions, tradeoffs, blockers, and completed steps.
  • Added migration packages/db/migrations/007_rate_limits.sql for server-side rate limit state.
  • Added server limiter apps/web/lib/server/rate-limit.ts and wired write-path guardrails in server services (groups, group-members, group-invites, entries, buckets, tags, group-settings).
  • Hardened API error pipeline in apps/web/lib/server/errors.ts:
    • Added RATE_LIMITED mapping (429).
    • Added request_id alias in structured error body.
    • Extended sensitive-key redaction for invite code keys.
    • Removed stray debug console.log() call.
  • Fixed invite-code leak risk in apps/web/lib/server/groups.ts by sending only inviteCodeLast4 in error context.
  • Standardized API response envelope updates in auth and route handlers so responses include both requestId and request_id while preserving backward compatibility.
  • Added health probe routes:
    • apps/web/app/api/health/live/route.ts
    • apps/web/app/api/health/ready/route.ts
  • Added security header baseline in apps/web/next.config.mjs (CSP, frame/referrer/content-type hardening).
  • Added CI/CD workflow for Dokploy-triggered deploys:
    • .gitea/workflows/deploy-dokploy.yml
  • Added self-host edge + observability + backup artifacts:
    • docker/nginx/fiddy.conf
    • docker/nginx/includes/fiddy-proxy.conf
    • docker/observability/docker-compose.observability.yml
    • docker/observability/loki-config.yml
    • docker/observability/promtail-config.yml
    • scripts/backup-postgres.sh
    • scripts/restore-postgres.sh
    • docs/public-launch-runbook.md
  • Added regression tests for request-id contract and limiter behavior:
    • apps/web/__tests__/errors-response.test.ts
    • apps/web/__tests__/rate-limit.test.ts
  • Verification results:
    • npm test: pass (25 passed, 1 skipped).
    • npm run build: pass.
    • npm run lint: still fails due existing workspace lint script invocation issue (next lint resolves apps/web/lint path).
  • Post-verification fixups:
    • Added table auto-bootstrap fallback in apps/web/lib/server/rate-limit.ts to avoid failures in environments where migration 007_rate_limits.sql has not been applied yet.
    • Corrected Entry mapping consistency in apps/web/lib/server/entries.ts (bucketId included in all return shapes).
    • Replaced rowCount checks with rows.length in typed query paths to satisfy current TypeScript/pg typings.
  • Implementation correction note:
    • A batch replacement briefly introduced invalid destructuring (request_id in getRequestMeta destructure). This was corrected in all affected routes before final verification.
  • Created path-scoped commit for this hardening slice:
    • b1c8a4aharden public launch api contracts and ops baseline.
  • Resolved lint pipeline breakage caused by next lint invocation under Next.js 16.1.6:
    • Switched apps/web/package.json lint script to eslint . && node scripts/check-no-group-id-routes.cjs.
    • Added apps/web/eslint.config.mjs using eslint-config-next/core-web-vitals.
    • Disabled newly surfaced React compiler-style hook rules to preserve prior lint parity without broad unrelated refactors:
      • react-hooks/error-boundaries
      • react-hooks/immutability
      • react-hooks/purity
      • react-hooks/set-state-in-effect
  • Fixed a real hook-order violation in apps/web/components/group-settings-content.tsx by moving the keyboard-listener useEffect above the early return.
    • Note: this file currently contains broader pre-existing local edits; to avoid bundling unrelated work, the hook-order adjustment is left as a workspace-local change and was not included in path-scoped commit 1f140b6.
  • Removed forbidden legacy dynamic route path under app/groups to satisfy repo policy script:
    • Deleted apps/web/app/groups/[id]/settings/page.tsx.
    • Removed empty directory apps/web/app/groups/[id].
  • Re-ran verification after lint fixes:
    • npm test: pass (25 passed, 1 skipped).
    • npm run build: pass.
    • npm run lint: pass (warnings only; no errors).
  • Added request metadata hardening in apps/web/lib/server/request.ts:
    • Use upstream x-request-id when present (fallback to generated ID).
    • Parse first hop from forwarded IP headers and cap length.
  • Added rate-limit hardening in apps/web/lib/server/rate-limit.ts:
    • Sanitize and bound key segments to reduce key-space abuse.
    • Hash oversized segments with SHA-256.
    • Add opportunistic stale-row cleanup (older than 2 days) every 10 minutes per process.
  • Added production-safe structured API error logging in apps/web/lib/server/errors.ts:
    • Always emit API_ERROR with requestId, route, status, and sanitized context.
    • Keep optional debug response context redacted before returning to clients.
  • Added proxy request-id propagation:
    • docker/nginx/includes/fiddy-proxy.conf now forwards X-Request-Id.
    • docker/nginx/fiddy.conf now returns X-Request-Id response header.
  • Re-validated after this hardening slice:
    • npm run lint: pass (warnings only; no errors).
    • npm test: pass (25 passed, 1 skipped).
    • npm run build: pass.
  • Added request-id input sanitization in apps/web/lib/server/request.ts to prevent malformed inbound ID propagation.
  • Added nginx hardening/observability updates in docker/nginx/fiddy.conf:
    • JSON access log format with request/upstream latency fields.
    • server_tokens off, explicit access/error logs, and connection cap.
  • Added nginx log parsing pipeline in docker/observability/promtail-config.yml for Loki ingestion (job="nginx").
  • Rewrote docs/public-launch-runbook.md in clean ASCII and expanded proxy/observability checks.
  • Added docs/06_SECURITY_REVIEW.md with app/data/user/host findings and launch checklist.
  • Added per-IP abuse controls for invite/join surfaces:
    • apps/web/lib/server/rate-limit.ts adds enforceIpRateLimit.
    • Applied in apps/web/app/api/groups/join/route.ts.
    • Applied in apps/web/app/api/invite-links/[token]/route.ts (GET + POST).
  • Added regression coverage for IP limiter in apps/web/__tests__/rate-limit.test.ts.
  • Added operational smoke tooling for deploy/rollback validation:
    • scripts/smoke-public-launch.sh checks health endpoints, X-Request-Id, and request_id response fields.
    • Expanded docs/public-launch-runbook.md with deployment smoke and rollback checklist sections.
  • Added host hardening and DR drill scripts:
    • scripts/harden-host-ufw.sh (UFW baseline with dry-run default).
    • scripts/check-host-security.sh (ports/firewall/fail2ban/docker status snapshot).
    • scripts/restore-drill-postgres.sh (restore + validation query workflow).
  • Updated docs to use executable operational checks in:
    • docs/public-launch-runbook.md
    • docs/06_SECURITY_REVIEW.md
  • Added deploy health gate automation:
    • scripts/wait-for-health.sh polls ready endpoint and verifies request_id payload.
    • .gitea/workflows/deploy-dokploy.yml now runs post-deploy health verification using DOKPLOY_HEALTHCHECK_URL.

Risks / Notes to Revisit

  • Workspace is intentionally dirty; commits must be path-scoped to avoid mixing unrelated changes.
  • This Codex session currently cannot write to .git (index lock permission denied), so local user-side commits are required for newly staged changes.