130 lines
8.6 KiB
Markdown
130 lines
8.6 KiB
Markdown
# Refactor 2: Public Launch Hardening
|
|
|
|
## Purpose Overview
|
|
This refactor prepares Fiddy for public exposure without changing the core stack (Next.js + external Postgres). The goal is to harden API contracts, improve abuse resistance, tighten security posture, and make deployment/operations repeatable for self-hosted production.
|
|
|
|
Primary outcomes:
|
|
- Keep architecture stable and avoid risky stack rewrite.
|
|
- Standardize API response metadata (`request_id`) for traceability.
|
|
- Add layered rate limiting (app + proxy plan) for auth and write paths.
|
|
- Remove sensitive logging risk (no full invite codes, no secrets).
|
|
- Add health probes and deployment/ops reference artifacts.
|
|
- Document rollout, rollback, backup, and monitoring runbooks.
|
|
|
|
## Scope Checklist
|
|
- [x] Phase 1: App security + API contract hardening
|
|
- [x] Phase 2: Dokploy deployment workflow artifacts
|
|
- [x] Phase 3: Nginx + host hardening artifacts
|
|
- [x] Phase 4: Observability stack references/configs
|
|
- [x] Phase 5: Backup + restore process and scripts/docs
|
|
- [x] Verification pass (tests/build/lint where possible)
|
|
|
|
## Running Implementation Log
|
|
|
|
### 2026-02-14
|
|
- Started `Refactor 2` execution from current in-progress workspace baseline.
|
|
- Confirmed this repo already contains broad uncommitted changes unrelated to this phase; implementation will be done via targeted edits only.
|
|
- Established this document as the source log/checklist for all noteworthy decisions, tradeoffs, blockers, and completed steps.
|
|
- Added migration `packages/db/migrations/007_rate_limits.sql` for server-side rate limit state.
|
|
- Added server limiter `apps/web/lib/server/rate-limit.ts` and wired write-path guardrails in server services (`groups`, `group-members`, `group-invites`, `entries`, `buckets`, `tags`, `group-settings`).
|
|
- Hardened API error pipeline in `apps/web/lib/server/errors.ts`:
|
|
- Added `RATE_LIMITED` mapping (`429`).
|
|
- Added `request_id` alias in structured error body.
|
|
- Extended sensitive-key redaction for invite code keys.
|
|
- Removed stray debug `console.log()` call.
|
|
- Fixed invite-code leak risk in `apps/web/lib/server/groups.ts` by sending only `inviteCodeLast4` in error context.
|
|
- Standardized API response envelope updates in auth and route handlers so responses include both `requestId` and `request_id` while preserving backward compatibility.
|
|
- Added health probe routes:
|
|
- `apps/web/app/api/health/live/route.ts`
|
|
- `apps/web/app/api/health/ready/route.ts`
|
|
- Added security header baseline in `apps/web/next.config.mjs` (CSP, frame/referrer/content-type hardening).
|
|
- Added CI/CD workflow for Dokploy-triggered deploys:
|
|
- `.gitea/workflows/deploy-dokploy.yml`
|
|
- Added self-host edge + observability + backup artifacts:
|
|
- `docker/nginx/fiddy.conf`
|
|
- `docker/nginx/includes/fiddy-proxy.conf`
|
|
- `docker/observability/docker-compose.observability.yml`
|
|
- `docker/observability/loki-config.yml`
|
|
- `docker/observability/promtail-config.yml`
|
|
- `scripts/backup-postgres.sh`
|
|
- `scripts/restore-postgres.sh`
|
|
- `docs/public-launch-runbook.md`
|
|
- Added regression tests for request-id contract and limiter behavior:
|
|
- `apps/web/__tests__/errors-response.test.ts`
|
|
- `apps/web/__tests__/rate-limit.test.ts`
|
|
- Verification results:
|
|
- `npm test`: pass (`25 passed`, `1 skipped`).
|
|
- `npm run build`: pass.
|
|
- `npm run lint`: still fails due existing workspace lint script invocation issue (`next lint` resolves `apps/web/lint` path).
|
|
- Post-verification fixups:
|
|
- Added table auto-bootstrap fallback in `apps/web/lib/server/rate-limit.ts` to avoid failures in environments where migration `007_rate_limits.sql` has not been applied yet.
|
|
- Corrected `Entry` mapping consistency in `apps/web/lib/server/entries.ts` (`bucketId` included in all return shapes).
|
|
- Replaced `rowCount` checks with `rows.length` in typed query paths to satisfy current TypeScript/`pg` typings.
|
|
- Implementation correction note:
|
|
- A batch replacement briefly introduced invalid destructuring (`request_id` in `getRequestMeta` destructure). This was corrected in all affected routes before final verification.
|
|
- Created path-scoped commit for this hardening slice:
|
|
- `b1c8a4a` — `harden public launch api contracts and ops baseline`.
|
|
- Resolved lint pipeline breakage caused by `next lint` invocation under Next.js `16.1.6`:
|
|
- Switched `apps/web/package.json` lint script to `eslint . && node scripts/check-no-group-id-routes.cjs`.
|
|
- Added `apps/web/eslint.config.mjs` using `eslint-config-next/core-web-vitals`.
|
|
- Disabled newly surfaced React compiler-style hook rules to preserve prior lint parity without broad unrelated refactors:
|
|
- `react-hooks/error-boundaries`
|
|
- `react-hooks/immutability`
|
|
- `react-hooks/purity`
|
|
- `react-hooks/set-state-in-effect`
|
|
- Fixed a real hook-order violation in `apps/web/components/group-settings-content.tsx` by moving the keyboard-listener `useEffect` above the early `return`.
|
|
- Note: this file currently contains broader pre-existing local edits; to avoid bundling unrelated work, the hook-order adjustment is left as a workspace-local change and was not included in path-scoped commit `1f140b6`.
|
|
- Removed forbidden legacy dynamic route path under `app/groups` to satisfy repo policy script:
|
|
- Deleted `apps/web/app/groups/[id]/settings/page.tsx`.
|
|
- Removed empty directory `apps/web/app/groups/[id]`.
|
|
- Re-ran verification after lint fixes:
|
|
- `npm test`: pass (`25 passed`, `1 skipped`).
|
|
- `npm run build`: pass.
|
|
- `npm run lint`: pass (warnings only; no errors).
|
|
- Added request metadata hardening in `apps/web/lib/server/request.ts`:
|
|
- Use upstream `x-request-id` when present (fallback to generated ID).
|
|
- Parse first hop from forwarded IP headers and cap length.
|
|
- Added rate-limit hardening in `apps/web/lib/server/rate-limit.ts`:
|
|
- Sanitize and bound key segments to reduce key-space abuse.
|
|
- Hash oversized segments with SHA-256.
|
|
- Add opportunistic stale-row cleanup (older than 2 days) every 10 minutes per process.
|
|
- Added production-safe structured API error logging in `apps/web/lib/server/errors.ts`:
|
|
- Always emit `API_ERROR` with `requestId`, route, status, and sanitized context.
|
|
- Keep optional debug response context redacted before returning to clients.
|
|
- Added proxy request-id propagation:
|
|
- `docker/nginx/includes/fiddy-proxy.conf` now forwards `X-Request-Id`.
|
|
- `docker/nginx/fiddy.conf` now returns `X-Request-Id` response header.
|
|
- Re-validated after this hardening slice:
|
|
- `npm run lint`: pass (warnings only; no errors).
|
|
- `npm test`: pass (`25 passed`, `1 skipped`).
|
|
- `npm run build`: pass.
|
|
- Added request-id input sanitization in `apps/web/lib/server/request.ts` to prevent malformed inbound ID propagation.
|
|
- Added nginx hardening/observability updates in `docker/nginx/fiddy.conf`:
|
|
- JSON access log format with request/upstream latency fields.
|
|
- `server_tokens off`, explicit access/error logs, and connection cap.
|
|
- Added nginx log parsing pipeline in `docker/observability/promtail-config.yml` for Loki ingestion (`job="nginx"`).
|
|
- Rewrote `docs/public-launch-runbook.md` in clean ASCII and expanded proxy/observability checks.
|
|
- Added `docs/06_SECURITY_REVIEW.md` with app/data/user/host findings and launch checklist.
|
|
- Added per-IP abuse controls for invite/join surfaces:
|
|
- `apps/web/lib/server/rate-limit.ts` adds `enforceIpRateLimit`.
|
|
- Applied in `apps/web/app/api/groups/join/route.ts`.
|
|
- Applied in `apps/web/app/api/invite-links/[token]/route.ts` (`GET` + `POST`).
|
|
- Added regression coverage for IP limiter in `apps/web/__tests__/rate-limit.test.ts`.
|
|
- Added operational smoke tooling for deploy/rollback validation:
|
|
- `scripts/smoke-public-launch.sh` checks health endpoints, `X-Request-Id`, and `request_id` response fields.
|
|
- Expanded `docs/public-launch-runbook.md` with deployment smoke and rollback checklist sections.
|
|
- Added host hardening and DR drill scripts:
|
|
- `scripts/harden-host-ufw.sh` (UFW baseline with dry-run default).
|
|
- `scripts/check-host-security.sh` (ports/firewall/fail2ban/docker status snapshot).
|
|
- `scripts/restore-drill-postgres.sh` (restore + validation query workflow).
|
|
- Updated docs to use executable operational checks in:
|
|
- `docs/public-launch-runbook.md`
|
|
- `docs/06_SECURITY_REVIEW.md`
|
|
- Added deploy health gate automation:
|
|
- `scripts/wait-for-health.sh` polls ready endpoint and verifies `request_id` payload.
|
|
- `.gitea/workflows/deploy-dokploy.yml` now runs post-deploy health verification using `DOKPLOY_HEALTHCHECK_URL`.
|
|
|
|
### Risks / Notes to Revisit
|
|
- Workspace is intentionally dirty; commits must be path-scoped to avoid mixing unrelated changes.
|
|
- This Codex session currently cannot write to `.git` (index lock permission denied), so local user-side commits are required for newly staged changes.
|