10 KiB
10 KiB
Refactor 2: Public Launch Hardening
Purpose Overview
This refactor prepares Fiddy for public exposure without changing the core stack (Next.js + external Postgres). The goal is to harden API contracts, improve abuse resistance, tighten security posture, and make deployment/operations repeatable for self-hosted production.
Primary outcomes:
- Keep architecture stable and avoid risky stack rewrite.
- Standardize API response metadata (
request_id) for traceability. - Add layered rate limiting (app + proxy plan) for auth and write paths.
- Remove sensitive logging risk (no full invite codes, no secrets).
- Add health probes and deployment/ops reference artifacts.
- Document rollout, rollback, backup, and monitoring runbooks.
Scope Checklist
- Phase 1: App security + API contract hardening
- Phase 2: Dokploy deployment workflow artifacts
- Phase 3: Nginx + host hardening artifacts
- Phase 4: Observability stack references/configs
- Phase 5: Backup + restore process and scripts/docs
- Verification pass (tests/build/lint where possible)
Running Implementation Log
2026-02-14
- Started
Refactor 2execution from current in-progress workspace baseline. - Confirmed this repo already contains broad uncommitted changes unrelated to this phase; implementation will be done via targeted edits only.
- Established this document as the source log/checklist for all noteworthy decisions, tradeoffs, blockers, and completed steps.
- Added migration
packages/db/migrations/007_rate_limits.sqlfor server-side rate limit state. - Added server limiter
apps/web/lib/server/rate-limit.tsand wired write-path guardrails in server services (groups,group-members,group-invites,entries,buckets,tags,group-settings). - Hardened API error pipeline in
apps/web/lib/server/errors.ts:- Added
RATE_LIMITEDmapping (429). - Added
request_idalias in structured error body. - Extended sensitive-key redaction for invite code keys.
- Removed stray debug
console.log()call.
- Added
- Fixed invite-code leak risk in
apps/web/lib/server/groups.tsby sending onlyinviteCodeLast4in error context. - Standardized API response envelope updates in auth and route handlers so responses include both
requestIdandrequest_idwhile preserving backward compatibility. - Added health probe routes:
apps/web/app/api/health/live/route.tsapps/web/app/api/health/ready/route.ts
- Added security header baseline in
apps/web/next.config.mjs(CSP, frame/referrer/content-type hardening). - Added CI/CD workflow for Dokploy-triggered deploys:
.gitea/workflows/deploy-dokploy.yml
- Added self-host edge + observability + backup artifacts:
docker/nginx/fiddy.confdocker/nginx/includes/fiddy-proxy.confdocker/observability/docker-compose.observability.ymldocker/observability/loki-config.ymldocker/observability/promtail-config.ymlscripts/backup-postgres.shscripts/restore-postgres.shdocs/public-launch-runbook.md
- Added regression tests for request-id contract and limiter behavior:
apps/web/__tests__/errors-response.test.tsapps/web/__tests__/rate-limit.test.ts
- Verification results:
npm test: pass (25 passed,1 skipped).npm run build: pass.npm run lint: still fails due existing workspace lint script invocation issue (next lintresolvesapps/web/lintpath).
- Post-verification fixups:
- Added table auto-bootstrap fallback in
apps/web/lib/server/rate-limit.tsto avoid failures in environments where migration007_rate_limits.sqlhas not been applied yet. - Corrected
Entrymapping consistency inapps/web/lib/server/entries.ts(bucketIdincluded in all return shapes). - Replaced
rowCountchecks withrows.lengthin typed query paths to satisfy current TypeScript/pgtypings.
- Added table auto-bootstrap fallback in
- Implementation correction note:
- A batch replacement briefly introduced invalid destructuring (
request_idingetRequestMetadestructure). This was corrected in all affected routes before final verification.
- A batch replacement briefly introduced invalid destructuring (
- Created path-scoped commit for this hardening slice:
b1c8a4a—harden public launch api contracts and ops baseline.
- Resolved lint pipeline breakage caused by
next lintinvocation under Next.js16.1.6:- Switched
apps/web/package.jsonlint script toeslint . && node scripts/check-no-group-id-routes.cjs. - Added
apps/web/eslint.config.mjsusingeslint-config-next/core-web-vitals. - Disabled newly surfaced React compiler-style hook rules to preserve prior lint parity without broad unrelated refactors:
react-hooks/error-boundariesreact-hooks/immutabilityreact-hooks/purityreact-hooks/set-state-in-effect
- Switched
- Fixed a real hook-order violation in
apps/web/components/group-settings-content.tsxby moving the keyboard-listeneruseEffectabove the earlyreturn.- Note: this file currently contains broader pre-existing local edits; to avoid bundling unrelated work, the hook-order adjustment is left as a workspace-local change and was not included in path-scoped commit
1f140b6.
- Note: this file currently contains broader pre-existing local edits; to avoid bundling unrelated work, the hook-order adjustment is left as a workspace-local change and was not included in path-scoped commit
- Removed forbidden legacy dynamic route path under
app/groupsto satisfy repo policy script:- Deleted
apps/web/app/groups/[id]/settings/page.tsx. - Removed empty directory
apps/web/app/groups/[id].
- Deleted
- Re-ran verification after lint fixes:
npm test: pass (25 passed,1 skipped).npm run build: pass.npm run lint: pass (warnings only; no errors).
- Added request metadata hardening in
apps/web/lib/server/request.ts:- Use upstream
x-request-idwhen present (fallback to generated ID). - Parse first hop from forwarded IP headers and cap length.
- Use upstream
- Added rate-limit hardening in
apps/web/lib/server/rate-limit.ts:- Sanitize and bound key segments to reduce key-space abuse.
- Hash oversized segments with SHA-256.
- Add opportunistic stale-row cleanup (older than 2 days) every 10 minutes per process.
- Added production-safe structured API error logging in
apps/web/lib/server/errors.ts:- Always emit
API_ERRORwithrequestId, route, status, and sanitized context. - Keep optional debug response context redacted before returning to clients.
- Always emit
- Added proxy request-id propagation:
docker/nginx/includes/fiddy-proxy.confnow forwardsX-Request-Id.docker/nginx/fiddy.confnow returnsX-Request-Idresponse header.
- Re-validated after this hardening slice:
npm run lint: pass (warnings only; no errors).npm test: pass (25 passed,1 skipped).npm run build: pass.
- Added request-id input sanitization in
apps/web/lib/server/request.tsto prevent malformed inbound ID propagation. - Added nginx hardening/observability updates in
docker/nginx/fiddy.conf:- JSON access log format with request/upstream latency fields.
server_tokens off, explicit access/error logs, and connection cap.
- Added nginx log parsing pipeline in
docker/observability/promtail-config.ymlfor Loki ingestion (job="nginx"). - Rewrote
docs/public-launch-runbook.mdin clean ASCII and expanded proxy/observability checks. - Added
docs/06_SECURITY_REVIEW.mdwith app/data/user/host findings and launch checklist. - Added per-IP abuse controls for invite/join surfaces:
apps/web/lib/server/rate-limit.tsaddsenforceIpRateLimit.- Applied in
apps/web/app/api/groups/join/route.ts. - Applied in
apps/web/app/api/invite-links/[token]/route.ts(GET+POST).
- Added regression coverage for IP limiter in
apps/web/__tests__/rate-limit.test.ts. - Added operational smoke tooling for deploy/rollback validation:
scripts/smoke-public-launch.shchecks health endpoints,X-Request-Id, andrequest_idresponse fields.- Expanded
docs/public-launch-runbook.mdwith deployment smoke and rollback checklist sections.
- Added host hardening and DR drill scripts:
scripts/harden-host-ufw.sh(UFW baseline with dry-run default).scripts/check-host-security.sh(ports/firewall/fail2ban/docker status snapshot).scripts/restore-drill-postgres.sh(restore + validation query workflow).
- Updated docs to use executable operational checks in:
docs/public-launch-runbook.mddocs/06_SECURITY_REVIEW.md
- Added deploy health gate automation:
scripts/wait-for-health.shpolls ready endpoint and verifiesrequest_idpayload..gitea/workflows/deploy-dokploy.ymlnow runs post-deploy health verification usingDOKPLOY_HEALTHCHECK_URL.
- Added DR + host-ban operational templates:
scripts/basebackup-postgres.shfor periodicpg_basebackupsnapshots.docker/security/fail2ban/*for auth/join/invite abuse bans from nginx JSON logs.docker/security/crowdsec/acquis.yamlas optional CrowdSec ingestion baseline.docker/security/README.mdto document security template usage.
- Added restore drill logging artifacts:
docs/restore-drill-log.csvas evidence log template.scripts/log-restore-drill.shto append timestamped restore outcomes and measured RTO.
- Added consolidated execution checklist:
docs/07_PUBLIC_LAUNCH_CHECKLIST.mdfor go-live gating across infra, deploy, security, observability, DR, and rollback.
- Clarified deployment assumption:
- use existing external Nginx edge;
docker/nginx/*is now documented as template/reference config, not a required new Nginx runtime.
- use existing external Nginx edge;
- Added Nginx Proxy Manager setup pack:
docs/08_NGINX_PROXY_MANAGER_SETUP.md(UI + SSH fallback instructions).docker/nginx/npm/*.examplesnippets for host advanced config, location limits, and http-level zones.
- Adjusted NPM guidance based on real NPM behavior:
- moved header directives to custom location snippets (especially
/) because Proxy Host Advanced does not reliably applyadd_header/proxy_set_header.
- moved header directives to custom location snippets (especially
- Added staged operator playbook:
docs/09_DEPLOYMENT_EXECUTION_PLAYBOOK.mdto separate no-touch prep from hands-on infra steps.
- Deferred (tracked) for later session:
- host-specific NPM tailoring for exact domain/upstream/custom-location layout.
- Added NPM execution runsheet:
docs/10_NPM_HANDS_ON_RUNSHEET.mdwith strict run order and verification checkpoints.
- Added first-time Dokploy onboarding guide:
docs/11_DOKPLOY_FIRST_TIME_WALKTHROUGH.mdwith install, app wiring, webhook, and rollback test flow.
Risks / Notes to Revisit
- Workspace is intentionally dirty; commits must be path-scoped to avoid mixing unrelated changes.
- This Codex session currently cannot write to
.git(index lock permission denied), so local user-side commits are required for newly staged changes.