haproxy-manager-base

cloud-hosting-platform/haproxy-manager-base

Author	SHA1	Message	Date
Josh Knapp	220b28f0c4	haproxy: use req.hdr_ip for real-IP resolution (string-IP crashed Coraza SPOA) All checks were successful HAProxy Manager Build and Push / Build-and-Push (push) Successful in 55s Details	2026-05-14 08:57:05 -07:00
Josh Knapp	9770398ab0	coraza: pass var(txn.real_ip) instead of src to Coraza (real client IP in WAF logs) All checks were successful HAProxy Manager Build and Push / Build-and-Push (push) Successful in 55s Details	2026-05-14 08:52:01 -07:00
Josh Knapp	633d9390f2	coraza: pin go.mod to 1.23 (matches go mod tidy output; Dockerfile still uses 1.25 image) All checks were successful Build and push coraza-spoa / Build-and-Push (push) Successful in 42s Details HAProxy Manager Build and Push / Build-and-Push (push) Successful in 54s Details	2026-05-14 08:08:38 -07:00
Josh Knapp	6d43308073	coraza: pre-CRS Include for runtime per-host exemptions (load-order fix) All checks were successful Build and push coraza-spoa / Build-and-Push (push) Successful in 41s Details HAProxy Manager Build and Push / Build-and-Push (push) Successful in 54s Details Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-14 07:55:51 -07:00
Josh Knapp	489290ed33	coraza: ship rules-catalog.json generated from bundled CRS at build time All checks were successful Build and push coraza-spoa / Build-and-Push (push) Successful in 44s Details HAProxy Manager Build and Push / Build-and-Push (push) Successful in 52s Details Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 06:57:42 -07:00
Josh Knapp	b2adcdbed9	coraza: reserve rule-ID range 990000000-990999999 for WHP-generated rules	2026-05-14 06:53:37 -07:00
Josh Knapp	1f1bc1837e	coraza: add second Include for runtime-managed local-overrides.conf	2026-05-14 06:51:24 -07:00
Josh Knapp	753743de20	coraza: drop 913xxx scanner-UA from enforce list (FP on Mastodon + SiteLock) All checks were successful Build and push coraza-spoa / Build-and-Push (push) Successful in 40s Details HAProxy Manager Build and Push / Build-and-Push (push) Successful in 54s Details 25h whp01 burn-in (2026-05-13) found ~11% FP rate on rule 913100: ActivityPub federation pulls (Mastodon UA "...Bot" on hackerpublicradio.org and blog.anti-social.online) and SiteLockSpider scans (a customer-paid security service hitting greggfranklin.com + suchascream.net). The other six promoted rule families (930120, 932100-160, 933170-200, 944100-300, 920440, 930130) showed zero FPs across the same window and stay enforced. Detection-only still feeds the anomaly score, so we lose ~no real blocking value by demoting this family.	2026-05-13 19:13:22 -07:00
Josh Knapp	5e5234cb14	refactor(suspension): serve via /suspended route on default-backend, drop bk_suspended All checks were successful HAProxy Manager Build and Push / Build-and-Push (push) Successful in 53s Details The previous design used a separate whp-suspended container (nginx:alpine serving a static 503 page) reachable via a dedicated bk_suspended backend. That was over-engineered — haproxy-manager-base already ships a default-app Flask server on :8080 that serves /default-page and /blocked-ip via path-rewrite ACLs. Mirroring that pattern lets the suspension page live in the SAME container, no extra image to build, no extra container to run/health-monitor. Changes: - Add /suspended Flask route on default_app returning 503 + suspended_page.html - Add templates/suspended_page.html (dark-themed 503 page) - hap_listener.tpl: 'http-request set-path /suspended' + 'use_backend default-backend' when host is in suspended_domains.list (same pattern as is_blocked_ip) - Rename env var from HAPROXY_SUSPENSION_BACKEND (a target hostport) to HAPROXY_SUSPENSION_ENABLED (a bool); accepts 1/true/yes/on (case-insensitive) - Remove hap_suspended_backend.tpl and its rendering in generate_config Non-WHP deployments (env var unset) see byte-identical haproxy.cfg as before (verified via jinja2 render diff). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 12:08:45 -07:00
Josh Knapp	6fd07b4c54	fix(suspended): tolerate startup DNS failure + use docker_dns resolvers All checks were successful HAProxy Manager Build and Push / Build-and-Push (push) Successful in 52s Details If the upstream container isn't up when haproxy-manager starts (e.g. when haproxy is recreated before whp-suspended), the default `init-addr libc` mode makes haproxy refuse to start — taking down the whole proxy. Switched to `init-addr last,none` (use last known address, fall back to 0.0.0.0 = DOWN) and added `resolvers docker_dns` (defined in hap_header.tpl) so the real IP is picked up once DNS becomes resolvable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 11:52:50 -07:00
Josh Knapp	2ef582a3de	feat(suspension): opt-in routing for suspended hosts via bk_suspended backend All checks were successful HAProxy Manager Build and Push / Build-and-Push (push) Successful in 56s Details Adds a new env var HAPROXY_SUSPENSION_BACKEND (default unset). When set (e.g. "whp-suspended:80"), generate_config() renders: - A bk_suspended backend pointing at the configured upstream - An ACL `acl is_suspended_domain hdr(host),lower -f /etc/haproxy/suspended_domains.list` + `use_backend bk_suspended if is_suspended_domain` in the frontend, sitting after IP-blocking and before any per-domain routing - An empty /etc/haproxy/suspended_domains.list if missing (haproxy refuses to start with -f pointing at a non-existent file) External tooling (e.g. WHP's site_disable.php) maintains the list via `docker cp` and HUP-reloads the container. Non-WHP deployments (home networks, standalone use) leave the env var unset and see byte-identical haproxy.cfg output. Same opt-in shape as the existing HAPROXY_CORAZA_SPOE_BACKEND integration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 11:46:18 -07:00
Josh Knapp	3572c66fb7	coraza: promote 920440 + 930130 to enforce list (empirical detect-only data) All checks were successful Build and push coraza-spoa / Build-and-Push (push) Successful in 1m17s Details HAProxy Manager Build and Push / Build-and-Push (push) Successful in 53s Details After ~30 min of detect-only on whp01 we have actionable data on what fires against legitimate customer traffic vs. attacker recon. Two rules demonstrably catch only the latter and earn promotion to the day-one enforce list: 920440 — URL file extension restricted by policy Caught 124 events in the sample window, ALL backup/config-file disclosure probes (`/wp-config.php.old`, `/db_backup.sql`, `/.env.save`, `/releases.sql` ...) from a single GCP-hosted scanner hammering joshuaknapp.net. Match patterns: .sql (×62), .bak (×5), .old (×3), .save (×2), .backup, .dist. No legitimate URL on WP/WooCommerce/Divi/HPR ends in these. 930130 — Restricted File Access Attempt Caught 117 events, ALL dotfile/VCS/config-disclosure probes (`/.env`, `/.env.local`, `/.env.bak`, `/.git/config`, `/config.php`, `/admin/.env`, `/backend/.env` ...). Spread across joshuaknapp.net, cgdannyb.com, onlinesupplements.net. Notably, HPR's `/ccdn.php?filename=/eps/...` legitimate audio-delivery URL does NOT trigger this rule — verified empirically. Also documented in the "intentionally detect-only" comment block: 933150 fires on WooCommerce checkout when literal `session_start` appears in billing form data (alphaoneaminos.com saw 2 such events). That's a canonical CRS false positive on WooCommerce; left detect-only. Net effect: existing detect_only deployments stay detect-only (the WHP apply script bind-mounts an empty overrides over the baked-in file). When operators next flip a server to enforce, these two extra ranges activate alongside the original day-one list. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 18:00:21 -07:00
Josh Knapp	ba4c101135	fix(coraza): add deny rules that act on Coraza's verdict + spop-check on backend All checks were successful HAProxy Manager Build and Push / Build-and-Push (push) Successful in 55s Details Two fixes that complete the SPOE enforcement path: 1. Listener was sending requests to Coraza for inspection but never reading the result. Coraza-SPOA sets var(txn.coraza.action) to "deny" / "drop" / "redirect" when a rule with that disruptive action fires; HAProxy needs explicit rules that READ the variable and apply the action. Without them, the audit log shows "Access denied" but the request still gets HTTP 200 (verified on staging: sqlmap/JNDI/shellinj all detected, all returned 200). Added the standard six rules from upstream's example/haproxy/haproxy.cfg covering http-request + http-response phases for each of deny/drop/ redirect. Same set the upstream Coraza-SPOA docs recommend. Intentionally did NOT add the upstream's fail-CLOSED rule `http-request deny deny_status 500 if { var(txn.coraza.error) -m int gt 0 }` — for a hosting platform we want fail-open. Documented inline. 2. Backend health check switched from plain TCP `check` to `option spop-check`. The spop-check actually negotiates a SPOE session against the agent, so HAProxy detects a half-broken SPOA that's listening on :9000 but failing protocol handshakes. Plain `check` would miss that. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 17:16:03 -07:00
Josh Knapp	f1e9bb2c63	fix(coraza-spoe): match upstream's required spoe shape (groups, arg order, names) All checks were successful HAProxy Manager Build and Push / Build-and-Push (push) Successful in 1m18s Details Three real bugs in the SPOE config caught when HAProxy validated the generated file: 1. spoe-agent must declare `groups` not `messages`. The `messages` form doesn't make the message reachable via `send-spoe-group`; HAProxy complained: unable to find SPOE group 'coraza-check' into SPOE engine 'coraza' 2. send-spoe-group references a spoe-GROUP name, which needs its own block. Added `spoe-group coraza-req { messages coraza-req }` as the indirection layer. 3. Arg names + ORDER are required to match what Coraza-SPOA parses positionally. My version had `dest-ip`/`dest-port`; upstream's example/haproxy/coraza.cfg (v0.7.1) uses `dst-ip`/`dst-port`. Renamed and reordered to match upstream verbatim, including the `app=str(haproxy)` literal that matches our config.yaml application name. Also corrected misleading comment about `set-on-error continue`: that option actually sets a variable on error; the fail-open behavior comes from us deliberately NOT adding a `http-request deny if errored` rule in the frontend. Renamed the variable to `error` (matching upstream) and updated comments to be accurate. Listener template's send-spoe-group action updated to reference the new group name `coraza-req`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 17:12:09 -07:00
Josh Knapp	061309675b	fix(coraza-spoe): collapse args to one line + ensure trailing LF on spoe cfg All checks were successful HAProxy Manager Build and Push / Build-and-Push (push) Successful in 2m11s Details Two HAProxy parse errors caught in staging functional test: 1. coraza-spoe.cfg:39 'args': missing fetch method The args directive had backslash line continuations. HAProxy doesn't support those in SPOE configs — args must be one physical line. Collapsed to a single line. 2. coraza-spoe.cfg:50 Missing LF on last line Same trailing-LF issue we hit on haproxy.cfg one commit ago. The Jinja2 template ends with content rather than a newline, and write() doesn't add one. Belt-and-suspenders: explicitly append '\n' before writing if not already there. After this commit HAProxy validates the generated config cleanly. Will verify on staging now (combined SPOE injection + fail-open + active attack-detection tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 17:07:12 -07:00
Josh Knapp	4769f67fe9	fix(coraza): ensure haproxy.cfg ends with LF when SPOE backend appended All checks were successful HAProxy Manager Build and Push / Build-and-Push (push) Successful in 54s Details The SPOE backend block from hap_coraza_spoa_backend.tpl was being appended last to config_parts. The template's render output doesn't end with a newline (and config_parts is joined with '\n' BETWEEN elements, not after the last one), so the resulting haproxy.cfg ended on `server coraza-spoa ...` with no trailing LF. HAProxy refuses to parse such files: [ALERT] config: parsing [/etc/haproxy/haproxy.cfg:288]: Missing LF on last line, file might have been truncated at position 70. Match the existing pattern at the previous-last config_parts.append (line 1850 uses `'\n'.join(config_backends) + '\n'`) and add an explicit '\n' on the coraza block append. Caught immediately on staging: HTTP 000 to localhost:80 because HAProxy never started; gunicorn/management API kept serving on :8000 fine. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 17:03:56 -07:00
Josh Knapp	3e1f9dda2b	fix(template): strip Jinja2 whitespace so no-env-var listener is byte-identical All checks were successful HAProxy Manager Build and Push / Build-and-Push (push) Successful in 1m23s Details Default Jinja2 {% if %}{% endif %} block syntax leaves a trailing newline even when the conditional doesn't render. Staging verification of PR 2 showed the resulting haproxy.cfg differed from the pre-PR2 version by exactly 1 blank line — semantically identical but not byte-identical, which violates the design promise that haproxy-manager-base's default output stays unchanged for home/standalone deployments. Use {%- if -%}/{%- endif %} (the whitespace-stripping variants) so the block contributes zero bytes when coraza_spoe_backend is unset. Verified locally: without env var = 55 lines, ends cleanly on the is_blocked_ip rule. With env var = 62 lines, +7 for the SPOE block. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 16:59:40 -07:00
Josh Knapp	73b9104565	PR 2/3: opt-in SPOE integration for Coraza WAF All checks were successful HAProxy Manager Build and Push / Build-and-Push (push) Successful in 1m59s Details Adds the plumbing that lets haproxy-manager talk to the coraza-spoa sidecar added in PR 1, while keeping the default behavior bit-identical for any deployment that doesn't set the new env var (the home network / standalone use cases). Single gate: HAPROXY_CORAZA_SPOE_BACKEND env var on the haproxy-manager container. Unset (default) = generate_config() renders zero SPOE-related output. Set (e.g. "coraza-spoa:9000") = three things happen at config generation time: 1. hap_listener.tpl injects 5 lines at the end of the frontend block: filter spoe engine coraza config /etc/haproxy/coraza-spoe.cfg http-request send-spoe-group coraza coraza-check ...placed AFTER rate-limit and IP-block guards so we don't waste WAF calls on requests we were going to drop anyway. 2. A new TCP backend (hap_coraza_spoa_backend.tpl) is appended: backend coraza-spoa-backend mode tcp server coraza-spoa <env-var-target> check ... 3. The SPOE engine config (hap_coraza_spoe_engine.tpl) is rendered and written to /etc/haproxy/coraza-spoe.cfg, defining the spoe-agent "coraza" + spoe-message "coraza-check". This sets: - option set-on-error continue (FAIL-OPEN if SPOA is unreachable) - timeout processing 100ms (per-request inspection budget) - app=str(haproxy) (matches sidecar's application name) Verification (template render only, before staging deploy): - hap_listener.tpl with no env var: 55 lines, zero SPOE references - hap_listener.tpl with env var: 62 lines, filter + send-spoe-group present - Engine cfg + backend block render with correct agent_target substitution Next: PR 3 wires this into WHP (sidecar deploy via container-manager.sh extension, server-settings UI for on/off, AI Monitor source for the audit log). Staging verification of PR 1 + PR 2 together happens after PR 3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 16:49:29 -07:00
Josh Knapp	4e0c22e9c9	ci: mirror golang:1.25 alongside python:3.12-slim, switch coraza-spoa FROM All checks were successful Build and push coraza-spoa / Build-and-Push (push) Successful in 1m16s Details HAProxy Manager Build and Push / Build-and-Push (push) Successful in 1m18s Details Cloudflare's bot-management incident on 2026-05-12 took out docker.io blob pulls twice in one day — first for python:3.12-slim (mirrored in `5a2ebf9`), then again for golang:1.25 when the PR 1 coraza-spoa build hit the same R2-via-Cloudflare failure on the build stage's base image. Restructure .gitea/workflows/mirror-base-image.yaml into a matrix that iterates over a list of (src, dst_path, tag) entries. Adding a new base image is now a one-line matrix entry. fail-fast: false so one image's upstream being down doesn't block refreshing the others. Switch coraza-spoa/Dockerfile's build stage FROM to the in-house golang mirror. Runtime FROM (gcr.io/distroless/static-debian12:nonroot) stays on upstream — distroless is on Google's registry, separate from Docker Hub's Cloudflare R2 setup, and didn't fail during today's incident. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 16:40:42 -07:00
Josh Knapp	e4c506bcd9	PR 1/3: add coraza-spoa sidecar image Some checks failed Build and push coraza-spoa / Build-and-Push (push) Failing after 24s Details HAProxy Manager Build and Push / Build-and-Push (push) Successful in 55s Details Self-contained sidecar that runs Coraza-SPOA v0.7.1 (latest upstream as of 2026-05-08, with OWASP CRS bundled in the binary). HAProxy will consult it per-request via SPOE in PR 2; for now this PR ships the image only. Defines: - coraza-spoa/Dockerfile — multi-stage build (golang:1.25 -> distroless), pinned to v0.7.1, ARG-overridable - coraza-spoa/config.yaml — single application "haproxy", JSON audit log to /var/log/coraza/audit.log, SecRuleEngine DetectionOnly globally - coraza-spoa/overrides.conf — day-one enforce list: scanner UAs (913xxx), RCE shell injection (932100-932160), webshell paths (933170-933200), targeted LFI (930120), Log4Shell/JNDI (944100-944300). Rationale per-range documented inline. Detect-only for XSS/SQLi/protocol (high FP on WP/WooCommerce/Divi customer mix). - coraza-spoa/README.md — deployment shape, audit log location, pin upgrade procedure, false-positive tuning. - .gitea/workflows/build-push-coraza.yaml — Gitea Action triggered on coraza-spoa/** changes, publishes repo.anhonesthost.net/cloud-hosting-platform/ coraza-spoa:latest. Path-scoped so it doesn't fire on every haproxy-manager push. No changes to haproxy-manager-base itself in this PR — the existing image stays bit-identical, used standalone in home networks and other projects without dependency on this sidecar. PR 2 will add the OPT-IN template plumbing that lets haproxy-manager call out to this agent when an env var is set. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 16:28:44 -07:00
Josh Knapp	55670daf5b	ci: add weekly Gitea Action to mirror python:3.12-slim into in-house registry All checks were successful HAProxy Manager Build and Push / Build-and-Push (push) Successful in 1m16s Details Companion to the Dockerfile change in `5a2ebf9`. The previous manual refresh note in the Dockerfile becomes automated: a workflow_dispatch + weekly cron that pulls python:3.12-slim from docker.io and re-pushes it to repo.anhonesthost.net/cloud-hosting-platform/python:3.12-slim. Workflow can also be triggered manually from the Gitea UI when Python publishes patches between cron firings. Logs the upstream and mirror digests so it's easy to verify "did the mirror really update" after a run. If more base images need mirroring later (haproxy itself, alpine, etc.), this workflow should be promoted to a matrix or moved to a dedicated infra repo — keeping it co-located with haproxy-manager-base for now since it's the only consumer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 16:18:32 -07:00
Josh Knapp	5a2ebf991c	ci: mirror python:3.12-slim into in-house registry All checks were successful HAProxy Manager Build and Push / Build-and-Push (push) Successful in 1m58s Details docker.io serves image blobs from Cloudflare R2. The 2026-05-12 Cloudflare incident took out blob pulls for hours and broke this image's Gitea CI build mid-way through the haproxy-manager gunicorn migration (commit `bdd7d2f`). With the base image mirrored at repo.anhonesthost.net, CI builds no longer depend on docker.io reachability. Refresh procedure documented in the Dockerfile comment block. Manual re-push monthly or when Python patches drop. A future Gitea Action could automate the pull-tag-push so we always have a current base. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 16:08:44 -07:00
Josh Knapp	bdd7d2f098	swap werkzeug dev server for gunicorn + accept all HTTP methods on default/blocked pages Some checks failed HAProxy Manager Build and Push / Build-and-Push (push) Failing after 13s Details Two related fixes for the issues the AI Monitor surfaced on whp01 on 2026-05-12 (haproxy-manager going "healthy but stalled" after long uptime, and noise from POST /blocked-ip returning 405): 1. Production WSGI server. The Flask app was running on werkzeug's built-in dev server (the one that prints "WARNING: This is a development server" on every startup). werkzeug is single-threaded and accumulates worker state over long uptimes; after ~24h on whp01 the health endpoint stops responding while the container still reports "healthy" because Docker's HEALTHCHECK uses an HTTP probe from inside the same werkzeug process that's stalled. Replace with gunicorn (gthread worker class, --max-requests=1000 with jitter so workers recycle periodically). Two gunicorn instances, one per Flask app — port 8000 for the management API, port 8080 for the default/blocked-ip page server. Both lift their app objects from the haproxy_manager module so gunicorn can import them. Required structural change: default_app was created INSIDE the __name__ == '__main__' block at module bottom, where gunicorn could never reach it. Moved to module level. The __main__ block now stays only for `python haproxy_manager.py` local-dev workflow. Container init (init_db, certbot register, generate_config, start_haproxy) extracted into a do_initial_setup() function called from a new scripts/init.py. start-up.sh runs init.py to completion before either gunicorn binds, which keeps HAProxy startup off the WSGI workers' fork paths (no race between workers all trying to start_haproxy() at once). 2. /blocked-ip and / accept ALL methods. HAProxy proxies blocked-IP traffic to default_app preserving the original verb, so a blocked POST request used to hit Flask's GET-only route and get a 405 + the AI Monitor flagged the noise. Adding the full method list lets the 403 page render regardless of verb. Gunicorn settings tunable via env (workers, timeout, max-requests). API gets --timeout 120 because ACME cert issuance can be slow; the default page server stays on the gunicorn default 30s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 15:24:28 -07:00
Josh Knapp	8a86beac73	feat: clear stale certbot lock files before each ACME run + at startup All checks were successful HAProxy Manager Build and Push / Build-and-Push (push) Successful in 53s Details certbot uses fasteners (fcntl-based locking) to serialize concurrent invocations. The kernel auto-releases fcntl locks when the holding process exits, but the .certbot.lock FILES persist on disk — and we've seen real cases where subsequent runs report "Another instance of Certbot is already running" even when no certbot process is alive. Observed during the 2026-05-09 bundling rollout when a hung worker held a lock across container-internal Python crashes. When SSL is blocked on a customer site, this is high-impact: the certbot lock can sit stale until somebody manually deletes it. clear_stale_certbot_locks(): - probes each known lock path with fcntl.LOCK_NB - if the lock is unheld → file is stale → delete it - if the lock IS held → leave it alone (real certbot is running) Wired in: - container startup (init block) - /api/ssl single-domain handler - /api/ssl/bundle handler - /api/certificates/renew handler Safe to call repeatedly; never deletes a lock a real process holds, so can never trigger concurrent certbot runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 12:09:19 -07:00
Josh Knapp	f7ef34b988	feat(api/ssl/bundle): clean up superseded lineages after issuance All checks were successful HAProxy Manager Build and Push / Build-and-Push (push) Successful in 53s Details The bundle endpoint correctly issued multi-SAN certs but left old single-SAN .pem files (e.g. <name>-0001.pem) in /etc/haproxy/certs/. HAProxy's `bind ... ssl crt /etc/haproxy/certs` loads everything in the directory and picked the alphabetically-first matching file — typically the older single-SAN one — so the new bundle had no effect on what was served. Repro on peptidesaver.net: bundle covered 4 SANs but HAProxy kept serving peptidesaver.net-0001.pem (single SAN, April-issued). After a successful bundle write, walk SSL_CERTS_DIR and remove any .pem whose CN is in the new bundle's name list (excluding the bundle's own combined file). Drop the matching certbot lineage with `certbot delete --cert-name <X> -n` so `certbot renew` stops touching the dead lineage too. Returns a `cleanup` summary in the API response so callers can log / display what was deleted. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 11:58:21 -07:00
Josh Knapp	90255cc4b3	feat(api): add /api/ssl/bundle for per-site SAN cert issuance All checks were successful HAProxy Manager Build and Push / Build-and-Push (push) Successful in 1m1s Details WHP's renewal orchestrator now bundles a site's domains into one cert covering all SANs, instead of N separate single-domain orders. Single ACME order = better behavior under Let's Encrypt's 50/hour orders limit when many domains need attention at once. Endpoint: POST /api/ssl/bundle Body: {"primary": "example.com", "sans": ["www.example.com", ...]} - Uses --cert-name <primary> so the lineage stays stable across renewals (no -0001/-0002 proliferation seen with the legacy single-domain flow). - Single combined .pem at /etc/haproxy/certs/<primary>.pem; HAProxy SNI- matches against the cert's SAN list, so one file serves all included hostnames. - Updates the domains table for every SAN in the bundle. - Hard cap at 100 SANs (LE limit). Existing /api/ssl single-domain endpoint kept for backwards compat. The WHP haproxy_manager::bundleSSL() helper falls back to a per-domain loop if /api/ssl/bundle returns 404, so the WHP side keeps working during the rolling image upgrade window. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 11:32:15 -07:00
Josh Knapp	b731feab12	Self-heal trusted IP whitelist files at startup All checks were successful HAProxy Manager Build and Push / Build-and-Push (push) Successful in 3m26s Details Volume-mounted /etc/haproxy can shadow the image-baked trusted_ips.list/trusted_ips.map, causing HAProxy to fail config validation with "failed to open pattern file" on non-WHP deployments. Touch empty files if they don't exist so the ACLs always parse. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 10:02:16 -07:00
Josh Knapp	615044fa14	Fix resolvers block placement — must be outside global section All checks were successful HAProxy Manager Build and Push / Build-and-Push (push) Successful in 53s Details The resolvers section was inserted inside the global section, causing HAProxy to parse global directives (pidfile, maxconn, etc.) as resolver keywords. Moved resolvers to its own top-level section between global and defaults where HAProxy expects it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 05:18:48 -07:00
Josh Knapp	cf4eb5092c	Add DNS resolver for automatic container IP re-resolution All checks were successful HAProxy Manager Build and Push / Build-and-Push (push) Successful in 54s Details When Docker containers restart, they can get new IPs on the bridge network. HAProxy caches DNS at config load time, so stale IPs cause 503s until config is regenerated. Added a 'docker_dns' resolvers section pointing to Docker's embedded DNS (127.0.0.11) with 10s hold time. Backend servers now use 'resolvers docker_dns init-addr last,libc,none' so HAProxy: - Re-resolves container names every 10 seconds - Falls back to last known IP if DNS is temporarily unavailable - Starts even if a backend can't be resolved yet (init-addr none) This eliminates 503s from container restarts, scaling, and recreation without requiring a HAProxy config regeneration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-01 22:27:07 -07:00
Josh Knapp	ecf891ff02	Don't abort cert renewal when a single domain fails All checks were successful HAProxy Manager Build and Push / Build-and-Push (push) Successful in 2m11s Details The renewal script was exiting immediately when certbot returned a non-zero exit code, which happens when ANY cert fails to renew. A single dead domain (e.g., DNS no longer pointed here) would block ALL other certificates from being processed and combined for HAProxy. Now logs the failures but continues to copy/combine successfully renewed certificates and reload HAProxy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-01 15:17:15 -07:00
Josh Knapp	3da5df67d0	Update CLAUDE.md with HAProxy hardening and AI log monitor docs All checks were successful HAProxy Manager Build and Push / Build-and-Push (push) Successful in 2m36s Details Documents HAProxy health checks, watchdog, rate limiting, trusted IP whitelist, timeout hardening, HTTP/2 protection, and the AI-powered log monitor system with two-tier analysis, auto-remediation, and notification support. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-01 08:16:44 -07:00
Josh Knapp	da40328438	Fix: remove comments from trusted IP files breaking HAProxy startup All checks were successful HAProxy Manager Build and Push / Build-and-Push (push) Successful in 54s Details Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-31 14:19:29 -07:00
Josh Knapp	13a5be636e	Raise rate limits further for media-heavy sites All checks were successful HAProxy Manager Build and Push / Build-and-Push (push) Successful in 53s Details Generous thresholds that accommodate sites with many images/assets while still catching obvious automated floods: - Request rate: tarpit at 300 req/s, block at 500 req/s - Connection rate: 500/10s - Concurrent connections: 500 - Error rate: 100/30s Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-31 14:12:24 -07:00
Josh Knapp	5390ebb8a6	Raise rate limit thresholds to avoid false positives on normal traffic All checks were successful HAProxy Manager Build and Push / Build-and-Push (push) Successful in 1m22s Details Previous thresholds (200/500 req/10s) were too aggressive — WordPress login pages with their CSS/JS/image assets can easily burst 30-50 requests per page load, triggering tarpits and blocks on legitimate users. New thresholds: - Request rate: tarpit at 1000/10s (100 req/s), block at 2000/10s (200 req/s) - Connection rate: 300/10s (was 150) - Concurrent connections: 200 (was 100) - Error rate: 50/30s (was 20) These still catch real floods and scanners while giving normal web traffic plenty of headroom. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-31 14:10:53 -07:00
Josh Knapp	53d259bd3f	Add trusted IP whitelist for rate limit bypass All checks were successful HAProxy Manager Build and Push / Build-and-Push (push) Successful in 1m25s Details Adds trusted_ips.list and trusted_ips.map files that exempt specific IPs from all rate limiting rules. Supports both direct source IP matching (is_trusted_ip) and proxy-header real IP matching (is_whitelisted). Files are baked into the image and can be updated by editing and rebuilding. Adds phone system IP 172.116.197.166 to the whitelist. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-31 13:39:41 -07:00
Josh Knapp	2ba8f87c2c	Raise connection rate limit from 60 to 150 per 10s All checks were successful HAProxy Manager Build and Push / Build-and-Push (push) Successful in 56s Details Gives more headroom for customers with code that makes frequent callbacks to itself, while still catching connection floods. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-31 12:25:53 -07:00
Josh Knapp	a3b19ce352	Add rate limiting, connection limits, and timeout hardening All checks were successful HAProxy Manager Build and Push / Build-and-Push (push) Successful in 1m33s Details Activate HAProxy's built-in attack prevention to stop floods that cause the container to become unresponsive: - Stick table tracks per-IP: conn_cur, conn_rate, http_req_rate, http_err_rate - Rate limit rules: deny at 50 req/s, tarpit at 20 req/s, connection rate limit at 60/10s, concurrent connection cap at 100, error rate tarpit at 20 errors/30s - Harden timeouts: http-request 300s→30s, connect 120s→10s, client 10m→5m, keep-alive 120s→30s - HTTP/2 Rapid Reset protection (CVE-2023-44487): stream and glitch limits - Stats frontend on localhost:8404 for monitoring - HEALTHCHECK now validates both port 80 (HAProxy) and 8000 (API) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-31 10:00:53 -07:00
jknapp	94af4e47c1	Add Host header capture to frontend for connection debugging All checks were successful HAProxy Manager Build and Push / Build-and-Push (push) Successful in 56s Details Captures the Host header in HAProxy httplog output so high-connection alerts can be correlated to specific domains. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 15:31:14 -08:00
jknapp	124a5373d2	Fix wildcard SSL cert: find certbot -NNNN dirs and use _wildcard_ filename All checks were successful HAProxy Manager Build and Push / Build-and-Push (push) Successful in 1m1s Details Add find_certbot_live_dir() helper to locate the most recent certbot live directory for a domain, handling -NNNN suffixed dirs from repeated requests. Fix combined cert filename from *.domain.pem to _wildcard_.domain.pem. Apply the helper across all SSL endpoints (request, renew, verify, download). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-20 06:38:28 -08:00
jknapp	657cd28344	Fix certbot hook script paths and add logging All checks were successful HAProxy Manager Build and Push / Build-and-Push (push) Successful in 3m4s Details Hook scripts are at /haproxy/scripts/ inside the container (per Dockerfile COPY), not /app/scripts/. Also added logging of certbot stdout/stderr so failures are visible in haproxy-manager.log. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-20 06:18:14 -08:00
jknapp	91c92dd07e	Add wildcard domain support with DNS-01 ACME challenge flow All checks were successful HAProxy Manager Build and Push / Build-and-Push (push) Successful in 1m17s Details Support wildcard domains (*.domain.tld) in HAProxy config generation with exact-match ACLs prioritized over wildcard ACLs. Add DNS-01 challenge endpoints that coordinate with certbot via auth/cleanup hook scripts for wildcard SSL certificate issuance. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 13:06:08 -08:00
Josh Knapp	6cd64295d2	Add separate SSE backend for secure Server-Sent Events support All checks were successful HAProxy Manager Build and Push / Build-and-Push (push) Successful in 52s Details Creates two backends per domain: 1. Regular backend - Uses http-server-close for better security and connection management (prevents connection exhaustion attacks) 2. SSE backend - Optimized for Server-Sent Events with: - no option http-server-close (allows long-lived connections) - option http-no-delay (immediate data transmission) - 6-hour timeouts (supports long streaming sessions) Frontend routing logic: - Detects SSE via Accept: text/event-stream header or ?action=stream param - Routes SSE traffic to SSE-optimized backend - Routes regular HTTP traffic to standard secure backend This approach provides full SSE support while maintaining security for regular HTTP traffic (preventing DDoS/connection flooding attacks). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-26 13:48:24 -08:00
Josh Knapp	eadd6b798f	Adding support for SSE Streaming All checks were successful HAProxy Manager Build and Push / Build-and-Push (push) Successful in 1m23s Details	2025-12-26 13:07:29 -08:00
Josh Knapp	6902daaea1	Add automatic SSE detection and support to backend template All checks were successful HAProxy Manager Build and Push / Build-and-Push (push) Successful in 1m27s Details Changes: - Detect SSE via Accept header (text/event-stream) or ?action=stream parameter - Disable http-server-close to allow long-lived SSE connections - Enable http-no-delay for immediate event delivery - Set 1-hour timeouts for SSE support (also fine for normal requests) - Force Connection: keep-alive for detected SSE requests Benefits: - SSE now works automatically without special backend configuration - Fixes transcription server display disconnection issues - Normal HTTP requests still work perfectly - No need for separate SSE-specific backends Fixes: Server-Sent Events timing out through HAProxy	2025-12-26 13:02:04 -08:00
Josh Knapp	1fcb25bb88	Update SQL logic to update instead of delete and re-add All checks were successful HAProxy Manager Build and Push / Build-and-Push (push) Successful in 1m18s Details	2025-12-18 12:23:06 -08:00
Josh Knapp	bff18d358b	Remove set -e and database dependency from certificate scripts All checks were successful HAProxy Manager Build and Push / Build-and-Push (push) Successful in 56s Details Improved certificate renewal and sync scripts to be more resilient: - Removed 'set -e' to prevent silent failures when individual domains error - Scripts now continue processing remaining domains even if one fails - Replaced database queries with direct filesystem scanning of /etc/letsencrypt/live/ - Uses 'find' command to discover all domains with Let's Encrypt certificates - More reliable as it works even if database is out of sync Benefits: - No silent failures - errors are logged but don't stop the entire process - Works independently of database state - Simpler and more straightforward - All domains with certificates get processed regardless of database 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-21 08:50:24 -08:00
Josh Knapp	1d22d789b8	Simplify certificate renewal scripts and add certbot cleanup All checks were successful HAProxy Manager Build and Push / Build-and-Push (push) Successful in 59s Details Simplified all certificate renewal scripts to be more straightforward and reliable: - Scripts now just run certbot renew and copy cert+key files to HAProxy format - Removed overly complex retry logic and error handling - Both in-container and host-side scripts work with cron scheduling Added automatic certbot cleanup when domains are removed: - When a domain is deleted via API, certbot certificate is also removed - Prevents renewal errors for domains that no longer exist in HAProxy - Cleans up both HAProxy combined cert and Let's Encrypt certificate Script changes: - renew-certificates.sh: Simplified to 87 lines (from 215) - sync-certificates.sh: Simplified to 79 lines (from 200+) - host-renew-certificates.sh: Simplified to 36 lines (from 40) - All scripts use same pattern: query DB, copy certs, reload HAProxy Python changes: - remove_domain() now calls 'certbot delete' to remove certificates - Prevents orphaned certificates from causing renewal failures 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-20 09:56:56 -08:00
jknapp	adc20d6d0b	Improve certificate renewal script with atomic file updates All checks were successful HAProxy Manager Build and Push / Build-and-Push (push) Successful in 59s Details - Write combined certificates to temporary file first - Verify file is not empty before moving to final location - Use atomic mv operation to prevent HAProxy from reading partial files - Add proper cleanup of temporary files on all error paths - Matches robust patterns from haproxy_manager.py 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-19 19:27:40 -08:00
Josh Knapp	71f4b9ef05	Add CIDR notation support for IP blocking All checks were successful HAProxy Manager Build and Push / Build-and-Push (push) Successful in 2m1s Details - Update map file format to include value (IP/CIDR 1) - Fix HAProxy template to use map_ip() for CIDR support - Update runtime map commands to include value - Document CIDR range blocking in API documentation - Support blocking entire network ranges (e.g., 192.168.1.0/24) This allows blocking compromised ISP ranges and other large-scale attacks.	2025-11-17 12:07:32 -08:00
Josh Knapp	8d732318b4	Fix certificate renewal to properly update HAProxy combined certificate files All checks were successful HAProxy Manager Build and Push / Build-and-Push (push) Successful in 1m4s Details After certbot renews certificates, the separate fullchain.pem and privkey.pem files must be combined into a single .pem file for HAProxy. The renewal script was missing this critical step, causing HAProxy to continue using old certificates. Changes: - Add update_combined_certificates() function to renew-certificates.sh - Query database for all SSL-enabled domains - Combine Let's Encrypt cert + key files using cat (matches haproxy_manager.py pattern) - Always update combined certs after renewal, even if certbot says no renewal needed - Add new sync-certificates.sh script for syncing all existing certificates - Smart update detection in sync script (only updates when source is newer) This ensures HAProxy always gets properly formatted certificate files after renewal. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-11 20:10:58 -08:00

1 2 3

135 Commits