Commit Graph

135 Commits

Author SHA1 Message Date
220b28f0c4 haproxy: use req.hdr_ip for real-IP resolution (string-IP crashed Coraza SPOA)
All checks were successful
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 55s
2026-05-14 08:57:05 -07:00
9770398ab0 coraza: pass var(txn.real_ip) instead of src to Coraza (real client IP in WAF logs)
All checks were successful
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 55s
2026-05-14 08:52:01 -07:00
633d9390f2 coraza: pin go.mod to 1.23 (matches go mod tidy output; Dockerfile still uses 1.25 image)
All checks were successful
Build and push coraza-spoa / Build-and-Push (push) Successful in 42s
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 54s
2026-05-14 08:08:38 -07:00
6d43308073 coraza: pre-CRS Include for runtime per-host exemptions (load-order fix)
All checks were successful
Build and push coraza-spoa / Build-and-Push (push) Successful in 41s
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 54s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-14 07:55:51 -07:00
489290ed33 coraza: ship rules-catalog.json generated from bundled CRS at build time
All checks were successful
Build and push coraza-spoa / Build-and-Push (push) Successful in 44s
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 52s
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 06:57:42 -07:00
b2adcdbed9 coraza: reserve rule-ID range 990000000-990999999 for WHP-generated rules 2026-05-14 06:53:37 -07:00
1f1bc1837e coraza: add second Include for runtime-managed local-overrides.conf 2026-05-14 06:51:24 -07:00
753743de20 coraza: drop 913xxx scanner-UA from enforce list (FP on Mastodon + SiteLock)
All checks were successful
Build and push coraza-spoa / Build-and-Push (push) Successful in 40s
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 54s
25h whp01 burn-in (2026-05-13) found ~11% FP rate on rule 913100:
ActivityPub federation pulls (Mastodon UA "...Bot" on hackerpublicradio.org
and blog.anti-social.online) and SiteLockSpider scans (a customer-paid
security service hitting greggfranklin.com + suchascream.net). The other
six promoted rule families (930120, 932100-160, 933170-200, 944100-300,
920440, 930130) showed zero FPs across the same window and stay enforced.

Detection-only still feeds the anomaly score, so we lose ~no real
blocking value by demoting this family.
2026-05-13 19:13:22 -07:00
5e5234cb14 refactor(suspension): serve via /suspended route on default-backend, drop bk_suspended
All checks were successful
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 53s
The previous design used a separate whp-suspended container (nginx:alpine
serving a static 503 page) reachable via a dedicated bk_suspended backend.
That was over-engineered — haproxy-manager-base already ships a default-app
Flask server on :8080 that serves /default-page and /blocked-ip via
path-rewrite ACLs. Mirroring that pattern lets the suspension page live
in the SAME container, no extra image to build, no extra container to
run/health-monitor.

Changes:
- Add /suspended Flask route on default_app returning 503 + suspended_page.html
- Add templates/suspended_page.html (dark-themed 503 page)
- hap_listener.tpl: 'http-request set-path /suspended' + 'use_backend
  default-backend' when host is in suspended_domains.list (same pattern
  as is_blocked_ip)
- Rename env var from HAPROXY_SUSPENSION_BACKEND (a target hostport) to
  HAPROXY_SUSPENSION_ENABLED (a bool); accepts 1/true/yes/on (case-insensitive)
- Remove hap_suspended_backend.tpl and its rendering in generate_config

Non-WHP deployments (env var unset) see byte-identical haproxy.cfg as before
(verified via jinja2 render diff).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 12:08:45 -07:00
6fd07b4c54 fix(suspended): tolerate startup DNS failure + use docker_dns resolvers
All checks were successful
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 52s
If the upstream container isn't up when haproxy-manager starts (e.g. when
haproxy is recreated before whp-suspended), the default `init-addr libc` mode
makes haproxy refuse to start — taking down the whole proxy. Switched to
`init-addr last,none` (use last known address, fall back to 0.0.0.0 = DOWN)
and added `resolvers docker_dns` (defined in hap_header.tpl) so the real IP
is picked up once DNS becomes resolvable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 11:52:50 -07:00
2ef582a3de feat(suspension): opt-in routing for suspended hosts via bk_suspended backend
All checks were successful
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 56s
Adds a new env var HAPROXY_SUSPENSION_BACKEND (default unset). When set
(e.g. "whp-suspended:80"), generate_config() renders:
- A bk_suspended backend pointing at the configured upstream
- An ACL `acl is_suspended_domain hdr(host),lower -f /etc/haproxy/suspended_domains.list`
  + `use_backend bk_suspended if is_suspended_domain` in the frontend,
  sitting after IP-blocking and before any per-domain routing
- An empty /etc/haproxy/suspended_domains.list if missing (haproxy refuses
  to start with -f pointing at a non-existent file)

External tooling (e.g. WHP's site_disable.php) maintains the list via
`docker cp` and HUP-reloads the container.

Non-WHP deployments (home networks, standalone use) leave the env var
unset and see byte-identical haproxy.cfg output. Same opt-in shape as
the existing HAPROXY_CORAZA_SPOE_BACKEND integration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 11:46:18 -07:00
3572c66fb7 coraza: promote 920440 + 930130 to enforce list (empirical detect-only data)
All checks were successful
Build and push coraza-spoa / Build-and-Push (push) Successful in 1m17s
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 53s
After ~30 min of detect-only on whp01 we have actionable data on what
fires against legitimate customer traffic vs. attacker recon. Two rules
demonstrably catch only the latter and earn promotion to the day-one
enforce list:

  920440 — URL file extension restricted by policy
    Caught 124 events in the sample window, ALL backup/config-file
    disclosure probes (`/wp-config.php.old`, `/db_backup.sql`,
    `/.env.save`, `/releases.sql` ...) from a single GCP-hosted scanner
    hammering joshuaknapp.net. Match patterns: .sql (×62), .bak (×5),
    .old (×3), .save (×2), .backup, .dist. No legitimate URL on
    WP/WooCommerce/Divi/HPR ends in these.

  930130 — Restricted File Access Attempt
    Caught 117 events, ALL dotfile/VCS/config-disclosure probes
    (`/.env`, `/.env.local`, `/.env.bak`, `/.git/config`, `/config.php`,
    `/admin/.env`, `/backend/.env` ...). Spread across joshuaknapp.net,
    cgdannyb.com, onlinesupplements.net. Notably, HPR's
    `/ccdn.php?filename=/eps/...` legitimate audio-delivery URL does NOT
    trigger this rule — verified empirically.

Also documented in the "intentionally detect-only" comment block: 933150
fires on WooCommerce checkout when literal `session_start` appears in
billing form data (alphaoneaminos.com saw 2 such events). That's a
canonical CRS false positive on WooCommerce; left detect-only.

Net effect: existing detect_only deployments stay detect-only (the WHP
apply script bind-mounts an empty overrides over the baked-in file).
When operators next flip a server to enforce, these two extra ranges
activate alongside the original day-one list.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 18:00:21 -07:00
ba4c101135 fix(coraza): add deny rules that act on Coraza's verdict + spop-check on backend
All checks were successful
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 55s
Two fixes that complete the SPOE enforcement path:

1. Listener was sending requests to Coraza for inspection but never reading
   the result. Coraza-SPOA sets var(txn.coraza.action) to "deny" / "drop"
   / "redirect" when a rule with that disruptive action fires; HAProxy
   needs explicit rules that READ the variable and apply the action.
   Without them, the audit log shows "Access denied" but the request
   still gets HTTP 200 (verified on staging: sqlmap/JNDI/shellinj all
   detected, all returned 200).

   Added the standard six rules from upstream's example/haproxy/haproxy.cfg
   covering http-request + http-response phases for each of deny/drop/
   redirect. Same set the upstream Coraza-SPOA docs recommend.

   Intentionally did NOT add the upstream's fail-CLOSED rule
   `http-request deny deny_status 500 if { var(txn.coraza.error) -m int gt 0 }`
   — for a hosting platform we want fail-open. Documented inline.

2. Backend health check switched from plain TCP `check` to `option
   spop-check`. The spop-check actually negotiates a SPOE session against
   the agent, so HAProxy detects a half-broken SPOA that's listening on
   :9000 but failing protocol handshakes. Plain `check` would miss that.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 17:16:03 -07:00
f1e9bb2c63 fix(coraza-spoe): match upstream's required spoe shape (groups, arg order, names)
All checks were successful
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 1m18s
Three real bugs in the SPOE config caught when HAProxy validated the
generated file:

1. spoe-agent must declare `groups` not `messages`. The `messages` form
   doesn't make the message reachable via `send-spoe-group`; HAProxy
   complained:
     unable to find SPOE group 'coraza-check' into SPOE engine 'coraza'

2. send-spoe-group references a spoe-GROUP name, which needs its own
   block. Added `spoe-group coraza-req { messages coraza-req }` as
   the indirection layer.

3. Arg names + ORDER are required to match what Coraza-SPOA parses
   positionally. My version had `dest-ip`/`dest-port`; upstream's
   example/haproxy/coraza.cfg (v0.7.1) uses `dst-ip`/`dst-port`.
   Renamed and reordered to match upstream verbatim, including the
   `app=str(haproxy)` literal that matches our config.yaml application
   name.

Also corrected misleading comment about `set-on-error continue`: that
option actually sets a variable on error; the fail-open behavior comes
from us deliberately NOT adding a `http-request deny if errored` rule
in the frontend. Renamed the variable to `error` (matching upstream)
and updated comments to be accurate.

Listener template's send-spoe-group action updated to reference the
new group name `coraza-req`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 17:12:09 -07:00
061309675b fix(coraza-spoe): collapse args to one line + ensure trailing LF on spoe cfg
All checks were successful
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 2m11s
Two HAProxy parse errors caught in staging functional test:

1. coraza-spoe.cfg:39 'args': missing fetch method
   The args directive had backslash line continuations. HAProxy doesn't
   support those in SPOE configs — args must be one physical line.
   Collapsed to a single line.

2. coraza-spoe.cfg:50 Missing LF on last line
   Same trailing-LF issue we hit on haproxy.cfg one commit ago. The
   Jinja2 template ends with content rather than a newline, and write()
   doesn't add one. Belt-and-suspenders: explicitly append '\n' before
   writing if not already there.

After this commit HAProxy validates the generated config cleanly. Will
verify on staging now (combined SPOE injection + fail-open + active
attack-detection tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 17:07:12 -07:00
4769f67fe9 fix(coraza): ensure haproxy.cfg ends with LF when SPOE backend appended
All checks were successful
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 54s
The SPOE backend block from hap_coraza_spoa_backend.tpl was being appended
last to config_parts. The template's render output doesn't end with a
newline (and config_parts is joined with '\n' BETWEEN elements, not after
the last one), so the resulting haproxy.cfg ended on `server coraza-spoa
...` with no trailing LF. HAProxy refuses to parse such files:

    [ALERT] config: parsing [/etc/haproxy/haproxy.cfg:288]: Missing LF
    on last line, file might have been truncated at position 70.

Match the existing pattern at the previous-last config_parts.append
(line 1850 uses `'\n'.join(config_backends) + '\n'`) and add an explicit
'\n' on the coraza block append.

Caught immediately on staging: HTTP 000 to localhost:80 because HAProxy
never started; gunicorn/management API kept serving on :8000 fine.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 17:03:56 -07:00
3e1f9dda2b fix(template): strip Jinja2 whitespace so no-env-var listener is byte-identical
All checks were successful
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 1m23s
Default Jinja2 {% if %}{% endif %} block syntax leaves a trailing newline
even when the conditional doesn't render. Staging verification of PR 2
showed the resulting haproxy.cfg differed from the pre-PR2 version by
exactly 1 blank line — semantically identical but not byte-identical,
which violates the design promise that haproxy-manager-base's default
output stays unchanged for home/standalone deployments.

Use {%- if -%}/{%- endif %} (the whitespace-stripping variants) so the
block contributes zero bytes when coraza_spoe_backend is unset.

Verified locally: without env var = 55 lines, ends cleanly on the
is_blocked_ip rule. With env var = 62 lines, +7 for the SPOE block.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 16:59:40 -07:00
73b9104565 PR 2/3: opt-in SPOE integration for Coraza WAF
All checks were successful
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 1m59s
Adds the plumbing that lets haproxy-manager talk to the coraza-spoa sidecar
added in PR 1, while keeping the default behavior bit-identical for any
deployment that doesn't set the new env var (the home network / standalone
use cases).

Single gate: HAPROXY_CORAZA_SPOE_BACKEND env var on the haproxy-manager
container. Unset (default) = generate_config() renders zero SPOE-related
output. Set (e.g. "coraza-spoa:9000") = three things happen at config
generation time:

  1. hap_listener.tpl injects 5 lines at the end of the frontend block:
       filter spoe engine coraza config /etc/haproxy/coraza-spoe.cfg
       http-request send-spoe-group coraza coraza-check
     ...placed AFTER rate-limit and IP-block guards so we don't waste WAF
     calls on requests we were going to drop anyway.

  2. A new TCP backend (hap_coraza_spoa_backend.tpl) is appended:
       backend coraza-spoa-backend
           mode tcp
           server coraza-spoa <env-var-target> check ...

  3. The SPOE engine config (hap_coraza_spoe_engine.tpl) is rendered and
     written to /etc/haproxy/coraza-spoe.cfg, defining the spoe-agent
     "coraza" + spoe-message "coraza-check". This sets:
       - option set-on-error continue   (FAIL-OPEN if SPOA is unreachable)
       - timeout processing 100ms       (per-request inspection budget)
       - app=str(haproxy)               (matches sidecar's application name)

Verification (template render only, before staging deploy):
  - hap_listener.tpl with no env var: 55 lines, zero SPOE references
  - hap_listener.tpl with env var:    62 lines, filter + send-spoe-group present
  - Engine cfg + backend block render with correct agent_target substitution

Next: PR 3 wires this into WHP (sidecar deploy via container-manager.sh
extension, server-settings UI for on/off, AI Monitor source for the audit
log). Staging verification of PR 1 + PR 2 together happens after PR 3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 16:49:29 -07:00
4e0c22e9c9 ci: mirror golang:1.25 alongside python:3.12-slim, switch coraza-spoa FROM
All checks were successful
Build and push coraza-spoa / Build-and-Push (push) Successful in 1m16s
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 1m18s
Cloudflare's bot-management incident on 2026-05-12 took out docker.io blob
pulls twice in one day — first for python:3.12-slim (mirrored in 5a2ebf9),
then again for golang:1.25 when the PR 1 coraza-spoa build hit the same
R2-via-Cloudflare failure on the build stage's base image.

Restructure .gitea/workflows/mirror-base-image.yaml into a matrix that
iterates over a list of (src, dst_path, tag) entries. Adding a new base
image is now a one-line matrix entry. fail-fast: false so one image's
upstream being down doesn't block refreshing the others.

Switch coraza-spoa/Dockerfile's build stage FROM to the in-house golang
mirror. Runtime FROM (gcr.io/distroless/static-debian12:nonroot) stays
on upstream — distroless is on Google's registry, separate from Docker
Hub's Cloudflare R2 setup, and didn't fail during today's incident.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 16:40:42 -07:00
e4c506bcd9 PR 1/3: add coraza-spoa sidecar image
Some checks failed
Build and push coraza-spoa / Build-and-Push (push) Failing after 24s
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 55s
Self-contained sidecar that runs Coraza-SPOA v0.7.1 (latest upstream as of
2026-05-08, with OWASP CRS bundled in the binary). HAProxy will consult it
per-request via SPOE in PR 2; for now this PR ships the image only.

Defines:
- coraza-spoa/Dockerfile       — multi-stage build (golang:1.25 -> distroless),
                                 pinned to v0.7.1, ARG-overridable
- coraza-spoa/config.yaml      — single application "haproxy", JSON audit log
                                 to /var/log/coraza/audit.log, SecRuleEngine
                                 DetectionOnly globally
- coraza-spoa/overrides.conf   — day-one enforce list: scanner UAs (913xxx),
                                 RCE shell injection (932100-932160),
                                 webshell paths (933170-933200), targeted LFI
                                 (930120), Log4Shell/JNDI (944100-944300).
                                 Rationale per-range documented inline.
                                 Detect-only for XSS/SQLi/protocol (high FP
                                 on WP/WooCommerce/Divi customer mix).
- coraza-spoa/README.md        — deployment shape, audit log location, pin
                                 upgrade procedure, false-positive tuning.
- .gitea/workflows/build-push-coraza.yaml — Gitea Action triggered on
                                 coraza-spoa/** changes, publishes
                                 repo.anhonesthost.net/cloud-hosting-platform/
                                 coraza-spoa:latest. Path-scoped so it
                                 doesn't fire on every haproxy-manager push.

No changes to haproxy-manager-base itself in this PR — the existing image
stays bit-identical, used standalone in home networks and other projects
without dependency on this sidecar. PR 2 will add the OPT-IN template
plumbing that lets haproxy-manager call out to this agent when an env var
is set.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 16:28:44 -07:00
55670daf5b ci: add weekly Gitea Action to mirror python:3.12-slim into in-house registry
All checks were successful
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 1m16s
Companion to the Dockerfile change in 5a2ebf9. The previous manual refresh
note in the Dockerfile becomes automated: a workflow_dispatch + weekly cron
that pulls python:3.12-slim from docker.io and re-pushes it to
repo.anhonesthost.net/cloud-hosting-platform/python:3.12-slim.

Workflow can also be triggered manually from the Gitea UI when Python
publishes patches between cron firings. Logs the upstream and mirror digests
so it's easy to verify "did the mirror really update" after a run.

If more base images need mirroring later (haproxy itself, alpine, etc.),
this workflow should be promoted to a matrix or moved to a dedicated infra
repo — keeping it co-located with haproxy-manager-base for now since it's
the only consumer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 16:18:32 -07:00
5a2ebf991c ci: mirror python:3.12-slim into in-house registry
All checks were successful
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 1m58s
docker.io serves image blobs from Cloudflare R2. The 2026-05-12 Cloudflare
incident took out blob pulls for hours and broke this image's Gitea CI
build mid-way through the haproxy-manager gunicorn migration (commit
bdd7d2f). With the base image mirrored at repo.anhonesthost.net,
CI builds no longer depend on docker.io reachability.

Refresh procedure documented in the Dockerfile comment block. Manual
re-push monthly or when Python patches drop. A future Gitea Action could
automate the pull-tag-push so we always have a current base.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 16:08:44 -07:00
bdd7d2f098 swap werkzeug dev server for gunicorn + accept all HTTP methods on default/blocked pages
Some checks failed
HAProxy Manager Build and Push / Build-and-Push (push) Failing after 13s
Two related fixes for the issues the AI Monitor surfaced on whp01 on
2026-05-12 (haproxy-manager going "healthy but stalled" after long
uptime, and noise from POST /blocked-ip returning 405):

1. Production WSGI server. The Flask app was running on werkzeug's
   built-in dev server (the one that prints "WARNING: This is a
   development server" on every startup). werkzeug is single-threaded
   and accumulates worker state over long uptimes; after ~24h on whp01
   the health endpoint stops responding while the container still
   reports "healthy" because Docker's HEALTHCHECK uses an HTTP probe
   from inside the same werkzeug process that's stalled.

   Replace with gunicorn (gthread worker class, --max-requests=1000
   with jitter so workers recycle periodically). Two gunicorn instances,
   one per Flask app — port 8000 for the management API, port 8080 for
   the default/blocked-ip page server. Both lift their app objects from
   the haproxy_manager module so gunicorn can import them.

   Required structural change: default_app was created INSIDE the
   __name__ == '__main__' block at module bottom, where gunicorn could
   never reach it. Moved to module level. The __main__ block now stays
   only for `python haproxy_manager.py` local-dev workflow.

   Container init (init_db, certbot register, generate_config,
   start_haproxy) extracted into a do_initial_setup() function called
   from a new scripts/init.py. start-up.sh runs init.py to completion
   before either gunicorn binds, which keeps HAProxy startup off the
   WSGI workers' fork paths (no race between workers all trying to
   start_haproxy() at once).

2. /blocked-ip and / accept ALL methods. HAProxy proxies blocked-IP
   traffic to default_app preserving the original verb, so a blocked
   POST request used to hit Flask's GET-only route and get a 405 +
   the AI Monitor flagged the noise. Adding the full method list lets
   the 403 page render regardless of verb.

Gunicorn settings tunable via env (workers, timeout, max-requests).
API gets --timeout 120 because ACME cert issuance can be slow; the
default page server stays on the gunicorn default 30s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 15:24:28 -07:00
8a86beac73 feat: clear stale certbot lock files before each ACME run + at startup
All checks were successful
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 53s
certbot uses fasteners (fcntl-based locking) to serialize concurrent
invocations. The kernel auto-releases fcntl locks when the holding
process exits, but the .certbot.lock FILES persist on disk — and we've
seen real cases where subsequent runs report "Another instance of
Certbot is already running" even when no certbot process is alive.
Observed during the 2026-05-09 bundling rollout when a hung worker
held a lock across container-internal Python crashes.

When SSL is blocked on a customer site, this is high-impact: the
certbot lock can sit stale until somebody manually deletes it.

clear_stale_certbot_locks():
  - probes each known lock path with fcntl.LOCK_NB
  - if the lock is unheld → file is stale → delete it
  - if the lock IS held → leave it alone (real certbot is running)

Wired in:
  - container startup (init block)
  - /api/ssl single-domain handler
  - /api/ssl/bundle handler
  - /api/certificates/renew handler

Safe to call repeatedly; never deletes a lock a real process holds, so
can never trigger concurrent certbot runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 12:09:19 -07:00
f7ef34b988 feat(api/ssl/bundle): clean up superseded lineages after issuance
All checks were successful
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 53s
The bundle endpoint correctly issued multi-SAN certs but left old
single-SAN .pem files (e.g. <name>-0001.pem) in /etc/haproxy/certs/.
HAProxy's `bind ... ssl crt /etc/haproxy/certs` loads everything in the
directory and picked the alphabetically-first matching file — typically
the older single-SAN one — so the new bundle had no effect on what was
served. Repro on peptidesaver.net: bundle covered 4 SANs but HAProxy
kept serving peptidesaver.net-0001.pem (single SAN, April-issued).

After a successful bundle write, walk SSL_CERTS_DIR and remove any
.pem whose CN is in the new bundle's name list (excluding the bundle's
own combined file). Drop the matching certbot lineage with
`certbot delete --cert-name <X> -n` so `certbot renew` stops touching
the dead lineage too.

Returns a `cleanup` summary in the API response so callers can log /
display what was deleted.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 11:58:21 -07:00
90255cc4b3 feat(api): add /api/ssl/bundle for per-site SAN cert issuance
All checks were successful
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 1m1s
WHP's renewal orchestrator now bundles a site's domains into one cert
covering all SANs, instead of N separate single-domain orders. Single
ACME order = better behavior under Let's Encrypt's 50/hour orders limit
when many domains need attention at once.

Endpoint: POST /api/ssl/bundle
Body: {"primary": "example.com", "sans": ["www.example.com", ...]}

- Uses --cert-name <primary> so the lineage stays stable across renewals
  (no -0001/-0002 proliferation seen with the legacy single-domain flow).
- Single combined .pem at /etc/haproxy/certs/<primary>.pem; HAProxy SNI-
  matches against the cert's SAN list, so one file serves all included
  hostnames.
- Updates the domains table for every SAN in the bundle.
- Hard cap at 100 SANs (LE limit).

Existing /api/ssl single-domain endpoint kept for backwards compat.
The WHP haproxy_manager::bundleSSL() helper falls back to a per-domain
loop if /api/ssl/bundle returns 404, so the WHP side keeps working
during the rolling image upgrade window.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 11:32:15 -07:00
b731feab12 Self-heal trusted IP whitelist files at startup
All checks were successful
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 3m26s
Volume-mounted /etc/haproxy can shadow the image-baked
trusted_ips.list/trusted_ips.map, causing HAProxy to fail
config validation with "failed to open pattern file" on
non-WHP deployments. Touch empty files if they don't exist
so the ACLs always parse.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 10:02:16 -07:00
615044fa14 Fix resolvers block placement — must be outside global section
All checks were successful
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 53s
The resolvers section was inserted inside the global section, causing
HAProxy to parse global directives (pidfile, maxconn, etc.) as
resolver keywords. Moved resolvers to its own top-level section
between global and defaults where HAProxy expects it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 05:18:48 -07:00
cf4eb5092c Add DNS resolver for automatic container IP re-resolution
All checks were successful
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 54s
When Docker containers restart, they can get new IPs on the bridge
network. HAProxy caches DNS at config load time, so stale IPs cause
503s until config is regenerated.

Added a 'docker_dns' resolvers section pointing to Docker's embedded
DNS (127.0.0.11) with 10s hold time. Backend servers now use
'resolvers docker_dns init-addr last,libc,none' so HAProxy:
- Re-resolves container names every 10 seconds
- Falls back to last known IP if DNS is temporarily unavailable
- Starts even if a backend can't be resolved yet (init-addr none)

This eliminates 503s from container restarts, scaling, and recreation
without requiring a HAProxy config regeneration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 22:27:07 -07:00
ecf891ff02 Don't abort cert renewal when a single domain fails
All checks were successful
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 2m11s
The renewal script was exiting immediately when certbot returned a
non-zero exit code, which happens when ANY cert fails to renew. A
single dead domain (e.g., DNS no longer pointed here) would block
ALL other certificates from being processed and combined for HAProxy.

Now logs the failures but continues to copy/combine successfully
renewed certificates and reload HAProxy.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 15:17:15 -07:00
3da5df67d0 Update CLAUDE.md with HAProxy hardening and AI log monitor docs
All checks were successful
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 2m36s
Documents HAProxy health checks, watchdog, rate limiting, trusted IP
whitelist, timeout hardening, HTTP/2 protection, and the AI-powered
log monitor system with two-tier analysis, auto-remediation, and
notification support.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 08:16:44 -07:00
da40328438 Fix: remove comments from trusted IP files breaking HAProxy startup
All checks were successful
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 54s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 14:19:29 -07:00
13a5be636e Raise rate limits further for media-heavy sites
All checks were successful
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 53s
Generous thresholds that accommodate sites with many images/assets
while still catching obvious automated floods:
- Request rate: tarpit at 300 req/s, block at 500 req/s
- Connection rate: 500/10s
- Concurrent connections: 500
- Error rate: 100/30s

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 14:12:24 -07:00
5390ebb8a6 Raise rate limit thresholds to avoid false positives on normal traffic
All checks were successful
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 1m22s
Previous thresholds (200/500 req/10s) were too aggressive — WordPress
login pages with their CSS/JS/image assets can easily burst 30-50
requests per page load, triggering tarpits and blocks on legitimate
users.

New thresholds:
- Request rate: tarpit at 1000/10s (100 req/s), block at 2000/10s (200 req/s)
- Connection rate: 300/10s (was 150)
- Concurrent connections: 200 (was 100)
- Error rate: 50/30s (was 20)

These still catch real floods and scanners while giving normal web
traffic plenty of headroom.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 14:10:53 -07:00
53d259bd3f Add trusted IP whitelist for rate limit bypass
All checks were successful
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 1m25s
Adds trusted_ips.list and trusted_ips.map files that exempt specific
IPs from all rate limiting rules. Supports both direct source IP
matching (is_trusted_ip) and proxy-header real IP matching
(is_whitelisted). Files are baked into the image and can be updated
by editing and rebuilding.

Adds phone system IP 172.116.197.166 to the whitelist.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 13:39:41 -07:00
2ba8f87c2c Raise connection rate limit from 60 to 150 per 10s
All checks were successful
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 56s
Gives more headroom for customers with code that makes frequent
callbacks to itself, while still catching connection floods.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 12:25:53 -07:00
a3b19ce352 Add rate limiting, connection limits, and timeout hardening
All checks were successful
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 1m33s
Activate HAProxy's built-in attack prevention to stop floods that cause
the container to become unresponsive:

- Stick table tracks per-IP: conn_cur, conn_rate, http_req_rate, http_err_rate
- Rate limit rules: deny at 50 req/s, tarpit at 20 req/s, connection
  rate limit at 60/10s, concurrent connection cap at 100, error rate
  tarpit at 20 errors/30s
- Harden timeouts: http-request 300s→30s, connect 120s→10s, client
  10m→5m, keep-alive 120s→30s
- HTTP/2 Rapid Reset protection (CVE-2023-44487): stream and glitch limits
- Stats frontend on localhost:8404 for monitoring
- HEALTHCHECK now validates both port 80 (HAProxy) and 8000 (API)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 10:00:53 -07:00
94af4e47c1 Add Host header capture to frontend for connection debugging
All checks were successful
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 56s
Captures the Host header in HAProxy httplog output so high-connection
alerts can be correlated to specific domains.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 15:31:14 -08:00
124a5373d2 Fix wildcard SSL cert: find certbot -NNNN dirs and use _wildcard_ filename
All checks were successful
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 1m1s
Add find_certbot_live_dir() helper to locate the most recent certbot live
directory for a domain, handling -NNNN suffixed dirs from repeated requests.
Fix combined cert filename from *.domain.pem to _wildcard_.domain.pem.
Apply the helper across all SSL endpoints (request, renew, verify, download).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 06:38:28 -08:00
657cd28344 Fix certbot hook script paths and add logging
All checks were successful
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 3m4s
Hook scripts are at /haproxy/scripts/ inside the container (per
Dockerfile COPY), not /app/scripts/. Also added logging of certbot
stdout/stderr so failures are visible in haproxy-manager.log.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 06:18:14 -08:00
91c92dd07e Add wildcard domain support with DNS-01 ACME challenge flow
All checks were successful
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 1m17s
Support wildcard domains (*.domain.tld) in HAProxy config generation
with exact-match ACLs prioritized over wildcard ACLs. Add DNS-01
challenge endpoints that coordinate with certbot via auth/cleanup
hook scripts for wildcard SSL certificate issuance.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 13:06:08 -08:00
6cd64295d2 Add separate SSE backend for secure Server-Sent Events support
All checks were successful
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 52s
Creates two backends per domain:
1. Regular backend - Uses http-server-close for better security and
   connection management (prevents connection exhaustion attacks)
2. SSE backend - Optimized for Server-Sent Events with:
   - no option http-server-close (allows long-lived connections)
   - option http-no-delay (immediate data transmission)
   - 6-hour timeouts (supports long streaming sessions)

Frontend routing logic:
- Detects SSE via Accept: text/event-stream header or ?action=stream param
- Routes SSE traffic to SSE-optimized backend
- Routes regular HTTP traffic to standard secure backend

This approach provides full SSE support while maintaining security for
regular HTTP traffic (preventing DDoS/connection flooding attacks).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-26 13:48:24 -08:00
eadd6b798f Adding support for SSE Streaming
All checks were successful
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 1m23s
2025-12-26 13:07:29 -08:00
6902daaea1 Add automatic SSE detection and support to backend template
All checks were successful
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 1m27s
Changes:
- Detect SSE via Accept header (text/event-stream) or ?action=stream parameter
- Disable http-server-close to allow long-lived SSE connections
- Enable http-no-delay for immediate event delivery
- Set 1-hour timeouts for SSE support (also fine for normal requests)
- Force Connection: keep-alive for detected SSE requests

Benefits:
- SSE now works automatically without special backend configuration
- Fixes transcription server display disconnection issues
- Normal HTTP requests still work perfectly
- No need for separate SSE-specific backends

Fixes: Server-Sent Events timing out through HAProxy
2025-12-26 13:02:04 -08:00
1fcb25bb88 Update SQL logic to update instead of delete and re-add
All checks were successful
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 1m18s
2025-12-18 12:23:06 -08:00
bff18d358b Remove set -e and database dependency from certificate scripts
All checks were successful
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 56s
Improved certificate renewal and sync scripts to be more resilient:
- Removed 'set -e' to prevent silent failures when individual domains error
- Scripts now continue processing remaining domains even if one fails
- Replaced database queries with direct filesystem scanning of /etc/letsencrypt/live/
- Uses 'find' command to discover all domains with Let's Encrypt certificates
- More reliable as it works even if database is out of sync

Benefits:
- No silent failures - errors are logged but don't stop the entire process
- Works independently of database state
- Simpler and more straightforward
- All domains with certificates get processed regardless of database

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 08:50:24 -08:00
1d22d789b8 Simplify certificate renewal scripts and add certbot cleanup
All checks were successful
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 59s
Simplified all certificate renewal scripts to be more straightforward and reliable:
- Scripts now just run certbot renew and copy cert+key files to HAProxy format
- Removed overly complex retry logic and error handling
- Both in-container and host-side scripts work with cron scheduling

Added automatic certbot cleanup when domains are removed:
- When a domain is deleted via API, certbot certificate is also removed
- Prevents renewal errors for domains that no longer exist in HAProxy
- Cleans up both HAProxy combined cert and Let's Encrypt certificate

Script changes:
- renew-certificates.sh: Simplified to 87 lines (from 215)
- sync-certificates.sh: Simplified to 79 lines (from 200+)
- host-renew-certificates.sh: Simplified to 36 lines (from 40)
- All scripts use same pattern: query DB, copy certs, reload HAProxy

Python changes:
- remove_domain() now calls 'certbot delete' to remove certificates
- Prevents orphaned certificates from causing renewal failures

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 09:56:56 -08:00
adc20d6d0b Improve certificate renewal script with atomic file updates
All checks were successful
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 59s
- Write combined certificates to temporary file first
- Verify file is not empty before moving to final location
- Use atomic mv operation to prevent HAProxy from reading partial files
- Add proper cleanup of temporary files on all error paths
- Matches robust patterns from haproxy_manager.py

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-19 19:27:40 -08:00
71f4b9ef05 Add CIDR notation support for IP blocking
All checks were successful
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 2m1s
- Update map file format to include value (IP/CIDR 1)
- Fix HAProxy template to use map_ip() for CIDR support
- Update runtime map commands to include value
- Document CIDR range blocking in API documentation
- Support blocking entire network ranges (e.g., 192.168.1.0/24)

This allows blocking compromised ISP ranges and other large-scale attacks.
2025-11-17 12:07:32 -08:00
8d732318b4 Fix certificate renewal to properly update HAProxy combined certificate files
All checks were successful
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 1m4s
After certbot renews certificates, the separate fullchain.pem and privkey.pem
files must be combined into a single .pem file for HAProxy. The renewal script
was missing this critical step, causing HAProxy to continue using old certificates.

Changes:
- Add update_combined_certificates() function to renew-certificates.sh
- Query database for all SSL-enabled domains
- Combine Let's Encrypt cert + key files using cat (matches haproxy_manager.py pattern)
- Always update combined certs after renewal, even if certbot says no renewal needed
- Add new sync-certificates.sh script for syncing all existing certificates
- Smart update detection in sync script (only updates when source is newer)

This ensures HAProxy always gets properly formatted certificate files after renewal.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-11 20:10:58 -08:00