Adds the plumbing that lets haproxy-manager talk to the coraza-spoa sidecar
added in PR 1, while keeping the default behavior bit-identical for any
deployment that doesn't set the new env var (the home network / standalone
use cases).
Single gate: HAPROXY_CORAZA_SPOE_BACKEND env var on the haproxy-manager
container. Unset (default) = generate_config() renders zero SPOE-related
output. Set (e.g. "coraza-spoa:9000") = three things happen at config
generation time:
1. hap_listener.tpl injects 5 lines at the end of the frontend block:
filter spoe engine coraza config /etc/haproxy/coraza-spoe.cfg
http-request send-spoe-group coraza coraza-check
...placed AFTER rate-limit and IP-block guards so we don't waste WAF
calls on requests we were going to drop anyway.
2. A new TCP backend (hap_coraza_spoa_backend.tpl) is appended:
backend coraza-spoa-backend
mode tcp
server coraza-spoa <env-var-target> check ...
3. The SPOE engine config (hap_coraza_spoe_engine.tpl) is rendered and
written to /etc/haproxy/coraza-spoe.cfg, defining the spoe-agent
"coraza" + spoe-message "coraza-check". This sets:
- option set-on-error continue (FAIL-OPEN if SPOA is unreachable)
- timeout processing 100ms (per-request inspection budget)
- app=str(haproxy) (matches sidecar's application name)
Verification (template render only, before staging deploy):
- hap_listener.tpl with no env var: 55 lines, zero SPOE references
- hap_listener.tpl with env var: 62 lines, filter + send-spoe-group present
- Engine cfg + backend block render with correct agent_target substitution
Next: PR 3 wires this into WHP (sidecar deploy via container-manager.sh
extension, server-settings UI for on/off, AI Monitor source for the audit
log). Staging verification of PR 1 + PR 2 together happens after PR 3.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two related fixes for the issues the AI Monitor surfaced on whp01 on
2026-05-12 (haproxy-manager going "healthy but stalled" after long
uptime, and noise from POST /blocked-ip returning 405):
1. Production WSGI server. The Flask app was running on werkzeug's
built-in dev server (the one that prints "WARNING: This is a
development server" on every startup). werkzeug is single-threaded
and accumulates worker state over long uptimes; after ~24h on whp01
the health endpoint stops responding while the container still
reports "healthy" because Docker's HEALTHCHECK uses an HTTP probe
from inside the same werkzeug process that's stalled.
Replace with gunicorn (gthread worker class, --max-requests=1000
with jitter so workers recycle periodically). Two gunicorn instances,
one per Flask app — port 8000 for the management API, port 8080 for
the default/blocked-ip page server. Both lift their app objects from
the haproxy_manager module so gunicorn can import them.
Required structural change: default_app was created INSIDE the
__name__ == '__main__' block at module bottom, where gunicorn could
never reach it. Moved to module level. The __main__ block now stays
only for `python haproxy_manager.py` local-dev workflow.
Container init (init_db, certbot register, generate_config,
start_haproxy) extracted into a do_initial_setup() function called
from a new scripts/init.py. start-up.sh runs init.py to completion
before either gunicorn binds, which keeps HAProxy startup off the
WSGI workers' fork paths (no race between workers all trying to
start_haproxy() at once).
2. /blocked-ip and / accept ALL methods. HAProxy proxies blocked-IP
traffic to default_app preserving the original verb, so a blocked
POST request used to hit Flask's GET-only route and get a 405 +
the AI Monitor flagged the noise. Adding the full method list lets
the 403 page render regardless of verb.
Gunicorn settings tunable via env (workers, timeout, max-requests).
API gets --timeout 120 because ACME cert issuance can be slow; the
default page server stays on the gunicorn default 30s.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
certbot uses fasteners (fcntl-based locking) to serialize concurrent
invocations. The kernel auto-releases fcntl locks when the holding
process exits, but the .certbot.lock FILES persist on disk — and we've
seen real cases where subsequent runs report "Another instance of
Certbot is already running" even when no certbot process is alive.
Observed during the 2026-05-09 bundling rollout when a hung worker
held a lock across container-internal Python crashes.
When SSL is blocked on a customer site, this is high-impact: the
certbot lock can sit stale until somebody manually deletes it.
clear_stale_certbot_locks():
- probes each known lock path with fcntl.LOCK_NB
- if the lock is unheld → file is stale → delete it
- if the lock IS held → leave it alone (real certbot is running)
Wired in:
- container startup (init block)
- /api/ssl single-domain handler
- /api/ssl/bundle handler
- /api/certificates/renew handler
Safe to call repeatedly; never deletes a lock a real process holds, so
can never trigger concurrent certbot runs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The bundle endpoint correctly issued multi-SAN certs but left old
single-SAN .pem files (e.g. <name>-0001.pem) in /etc/haproxy/certs/.
HAProxy's `bind ... ssl crt /etc/haproxy/certs` loads everything in the
directory and picked the alphabetically-first matching file — typically
the older single-SAN one — so the new bundle had no effect on what was
served. Repro on peptidesaver.net: bundle covered 4 SANs but HAProxy
kept serving peptidesaver.net-0001.pem (single SAN, April-issued).
After a successful bundle write, walk SSL_CERTS_DIR and remove any
.pem whose CN is in the new bundle's name list (excluding the bundle's
own combined file). Drop the matching certbot lineage with
`certbot delete --cert-name <X> -n` so `certbot renew` stops touching
the dead lineage too.
Returns a `cleanup` summary in the API response so callers can log /
display what was deleted.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
WHP's renewal orchestrator now bundles a site's domains into one cert
covering all SANs, instead of N separate single-domain orders. Single
ACME order = better behavior under Let's Encrypt's 50/hour orders limit
when many domains need attention at once.
Endpoint: POST /api/ssl/bundle
Body: {"primary": "example.com", "sans": ["www.example.com", ...]}
- Uses --cert-name <primary> so the lineage stays stable across renewals
(no -0001/-0002 proliferation seen with the legacy single-domain flow).
- Single combined .pem at /etc/haproxy/certs/<primary>.pem; HAProxy SNI-
matches against the cert's SAN list, so one file serves all included
hostnames.
- Updates the domains table for every SAN in the bundle.
- Hard cap at 100 SANs (LE limit).
Existing /api/ssl single-domain endpoint kept for backwards compat.
The WHP haproxy_manager::bundleSSL() helper falls back to a per-domain
loop if /api/ssl/bundle returns 404, so the WHP side keeps working
during the rolling image upgrade window.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add find_certbot_live_dir() helper to locate the most recent certbot live
directory for a domain, handling -NNNN suffixed dirs from repeated requests.
Fix combined cert filename from *.domain.pem to _wildcard_.domain.pem.
Apply the helper across all SSL endpoints (request, renew, verify, download).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Hook scripts are at /haproxy/scripts/ inside the container (per
Dockerfile COPY), not /app/scripts/. Also added logging of certbot
stdout/stderr so failures are visible in haproxy-manager.log.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Support wildcard domains (*.domain.tld) in HAProxy config generation
with exact-match ACLs prioritized over wildcard ACLs. Add DNS-01
challenge endpoints that coordinate with certbot via auth/cleanup
hook scripts for wildcard SSL certificate issuance.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Simplified all certificate renewal scripts to be more straightforward and reliable:
- Scripts now just run certbot renew and copy cert+key files to HAProxy format
- Removed overly complex retry logic and error handling
- Both in-container and host-side scripts work with cron scheduling
Added automatic certbot cleanup when domains are removed:
- When a domain is deleted via API, certbot certificate is also removed
- Prevents renewal errors for domains that no longer exist in HAProxy
- Cleans up both HAProxy combined cert and Let's Encrypt certificate
Script changes:
- renew-certificates.sh: Simplified to 87 lines (from 215)
- sync-certificates.sh: Simplified to 79 lines (from 200+)
- host-renew-certificates.sh: Simplified to 36 lines (from 40)
- All scripts use same pattern: query DB, copy certs, reload HAProxy
Python changes:
- remove_domain() now calls 'certbot delete' to remove certificates
- Prevents orphaned certificates from causing renewal failures
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Update map file format to include value (IP/CIDR 1)
- Fix HAProxy template to use map_ip() for CIDR support
- Update runtime map commands to include value
- Document CIDR range blocking in API documentation
- Support blocking entire network ranges (e.g., 192.168.1.0/24)
This allows blocking compromised ISP ranges and other large-scale attacks.
- Modified /blocked-ip route to return 403 Forbidden status with HTML page
- Added HAProxy reload after adding blocked IP to ensure consistency
- Added HAProxy reload after removing blocked IP to ensure consistency
- Includes error handling for reload failures without breaking the operation
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add blocked_ips database table to store blocked IP addresses
- Implement API endpoints for IP blocking management:
- GET /api/blocked-ips: List all blocked IPs
- POST /api/blocked-ips: Block an IP address
- DELETE /api/blocked-ips: Unblock an IP address
- Update HAProxy configuration generation to include blocked IP ACLs
- Create blocked IP page template for denied access
- Add comprehensive API documentation for WHP integration
- Include test script for IP blocking functionality
- Update .gitignore with Python patterns
- Add CLAUDE.md for codebase documentation
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add configuration regeneration before HAProxy startup
- Add configuration validation before starting HAProxy
- Add automatic configuration regeneration if invalid config detected
- Prevent container crashes when HAProxy fails to start
- Allow container to continue running even if HAProxy is not available
- Add better error handling and logging for startup issues
- Replace http-response set-body (HAProxy 2.8+) with local server approach
- Add separate Flask server on port 8080 to serve default page
- Update default backend template to use local server instead of inline HTML
- Maintain all customization features via environment variables
- Fix JavaScript error handling for domains API response