Default Jinja2 {% if %}{% endif %} block syntax leaves a trailing newline
even when the conditional doesn't render. Staging verification of PR 2
showed the resulting haproxy.cfg differed from the pre-PR2 version by
exactly 1 blank line — semantically identical but not byte-identical,
which violates the design promise that haproxy-manager-base's default
output stays unchanged for home/standalone deployments.
Use {%- if -%}/{%- endif %} (the whitespace-stripping variants) so the
block contributes zero bytes when coraza_spoe_backend is unset.
Verified locally: without env var = 55 lines, ends cleanly on the
is_blocked_ip rule. With env var = 62 lines, +7 for the SPOE block.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the plumbing that lets haproxy-manager talk to the coraza-spoa sidecar
added in PR 1, while keeping the default behavior bit-identical for any
deployment that doesn't set the new env var (the home network / standalone
use cases).
Single gate: HAPROXY_CORAZA_SPOE_BACKEND env var on the haproxy-manager
container. Unset (default) = generate_config() renders zero SPOE-related
output. Set (e.g. "coraza-spoa:9000") = three things happen at config
generation time:
1. hap_listener.tpl injects 5 lines at the end of the frontend block:
filter spoe engine coraza config /etc/haproxy/coraza-spoe.cfg
http-request send-spoe-group coraza coraza-check
...placed AFTER rate-limit and IP-block guards so we don't waste WAF
calls on requests we were going to drop anyway.
2. A new TCP backend (hap_coraza_spoa_backend.tpl) is appended:
backend coraza-spoa-backend
mode tcp
server coraza-spoa <env-var-target> check ...
3. The SPOE engine config (hap_coraza_spoe_engine.tpl) is rendered and
written to /etc/haproxy/coraza-spoe.cfg, defining the spoe-agent
"coraza" + spoe-message "coraza-check". This sets:
- option set-on-error continue (FAIL-OPEN if SPOA is unreachable)
- timeout processing 100ms (per-request inspection budget)
- app=str(haproxy) (matches sidecar's application name)
Verification (template render only, before staging deploy):
- hap_listener.tpl with no env var: 55 lines, zero SPOE references
- hap_listener.tpl with env var: 62 lines, filter + send-spoe-group present
- Engine cfg + backend block render with correct agent_target substitution
Next: PR 3 wires this into WHP (sidecar deploy via container-manager.sh
extension, server-settings UI for on/off, AI Monitor source for the audit
log). Staging verification of PR 1 + PR 2 together happens after PR 3.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cloudflare's bot-management incident on 2026-05-12 took out docker.io blob
pulls twice in one day — first for python:3.12-slim (mirrored in 5a2ebf9),
then again for golang:1.25 when the PR 1 coraza-spoa build hit the same
R2-via-Cloudflare failure on the build stage's base image.
Restructure .gitea/workflows/mirror-base-image.yaml into a matrix that
iterates over a list of (src, dst_path, tag) entries. Adding a new base
image is now a one-line matrix entry. fail-fast: false so one image's
upstream being down doesn't block refreshing the others.
Switch coraza-spoa/Dockerfile's build stage FROM to the in-house golang
mirror. Runtime FROM (gcr.io/distroless/static-debian12:nonroot) stays
on upstream — distroless is on Google's registry, separate from Docker
Hub's Cloudflare R2 setup, and didn't fail during today's incident.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Self-contained sidecar that runs Coraza-SPOA v0.7.1 (latest upstream as of
2026-05-08, with OWASP CRS bundled in the binary). HAProxy will consult it
per-request via SPOE in PR 2; for now this PR ships the image only.
Defines:
- coraza-spoa/Dockerfile — multi-stage build (golang:1.25 -> distroless),
pinned to v0.7.1, ARG-overridable
- coraza-spoa/config.yaml — single application "haproxy", JSON audit log
to /var/log/coraza/audit.log, SecRuleEngine
DetectionOnly globally
- coraza-spoa/overrides.conf — day-one enforce list: scanner UAs (913xxx),
RCE shell injection (932100-932160),
webshell paths (933170-933200), targeted LFI
(930120), Log4Shell/JNDI (944100-944300).
Rationale per-range documented inline.
Detect-only for XSS/SQLi/protocol (high FP
on WP/WooCommerce/Divi customer mix).
- coraza-spoa/README.md — deployment shape, audit log location, pin
upgrade procedure, false-positive tuning.
- .gitea/workflows/build-push-coraza.yaml — Gitea Action triggered on
coraza-spoa/** changes, publishes
repo.anhonesthost.net/cloud-hosting-platform/
coraza-spoa:latest. Path-scoped so it
doesn't fire on every haproxy-manager push.
No changes to haproxy-manager-base itself in this PR — the existing image
stays bit-identical, used standalone in home networks and other projects
without dependency on this sidecar. PR 2 will add the OPT-IN template
plumbing that lets haproxy-manager call out to this agent when an env var
is set.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Companion to the Dockerfile change in 5a2ebf9. The previous manual refresh
note in the Dockerfile becomes automated: a workflow_dispatch + weekly cron
that pulls python:3.12-slim from docker.io and re-pushes it to
repo.anhonesthost.net/cloud-hosting-platform/python:3.12-slim.
Workflow can also be triggered manually from the Gitea UI when Python
publishes patches between cron firings. Logs the upstream and mirror digests
so it's easy to verify "did the mirror really update" after a run.
If more base images need mirroring later (haproxy itself, alpine, etc.),
this workflow should be promoted to a matrix or moved to a dedicated infra
repo — keeping it co-located with haproxy-manager-base for now since it's
the only consumer.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
docker.io serves image blobs from Cloudflare R2. The 2026-05-12 Cloudflare
incident took out blob pulls for hours and broke this image's Gitea CI
build mid-way through the haproxy-manager gunicorn migration (commit
bdd7d2f). With the base image mirrored at repo.anhonesthost.net,
CI builds no longer depend on docker.io reachability.
Refresh procedure documented in the Dockerfile comment block. Manual
re-push monthly or when Python patches drop. A future Gitea Action could
automate the pull-tag-push so we always have a current base.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two related fixes for the issues the AI Monitor surfaced on whp01 on
2026-05-12 (haproxy-manager going "healthy but stalled" after long
uptime, and noise from POST /blocked-ip returning 405):
1. Production WSGI server. The Flask app was running on werkzeug's
built-in dev server (the one that prints "WARNING: This is a
development server" on every startup). werkzeug is single-threaded
and accumulates worker state over long uptimes; after ~24h on whp01
the health endpoint stops responding while the container still
reports "healthy" because Docker's HEALTHCHECK uses an HTTP probe
from inside the same werkzeug process that's stalled.
Replace with gunicorn (gthread worker class, --max-requests=1000
with jitter so workers recycle periodically). Two gunicorn instances,
one per Flask app — port 8000 for the management API, port 8080 for
the default/blocked-ip page server. Both lift their app objects from
the haproxy_manager module so gunicorn can import them.
Required structural change: default_app was created INSIDE the
__name__ == '__main__' block at module bottom, where gunicorn could
never reach it. Moved to module level. The __main__ block now stays
only for `python haproxy_manager.py` local-dev workflow.
Container init (init_db, certbot register, generate_config,
start_haproxy) extracted into a do_initial_setup() function called
from a new scripts/init.py. start-up.sh runs init.py to completion
before either gunicorn binds, which keeps HAProxy startup off the
WSGI workers' fork paths (no race between workers all trying to
start_haproxy() at once).
2. /blocked-ip and / accept ALL methods. HAProxy proxies blocked-IP
traffic to default_app preserving the original verb, so a blocked
POST request used to hit Flask's GET-only route and get a 405 +
the AI Monitor flagged the noise. Adding the full method list lets
the 403 page render regardless of verb.
Gunicorn settings tunable via env (workers, timeout, max-requests).
API gets --timeout 120 because ACME cert issuance can be slow; the
default page server stays on the gunicorn default 30s.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
certbot uses fasteners (fcntl-based locking) to serialize concurrent
invocations. The kernel auto-releases fcntl locks when the holding
process exits, but the .certbot.lock FILES persist on disk — and we've
seen real cases where subsequent runs report "Another instance of
Certbot is already running" even when no certbot process is alive.
Observed during the 2026-05-09 bundling rollout when a hung worker
held a lock across container-internal Python crashes.
When SSL is blocked on a customer site, this is high-impact: the
certbot lock can sit stale until somebody manually deletes it.
clear_stale_certbot_locks():
- probes each known lock path with fcntl.LOCK_NB
- if the lock is unheld → file is stale → delete it
- if the lock IS held → leave it alone (real certbot is running)
Wired in:
- container startup (init block)
- /api/ssl single-domain handler
- /api/ssl/bundle handler
- /api/certificates/renew handler
Safe to call repeatedly; never deletes a lock a real process holds, so
can never trigger concurrent certbot runs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The bundle endpoint correctly issued multi-SAN certs but left old
single-SAN .pem files (e.g. <name>-0001.pem) in /etc/haproxy/certs/.
HAProxy's `bind ... ssl crt /etc/haproxy/certs` loads everything in the
directory and picked the alphabetically-first matching file — typically
the older single-SAN one — so the new bundle had no effect on what was
served. Repro on peptidesaver.net: bundle covered 4 SANs but HAProxy
kept serving peptidesaver.net-0001.pem (single SAN, April-issued).
After a successful bundle write, walk SSL_CERTS_DIR and remove any
.pem whose CN is in the new bundle's name list (excluding the bundle's
own combined file). Drop the matching certbot lineage with
`certbot delete --cert-name <X> -n` so `certbot renew` stops touching
the dead lineage too.
Returns a `cleanup` summary in the API response so callers can log /
display what was deleted.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
WHP's renewal orchestrator now bundles a site's domains into one cert
covering all SANs, instead of N separate single-domain orders. Single
ACME order = better behavior under Let's Encrypt's 50/hour orders limit
when many domains need attention at once.
Endpoint: POST /api/ssl/bundle
Body: {"primary": "example.com", "sans": ["www.example.com", ...]}
- Uses --cert-name <primary> so the lineage stays stable across renewals
(no -0001/-0002 proliferation seen with the legacy single-domain flow).
- Single combined .pem at /etc/haproxy/certs/<primary>.pem; HAProxy SNI-
matches against the cert's SAN list, so one file serves all included
hostnames.
- Updates the domains table for every SAN in the bundle.
- Hard cap at 100 SANs (LE limit).
Existing /api/ssl single-domain endpoint kept for backwards compat.
The WHP haproxy_manager::bundleSSL() helper falls back to a per-domain
loop if /api/ssl/bundle returns 404, so the WHP side keeps working
during the rolling image upgrade window.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Volume-mounted /etc/haproxy can shadow the image-baked
trusted_ips.list/trusted_ips.map, causing HAProxy to fail
config validation with "failed to open pattern file" on
non-WHP deployments. Touch empty files if they don't exist
so the ACLs always parse.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The resolvers section was inserted inside the global section, causing
HAProxy to parse global directives (pidfile, maxconn, etc.) as
resolver keywords. Moved resolvers to its own top-level section
between global and defaults where HAProxy expects it.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When Docker containers restart, they can get new IPs on the bridge
network. HAProxy caches DNS at config load time, so stale IPs cause
503s until config is regenerated.
Added a 'docker_dns' resolvers section pointing to Docker's embedded
DNS (127.0.0.11) with 10s hold time. Backend servers now use
'resolvers docker_dns init-addr last,libc,none' so HAProxy:
- Re-resolves container names every 10 seconds
- Falls back to last known IP if DNS is temporarily unavailable
- Starts even if a backend can't be resolved yet (init-addr none)
This eliminates 503s from container restarts, scaling, and recreation
without requiring a HAProxy config regeneration.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The renewal script was exiting immediately when certbot returned a
non-zero exit code, which happens when ANY cert fails to renew. A
single dead domain (e.g., DNS no longer pointed here) would block
ALL other certificates from being processed and combined for HAProxy.
Now logs the failures but continues to copy/combine successfully
renewed certificates and reload HAProxy.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Documents HAProxy health checks, watchdog, rate limiting, trusted IP
whitelist, timeout hardening, HTTP/2 protection, and the AI-powered
log monitor system with two-tier analysis, auto-remediation, and
notification support.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Generous thresholds that accommodate sites with many images/assets
while still catching obvious automated floods:
- Request rate: tarpit at 300 req/s, block at 500 req/s
- Connection rate: 500/10s
- Concurrent connections: 500
- Error rate: 100/30s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previous thresholds (200/500 req/10s) were too aggressive — WordPress
login pages with their CSS/JS/image assets can easily burst 30-50
requests per page load, triggering tarpits and blocks on legitimate
users.
New thresholds:
- Request rate: tarpit at 1000/10s (100 req/s), block at 2000/10s (200 req/s)
- Connection rate: 300/10s (was 150)
- Concurrent connections: 200 (was 100)
- Error rate: 50/30s (was 20)
These still catch real floods and scanners while giving normal web
traffic plenty of headroom.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds trusted_ips.list and trusted_ips.map files that exempt specific
IPs from all rate limiting rules. Supports both direct source IP
matching (is_trusted_ip) and proxy-header real IP matching
(is_whitelisted). Files are baked into the image and can be updated
by editing and rebuilding.
Adds phone system IP 172.116.197.166 to the whitelist.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Gives more headroom for customers with code that makes frequent
callbacks to itself, while still catching connection floods.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Activate HAProxy's built-in attack prevention to stop floods that cause
the container to become unresponsive:
- Stick table tracks per-IP: conn_cur, conn_rate, http_req_rate, http_err_rate
- Rate limit rules: deny at 50 req/s, tarpit at 20 req/s, connection
rate limit at 60/10s, concurrent connection cap at 100, error rate
tarpit at 20 errors/30s
- Harden timeouts: http-request 300s→30s, connect 120s→10s, client
10m→5m, keep-alive 120s→30s
- HTTP/2 Rapid Reset protection (CVE-2023-44487): stream and glitch limits
- Stats frontend on localhost:8404 for monitoring
- HEALTHCHECK now validates both port 80 (HAProxy) and 8000 (API)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Captures the Host header in HAProxy httplog output so high-connection
alerts can be correlated to specific domains.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add find_certbot_live_dir() helper to locate the most recent certbot live
directory for a domain, handling -NNNN suffixed dirs from repeated requests.
Fix combined cert filename from *.domain.pem to _wildcard_.domain.pem.
Apply the helper across all SSL endpoints (request, renew, verify, download).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Hook scripts are at /haproxy/scripts/ inside the container (per
Dockerfile COPY), not /app/scripts/. Also added logging of certbot
stdout/stderr so failures are visible in haproxy-manager.log.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Support wildcard domains (*.domain.tld) in HAProxy config generation
with exact-match ACLs prioritized over wildcard ACLs. Add DNS-01
challenge endpoints that coordinate with certbot via auth/cleanup
hook scripts for wildcard SSL certificate issuance.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Changes:
- Detect SSE via Accept header (text/event-stream) or ?action=stream parameter
- Disable http-server-close to allow long-lived SSE connections
- Enable http-no-delay for immediate event delivery
- Set 1-hour timeouts for SSE support (also fine for normal requests)
- Force Connection: keep-alive for detected SSE requests
Benefits:
- SSE now works automatically without special backend configuration
- Fixes transcription server display disconnection issues
- Normal HTTP requests still work perfectly
- No need for separate SSE-specific backends
Fixes: Server-Sent Events timing out through HAProxy
Improved certificate renewal and sync scripts to be more resilient:
- Removed 'set -e' to prevent silent failures when individual domains error
- Scripts now continue processing remaining domains even if one fails
- Replaced database queries with direct filesystem scanning of /etc/letsencrypt/live/
- Uses 'find' command to discover all domains with Let's Encrypt certificates
- More reliable as it works even if database is out of sync
Benefits:
- No silent failures - errors are logged but don't stop the entire process
- Works independently of database state
- Simpler and more straightforward
- All domains with certificates get processed regardless of database
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Simplified all certificate renewal scripts to be more straightforward and reliable:
- Scripts now just run certbot renew and copy cert+key files to HAProxy format
- Removed overly complex retry logic and error handling
- Both in-container and host-side scripts work with cron scheduling
Added automatic certbot cleanup when domains are removed:
- When a domain is deleted via API, certbot certificate is also removed
- Prevents renewal errors for domains that no longer exist in HAProxy
- Cleans up both HAProxy combined cert and Let's Encrypt certificate
Script changes:
- renew-certificates.sh: Simplified to 87 lines (from 215)
- sync-certificates.sh: Simplified to 79 lines (from 200+)
- host-renew-certificates.sh: Simplified to 36 lines (from 40)
- All scripts use same pattern: query DB, copy certs, reload HAProxy
Python changes:
- remove_domain() now calls 'certbot delete' to remove certificates
- Prevents orphaned certificates from causing renewal failures
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Write combined certificates to temporary file first
- Verify file is not empty before moving to final location
- Use atomic mv operation to prevent HAProxy from reading partial files
- Add proper cleanup of temporary files on all error paths
- Matches robust patterns from haproxy_manager.py
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Update map file format to include value (IP/CIDR 1)
- Fix HAProxy template to use map_ip() for CIDR support
- Update runtime map commands to include value
- Document CIDR range blocking in API documentation
- Support blocking entire network ranges (e.g., 192.168.1.0/24)
This allows blocking compromised ISP ranges and other large-scale attacks.
After certbot renews certificates, the separate fullchain.pem and privkey.pem
files must be combined into a single .pem file for HAProxy. The renewal script
was missing this critical step, causing HAProxy to continue using old certificates.
Changes:
- Add update_combined_certificates() function to renew-certificates.sh
- Query database for all SSL-enabled domains
- Combine Let's Encrypt cert + key files using cat (matches haproxy_manager.py pattern)
- Always update combined certs after renewal, even if certbot says no renewal needed
- Add new sync-certificates.sh script for syncing all existing certificates
- Smart update detection in sync script (only updates when source is newer)
This ensures HAProxy always gets properly formatted certificate files after renewal.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This commit simplifies the HAProxy configuration by removing automatic
threat detection and blocking rules while preserving essential functionality.
Changes:
- Removed all automatic ACL-based security rules (SQL injection detection,
scanner detection, rate limiting, brute force protection, etc.)
- Removed complex stick-table tracking with 15 GPC counters
- Removed graduated threat response system (tarpit, deny based on threat scores)
- Removed HTTP/2 security tuning parameters specific to threat detection
- Commented out IP header forwarding in hap_backend_basic.tpl
Preserved functionality:
- Real client IP detection from proxy headers (CF-Connecting-IP, X-Real-IP,
X-Forwarded-For) with proper fallback to source IP
- Manual IP blocking via map file (/etc/haproxy/blocked_ips.map)
- Runtime map updates for immediate blocking without reload
- Backend IP forwarding capabilities (available in hap_backend.tpl)
The configuration now focuses on manual IP blocking only, which can be
managed through the API endpoints (/api/blocked-ips).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Fixed crontab permissions (600) and ownership for proper cron execution
- Added PATH environment variable to crontab to prevent command not found issues
- Created dedicated renewal script with comprehensive logging and error handling
- Added retry logic (3 attempts) for HAProxy reload with socket health checks
- Implemented host-side renewal script for external cron scheduling via docker exec
- Added crontab configuration examples for various renewal schedules
- Updated README with detailed certificate renewal documentation
This resolves issues where the cron job would not run or hang during execution.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Remove semicolons from variable initialization in AWK scripts
- Each variable now on separate line to prevent syntax errors
- Fixes "syntax error at or near ," in monitor-attacks.sh and manage-blocked-ips.sh
- Scripts now properly parse HAProxy 3.0.11 threat intelligence data
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Remove reference to non-existent security_blacklist table
- Use single table tracking with consolidated array-based GPC system
- Remove res.hdr(X-Threat-Level) from log-format as response headers not available in request phase
- Maintains threat intelligence logging with available request-phase data
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Replace compound ACL xmlrpc_abuse with separate conditions
- Use xmlrpc_rate_abuse for rate detection and combine with is_xmlrpc in http-request rule
- Prevents ACL-to-ACL reference which is not supported in HAProxy 3.0.11
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add -m int matcher for all var(txn.threat_score) comparisons
- Fix set-header, tarpit, deny, and set-log-level conditions
- Ensures proper variable type matching for HAProxy 3.0.11
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Fix tune.h2.fe-max-total-streams parameter name in global config
- Fix stick-table multiline syntax by removing line continuations
- Replace sc0_get_gpc with sc_get_gpc for proper 3.0.11 syntax
- Replace sc-set-gpc with sc-set-gpt for value assignments
- Update ACL definitions to use correct GPT fetch methods
- Simplify threat scoring to avoid unsupported add-var operations
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
Escape inner quotes in the certbot renewal cron job to properly
send reload command to HAProxy via socat after certificate renewal.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>