fix(supervisor): restart haproxy in-place if it dies while container lives
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 1m11s
HAProxy Manager Build and Push / Build-and-Push (push) Successful in 1m11s
haproxy runs as a background child of PID 1 (gunicorn) with nothing watching it after init. If the haproxy master dies mid-life (observed 2026-07-01 on whp01: SIGABRT -> exit 134, reaped by gunicorn and logged as "Worker (pid:22) exited"), the container stays "up", Docker's --restart never fires, and haproxy is down until the external host watchdog full-restarts the whole container minutes later (dropping every connection). Add an in-container supervisor loop in start-up.sh (Phase 1.5) that runs scripts/ensure_haproxy.py every HAPROXY_SUPERVISOR_INTERVAL (default 15s). ensure_haproxy.py calls the existing, idempotent start_haproxy() only when haproxy isn't running (psutil guard), reviving it in place within one interval with no container restart. Same entrypoint-supervision pattern shipped for cac-litespeed. Validated locally: killing haproxy -> revived with new PIDs in ~one interval, container stayed healthy, no spurious restarts while healthy. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -27,6 +27,22 @@ cron &
|
||||
# Phase 1: container init
|
||||
python /haproxy/scripts/init.py
|
||||
|
||||
# Phase 1.5: in-container haproxy supervisor.
|
||||
# haproxy runs as a background child of PID 1 (gunicorn) with NOTHING watching
|
||||
# it after init. If the haproxy master dies mid-life (e.g. SIGABRT -> exit 134,
|
||||
# segfault), the container stays "up" (gunicorn is PID 1), Docker's --restart
|
||||
# policy never fires, and haproxy is down until the external host watchdog
|
||||
# full-restarts the whole container minutes later (dropping every connection).
|
||||
# This loop revives haproxy in place within one interval. ensure_haproxy.py is
|
||||
# idempotent — a cheap no-op whenever haproxy is already running.
|
||||
HAPROXY_SUPERVISOR_INTERVAL="${HAPROXY_SUPERVISOR_INTERVAL:-15}"
|
||||
(
|
||||
while true; do
|
||||
sleep "${HAPROXY_SUPERVISOR_INTERVAL}"
|
||||
python /haproxy/scripts/ensure_haproxy.py 2>&1 || true
|
||||
done
|
||||
) &
|
||||
|
||||
# Phase 2: WSGI servers
|
||||
# Tunable via env: HAPROXY_MGR_API_WORKERS (default 1), HAPROXY_MGR_API_TIMEOUT
|
||||
# (default 120 — API can do slow ACME calls), HAPROXY_MGR_MAX_REQUESTS (default
|
||||
|
||||
Reference in New Issue
Block a user