Files
cloud-apache-container/scripts
jknapp 50202538e4
All checks were successful
Cloud Apache Container / Build-and-Push (74) (push) Successful in 1m24s
Cloud Apache Container / Build-and-Push (80) (push) Successful in 1m22s
Cloud Apache Container / Build-and-Push (81) (push) Successful in 1m17s
Cloud Apache Container / Build-and-Push (82) (push) Successful in 1m25s
Cloud Apache Container / Build-and-Push (83) (push) Successful in 1m21s
Cloud Apache Container / Build-and-Push (84) (push) Successful in 1m15s
Cloud Apache Container / Build-and-Push (85) (push) Successful in 1m18s
Cloud Apache Container / Build-FPM-Images (74) (push) Successful in 1m23s
Cloud Apache Container / Build-FPM-Images (80) (push) Successful in 1m17s
Cloud Apache Container / Build-FPM-Images (81) (push) Successful in 1m16s
Cloud Apache Container / Build-FPM-Images (82) (push) Successful in 1m15s
Cloud Apache Container / Build-FPM-Images (83) (push) Successful in 1m33s
Cloud Apache Container / Build-FPM-Images (84) (push) Successful in 1m19s
Cloud Apache Container / Build-FPM-Images (85) (push) Successful in 1m24s
Cloud Apache Container / Build-LiteSpeed-Images (81) (push) Successful in 30s
Cloud Apache Container / Build-LiteSpeed-Images (82) (push) Successful in 31s
Cloud Apache Container / Build-LiteSpeed-Images (83) (push) Successful in 29s
Cloud Apache Container / Build-LiteSpeed-Images (84) (push) Successful in 31s
Cloud Apache Container / Build-LiteSpeed-Images (85) (push) Successful in 32s
Cloud Apache Container / Build-Shared-httpd (push) Successful in 28s
cac-litespeed: supervise OLS in daemon mode so self-restarts don't kill PID 1
cac-litespeed containers were dying at random intervals and staying 503 until
manually restarted. Root-caused on whp02 (alsacorp, 2026-06-06): the LiteSpeed
Cache / QUIC.cloud integration refreshes the QUIC.cloud IP allowlist on a
schedule and, when it changes, sends SIGUSR1 → "request a graceful server
restart". The entrypoint ran `openlitespeed -n & wait "$OLS_PID"`, so when the
OLD main PID exited after the zero-downtime handoff, `wait` returned, PID 1
(bash) exited, and the whole container went down. The exit was clean (code 0),
so even a restart policy wouldn't reliably catch it — HAProxy just served 503
until someone ran `docker start`.

Replace the `-n` foreground+wait model with a daemon-mode supervisor: start OLS
via `lswsctrl start` (its native model, where it owns the SIGUSR1 handoff and
keeps listeners bound across generations) and have PID 1 follow `lswsctrl
status`. A graceful self-restart is now invisible here (verified zero-downtime);
PID 1 only relaunches on a genuine crash (no live main), with a 5-in-60s
crash-loop cap that bails out to Docker's restart policy / the site monitor.
SIGTERM still drains and exits cleanly for docker stop / recreate.

Verified on a scratch php85 container: survives `lswsctrl restart`, survives a
raw SIGUSR1 to the main (the exact QUIC.cloud path that used to kill it),
relaunches after `kill -9` of the main, and stops cleanly in ~6s on docker stop.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 19:15:25 -07:00
..
2026-02-08 07:57:04 -08:00