7 Commits

Author SHA1 Message Date
50202538e4 cac-litespeed: supervise OLS in daemon mode so self-restarts don't kill PID 1
All checks were successful
Cloud Apache Container / Build-and-Push (74) (push) Successful in 1m24s
Cloud Apache Container / Build-and-Push (80) (push) Successful in 1m22s
Cloud Apache Container / Build-and-Push (81) (push) Successful in 1m17s
Cloud Apache Container / Build-and-Push (82) (push) Successful in 1m25s
Cloud Apache Container / Build-and-Push (83) (push) Successful in 1m21s
Cloud Apache Container / Build-and-Push (84) (push) Successful in 1m15s
Cloud Apache Container / Build-and-Push (85) (push) Successful in 1m18s
Cloud Apache Container / Build-FPM-Images (74) (push) Successful in 1m23s
Cloud Apache Container / Build-FPM-Images (80) (push) Successful in 1m17s
Cloud Apache Container / Build-FPM-Images (81) (push) Successful in 1m16s
Cloud Apache Container / Build-FPM-Images (82) (push) Successful in 1m15s
Cloud Apache Container / Build-FPM-Images (83) (push) Successful in 1m33s
Cloud Apache Container / Build-FPM-Images (84) (push) Successful in 1m19s
Cloud Apache Container / Build-FPM-Images (85) (push) Successful in 1m24s
Cloud Apache Container / Build-LiteSpeed-Images (81) (push) Successful in 30s
Cloud Apache Container / Build-LiteSpeed-Images (82) (push) Successful in 31s
Cloud Apache Container / Build-LiteSpeed-Images (83) (push) Successful in 29s
Cloud Apache Container / Build-LiteSpeed-Images (84) (push) Successful in 31s
Cloud Apache Container / Build-LiteSpeed-Images (85) (push) Successful in 32s
Cloud Apache Container / Build-Shared-httpd (push) Successful in 28s
cac-litespeed containers were dying at random intervals and staying 503 until
manually restarted. Root-caused on whp02 (alsacorp, 2026-06-06): the LiteSpeed
Cache / QUIC.cloud integration refreshes the QUIC.cloud IP allowlist on a
schedule and, when it changes, sends SIGUSR1 → "request a graceful server
restart". The entrypoint ran `openlitespeed -n & wait "$OLS_PID"`, so when the
OLD main PID exited after the zero-downtime handoff, `wait` returned, PID 1
(bash) exited, and the whole container went down. The exit was clean (code 0),
so even a restart policy wouldn't reliably catch it — HAProxy just served 503
until someone ran `docker start`.

Replace the `-n` foreground+wait model with a daemon-mode supervisor: start OLS
via `lswsctrl start` (its native model, where it owns the SIGUSR1 handoff and
keeps listeners bound across generations) and have PID 1 follow `lswsctrl
status`. A graceful self-restart is now invisible here (verified zero-downtime);
PID 1 only relaunches on a genuine crash (no live main), with a 5-in-60s
crash-loop cap that bails out to Docker's restart policy / the site monitor.
SIGTERM still drains and exits cleanly for docker stop / recreate.

Verified on a scratch php85 container: survives `lswsctrl restart`, survives a
raw SIGUSR1 to the main (the exact QUIC.cloud path that used to kill it),
relaunches after `kill -9` of the main, and stops cleanly in ~6s on docker stop.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 19:15:25 -07:00
cfdaae116a tune(litespeed): bump opcache 32→64 MB / 4000→8000 files + add per-site override
All checks were successful
Cloud Apache Container / Build-and-Push (74) (push) Successful in 1m37s
Cloud Apache Container / Build-and-Push (80) (push) Successful in 1m42s
Cloud Apache Container / Build-and-Push (81) (push) Successful in 1m50s
Cloud Apache Container / Build-and-Push (82) (push) Successful in 1m51s
Cloud Apache Container / Build-and-Push (83) (push) Successful in 3m18s
Cloud Apache Container / Build-and-Push (84) (push) Successful in 2m21s
Cloud Apache Container / Build-and-Push (85) (push) Successful in 3m49s
Cloud Apache Container / Build-FPM-Images (74) (push) Successful in 2m0s
Cloud Apache Container / Build-FPM-Images (80) (push) Successful in 1m44s
Cloud Apache Container / Build-FPM-Images (81) (push) Successful in 1m30s
Cloud Apache Container / Build-FPM-Images (82) (push) Successful in 1m48s
Cloud Apache Container / Build-FPM-Images (83) (push) Successful in 1m40s
Cloud Apache Container / Build-FPM-Images (84) (push) Successful in 1m58s
Cloud Apache Container / Build-FPM-Images (85) (push) Successful in 2m15s
Cloud Apache Container / Build-LiteSpeed-Images (81) (push) Successful in 29s
Cloud Apache Container / Build-LiteSpeed-Images (82) (push) Successful in 29s
Cloud Apache Container / Build-LiteSpeed-Images (83) (push) Successful in 29s
Cloud Apache Container / Build-LiteSpeed-Images (84) (push) Successful in 29s
Cloud Apache Container / Build-LiteSpeed-Images (85) (push) Successful in 29s
Cloud Apache Container / Build-Shared-httpd (push) Successful in 26s
32M/4000 was too aggressive for heavy WP+Divi+WC sites: 3000+4000 unique
PHP files each blow through max_accelerated_files, causing constant
eviction + recompilation thrash. Manifested 2026-06-03 as ~40% sustained
CPU on alphaoneaminos and 5378 oom_kills/9h on brain-jar.

64M/8000 fits Divi + WC + WP core bytecode without eviction. N lsphp ×
64 MB ≈ 512 MiB shmem worst case — still under the per-instance setUIDMode
fan-out from the original 128M problem (which was 1+ GiB).

Per-site override (OPCACHE_MEMORY_MB / OPCACHE_MAX_FILES env vars) lets the
panel push down for low-traffic sites or up for outliers without rebuilding
the image. WHP panel UI ships in a follow-up commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-03 06:21:37 -07:00
87f154cdc8 refactor(litespeed): drop setUIDMode for shared lsphp + cut opcache 128→32M
All checks were successful
Cloud Apache Container / Build-and-Push (74) (push) Successful in 1m19s
Cloud Apache Container / Build-and-Push (80) (push) Successful in 2m35s
Cloud Apache Container / Build-and-Push (81) (push) Successful in 1m16s
Cloud Apache Container / Build-and-Push (82) (push) Successful in 1m29s
Cloud Apache Container / Build-and-Push (83) (push) Successful in 2m2s
Cloud Apache Container / Build-and-Push (84) (push) Successful in 2m15s
Cloud Apache Container / Build-and-Push (85) (push) Successful in 2m22s
Cloud Apache Container / Build-FPM-Images (74) (push) Successful in 2m30s
Cloud Apache Container / Build-FPM-Images (80) (push) Successful in 1m14s
Cloud Apache Container / Build-FPM-Images (81) (push) Successful in 2m6s
Cloud Apache Container / Build-FPM-Images (82) (push) Successful in 2m20s
Cloud Apache Container / Build-FPM-Images (83) (push) Successful in 3m20s
Cloud Apache Container / Build-FPM-Images (84) (push) Successful in 2m19s
Cloud Apache Container / Build-FPM-Images (85) (push) Successful in 2m41s
Cloud Apache Container / Build-LiteSpeed-Images (81) (push) Successful in 43s
Cloud Apache Container / Build-LiteSpeed-Images (82) (push) Successful in 1m16s
Cloud Apache Container / Build-LiteSpeed-Images (83) (push) Successful in 29s
Cloud Apache Container / Build-LiteSpeed-Images (84) (push) Successful in 56s
Cloud Apache Container / Build-LiteSpeed-Images (85) (push) Successful in 2m2s
Cloud Apache Container / Build-Shared-httpd (push) Successful in 51s
OLS runs as the customer user end-to-end (server-level user/group set by
create-vhost-litespeed.sh), so lsphp inherits that uid without per-request
suEXEC. Eliminates the per-httpd-worker lsphp instance fan-out — one shared
lsphp parent now serves all httpd workers via the shared socket.

Combined with opcache.memory_consumption 128→32M, brain-jar measured shmem
dropped from ~880 MiB → 32 MiB and memory.current from ~1.1 GiB → 67 MiB
at the 1.5 GiB cap. No new oom_kills since the change.

Safe because cac-litespeed is one-customer-per-container — the container
boundary is the privsep boundary.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-02 20:06:56 -07:00
03cca745f7 feat(litespeed): wire up dynamic LSAPI tuning + idle reduction
All checks were successful
Cloud Apache Container / Build-and-Push (74) (push) Successful in 1m18s
Cloud Apache Container / Build-and-Push (80) (push) Successful in 2m14s
Cloud Apache Container / Build-and-Push (81) (push) Successful in 3m21s
Cloud Apache Container / Build-and-Push (82) (push) Successful in 2m18s
Cloud Apache Container / Build-and-Push (83) (push) Successful in 2m15s
Cloud Apache Container / Build-and-Push (84) (push) Successful in 2m11s
Cloud Apache Container / Build-and-Push (85) (push) Successful in 2m22s
Cloud Apache Container / Build-FPM-Images (74) (push) Successful in 4m22s
Cloud Apache Container / Build-FPM-Images (80) (push) Successful in 3m46s
Cloud Apache Container / Build-FPM-Images (81) (push) Successful in 1m17s
Cloud Apache Container / Build-FPM-Images (82) (push) Successful in 1m21s
Cloud Apache Container / Build-FPM-Images (83) (push) Successful in 2m15s
Cloud Apache Container / Build-FPM-Images (84) (push) Successful in 2m21s
Cloud Apache Container / Build-FPM-Images (85) (push) Successful in 3m29s
Cloud Apache Container / Build-LiteSpeed-Images (81) (push) Successful in 31s
Cloud Apache Container / Build-LiteSpeed-Images (82) (push) Successful in 31s
Cloud Apache Container / Build-LiteSpeed-Images (83) (push) Successful in 30s
Cloud Apache Container / Build-LiteSpeed-Images (84) (push) Successful in 32s
Cloud Apache Container / Build-LiteSpeed-Images (85) (push) Successful in 31s
Cloud Apache Container / Build-Shared-httpd (push) Successful in 1m33s
Two correctness fixes and a tuning improvement.

CORRECTNESS:
1. Strip the stock 'extProcessor lsphp' from httpd_config.conf before
   appending ours. Previously the stock block (hard-coded
   PHP_LSAPI_CHILDREN=10 regardless of container memory) always won
   because our APPEND fragment didn't include an extProcessor block.
   detect-memory-litespeed.sh was computing LSAPI_CHILDREN but never
   plumbing it anywhere — silent dead code.

2. Bump LSPHP_WORKER_ESTIMATE_MB from 96 → 115 per the 2026-06-02
   memory-sizing finding (vantagehealth OOM-spawn loop). Each lsphp
   carries ~115 MB shmem-rss accounted per worker. 115 MB matches the
   real per-worker baseline.

TUNING (idle reduction, the original ask):
- LSAPI_MAX_IDLE_CHILDREN=2  (was CHILDREN/2 = 5 default)
- LSAPI_MAX_IDLE=60s         (was 300s default)
- PHP_LSAPI_MAX_REQUESTS=500 (recycle workers, prevents bloat)
- memSoftLimit=1024M / memHardLimit=1500M per worker (RLIMIT_AS;
  catches runaway scripts at the worker level, cgroup still backstops
  the container)

Effective LSAPI_CHILDREN per container:
  2 GiB → ~17 (was 10 — brain-jar was saturating)
  1 GiB → ~8
  512 MiB → ~3 (cap-marginal per the memory note; bump container if
                site grows)

Dropped LSAPI_MEM_SOFT/HARD computation in detect-memory: AVAILABLE/CHILDREN
was conflating VSZ with RSS-budget arithmetic and would have killed
legitimate workers. The 1024/1500 hard-coded values in the template
comfortably fit typical Divi/WooCommerce VSZ (280-365 MB).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-02 16:36:25 -07:00
d1c3cfadc0 feat(litespeed): make log paths drop-in compatible with cac:phpNN
All checks were successful
Cloud Apache Container / Build-and-Push (74) (push) Successful in 1m35s
Cloud Apache Container / Build-and-Push (80) (push) Successful in 2m20s
Cloud Apache Container / Build-and-Push (81) (push) Successful in 1m18s
Cloud Apache Container / Build-and-Push (82) (push) Successful in 2m13s
Cloud Apache Container / Build-and-Push (83) (push) Successful in 2m21s
Cloud Apache Container / Build-and-Push (84) (push) Successful in 2m22s
Cloud Apache Container / Build-and-Push (85) (push) Successful in 2m19s
Cloud Apache Container / Build-FPM-Images (74) (push) Successful in 1m14s
Cloud Apache Container / Build-FPM-Images (80) (push) Successful in 2m25s
Cloud Apache Container / Build-FPM-Images (81) (push) Successful in 2m26s
Cloud Apache Container / Build-FPM-Images (82) (push) Successful in 2m15s
Cloud Apache Container / Build-FPM-Images (83) (push) Successful in 2m15s
Cloud Apache Container / Build-FPM-Images (84) (push) Successful in 2m58s
Cloud Apache Container / Build-FPM-Images (85) (push) Successful in 1m27s
Cloud Apache Container / Build-LiteSpeed-Images (81) (push) Successful in 30s
Cloud Apache Container / Build-LiteSpeed-Images (82) (push) Successful in 29s
Cloud Apache Container / Build-LiteSpeed-Images (83) (push) Successful in 29s
Cloud Apache Container / Build-LiteSpeed-Images (84) (push) Successful in 33s
Cloud Apache Container / Build-LiteSpeed-Images (85) (push) Successful in 1m27s
Cloud Apache Container / Build-Shared-httpd (push) Successful in 24s
OLS now writes:
  access -> /home/$user/logs/apache/access_log
  error  -> /home/$user/logs/apache/error_log
  PHP    -> /home/$user/logs/php-fpm/error.log

Matches the cac:phpNN bundled image convention exactly, so existing WHP
log-gathering code (whp-traffic-aggregator.php, process-log-review.php)
works for migrated sites without any panel-side changes. Customer-facing
paths are stable across migrations — "where do I find my access log?"
gets the same answer regardless of image family.

Server-level OLS logs (/usr/local/lsws/logs/) are unchanged — those are
internal diagnostics, not customer-relevant.

PHP error_log is set via a runtime-rendered tiny ini in lsphp's scan dir
(can't be in the static lsphp-overrides.ini because the path is
per-customer).

Customers on the four whp01 migrations (alphaone, peptides, shadowdao,
brain-jar) need a container recreate after CI publishes the new tags.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-02 10:53:44 -07:00
80fa06592b perf(litespeed): defer mariadb-server + memcached install to DEV runtime
All checks were successful
Cloud Apache Container / Build-and-Push (74) (push) Successful in 2m22s
Cloud Apache Container / Build-and-Push (80) (push) Successful in 2m23s
Cloud Apache Container / Build-and-Push (81) (push) Successful in 1m58s
Cloud Apache Container / Build-and-Push (82) (push) Successful in 2m0s
Cloud Apache Container / Build-and-Push (83) (push) Successful in 2m14s
Cloud Apache Container / Build-and-Push (84) (push) Successful in 2m12s
Cloud Apache Container / Build-and-Push (85) (push) Successful in 2m24s
Cloud Apache Container / Build-FPM-Images (74) (push) Successful in 2m44s
Cloud Apache Container / Build-FPM-Images (80) (push) Successful in 1m41s
Cloud Apache Container / Build-FPM-Images (81) (push) Successful in 3m33s
Cloud Apache Container / Build-FPM-Images (82) (push) Successful in 2m18s
Cloud Apache Container / Build-FPM-Images (83) (push) Successful in 2m17s
Cloud Apache Container / Build-FPM-Images (84) (push) Successful in 2m21s
Cloud Apache Container / Build-FPM-Images (85) (push) Successful in 2m16s
Cloud Apache Container / Build-LiteSpeed-Images (81) (push) Successful in 1m19s
Cloud Apache Container / Build-LiteSpeed-Images (82) (push) Successful in 46s
Cloud Apache Container / Build-LiteSpeed-Images (83) (push) Successful in 31s
Cloud Apache Container / Build-LiteSpeed-Images (84) (push) Successful in 1m26s
Cloud Apache Container / Build-LiteSpeed-Images (85) (push) Successful in 52s
Cloud Apache Container / Build-Shared-httpd (push) Successful in 58s
Drops these from the build-time apt install in Dockerfile.litespeed; they
now install at entrypoint time only when environment=DEV, guarded by
'command -v mysqld' so container restarts skip the apt step.

Mirrors the cac:phpNN pattern. The mysql CLI client is already in the
litespeedtech/openlitespeed base, so wp-cli + DEV creds-bootstrap still work
without a build-time client install.

Measured (php83 / OLS 1.8.4):
  PROD image: 1.64 GB -> 1.20 GB (~440 MB savings)
  PROD first-200 boot: unchanged at ~1.5s
  DEV first boot:  ~51s (apt install cost — one-time per container)
  DEV second boot: ~6s (cache hit, same as PROD)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-02 08:26:19 -07:00
55c28a0c11 Add cac-litespeed image family (OpenLiteSpeed, native LSAPI)
New paid-tier per-customer image built on litespeedtech/openlitespeed:1.8.4-lsphpNN.
Matrix: 8.1-8.5. Native LSAPI suexec to customer uid, server-level LSCache,
all WP/WooCommerce extensions (memcached, redis, imagick, mbstring, etc.) baked in.

Files:
- Dockerfile.litespeed (FROM prebuilt LiteSpeed base, layers wp-cli/composer/mariadb)
- configs/litespeed/{httpd_config,site-template,lsphp-overrides}.tpl
- scripts/{entrypoint,create-vhost,detect-memory}-litespeed.sh + install-lscache-wp.sh

CI: new Build-LiteSpeed-Images matrix job. OLS_VERSION pinned to 1.8.4 (only
release with prebuilt images for all 5 PHP versions on Docker Hub).

Spec: whp/docs/superpowers/specs/2026-06-01-cac-litespeed-design.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-02 07:32:47 -07:00