Surfaced on whp02 alsacorp retry: scan-dbs.php hit PHP fatal at line 86
"Allowed memory size of 134217728 bytes exhausted (tried to allocate
5488440384 bytes)" while loading alsacorp_alsa1.sql via
file_get_contents. The dump is multi-GB (typical for WooCommerce stores
with media metadata); the 128MB-default PHP memory_limit + the 2GB
cgroup on the container both fail well below the actual file size.
Rewrote the per-DB pass as a streaming loop over 4MB chunks:
- engine_swap_chunk: same `\bENGINE=MyISAM\b` regex, mutates a
per-DB counter via reference so the per-chunk counts accumulate
into a single myisam_to_innodb total.
- is_wp_chunk_scan: OR-folds the four WP fingerprint regexes
(CREATE TABLE *_options, *_posts, *_users + the
'siteurl|home|template|stylesheet' sentinel) into a state dict;
any chunk that flips a flag from false to true keeps it true for
the rest of the file. Caller AND-folds at finalization.
- wp_options_chunk_scan: extracts (option_name, option_value)
tuples from INSERT INTO options statements as they pass through.
First occurrence wins so we keep the live value, not later
duplicates.
- wp_content_scan_from_values: extracted the finalization logic
from the legacy wp_content_scan() so the streaming path can
submit a pre-built option-values map instead of re-scanning the
full string.
Per-chunk carry: a 128-byte buffer at the end of each chunk is held
back and prepended to the next chunk so a pattern split across a
chunk boundary (e.g. "ENGINE=" at byte 4194302, "MyISAM" at byte
4194304) is still seen by the regex. 128 bytes is generous for our
patterns (longest is "ENGINE = MyISAM" with whitespace flex).
Output goes to a `<db>.sql.tmp` first, then renamed to
`<db>.sql{,.flagged}` once we know the flag verdict — avoids a
partial file if the scan dies mid-stream.
Legacy `engine_swap`, `is_wordpress_dump`, and the unused
`wp_content_scan`+`extract_wp_options` are kept in place for the
small-file path (none of them currently called from the new
streaming loop, but they're public-ish helpers the next dbsanitize
revision could reuse).
Resident memory now bounded to <16 MB per DB regardless of input
file size — should handle the 30 GB+ outliers we'll inevitably see.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Skeleton for the cpanel-importer Docker container — a one-shot
sandbox the WHP panel invokes BEFORE extracting a customer cpmove
tarball. See cpanel-import-container-spec.md (in /workspace/) for the
full design.
What this ships in v1.0:
- Dockerfile: almalinux:10-minimal + PHP 8.4 (Remi) + ClamAV 1.4 +
SaneSecurity Foxhole.PHP rules + tar/mariadb-client/rsync. Runs as
UID 999 (whp-import) via the panel-side --user 999:999 flag.
- scripts/entrypoint.sh: validates env, runs (optional) freshclam,
drives extract -> scan-files -> scan-dbs -> rsync -> report.json.
- scripts/extract.sh + scripts/lib/scan-symlinks.php: pre-extract
symlink scan ported standalone from
web-files/libs/CpanelBackupImporter.php (the existing 2026-05-29
whp02 destruction-vector fix). Aborts with exit 3 before tar runs
if any DANGEROUS symlink is found.
- scripts/scan-files.php: ClamAV walk + classify-and-action. v1.0
ships with an empty cleaner registry — every hit is
QUARANTINE_ONLY. Cleaner hooks are stubbed for v1.1.
- scripts/scan-dbs.php: regex MyISAM -> InnoDB rewrite (always
applied), WordPress identification, and ONE WP content scan check
(siteurl_external_domain). v1.1 will grow the check set.
- scripts/lib/safety-net.php: container-narrow open_basedir
allow-list, much tighter than the panel-side one.
- .gitea/workflows/build-push.yaml: builds + smoke-tests +
PHP-syntax-checks + bash-syntax-checks before pushing to
repo.anhonesthost.net/cloud-hosting-platform/cpanel-importer.
- tests/build-fixtures.sh: builds cpmove-clean.tar.gz (benign WP
dump) and cpmove-alfa.tar.gz (the ALFA-shell symlink-to-/etc
vector) for local end-to-end testing.
- README.md / CONTRIBUTING.md: docker-run invocation, bind-mount
catalog, report.json schema, how to add a cleaner pattern or a WP
scan signature.
Local acceptance test results:
- clean fixture -> status=completed, 3 MyISAM->InnoDB, no flags, 0
- ALFA fixture -> exit 1, status=failed, failed_stage=extract,
"tarball contains dangerous symlinks; aborting" on stderr
- compromised-siteurl fixture -> imported_into_new_server=false,
.flagged file written, summary_for_panel.show_alert=true
Image size: 197 MB compressed (gzipped docker save), ~397 MB unique
layers extracted. Well under the spec's 600 MB compressed / 1.2 GB
extracted budget.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>