Skeleton for the cpanel-importer Docker container — a one-shot sandbox the WHP panel invokes BEFORE extracting a customer cpmove tarball. See cpanel-import-container-spec.md (in /workspace/) for the full design. What this ships in v1.0: - Dockerfile: almalinux:10-minimal + PHP 8.4 (Remi) + ClamAV 1.4 + SaneSecurity Foxhole.PHP rules + tar/mariadb-client/rsync. Runs as UID 999 (whp-import) via the panel-side --user 999:999 flag. - scripts/entrypoint.sh: validates env, runs (optional) freshclam, drives extract -> scan-files -> scan-dbs -> rsync -> report.json. - scripts/extract.sh + scripts/lib/scan-symlinks.php: pre-extract symlink scan ported standalone from web-files/libs/CpanelBackupImporter.php (the existing 2026-05-29 whp02 destruction-vector fix). Aborts with exit 3 before tar runs if any DANGEROUS symlink is found. - scripts/scan-files.php: ClamAV walk + classify-and-action. v1.0 ships with an empty cleaner registry — every hit is QUARANTINE_ONLY. Cleaner hooks are stubbed for v1.1. - scripts/scan-dbs.php: regex MyISAM -> InnoDB rewrite (always applied), WordPress identification, and ONE WP content scan check (siteurl_external_domain). v1.1 will grow the check set. - scripts/lib/safety-net.php: container-narrow open_basedir allow-list, much tighter than the panel-side one. - .gitea/workflows/build-push.yaml: builds + smoke-tests + PHP-syntax-checks + bash-syntax-checks before pushing to repo.anhonesthost.net/cloud-hosting-platform/cpanel-importer. - tests/build-fixtures.sh: builds cpmove-clean.tar.gz (benign WP dump) and cpmove-alfa.tar.gz (the ALFA-shell symlink-to-/etc vector) for local end-to-end testing. - README.md / CONTRIBUTING.md: docker-run invocation, bind-mount catalog, report.json schema, how to add a cleaner pattern or a WP scan signature. Local acceptance test results: - clean fixture -> status=completed, 3 MyISAM->InnoDB, no flags, 0 - ALFA fixture -> exit 1, status=failed, failed_stage=extract, "tarball contains dangerous symlinks; aborting" on stderr - compromised-siteurl fixture -> imported_into_new_server=false, .flagged file written, summary_for_panel.show_alert=true Image size: 197 MB compressed (gzipped docker save), ~397 MB unique layers extracted. Well under the spec's 600 MB compressed / 1.2 GB extracted budget. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6.7 KiB
Contributing — cpanel-importer
How to add an auto-cleaner pattern
Auto-cleaners live in scripts/scan-files.php, in the $cleaners
registry at the top of the main flow.
A cleaner has three parts:
$cleaners['short-cleaner-name'] = [
'class' => 'KNOWN_REMOVABLE', // or 'REMOVABLE_WITH_BACKUP'
'match' => fn(string $sig): bool => str_contains($sig, 'PHP.Trojan.EvalB64'),
'clean' => function (string $path): bool {
// Read $path, transform, write back; return true on success.
// The file at $path is the LIVE extracted file — your edit
// here is what ends up in /host/sanitized/<id>/extracted/.
// The original has ALREADY been backed up to <path>.original
// by the orchestrator before this is called.
},
];
Safety checklist before merging a new cleaner
- Backup is guaranteed. The orchestrator copies the file to
<quarantine>/<relpath>.originalBEFORE callingclean(). Verify this is still true inscan-files.phpif you refactor the dispatch. - Cleaner is idempotent. Running it twice on the same file must produce the same output the second time as the first.
- Cleaner is conservative. If the file does NOT match your
transform exactly, return
false(the orchestrator will fall back to quarantining). Never "best-effort" a half-clean. - Cleaner has a regression test. Add a fixture under
tests/fixtures/cleaner-<name>/with input + expected output, and exercise it fromtests/run-tests.sh(or your CI step). - Cleaner classification is correct.
KNOWN_REMOVABLE= the whole pattern is known-safe to strip.REMOVABLE_WITH_BACKUP= legit file with injected lines; we are confident in surgical removal but back up anyway.QUARANTINE_ONLY= no clean variant; don't write aclean().
- Signature match is tight. Prefer
str_contains($sig, 'specific-sig-name')over broad regex matches. A false-positive cleaner can corrupt customer files.
Manual test loop
docker build -t cpanel-importer:dev .
# Place a known-infected synthetic file under tests/fixtures/cleaner-X/in/
# Run scan-files.php directly against it:
docker run --rm \
--entrypoint /scripts/scan-files.php \
-v "$PWD/tests/fixtures/cleaner-X/in:/tmp/extract" \
-v "$PWD/tests/fixtures/cleaner-X/quarantine:/host/quarantine" \
cpanel-importer:dev \
--extract /tmp/extract --quarantine /host/quarantine \
--report /tmp/r.json --import-id test
How to add a WordPress content scan signature
Scan checks live in scripts/scan-dbs.php, in wp_content_scan().
Each check should produce a flag dict on hit:
$flags[] = [
'severity' => 'high', // 'high' refuses the DB (per default threshold N=1)
// 'medium' / 'low' flag in the report but allow import
'code' => 'short_machine_readable_code',
'details' => 'Human-readable explanation including the matched value(s).',
];
Safety checklist
- Severity reflects confidence. Use
highonly when a false positive is acceptable for the customer (they re-import via the "import anyway" UI button). Errors of measurement here translate directly to admin support tickets. - Check is fast. The whole
.sqldump is in memory as a string; preferpreg_matchon the raw string or a pre-built map (seeextract_wp_options()) over re-parsing the full dump. - Check is well-tested. Add a fixture under
tests/fixtures/wp-scan-<code>/with a synthetic dump that triggers the flag and one that does not. - Allow-list awareness. If the check is comparing a value against
the customer's domain list, use
domain_in_allowlist($host, $allowedDomains)so subdomain matches work consistently with the rest of the scanner. - Don't break engine swap.
wp_content_scan()runs AFTER the engine swap on the same$rewrittenstring. Both your check and the engine swap must be tolerant of each other's output.
How to test locally
Build the image
docker build -t cpanel-importer:dev .
Confirm the image is under the budget:
docker images cpanel-importer:dev --format '{{.Size}}'
Target: < 1 GB extracted (spec asks < 600 MB compressed for prod, but local builds typically come in around 700–900 MB extracted including ClamAV signature DBs).
Build the fixtures
bash tests/build-fixtures.sh
Two tarballs land under tests/fixtures/:
cpmove-clean.tar.gz— a benign cpmove with a WordPress MyISAM dump.cpmove-alfa.tar.gz— same shape PLUS an ALFA-style symlink to /etc.
Run against the clean fixture
mkdir -p /tmp/test-quarantine /tmp/test-sanitized
docker run --rm \
-e IMPORT_ID=test-clean \
-e IMPORT_USERNAME=testuser \
-e IMPORT_BACKUP_FILE=/host/backup/cpmove-clean.tar.gz \
-e CLAMAV_REFRESH=false \
-v "$PWD/tests/fixtures/cpmove-clean.tar.gz:/host/backup/cpmove-clean.tar.gz:ro" \
-v /tmp/test-quarantine:/host/quarantine \
-v /tmp/test-sanitized:/host/sanitized \
cpanel-importer:dev
Expect status=completed, MyISAM count > 0, no flags, exit 0.
Run against the ALFA fixture
docker run --rm \
-e IMPORT_ID=test-alfa \
-e IMPORT_USERNAME=testuser \
-e IMPORT_BACKUP_FILE=/host/backup/cpmove-alfa.tar.gz \
-e CLAMAV_REFRESH=false \
-v "$PWD/tests/fixtures/cpmove-alfa.tar.gz:/host/backup/cpmove-alfa.tar.gz:ro" \
-v /tmp/test-quarantine:/host/quarantine \
-v /tmp/test-sanitized:/host/sanitized \
cpanel-importer:dev
Expect non-zero exit, status=failed, failed_stage=extract, and
stderr from inside the container containing
tarball contains dangerous symlinks; aborting.
Iterating on PHP / shell scripts
The scripts/ directory is COPYed in late in the Dockerfile, so
edits there only re-trigger the last layer of the build — typical
turnaround is ~5 seconds.
Code style
- Bash scripts:
set -euo pipefail, absolute paths only, every external command on its own logical line, comment each non-obvious flag. - PHP scripts: 4-space indent, single quotes for non-interpolated
strings,
<?phpopener on line 1, no closing?>. - All scripts must be idempotent — the worker may be re-run against the
same
IMPORT_IDon retry; second runs must overwrite the priorreport.jsoncleanly.
CI
Pushes to trunk build + push the image to
repo.anhonesthost.net/cloud-hosting-platform/cpanel-importer:latest and
...:<sha>. Pushes of a YYYY.MM.NNN tag additionally tag
...:YYYY.MM.NNN. CI runs the smoke test (image starts and
echo ok runs) and PHP -l / bash -n syntax checks on every script
before pushing.
See .gitea/workflows/build-push.yaml.