Files
cpanel-importer/scripts/lib/scan-symlinks.php

202 lines
7.9 KiB
PHP
Raw Normal View History

Initial bootstrap: cpanel-importer sanitization sandbox Skeleton for the cpanel-importer Docker container — a one-shot sandbox the WHP panel invokes BEFORE extracting a customer cpmove tarball. See cpanel-import-container-spec.md (in /workspace/) for the full design. What this ships in v1.0: - Dockerfile: almalinux:10-minimal + PHP 8.4 (Remi) + ClamAV 1.4 + SaneSecurity Foxhole.PHP rules + tar/mariadb-client/rsync. Runs as UID 999 (whp-import) via the panel-side --user 999:999 flag. - scripts/entrypoint.sh: validates env, runs (optional) freshclam, drives extract -> scan-files -> scan-dbs -> rsync -> report.json. - scripts/extract.sh + scripts/lib/scan-symlinks.php: pre-extract symlink scan ported standalone from web-files/libs/CpanelBackupImporter.php (the existing 2026-05-29 whp02 destruction-vector fix). Aborts with exit 3 before tar runs if any DANGEROUS symlink is found. - scripts/scan-files.php: ClamAV walk + classify-and-action. v1.0 ships with an empty cleaner registry — every hit is QUARANTINE_ONLY. Cleaner hooks are stubbed for v1.1. - scripts/scan-dbs.php: regex MyISAM -> InnoDB rewrite (always applied), WordPress identification, and ONE WP content scan check (siteurl_external_domain). v1.1 will grow the check set. - scripts/lib/safety-net.php: container-narrow open_basedir allow-list, much tighter than the panel-side one. - .gitea/workflows/build-push.yaml: builds + smoke-tests + PHP-syntax-checks + bash-syntax-checks before pushing to repo.anhonesthost.net/cloud-hosting-platform/cpanel-importer. - tests/build-fixtures.sh: builds cpmove-clean.tar.gz (benign WP dump) and cpmove-alfa.tar.gz (the ALFA-shell symlink-to-/etc vector) for local end-to-end testing. - README.md / CONTRIBUTING.md: docker-run invocation, bind-mount catalog, report.json schema, how to add a cleaner pattern or a WP scan signature. Local acceptance test results: - clean fixture -> status=completed, 3 MyISAM->InnoDB, no flags, 0 - ALFA fixture -> exit 1, status=failed, failed_stage=extract, "tarball contains dangerous symlinks; aborting" on stderr - compromised-siteurl fixture -> imported_into_new_server=false, .flagged file written, summary_for_panel.show_alert=true Image size: 197 MB compressed (gzipped docker save), ~397 MB unique layers extracted. Well under the spec's 600 MB compressed / 1.2 GB extracted budget. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-30 19:56:57 -07:00
<?php
/**
* scan-symlinks.php standalone port of
* CpanelBackupImporter::scanTarballForDangerousSymlinks().
*
* This is the same classification logic that ships in the WHP panel today
* (web-files/libs/CpanelBackupImporter.php, ~line 2438). Lifted into a
* standalone CLI so the container can run it as an independent pre-extract
* gate without dragging in the rest of the importer.
*
* Exit codes:
sanitize-dont-refuse: strip dangerous symlinks via tar --exclude Shifts the sandbox's symlink handling from "refuse the whole tarball" to "drop the dangerous entries from extraction and record them as quarantine actions". This is what sandbox mode is supposed to do — make malicious cpmoves safe to import rather than gate-keeping them. Three coordinated changes: 1. scan-symlinks.php — exit 0 even when DANGEROUS findings exist. The JSON report is the source of truth; the caller decides what to do with it. Usage/IO errors still exit 2. STDERR still names each finding (now "STRIP X -> Y" instead of "refusing tarball") so the streamed [container] log on the panel side surfaces them. 2. extract.sh — reads the scan-symlinks report, builds a newline-delimited exclude list of DANGEROUS archive_paths, and passes it to `tar --exclude-from=`. The stripped entries never reach the filesystem; tar skips them silently. Also writes a small JSON sidecar at $EXTRACT_DIR/.cpanel-importer-stripped-symlinks.json describing each strip-action so the merge step can surface them in report.json without re-parsing scan-symlinks output. 3. entrypoint.sh write_report — reads the sidecar, prepends each stripped_dangerous_symlink action to the actions[] list, bumps files_quarantined by the strip-count, and rewrites summary_for_panel.alert_message to call them out distinctly: "N dangerous symlink(s) stripped during extract; M files quarantined; K cleaned in place. Customer site may have been compromised at the source — recommend review." Result on darkside: instead of the import failing on the ALFA alfasymlink/root entry, that entry is silently skipped during extract, recorded as `stripped_dangerous_symlink path=... target=/ reason=absolute target is root /`, and the rest of the tarball extracts normally. Subsequent ClamAV scan + DB sanitization run to completion; panel sees a verdict-completed import with the stripped symlinks visible in the Sanitization Sandbox panel on the results page. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 11:13:57 -07:00
* 0 scan completed successfully (with or without DANGEROUS findings).
* Findings are recorded in --report; extract.sh inspects the report
* to decide which entries to --exclude from `tar -xzf`. Sandbox-mode
* posture is "sanitize, don't refuse" the container drops the
* dangerous symlinks from extraction and records the actions in
* report.json instead of aborting the whole import.
* 2 usage / I/O error (couldn't read tarball, couldn't write report).
Initial bootstrap: cpanel-importer sanitization sandbox Skeleton for the cpanel-importer Docker container — a one-shot sandbox the WHP panel invokes BEFORE extracting a customer cpmove tarball. See cpanel-import-container-spec.md (in /workspace/) for the full design. What this ships in v1.0: - Dockerfile: almalinux:10-minimal + PHP 8.4 (Remi) + ClamAV 1.4 + SaneSecurity Foxhole.PHP rules + tar/mariadb-client/rsync. Runs as UID 999 (whp-import) via the panel-side --user 999:999 flag. - scripts/entrypoint.sh: validates env, runs (optional) freshclam, drives extract -> scan-files -> scan-dbs -> rsync -> report.json. - scripts/extract.sh + scripts/lib/scan-symlinks.php: pre-extract symlink scan ported standalone from web-files/libs/CpanelBackupImporter.php (the existing 2026-05-29 whp02 destruction-vector fix). Aborts with exit 3 before tar runs if any DANGEROUS symlink is found. - scripts/scan-files.php: ClamAV walk + classify-and-action. v1.0 ships with an empty cleaner registry — every hit is QUARANTINE_ONLY. Cleaner hooks are stubbed for v1.1. - scripts/scan-dbs.php: regex MyISAM -> InnoDB rewrite (always applied), WordPress identification, and ONE WP content scan check (siteurl_external_domain). v1.1 will grow the check set. - scripts/lib/safety-net.php: container-narrow open_basedir allow-list, much tighter than the panel-side one. - .gitea/workflows/build-push.yaml: builds + smoke-tests + PHP-syntax-checks + bash-syntax-checks before pushing to repo.anhonesthost.net/cloud-hosting-platform/cpanel-importer. - tests/build-fixtures.sh: builds cpmove-clean.tar.gz (benign WP dump) and cpmove-alfa.tar.gz (the ALFA-shell symlink-to-/etc vector) for local end-to-end testing. - README.md / CONTRIBUTING.md: docker-run invocation, bind-mount catalog, report.json schema, how to add a cleaner pattern or a WP scan signature. Local acceptance test results: - clean fixture -> status=completed, 3 MyISAM->InnoDB, no flags, 0 - ALFA fixture -> exit 1, status=failed, failed_stage=extract, "tarball contains dangerous symlinks; aborting" on stderr - compromised-siteurl fixture -> imported_into_new_server=false, .flagged file written, summary_for_panel.show_alert=true Image size: 197 MB compressed (gzipped docker save), ~397 MB unique layers extracted. Well under the spec's 600 MB compressed / 1.2 GB extracted budget. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-30 19:56:57 -07:00
*
* Always writes a JSON report to --report describing every absolute-target
* symlink seen and the classification verdict.
*
* SECURITY NOTE this differs from the panel implementation in ONE way:
* The panel uses file_exists($target) on the *host* to decide whether a
* target under a dangerous prefix is BENIGN_DANGLING vs DANGEROUS. We
* are running INSIDE the container so /etc and /usr DO exist (they're
* the container's own), but `--read-only --tmpfs /tmp` plus the worker
* running as UID 999 means even DANGEROUS targets cannot reach the host.
*
* We treat any absolute-target symlink under a dangerous prefix as
* DANGEROUS regardless of `file_exists()` this is a stricter check
* than the panel's, because in the container we *can* safely refuse to
* even try the extract on a clearly malicious tarball.
*/
require __DIR__ . '/safety-net.php';
$opts = getopt('', ['tarball:', 'username:', 'report:']);
if (!isset($opts['tarball']) || !isset($opts['report'])) {
fwrite(STDERR, "usage: scan-symlinks.php --tarball <path> --report <out.json> [--username <u>]\n");
exit(2);
}
$tarPath = $opts['tarball'];
$reportPath = $opts['report'];
$username = $opts['username'] ?? '';
if (!is_file($tarPath) || !is_readable($tarPath)) {
fwrite(STDERR, "scan-symlinks: not a readable file: $tarPath\n");
exit(2);
}
scan-symlinks: tighten DANGEROUS prefix list to actual destruction class Previous version of scan-symlinks.php was a verbatim port of the panel's scanTarballForDangerousSymlinks(), which flagged every symlink whose target sits under /etc, /usr, /bin, /sbin, /lib, /lib64, /var/lib, /var/log, /var/cache, or /var/spool. That's the right posture for the panel's pre-extract scan in DIRECT mode — refuse before extract — but it makes the container REFUSE every cpmove that comes from a real cPanel source server, including totally clean ones. Standard cPanel accounts ship with stock symlinks like: homedir/access-logs -> /usr/local/apache/domlogs/<user> homedir/var/cpanel/styled/current_style -> /usr/local/cpanel/base/frontend/... homedir/.cpanel/email -> /usr/local/cpanel/... homedir/etc -> /var/cpanel/userhomes/<user>/etc Every customer tarball has 5-20 of these. Treating them as DANGEROUS made the container abort with verdict=refused before extract.sh ever ran. Surfaced on darkside import to whp02: scan-symlinks found homedir/access-logs (a textbook cPanel symlink) and the import bombed. The real destruction class — what ALFA TEaM Shell uses, what we saw brick whp02 in May — is symlinks whose target is the exact filesystem root or under one of the genuinely catastrophic system trees that either escape the customer account or clobber boot/config/proc state: / exact root (the classic alfasymlink/root) /etc config tampering, /etc/shadow exfil /root root home dir /boot bootloader / kernel /proc process info / kernel knobs /sys sysfs /dev device nodes Everything else (notably /usr, /var) becomes UNCERTAIN: reported in the JSON output but doesn't refuse the tarball. With --cap-drop=ALL --read-only --network none --user 999, a /usr-targeting symlink in the container's sandbox can at worst dangle on extract; it can't touch the host. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 10:07:23 -07:00
// Threat model: an "ALFA TEaM Shell"-style payload links into a path that,
// when a recursive walker follows it (or when something writes through it),
// either ESCAPES the customer's account on the destination server OR
// CLOBBERS critical system state. The classification needs to be tight
// enough to catch those — and loose enough to NOT flag the dozens of
// standard cPanel-internal symlinks every customer tarball contains
// (access-logs -> /usr/local/apache/domlogs/<user>, var/cpanel/styled/...
// -> /usr/local/cpanel/base/frontend/..., mailman, etc.).
//
// Earlier versions of this file used the panel's broader list (everything
// under /etc, /usr, /bin, /sbin, /lib, /lib64, /var/lib, /var/log,
// /var/cache, /var/spool) which made the container REFUSE every cpmove
// from a real cPanel source server — including clean ones. The panel
// could afford to be permissive in UNCERTAIN handling because it never
// actually followed the links (removeDirectory now shell-rm's, not
// recursive PHP walk). The container is supposed to QUARANTINE the truly
// destructive ones and let the rest through.
//
// Real-world dangerous prefixes (escapes/clobbers):
// / exact root — ALFA "alfasymlink/root -> /"
// /etc config tampering, /etc/shadow exfil
// /root root home dir
// /boot bootloader / kernel
// /proc process info / kernel knobs
// /sys sysfs
// /dev device nodes
//
// Notably NOT in the list (cPanel-legitimate, kept as UNCERTAIN):
// /usr/local/apache/... access logs
// /usr/local/cpanel/... UI styling, plugins, mailman
// /var/log/... per-user mail logs
// /bin, /sbin customer "fix shell" symlinks (rare but seen)
Initial bootstrap: cpanel-importer sanitization sandbox Skeleton for the cpanel-importer Docker container — a one-shot sandbox the WHP panel invokes BEFORE extracting a customer cpmove tarball. See cpanel-import-container-spec.md (in /workspace/) for the full design. What this ships in v1.0: - Dockerfile: almalinux:10-minimal + PHP 8.4 (Remi) + ClamAV 1.4 + SaneSecurity Foxhole.PHP rules + tar/mariadb-client/rsync. Runs as UID 999 (whp-import) via the panel-side --user 999:999 flag. - scripts/entrypoint.sh: validates env, runs (optional) freshclam, drives extract -> scan-files -> scan-dbs -> rsync -> report.json. - scripts/extract.sh + scripts/lib/scan-symlinks.php: pre-extract symlink scan ported standalone from web-files/libs/CpanelBackupImporter.php (the existing 2026-05-29 whp02 destruction-vector fix). Aborts with exit 3 before tar runs if any DANGEROUS symlink is found. - scripts/scan-files.php: ClamAV walk + classify-and-action. v1.0 ships with an empty cleaner registry — every hit is QUARANTINE_ONLY. Cleaner hooks are stubbed for v1.1. - scripts/scan-dbs.php: regex MyISAM -> InnoDB rewrite (always applied), WordPress identification, and ONE WP content scan check (siteurl_external_domain). v1.1 will grow the check set. - scripts/lib/safety-net.php: container-narrow open_basedir allow-list, much tighter than the panel-side one. - .gitea/workflows/build-push.yaml: builds + smoke-tests + PHP-syntax-checks + bash-syntax-checks before pushing to repo.anhonesthost.net/cloud-hosting-platform/cpanel-importer. - tests/build-fixtures.sh: builds cpmove-clean.tar.gz (benign WP dump) and cpmove-alfa.tar.gz (the ALFA-shell symlink-to-/etc vector) for local end-to-end testing. - README.md / CONTRIBUTING.md: docker-run invocation, bind-mount catalog, report.json schema, how to add a cleaner pattern or a WP scan signature. Local acceptance test results: - clean fixture -> status=completed, 3 MyISAM->InnoDB, no flags, 0 - ALFA fixture -> exit 1, status=failed, failed_stage=extract, "tarball contains dangerous symlinks; aborting" on stderr - compromised-siteurl fixture -> imported_into_new_server=false, .flagged file written, summary_for_panel.show_alert=true Image size: 197 MB compressed (gzipped docker save), ~397 MB unique layers extracted. Well under the spec's 600 MB compressed / 1.2 GB extracted budget. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-30 19:56:57 -07:00
$dangerousPrefixes = [
scan-symlinks: tighten DANGEROUS prefix list to actual destruction class Previous version of scan-symlinks.php was a verbatim port of the panel's scanTarballForDangerousSymlinks(), which flagged every symlink whose target sits under /etc, /usr, /bin, /sbin, /lib, /lib64, /var/lib, /var/log, /var/cache, or /var/spool. That's the right posture for the panel's pre-extract scan in DIRECT mode — refuse before extract — but it makes the container REFUSE every cpmove that comes from a real cPanel source server, including totally clean ones. Standard cPanel accounts ship with stock symlinks like: homedir/access-logs -> /usr/local/apache/domlogs/<user> homedir/var/cpanel/styled/current_style -> /usr/local/cpanel/base/frontend/... homedir/.cpanel/email -> /usr/local/cpanel/... homedir/etc -> /var/cpanel/userhomes/<user>/etc Every customer tarball has 5-20 of these. Treating them as DANGEROUS made the container abort with verdict=refused before extract.sh ever ran. Surfaced on darkside import to whp02: scan-symlinks found homedir/access-logs (a textbook cPanel symlink) and the import bombed. The real destruction class — what ALFA TEaM Shell uses, what we saw brick whp02 in May — is symlinks whose target is the exact filesystem root or under one of the genuinely catastrophic system trees that either escape the customer account or clobber boot/config/proc state: / exact root (the classic alfasymlink/root) /etc config tampering, /etc/shadow exfil /root root home dir /boot bootloader / kernel /proc process info / kernel knobs /sys sysfs /dev device nodes Everything else (notably /usr, /var) becomes UNCERTAIN: reported in the JSON output but doesn't refuse the tarball. With --cap-drop=ALL --read-only --network none --user 999, a /usr-targeting symlink in the container's sandbox can at worst dangle on extract; it can't touch the host. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 10:07:23 -07:00
'/etc',
'/root',
'/boot',
'/proc',
'/sys',
'/dev',
Initial bootstrap: cpanel-importer sanitization sandbox Skeleton for the cpanel-importer Docker container — a one-shot sandbox the WHP panel invokes BEFORE extracting a customer cpmove tarball. See cpanel-import-container-spec.md (in /workspace/) for the full design. What this ships in v1.0: - Dockerfile: almalinux:10-minimal + PHP 8.4 (Remi) + ClamAV 1.4 + SaneSecurity Foxhole.PHP rules + tar/mariadb-client/rsync. Runs as UID 999 (whp-import) via the panel-side --user 999:999 flag. - scripts/entrypoint.sh: validates env, runs (optional) freshclam, drives extract -> scan-files -> scan-dbs -> rsync -> report.json. - scripts/extract.sh + scripts/lib/scan-symlinks.php: pre-extract symlink scan ported standalone from web-files/libs/CpanelBackupImporter.php (the existing 2026-05-29 whp02 destruction-vector fix). Aborts with exit 3 before tar runs if any DANGEROUS symlink is found. - scripts/scan-files.php: ClamAV walk + classify-and-action. v1.0 ships with an empty cleaner registry — every hit is QUARANTINE_ONLY. Cleaner hooks are stubbed for v1.1. - scripts/scan-dbs.php: regex MyISAM -> InnoDB rewrite (always applied), WordPress identification, and ONE WP content scan check (siteurl_external_domain). v1.1 will grow the check set. - scripts/lib/safety-net.php: container-narrow open_basedir allow-list, much tighter than the panel-side one. - .gitea/workflows/build-push.yaml: builds + smoke-tests + PHP-syntax-checks + bash-syntax-checks before pushing to repo.anhonesthost.net/cloud-hosting-platform/cpanel-importer. - tests/build-fixtures.sh: builds cpmove-clean.tar.gz (benign WP dump) and cpmove-alfa.tar.gz (the ALFA-shell symlink-to-/etc vector) for local end-to-end testing. - README.md / CONTRIBUTING.md: docker-run invocation, bind-mount catalog, report.json schema, how to add a cleaner pattern or a WP scan signature. Local acceptance test results: - clean fixture -> status=completed, 3 MyISAM->InnoDB, no flags, 0 - ALFA fixture -> exit 1, status=failed, failed_stage=extract, "tarball contains dangerous symlinks; aborting" on stderr - compromised-siteurl fixture -> imported_into_new_server=false, .flagged file written, summary_for_panel.show_alert=true Image size: 197 MB compressed (gzipped docker save), ~397 MB unique layers extracted. Well under the spec's 600 MB compressed / 1.2 GB extracted budget. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-30 19:56:57 -07:00
];
$findings = [];
$cpanelUsername = null;
$cmd = 'tar -tvf ' . escapeshellarg($tarPath) . ' 2>/dev/null';
$fh = @popen($cmd, 'r');
if (!$fh) {
fwrite(STDERR, "scan-symlinks: failed to spawn tar -tvf on $tarPath\n");
exit(2);
}
while (($line = fgets($fh)) !== false) {
if ($line === '' || $line[0] !== 'l') continue;
$arrow = strpos($line, ' -> ');
if ($arrow === false) continue;
$left = substr($line, 0, $arrow);
$right = rtrim(substr($line, $arrow + 4), "\r\n");
$parts = preg_split('/\s+/', $left, 6);
if (count($parts) < 6) continue;
$archivePath = $parts[5];
$target = $right;
if ($target === '' || $target[0] !== '/') continue;
if ($cpanelUsername === null) {
if (preg_match('#^cpmove-([^/]+)/#', $archivePath, $m)) {
$cpanelUsername = $m[1];
}
}
// (1) user-internal — accept symlinks pointing into the customer's
// own /home/<user>/ tree. The panel rewrites these on extract.
$userInternal = false;
$usernames = [];
if ($cpanelUsername !== null && $cpanelUsername !== '') $usernames[] = $cpanelUsername;
if ($username !== '') $usernames[] = $username;
foreach ($usernames as $u) {
$prefix = '/home/' . $u . '/';
if (strpos($target, $prefix) === 0 || $target === rtrim($prefix, '/')) {
$userInternal = true;
break;
}
if (preg_match('#^/home\d+/' . preg_quote($u, '#') . '(/|$)#', $target)) {
$userInternal = true;
break;
}
}
if ($userInternal) continue;
// (2) exact root.
$type = null;
$reason = '';
if ($target === '/') {
$type = 'DANGEROUS';
$reason = 'absolute target is root /';
} else {
// (3) — in container, every dangerous-prefix target is treated
// as DANGEROUS without a file_exists() check (see security note
// at top of file).
foreach ($dangerousPrefixes as $p) {
if ($target === $p || strpos($target, $p . '/') === 0) {
$type = 'DANGEROUS';
$reason = "absolute target resolves under system path $p";
break;
}
}
if ($type === null) {
// Target is absolute, not user-internal, not under a known
// dangerous prefix. Operators want to know about these.
$type = 'UNCERTAIN';
$reason = 'absolute target outside user tree and not on dangerous-prefix list';
}
}
$findings[] = [
'type' => $type,
'archive_path' => $archivePath,
'target' => $target,
'reason' => $reason,
];
}
pclose($fh);
$dangerousCount = count(array_filter($findings, fn($f) => $f['type'] === 'DANGEROUS'));
$uncertainCount = count(array_filter($findings, fn($f) => $f['type'] === 'UNCERTAIN'));
$report = [
'tarball' => $tarPath,
'total_findings' => count($findings),
'dangerous_count' => $dangerousCount,
'uncertain_count' => $uncertainCount,
'findings' => $findings,
];
@file_put_contents($reportPath, json_encode($report, JSON_PRETTY_PRINT | JSON_UNESCAPED_SLASHES) . "\n");
sanitize-dont-refuse: strip dangerous symlinks via tar --exclude Shifts the sandbox's symlink handling from "refuse the whole tarball" to "drop the dangerous entries from extraction and record them as quarantine actions". This is what sandbox mode is supposed to do — make malicious cpmoves safe to import rather than gate-keeping them. Three coordinated changes: 1. scan-symlinks.php — exit 0 even when DANGEROUS findings exist. The JSON report is the source of truth; the caller decides what to do with it. Usage/IO errors still exit 2. STDERR still names each finding (now "STRIP X -> Y" instead of "refusing tarball") so the streamed [container] log on the panel side surfaces them. 2. extract.sh — reads the scan-symlinks report, builds a newline-delimited exclude list of DANGEROUS archive_paths, and passes it to `tar --exclude-from=`. The stripped entries never reach the filesystem; tar skips them silently. Also writes a small JSON sidecar at $EXTRACT_DIR/.cpanel-importer-stripped-symlinks.json describing each strip-action so the merge step can surface them in report.json without re-parsing scan-symlinks output. 3. entrypoint.sh write_report — reads the sidecar, prepends each stripped_dangerous_symlink action to the actions[] list, bumps files_quarantined by the strip-count, and rewrites summary_for_panel.alert_message to call them out distinctly: "N dangerous symlink(s) stripped during extract; M files quarantined; K cleaned in place. Customer site may have been compromised at the source — recommend review." Result on darkside: instead of the import failing on the ALFA alfasymlink/root entry, that entry is silently skipped during extract, recorded as `stripped_dangerous_symlink path=... target=/ reason=absolute target is root /`, and the rest of the tarball extracts normally. Subsequent ClamAV scan + DB sanitization run to completion; panel sees a verdict-completed import with the stripped symlinks visible in the Sanitization Sandbox panel on the results page. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 11:13:57 -07:00
// Sandbox-mode posture: never refuse. Log every DANGEROUS finding to
// stderr so the panel sees them in the streamed [container] log, and let
// extract.sh inspect --report to decide which entries to exclude from
// the tar untar. Caller treats exit 0 as "scan completed; consult report".
Initial bootstrap: cpanel-importer sanitization sandbox Skeleton for the cpanel-importer Docker container — a one-shot sandbox the WHP panel invokes BEFORE extracting a customer cpmove tarball. See cpanel-import-container-spec.md (in /workspace/) for the full design. What this ships in v1.0: - Dockerfile: almalinux:10-minimal + PHP 8.4 (Remi) + ClamAV 1.4 + SaneSecurity Foxhole.PHP rules + tar/mariadb-client/rsync. Runs as UID 999 (whp-import) via the panel-side --user 999:999 flag. - scripts/entrypoint.sh: validates env, runs (optional) freshclam, drives extract -> scan-files -> scan-dbs -> rsync -> report.json. - scripts/extract.sh + scripts/lib/scan-symlinks.php: pre-extract symlink scan ported standalone from web-files/libs/CpanelBackupImporter.php (the existing 2026-05-29 whp02 destruction-vector fix). Aborts with exit 3 before tar runs if any DANGEROUS symlink is found. - scripts/scan-files.php: ClamAV walk + classify-and-action. v1.0 ships with an empty cleaner registry — every hit is QUARANTINE_ONLY. Cleaner hooks are stubbed for v1.1. - scripts/scan-dbs.php: regex MyISAM -> InnoDB rewrite (always applied), WordPress identification, and ONE WP content scan check (siteurl_external_domain). v1.1 will grow the check set. - scripts/lib/safety-net.php: container-narrow open_basedir allow-list, much tighter than the panel-side one. - .gitea/workflows/build-push.yaml: builds + smoke-tests + PHP-syntax-checks + bash-syntax-checks before pushing to repo.anhonesthost.net/cloud-hosting-platform/cpanel-importer. - tests/build-fixtures.sh: builds cpmove-clean.tar.gz (benign WP dump) and cpmove-alfa.tar.gz (the ALFA-shell symlink-to-/etc vector) for local end-to-end testing. - README.md / CONTRIBUTING.md: docker-run invocation, bind-mount catalog, report.json schema, how to add a cleaner pattern or a WP scan signature. Local acceptance test results: - clean fixture -> status=completed, 3 MyISAM->InnoDB, no flags, 0 - ALFA fixture -> exit 1, status=failed, failed_stage=extract, "tarball contains dangerous symlinks; aborting" on stderr - compromised-siteurl fixture -> imported_into_new_server=false, .flagged file written, summary_for_panel.show_alert=true Image size: 197 MB compressed (gzipped docker save), ~397 MB unique layers extracted. Well under the spec's 600 MB compressed / 1.2 GB extracted budget. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-30 19:56:57 -07:00
if ($dangerousCount > 0) {
sanitize-dont-refuse: strip dangerous symlinks via tar --exclude Shifts the sandbox's symlink handling from "refuse the whole tarball" to "drop the dangerous entries from extraction and record them as quarantine actions". This is what sandbox mode is supposed to do — make malicious cpmoves safe to import rather than gate-keeping them. Three coordinated changes: 1. scan-symlinks.php — exit 0 even when DANGEROUS findings exist. The JSON report is the source of truth; the caller decides what to do with it. Usage/IO errors still exit 2. STDERR still names each finding (now "STRIP X -> Y" instead of "refusing tarball") so the streamed [container] log on the panel side surfaces them. 2. extract.sh — reads the scan-symlinks report, builds a newline-delimited exclude list of DANGEROUS archive_paths, and passes it to `tar --exclude-from=`. The stripped entries never reach the filesystem; tar skips them silently. Also writes a small JSON sidecar at $EXTRACT_DIR/.cpanel-importer-stripped-symlinks.json describing each strip-action so the merge step can surface them in report.json without re-parsing scan-symlinks output. 3. entrypoint.sh write_report — reads the sidecar, prepends each stripped_dangerous_symlink action to the actions[] list, bumps files_quarantined by the strip-count, and rewrites summary_for_panel.alert_message to call them out distinctly: "N dangerous symlink(s) stripped during extract; M files quarantined; K cleaned in place. Customer site may have been compromised at the source — recommend review." Result on darkside: instead of the import failing on the ALFA alfasymlink/root entry, that entry is silently skipped during extract, recorded as `stripped_dangerous_symlink path=... target=/ reason=absolute target is root /`, and the rest of the tarball extracts normally. Subsequent ClamAV scan + DB sanitization run to completion; panel sees a verdict-completed import with the stripped symlinks visible in the Sanitization Sandbox panel on the results page. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 11:13:57 -07:00
fwrite(STDERR, "scan-symlinks: $dangerousCount DANGEROUS finding(s) will be stripped during extract\n");
Initial bootstrap: cpanel-importer sanitization sandbox Skeleton for the cpanel-importer Docker container — a one-shot sandbox the WHP panel invokes BEFORE extracting a customer cpmove tarball. See cpanel-import-container-spec.md (in /workspace/) for the full design. What this ships in v1.0: - Dockerfile: almalinux:10-minimal + PHP 8.4 (Remi) + ClamAV 1.4 + SaneSecurity Foxhole.PHP rules + tar/mariadb-client/rsync. Runs as UID 999 (whp-import) via the panel-side --user 999:999 flag. - scripts/entrypoint.sh: validates env, runs (optional) freshclam, drives extract -> scan-files -> scan-dbs -> rsync -> report.json. - scripts/extract.sh + scripts/lib/scan-symlinks.php: pre-extract symlink scan ported standalone from web-files/libs/CpanelBackupImporter.php (the existing 2026-05-29 whp02 destruction-vector fix). Aborts with exit 3 before tar runs if any DANGEROUS symlink is found. - scripts/scan-files.php: ClamAV walk + classify-and-action. v1.0 ships with an empty cleaner registry — every hit is QUARANTINE_ONLY. Cleaner hooks are stubbed for v1.1. - scripts/scan-dbs.php: regex MyISAM -> InnoDB rewrite (always applied), WordPress identification, and ONE WP content scan check (siteurl_external_domain). v1.1 will grow the check set. - scripts/lib/safety-net.php: container-narrow open_basedir allow-list, much tighter than the panel-side one. - .gitea/workflows/build-push.yaml: builds + smoke-tests + PHP-syntax-checks + bash-syntax-checks before pushing to repo.anhonesthost.net/cloud-hosting-platform/cpanel-importer. - tests/build-fixtures.sh: builds cpmove-clean.tar.gz (benign WP dump) and cpmove-alfa.tar.gz (the ALFA-shell symlink-to-/etc vector) for local end-to-end testing. - README.md / CONTRIBUTING.md: docker-run invocation, bind-mount catalog, report.json schema, how to add a cleaner pattern or a WP scan signature. Local acceptance test results: - clean fixture -> status=completed, 3 MyISAM->InnoDB, no flags, 0 - ALFA fixture -> exit 1, status=failed, failed_stage=extract, "tarball contains dangerous symlinks; aborting" on stderr - compromised-siteurl fixture -> imported_into_new_server=false, .flagged file written, summary_for_panel.show_alert=true Image size: 197 MB compressed (gzipped docker save), ~397 MB unique layers extracted. Well under the spec's 600 MB compressed / 1.2 GB extracted budget. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-30 19:56:57 -07:00
foreach ($findings as $f) {
if ($f['type'] === 'DANGEROUS') {
sanitize-dont-refuse: strip dangerous symlinks via tar --exclude Shifts the sandbox's symlink handling from "refuse the whole tarball" to "drop the dangerous entries from extraction and record them as quarantine actions". This is what sandbox mode is supposed to do — make malicious cpmoves safe to import rather than gate-keeping them. Three coordinated changes: 1. scan-symlinks.php — exit 0 even when DANGEROUS findings exist. The JSON report is the source of truth; the caller decides what to do with it. Usage/IO errors still exit 2. STDERR still names each finding (now "STRIP X -> Y" instead of "refusing tarball") so the streamed [container] log on the panel side surfaces them. 2. extract.sh — reads the scan-symlinks report, builds a newline-delimited exclude list of DANGEROUS archive_paths, and passes it to `tar --exclude-from=`. The stripped entries never reach the filesystem; tar skips them silently. Also writes a small JSON sidecar at $EXTRACT_DIR/.cpanel-importer-stripped-symlinks.json describing each strip-action so the merge step can surface them in report.json without re-parsing scan-symlinks output. 3. entrypoint.sh write_report — reads the sidecar, prepends each stripped_dangerous_symlink action to the actions[] list, bumps files_quarantined by the strip-count, and rewrites summary_for_panel.alert_message to call them out distinctly: "N dangerous symlink(s) stripped during extract; M files quarantined; K cleaned in place. Customer site may have been compromised at the source — recommend review." Result on darkside: instead of the import failing on the ALFA alfasymlink/root entry, that entry is silently skipped during extract, recorded as `stripped_dangerous_symlink path=... target=/ reason=absolute target is root /`, and the rest of the tarball extracts normally. Subsequent ClamAV scan + DB sanitization run to completion; panel sees a verdict-completed import with the stripped symlinks visible in the Sanitization Sandbox panel on the results page. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 11:13:57 -07:00
fwrite(STDERR, sprintf(" STRIP %s -> %s (%s)\n", $f['archive_path'], $f['target'], $f['reason']));
Initial bootstrap: cpanel-importer sanitization sandbox Skeleton for the cpanel-importer Docker container — a one-shot sandbox the WHP panel invokes BEFORE extracting a customer cpmove tarball. See cpanel-import-container-spec.md (in /workspace/) for the full design. What this ships in v1.0: - Dockerfile: almalinux:10-minimal + PHP 8.4 (Remi) + ClamAV 1.4 + SaneSecurity Foxhole.PHP rules + tar/mariadb-client/rsync. Runs as UID 999 (whp-import) via the panel-side --user 999:999 flag. - scripts/entrypoint.sh: validates env, runs (optional) freshclam, drives extract -> scan-files -> scan-dbs -> rsync -> report.json. - scripts/extract.sh + scripts/lib/scan-symlinks.php: pre-extract symlink scan ported standalone from web-files/libs/CpanelBackupImporter.php (the existing 2026-05-29 whp02 destruction-vector fix). Aborts with exit 3 before tar runs if any DANGEROUS symlink is found. - scripts/scan-files.php: ClamAV walk + classify-and-action. v1.0 ships with an empty cleaner registry — every hit is QUARANTINE_ONLY. Cleaner hooks are stubbed for v1.1. - scripts/scan-dbs.php: regex MyISAM -> InnoDB rewrite (always applied), WordPress identification, and ONE WP content scan check (siteurl_external_domain). v1.1 will grow the check set. - scripts/lib/safety-net.php: container-narrow open_basedir allow-list, much tighter than the panel-side one. - .gitea/workflows/build-push.yaml: builds + smoke-tests + PHP-syntax-checks + bash-syntax-checks before pushing to repo.anhonesthost.net/cloud-hosting-platform/cpanel-importer. - tests/build-fixtures.sh: builds cpmove-clean.tar.gz (benign WP dump) and cpmove-alfa.tar.gz (the ALFA-shell symlink-to-/etc vector) for local end-to-end testing. - README.md / CONTRIBUTING.md: docker-run invocation, bind-mount catalog, report.json schema, how to add a cleaner pattern or a WP scan signature. Local acceptance test results: - clean fixture -> status=completed, 3 MyISAM->InnoDB, no flags, 0 - ALFA fixture -> exit 1, status=failed, failed_stage=extract, "tarball contains dangerous symlinks; aborting" on stderr - compromised-siteurl fixture -> imported_into_new_server=false, .flagged file written, summary_for_panel.show_alert=true Image size: 197 MB compressed (gzipped docker save), ~397 MB unique layers extracted. Well under the spec's 600 MB compressed / 1.2 GB extracted budget. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-30 19:56:57 -07:00
}
}
}
sanitize-dont-refuse: strip dangerous symlinks via tar --exclude Shifts the sandbox's symlink handling from "refuse the whole tarball" to "drop the dangerous entries from extraction and record them as quarantine actions". This is what sandbox mode is supposed to do — make malicious cpmoves safe to import rather than gate-keeping them. Three coordinated changes: 1. scan-symlinks.php — exit 0 even when DANGEROUS findings exist. The JSON report is the source of truth; the caller decides what to do with it. Usage/IO errors still exit 2. STDERR still names each finding (now "STRIP X -> Y" instead of "refusing tarball") so the streamed [container] log on the panel side surfaces them. 2. extract.sh — reads the scan-symlinks report, builds a newline-delimited exclude list of DANGEROUS archive_paths, and passes it to `tar --exclude-from=`. The stripped entries never reach the filesystem; tar skips them silently. Also writes a small JSON sidecar at $EXTRACT_DIR/.cpanel-importer-stripped-symlinks.json describing each strip-action so the merge step can surface them in report.json without re-parsing scan-symlinks output. 3. entrypoint.sh write_report — reads the sidecar, prepends each stripped_dangerous_symlink action to the actions[] list, bumps files_quarantined by the strip-count, and rewrites summary_for_panel.alert_message to call them out distinctly: "N dangerous symlink(s) stripped during extract; M files quarantined; K cleaned in place. Customer site may have been compromised at the source — recommend review." Result on darkside: instead of the import failing on the ALFA alfasymlink/root entry, that entry is silently skipped during extract, recorded as `stripped_dangerous_symlink path=... target=/ reason=absolute target is root /`, and the rest of the tarball extracts normally. Subsequent ClamAV scan + DB sanitization run to completion; panel sees a verdict-completed import with the stripped symlinks visible in the Sanitization Sandbox panel on the results page. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 11:13:57 -07:00
fwrite(STDERR, "scan-symlinks: scan complete (uncertain=$uncertainCount, dangerous=$dangerousCount)\n");
Initial bootstrap: cpanel-importer sanitization sandbox Skeleton for the cpanel-importer Docker container — a one-shot sandbox the WHP panel invokes BEFORE extracting a customer cpmove tarball. See cpanel-import-container-spec.md (in /workspace/) for the full design. What this ships in v1.0: - Dockerfile: almalinux:10-minimal + PHP 8.4 (Remi) + ClamAV 1.4 + SaneSecurity Foxhole.PHP rules + tar/mariadb-client/rsync. Runs as UID 999 (whp-import) via the panel-side --user 999:999 flag. - scripts/entrypoint.sh: validates env, runs (optional) freshclam, drives extract -> scan-files -> scan-dbs -> rsync -> report.json. - scripts/extract.sh + scripts/lib/scan-symlinks.php: pre-extract symlink scan ported standalone from web-files/libs/CpanelBackupImporter.php (the existing 2026-05-29 whp02 destruction-vector fix). Aborts with exit 3 before tar runs if any DANGEROUS symlink is found. - scripts/scan-files.php: ClamAV walk + classify-and-action. v1.0 ships with an empty cleaner registry — every hit is QUARANTINE_ONLY. Cleaner hooks are stubbed for v1.1. - scripts/scan-dbs.php: regex MyISAM -> InnoDB rewrite (always applied), WordPress identification, and ONE WP content scan check (siteurl_external_domain). v1.1 will grow the check set. - scripts/lib/safety-net.php: container-narrow open_basedir allow-list, much tighter than the panel-side one. - .gitea/workflows/build-push.yaml: builds + smoke-tests + PHP-syntax-checks + bash-syntax-checks before pushing to repo.anhonesthost.net/cloud-hosting-platform/cpanel-importer. - tests/build-fixtures.sh: builds cpmove-clean.tar.gz (benign WP dump) and cpmove-alfa.tar.gz (the ALFA-shell symlink-to-/etc vector) for local end-to-end testing. - README.md / CONTRIBUTING.md: docker-run invocation, bind-mount catalog, report.json schema, how to add a cleaner pattern or a WP scan signature. Local acceptance test results: - clean fixture -> status=completed, 3 MyISAM->InnoDB, no flags, 0 - ALFA fixture -> exit 1, status=failed, failed_stage=extract, "tarball contains dangerous symlinks; aborting" on stderr - compromised-siteurl fixture -> imported_into_new_server=false, .flagged file written, summary_for_panel.show_alert=true Image size: 197 MB compressed (gzipped docker save), ~397 MB unique layers extracted. Well under the spec's 600 MB compressed / 1.2 GB extracted budget. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-30 19:56:57 -07:00
exit(0);