Initial bootstrap: cpanel-importer sanitization sandbox

Skeleton for the cpanel-importer Docker container — a one-shot
sandbox the WHP panel invokes BEFORE extracting a customer cpmove
tarball. See cpanel-import-container-spec.md (in /workspace/) for the
full design.

What this ships in v1.0:

- Dockerfile: almalinux:10-minimal + PHP 8.4 (Remi) + ClamAV 1.4 +
  SaneSecurity Foxhole.PHP rules + tar/mariadb-client/rsync. Runs as
  UID 999 (whp-import) via the panel-side --user 999:999 flag.

- scripts/entrypoint.sh: validates env, runs (optional) freshclam,
  drives extract -> scan-files -> scan-dbs -> rsync -> report.json.

- scripts/extract.sh + scripts/lib/scan-symlinks.php: pre-extract
  symlink scan ported standalone from
  web-files/libs/CpanelBackupImporter.php (the existing 2026-05-29
  whp02 destruction-vector fix). Aborts with exit 3 before tar runs
  if any DANGEROUS symlink is found.

- scripts/scan-files.php: ClamAV walk + classify-and-action. v1.0
  ships with an empty cleaner registry — every hit is
  QUARANTINE_ONLY. Cleaner hooks are stubbed for v1.1.

- scripts/scan-dbs.php: regex MyISAM -> InnoDB rewrite (always
  applied), WordPress identification, and ONE WP content scan check
  (siteurl_external_domain). v1.1 will grow the check set.

- scripts/lib/safety-net.php: container-narrow open_basedir
  allow-list, much tighter than the panel-side one.

- .gitea/workflows/build-push.yaml: builds + smoke-tests +
  PHP-syntax-checks + bash-syntax-checks before pushing to
  repo.anhonesthost.net/cloud-hosting-platform/cpanel-importer.

- tests/build-fixtures.sh: builds cpmove-clean.tar.gz (benign WP
  dump) and cpmove-alfa.tar.gz (the ALFA-shell symlink-to-/etc
  vector) for local end-to-end testing.

- README.md / CONTRIBUTING.md: docker-run invocation, bind-mount
  catalog, report.json schema, how to add a cleaner pattern or a WP
  scan signature.

Local acceptance test results:
- clean fixture -> status=completed, 3 MyISAM->InnoDB, no flags, 0
- ALFA fixture -> exit 1, status=failed, failed_stage=extract,
  "tarball contains dangerous symlinks; aborting" on stderr
- compromised-siteurl fixture -> imported_into_new_server=false,
  .flagged file written, summary_for_panel.show_alert=true

Image size: 197 MB compressed (gzipped docker save), ~397 MB unique
layers extracted. Well under the spec's 600 MB compressed / 1.2 GB
extracted budget.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Claude (bootstrap)
2026-05-30 19:56:57 -07:00
commit 5487dfc8f1
17 changed files with 2008 additions and 0 deletions

View File

@@ -0,0 +1,46 @@
<?php
/**
* safety-net.php — container-narrow open_basedir allow-list.
*
* The sibling at /workspace/whp/web-files/includes/safety-net.php is the
* panel's allow-list — it includes /docker, /root/whp, /etc/whp, etc.,
* because the panel legitimately reads from those.
*
* Inside this container, the worker has a much smaller set of paths it
* needs. Anything outside this list is blocked at the PHP filesystem-
* function level (PHP enforces open_basedir in unlink/scandir/fopen/
* RecursiveDirectoryIterator/etc. AFTER symlink resolution, so a planted
* symlink-to-/proc cannot escape the allow-list).
*
* HISTORY — the same destruction-bug class that motivated the panel-side
* safety-net (whp02 /usr/bin + /etc wipe, 2026-05-28/29) is the reason
* this exists. In the container the host /etc /usr /root are not bind-
* mounted, but open_basedir gives belt-and-suspenders enforcement
* against any extracted-archive symlink walker we add later.
*/
if (function_exists('ini_set')) {
// Container-internal paths only. Notable absences:
// - /etc, /usr, /var, /root — never written to by this container
// - /docker — there is no /docker in this image
// - /home — there is no /home in this image
$allowed = implode(PATH_SEPARATOR, [
'/host', // /host/backup (RO), /host/quarantine, /host/sanitized
'/tmp', // tmpfs scratch space
'/opt/whp', // WORKDIR + per-run state
'/scripts', // our own code
'/var/lib/clamav', // ClamAV signature DB
'/var/log/clamav', // freshclam log
'/etc/freshclam.conf', // single file, read-only
'/proc/self', // pid/cgroup introspection
]);
if ((string) ini_get('open_basedir') === '') {
@ini_set('open_basedir', $allowed);
}
// Realpath cache tuning matches the panel — open_basedir adds a
// realpath() to every fs op, so a bigger cache pays back fast.
@ini_set('realpath_cache_size', '512K');
@ini_set('realpath_cache_ttl', '600');
}

View File

@@ -0,0 +1,161 @@
<?php
/**
* scan-symlinks.php — standalone port of
* CpanelBackupImporter::scanTarballForDangerousSymlinks().
*
* This is the same classification logic that ships in the WHP panel today
* (web-files/libs/CpanelBackupImporter.php, ~line 2438). Lifted into a
* standalone CLI so the container can run it as an independent pre-extract
* gate without dragging in the rest of the importer.
*
* Exit codes:
* 0 — clean (no DANGEROUS findings)
* 1 — one or more DANGEROUS findings; tarball MUST NOT be extracted
* 2 — usage / I/O error
*
* Always writes a JSON report to --report describing every absolute-target
* symlink seen and the classification verdict.
*
* SECURITY NOTE — this differs from the panel implementation in ONE way:
* The panel uses file_exists($target) on the *host* to decide whether a
* target under a dangerous prefix is BENIGN_DANGLING vs DANGEROUS. We
* are running INSIDE the container so /etc and /usr DO exist (they're
* the container's own), but `--read-only --tmpfs /tmp` plus the worker
* running as UID 999 means even DANGEROUS targets cannot reach the host.
*
* We treat any absolute-target symlink under a dangerous prefix as
* DANGEROUS regardless of `file_exists()` — this is a stricter check
* than the panel's, because in the container we *can* safely refuse to
* even try the extract on a clearly malicious tarball.
*/
require __DIR__ . '/safety-net.php';
$opts = getopt('', ['tarball:', 'username:', 'report:']);
if (!isset($opts['tarball']) || !isset($opts['report'])) {
fwrite(STDERR, "usage: scan-symlinks.php --tarball <path> --report <out.json> [--username <u>]\n");
exit(2);
}
$tarPath = $opts['tarball'];
$reportPath = $opts['report'];
$username = $opts['username'] ?? '';
if (!is_file($tarPath) || !is_readable($tarPath)) {
fwrite(STDERR, "scan-symlinks: not a readable file: $tarPath\n");
exit(2);
}
// Same prefix list as the panel.
$dangerousPrefixes = [
'/etc', '/usr', '/bin', '/sbin', '/lib', '/lib64',
'/boot', '/root',
'/var/lib', '/var/log', '/var/cache', '/var/spool',
];
$findings = [];
$cpanelUsername = null;
$cmd = 'tar -tvf ' . escapeshellarg($tarPath) . ' 2>/dev/null';
$fh = @popen($cmd, 'r');
if (!$fh) {
fwrite(STDERR, "scan-symlinks: failed to spawn tar -tvf on $tarPath\n");
exit(2);
}
while (($line = fgets($fh)) !== false) {
if ($line === '' || $line[0] !== 'l') continue;
$arrow = strpos($line, ' -> ');
if ($arrow === false) continue;
$left = substr($line, 0, $arrow);
$right = rtrim(substr($line, $arrow + 4), "\r\n");
$parts = preg_split('/\s+/', $left, 6);
if (count($parts) < 6) continue;
$archivePath = $parts[5];
$target = $right;
if ($target === '' || $target[0] !== '/') continue;
if ($cpanelUsername === null) {
if (preg_match('#^cpmove-([^/]+)/#', $archivePath, $m)) {
$cpanelUsername = $m[1];
}
}
// (1) user-internal — accept symlinks pointing into the customer's
// own /home/<user>/ tree. The panel rewrites these on extract.
$userInternal = false;
$usernames = [];
if ($cpanelUsername !== null && $cpanelUsername !== '') $usernames[] = $cpanelUsername;
if ($username !== '') $usernames[] = $username;
foreach ($usernames as $u) {
$prefix = '/home/' . $u . '/';
if (strpos($target, $prefix) === 0 || $target === rtrim($prefix, '/')) {
$userInternal = true;
break;
}
if (preg_match('#^/home\d+/' . preg_quote($u, '#') . '(/|$)#', $target)) {
$userInternal = true;
break;
}
}
if ($userInternal) continue;
// (2) exact root.
$type = null;
$reason = '';
if ($target === '/') {
$type = 'DANGEROUS';
$reason = 'absolute target is root /';
} else {
// (3) — in container, every dangerous-prefix target is treated
// as DANGEROUS without a file_exists() check (see security note
// at top of file).
foreach ($dangerousPrefixes as $p) {
if ($target === $p || strpos($target, $p . '/') === 0) {
$type = 'DANGEROUS';
$reason = "absolute target resolves under system path $p";
break;
}
}
if ($type === null) {
// Target is absolute, not user-internal, not under a known
// dangerous prefix. Operators want to know about these.
$type = 'UNCERTAIN';
$reason = 'absolute target outside user tree and not on dangerous-prefix list';
}
}
$findings[] = [
'type' => $type,
'archive_path' => $archivePath,
'target' => $target,
'reason' => $reason,
];
}
pclose($fh);
$dangerousCount = count(array_filter($findings, fn($f) => $f['type'] === 'DANGEROUS'));
$uncertainCount = count(array_filter($findings, fn($f) => $f['type'] === 'UNCERTAIN'));
$report = [
'tarball' => $tarPath,
'total_findings' => count($findings),
'dangerous_count' => $dangerousCount,
'uncertain_count' => $uncertainCount,
'findings' => $findings,
];
@file_put_contents($reportPath, json_encode($report, JSON_PRETTY_PRINT | JSON_UNESCAPED_SLASHES) . "\n");
if ($dangerousCount > 0) {
fwrite(STDERR, "scan-symlinks: $dangerousCount DANGEROUS finding(s); refusing tarball\n");
foreach ($findings as $f) {
if ($f['type'] === 'DANGEROUS') {
fwrite(STDERR, sprintf(" %s -> %s (%s)\n", $f['archive_path'], $f['target'], $f['reason']));
}
}
exit(1);
}
fwrite(STDERR, "scan-symlinks: clean (uncertain=$uncertainCount, dangerous=0)\n");
exit(0);