Initial bootstrap: cpanel-importer sanitization sandbox

Skeleton for the cpanel-importer Docker container — a one-shot
sandbox the WHP panel invokes BEFORE extracting a customer cpmove
tarball. See cpanel-import-container-spec.md (in /workspace/) for the
full design.

What this ships in v1.0:

- Dockerfile: almalinux:10-minimal + PHP 8.4 (Remi) + ClamAV 1.4 +
  SaneSecurity Foxhole.PHP rules + tar/mariadb-client/rsync. Runs as
  UID 999 (whp-import) via the panel-side --user 999:999 flag.

- scripts/entrypoint.sh: validates env, runs (optional) freshclam,
  drives extract -> scan-files -> scan-dbs -> rsync -> report.json.

- scripts/extract.sh + scripts/lib/scan-symlinks.php: pre-extract
  symlink scan ported standalone from
  web-files/libs/CpanelBackupImporter.php (the existing 2026-05-29
  whp02 destruction-vector fix). Aborts with exit 3 before tar runs
  if any DANGEROUS symlink is found.

- scripts/scan-files.php: ClamAV walk + classify-and-action. v1.0
  ships with an empty cleaner registry — every hit is
  QUARANTINE_ONLY. Cleaner hooks are stubbed for v1.1.

- scripts/scan-dbs.php: regex MyISAM -> InnoDB rewrite (always
  applied), WordPress identification, and ONE WP content scan check
  (siteurl_external_domain). v1.1 will grow the check set.

- scripts/lib/safety-net.php: container-narrow open_basedir
  allow-list, much tighter than the panel-side one.

- .gitea/workflows/build-push.yaml: builds + smoke-tests +
  PHP-syntax-checks + bash-syntax-checks before pushing to
  repo.anhonesthost.net/cloud-hosting-platform/cpanel-importer.

- tests/build-fixtures.sh: builds cpmove-clean.tar.gz (benign WP
  dump) and cpmove-alfa.tar.gz (the ALFA-shell symlink-to-/etc
  vector) for local end-to-end testing.

- README.md / CONTRIBUTING.md: docker-run invocation, bind-mount
  catalog, report.json schema, how to add a cleaner pattern or a WP
  scan signature.

Local acceptance test results:
- clean fixture -> status=completed, 3 MyISAM->InnoDB, no flags, 0
- ALFA fixture -> exit 1, status=failed, failed_stage=extract,
  "tarball contains dangerous symlinks; aborting" on stderr
- compromised-siteurl fixture -> imported_into_new_server=false,
  .flagged file written, summary_for_panel.show_alert=true

Image size: 197 MB compressed (gzipped docker save), ~397 MB unique
layers extracted. Well under the spec's 600 MB compressed / 1.2 GB
extracted budget.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Claude (bootstrap)
2026-05-30 19:56:57 -07:00
commit 5487dfc8f1
17 changed files with 2008 additions and 0 deletions

64
scripts/extract.sh Executable file
View File

@@ -0,0 +1,64 @@
#!/usr/bin/env bash
#
# extract.sh — pre-extract symlink scan + cpmove untar.
#
# Usage: extract.sh <tarball> <dest> <username>
#
# Calls scripts/lib/scan-symlinks.php first; if it reports any DANGEROUS
# findings we abort BEFORE tar runs (per spec §0 step 2). On clean,
# extracts with the same hardening flags CpanelBackupImporter::extractBackup
# uses on the panel today (see web-files/libs/CpanelBackupImporter.php).
set -euo pipefail
TARBALL="${1:?usage: extract.sh <tarball> <dest> <username>}"
DEST="${2:?usage: extract.sh <tarball> <dest> <username>}"
USERNAME="${3:?usage: extract.sh <tarball> <dest> <username>}"
ts() { date -u +'%Y-%m-%dT%H:%M:%SZ'; }
log() { printf '[%s] extract: %s\n' "$(ts)" "$*"; }
[[ -f "$TARBALL" ]] || { log "tarball not found: $TARBALL"; exit 2; }
mkdir -p "$DEST"
# --- pre-extract symlink scan ---------------------------------------------
log "scanning tarball for dangerous symlinks (cpmove vector check)"
SYMLINK_REPORT=$(mktemp -p /tmp scan-symlinks.XXXXXX.json)
if ! php /scripts/lib/scan-symlinks.php \
--tarball "$TARBALL" \
--username "$USERNAME" \
--report "$SYMLINK_REPORT"; then
log "scan-symlinks.php exited non-zero"
cat "$SYMLINK_REPORT" >&2 || true
log "ABORT: tarball contains dangerous symlinks; aborting"
# Propagate the report on stdout so entrypoint.sh can include it
# in the failure record.
exit 3
fi
log "symlink scan clean (no DANGEROUS findings)"
# --- extract --------------------------------------------------------------
# Detect compression. cpmove can be .tar.gz / .tar.bz2 / .tar.
TAR_FLAGS="-xf"
case "$TARBALL" in
*.tar.gz|*.tgz) TAR_FLAGS="-xzf" ;;
*.tar.bz2|*.tbz2) TAR_FLAGS="-xjf" ;;
*.tar.xz|*.txz) TAR_FLAGS="-xJf" ;;
*.tar) TAR_FLAGS="-xf" ;;
esac
log "extracting with hardened tar flags into $DEST"
# Hardening flags (mirrored from CpanelBackupImporter::extractBackup):
# --no-same-owner / --no-same-permissions: drop archive-recorded
# uid/perm bits so the cpmove can't drop setuid binaries at us.
# --no-overwrite-dir: refuse to clobber existing directory metadata,
# closing one historical tar-symlink-escape vector.
# --absolute-names is NOT used — leading / in a member name is stripped.
cd "$DEST"
tar --no-same-owner --no-same-permissions --no-overwrite-dir $TAR_FLAGS "$TARBALL"
log "extracted OK ($(find "$DEST" -type f | wc -l) files)"
exit 0