Files
cpanel-importer/scripts/entrypoint.sh

271 lines
11 KiB
Bash
Raw Normal View History

Initial bootstrap: cpanel-importer sanitization sandbox Skeleton for the cpanel-importer Docker container — a one-shot sandbox the WHP panel invokes BEFORE extracting a customer cpmove tarball. See cpanel-import-container-spec.md (in /workspace/) for the full design. What this ships in v1.0: - Dockerfile: almalinux:10-minimal + PHP 8.4 (Remi) + ClamAV 1.4 + SaneSecurity Foxhole.PHP rules + tar/mariadb-client/rsync. Runs as UID 999 (whp-import) via the panel-side --user 999:999 flag. - scripts/entrypoint.sh: validates env, runs (optional) freshclam, drives extract -> scan-files -> scan-dbs -> rsync -> report.json. - scripts/extract.sh + scripts/lib/scan-symlinks.php: pre-extract symlink scan ported standalone from web-files/libs/CpanelBackupImporter.php (the existing 2026-05-29 whp02 destruction-vector fix). Aborts with exit 3 before tar runs if any DANGEROUS symlink is found. - scripts/scan-files.php: ClamAV walk + classify-and-action. v1.0 ships with an empty cleaner registry — every hit is QUARANTINE_ONLY. Cleaner hooks are stubbed for v1.1. - scripts/scan-dbs.php: regex MyISAM -> InnoDB rewrite (always applied), WordPress identification, and ONE WP content scan check (siteurl_external_domain). v1.1 will grow the check set. - scripts/lib/safety-net.php: container-narrow open_basedir allow-list, much tighter than the panel-side one. - .gitea/workflows/build-push.yaml: builds + smoke-tests + PHP-syntax-checks + bash-syntax-checks before pushing to repo.anhonesthost.net/cloud-hosting-platform/cpanel-importer. - tests/build-fixtures.sh: builds cpmove-clean.tar.gz (benign WP dump) and cpmove-alfa.tar.gz (the ALFA-shell symlink-to-/etc vector) for local end-to-end testing. - README.md / CONTRIBUTING.md: docker-run invocation, bind-mount catalog, report.json schema, how to add a cleaner pattern or a WP scan signature. Local acceptance test results: - clean fixture -> status=completed, 3 MyISAM->InnoDB, no flags, 0 - ALFA fixture -> exit 1, status=failed, failed_stage=extract, "tarball contains dangerous symlinks; aborting" on stderr - compromised-siteurl fixture -> imported_into_new_server=false, .flagged file written, summary_for_panel.show_alert=true Image size: 197 MB compressed (gzipped docker save), ~397 MB unique layers extracted. Well under the spec's 600 MB compressed / 1.2 GB extracted budget. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-30 19:56:57 -07:00
#!/usr/bin/env bash
#
# entrypoint.sh — main controller for the cpanel-importer sandbox.
#
# Inputs (env, set by the panel's docker run):
# IMPORT_ID unique id for this run; used in quarantine + report paths
# IMPORT_USERNAME cPanel/WHP username the cpmove belongs to
# IMPORT_BACKUP_FILE absolute path inside the container, typically
# /host/backup/cpmove-<user>.tar.gz
# CLAMAV_REFRESH "true" to run freshclam at start (default: true)
#
# Flow (spec §0):
# 1. validate env
# 2. (optional) refresh ClamAV signatures
# 3. extract → /tmp/extract/
# 4. file scan → /tmp/scan-files-report.json
# 5. DB sanitize → /tmp/sanitized/mysql/, /tmp/scan-dbs-report.json
# 6. rsync /tmp/sanitized/ → /host/sanitized/<importid>/
# 7. write /host/sanitized/<importid>/report.json (merged)
#
# On failure at any stage we still write a partial report.json with
# status="failed" + the stage that broke, then exit non-zero.
set -euo pipefail
# --- logging ---------------------------------------------------------------
ts() { date -u +'%Y-%m-%dT%H:%M:%SZ'; }
log() { printf '[%s] %s\n' "$(ts)" "$*"; }
die() { log "FATAL: $*"; write_failure_report "$STAGE" "$*"; exit 1; }
# Buffered partial state. The final report.json is written by the merge
# step (see write_final_report); if we crash before then, write_failure_report
# emits whatever partial pieces exist.
STAGE="init"
START_TS="$(date -u +%s)"
write_failure_report() {
local stage="$1"
local msg="$2"
local out_dir="/host/sanitized/${IMPORT_ID:-unknown}"
# mkdir AND the report write can both fail (mount RO, missing
# /host/sanitized, etc.); we log every failure to stderr and never
# let the report-writer abort the script.
if ! mkdir -p "$out_dir" 2>/dev/null; then
log "WARN: failure-report mkdir failed for $out_dir; report will not be persisted"
return 0
fi
if ! cat > "$out_dir/report.json" 2>/dev/null <<JSON
{
"import_id": "${IMPORT_ID:-unknown}",
"status": "failed",
"failed_stage": "$stage",
"error": $(printf '%s' "$msg" | php -r 'echo json_encode(stream_get_contents(STDIN));' 2>/dev/null || echo '"(unencodable)"'),
"scan_duration_seconds": $(( $(date -u +%s) - START_TS )),
"files": null,
"databases": null
}
JSON
then
log "WARN: failure-report write failed for $out_dir/report.json"
fi
}
# --- env validation --------------------------------------------------------
STAGE="validate_env"
log "cpanel-importer starting (container UID=$(id -u) GID=$(id -g))"
: "${IMPORT_ID:?IMPORT_ID env var is required}"
: "${IMPORT_USERNAME:?IMPORT_USERNAME env var is required}"
: "${IMPORT_BACKUP_FILE:?IMPORT_BACKUP_FILE env var is required}"
CLAMAV_REFRESH="${CLAMAV_REFRESH:-true}"
log "import_id=$IMPORT_ID username=$IMPORT_USERNAME backup=$IMPORT_BACKUP_FILE"
if [[ ! -f "$IMPORT_BACKUP_FILE" ]]; then
die "backup file does not exist or is not a regular file: $IMPORT_BACKUP_FILE"
fi
# Make sure the output dirs exist (they're bind mounts, so we trust the
# host to have created them, but mkdir -p is harmless).
QUARANTINE_DIR="/host/quarantine/$IMPORT_ID"
SANITIZED_DIR="/host/sanitized/$IMPORT_ID"
mkdir -p "$QUARANTINE_DIR" "$SANITIZED_DIR" \
|| die "cannot create quarantine/sanitized output dirs (are the bind mounts RW?)"
fix: move EXTRACT_DIR + WORK_DIR off tmpfs onto disk-backed bind mount rc=137 OOM kill triaged on whp02 darkside import. dmesg confirmed: memory: usage 2097100kB, limit 2097152kB, failcnt 132 oom_kill_process ... task=bash uid=999 Root cause: extract.sh untars the cpmove into EXTRACT_DIR which was /tmp/extract — a tmpfs mount (RAM-backed). The container's --memory 2g cgroup ceiling counts tmpfs writes against RSS, so the 3 GB cpmove decompressing into tmpfs hit the limit at ~7s into tar and the kernel killed the bash process running extract.sh. Fix is structural, not a memory bump: the disk-backed bind mount at /host/sanitized (mapped to /var/lib/whp/cpanel-importer-extract on host) has effectively unlimited capacity and doesn't count against the cgroup memory limit. Moving the working dirs there sidesteps the OOM class entirely. Layout change: EXTRACT_DIR /tmp/extract -> $SANITIZED_DIR/extract-work WORK_DIR /tmp/sanitized -> $SANITIZED_DIR/work Two ripple changes: - The old rsync_out stage cross-filesystem-copied ~10 GB from tmpfs to /host/sanitized/<id>/extracted. That's now a same-filesystem `mv` (constant-time rename) since extract-work IS already inside /host/sanitized/<id>/. Stage renamed to finalize_layout for clarity; pre-existing wipe of extracted/ + mysql/ guards against partial-run residue. - The stripped-symlinks actions sidecar moved to /tmp explicitly (entrypoint.sh passes the 4th arg to extract.sh) so finalize's rename doesn't (a) carry a dotfile into the cleaned tree the panel imports and (b) move it out from under write_report's read. Also fixes the unrelated-but-cosmetic freshclam warning by cd'ing to /var/lib/clamav (the configured DatabaseDirectory, tmpfs writable) before invoking freshclam in a subshell. The "Can't create freshclam.dat in /opt/whp" errors were because /opt/whp is the container WORKDIR which lives on the read-only rootfs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 11:29:31 -07:00
# Working scratch lives inside the disk-backed bind mount, NOT under /tmp.
# /tmp is mounted as tmpfs (RAM-backed) by the panel for fast small-file
# scratch (per-stage reports, exclude lists). Putting the multi-GB cpmove
# extract there blew the container's --memory 2g cgroup ceiling (tmpfs
# writes count against cgroup RSS), surfaced as rc=137 OOM kills mid-tar.
#
# Layout:
# EXTRACT_DIR $SANITIZED_DIR/extract-work — tar untars here. After
# scan-files quarantines bad files, this is the cleaned
# tree. Renamed to $SANITIZED_DIR/extracted at the end of
# the run so the panel can find it at the expected path.
# WORK_DIR $SANITIZED_DIR/work — scan-dbs writes cleaned
# SQL dumps here; folded into $SANITIZED_DIR/mysql at the
# end of the run.
EXTRACT_DIR="$SANITIZED_DIR/extract-work"
WORK_DIR="$SANITIZED_DIR/work"
Initial bootstrap: cpanel-importer sanitization sandbox Skeleton for the cpanel-importer Docker container — a one-shot sandbox the WHP panel invokes BEFORE extracting a customer cpmove tarball. See cpanel-import-container-spec.md (in /workspace/) for the full design. What this ships in v1.0: - Dockerfile: almalinux:10-minimal + PHP 8.4 (Remi) + ClamAV 1.4 + SaneSecurity Foxhole.PHP rules + tar/mariadb-client/rsync. Runs as UID 999 (whp-import) via the panel-side --user 999:999 flag. - scripts/entrypoint.sh: validates env, runs (optional) freshclam, drives extract -> scan-files -> scan-dbs -> rsync -> report.json. - scripts/extract.sh + scripts/lib/scan-symlinks.php: pre-extract symlink scan ported standalone from web-files/libs/CpanelBackupImporter.php (the existing 2026-05-29 whp02 destruction-vector fix). Aborts with exit 3 before tar runs if any DANGEROUS symlink is found. - scripts/scan-files.php: ClamAV walk + classify-and-action. v1.0 ships with an empty cleaner registry — every hit is QUARANTINE_ONLY. Cleaner hooks are stubbed for v1.1. - scripts/scan-dbs.php: regex MyISAM -> InnoDB rewrite (always applied), WordPress identification, and ONE WP content scan check (siteurl_external_domain). v1.1 will grow the check set. - scripts/lib/safety-net.php: container-narrow open_basedir allow-list, much tighter than the panel-side one. - .gitea/workflows/build-push.yaml: builds + smoke-tests + PHP-syntax-checks + bash-syntax-checks before pushing to repo.anhonesthost.net/cloud-hosting-platform/cpanel-importer. - tests/build-fixtures.sh: builds cpmove-clean.tar.gz (benign WP dump) and cpmove-alfa.tar.gz (the ALFA-shell symlink-to-/etc vector) for local end-to-end testing. - README.md / CONTRIBUTING.md: docker-run invocation, bind-mount catalog, report.json schema, how to add a cleaner pattern or a WP scan signature. Local acceptance test results: - clean fixture -> status=completed, 3 MyISAM->InnoDB, no flags, 0 - ALFA fixture -> exit 1, status=failed, failed_stage=extract, "tarball contains dangerous symlinks; aborting" on stderr - compromised-siteurl fixture -> imported_into_new_server=false, .flagged file written, summary_for_panel.show_alert=true Image size: 197 MB compressed (gzipped docker save), ~397 MB unique layers extracted. Well under the spec's 600 MB compressed / 1.2 GB extracted budget. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-30 19:56:57 -07:00
mkdir -p "$EXTRACT_DIR" "$WORK_DIR/mysql"
# --- refresh ClamAV signatures --------------------------------------------
STAGE="freshclam"
if [[ "$CLAMAV_REFRESH" == "true" ]]; then
log "refreshing ClamAV signatures (freshclam)"
fix: move EXTRACT_DIR + WORK_DIR off tmpfs onto disk-backed bind mount rc=137 OOM kill triaged on whp02 darkside import. dmesg confirmed: memory: usage 2097100kB, limit 2097152kB, failcnt 132 oom_kill_process ... task=bash uid=999 Root cause: extract.sh untars the cpmove into EXTRACT_DIR which was /tmp/extract — a tmpfs mount (RAM-backed). The container's --memory 2g cgroup ceiling counts tmpfs writes against RSS, so the 3 GB cpmove decompressing into tmpfs hit the limit at ~7s into tar and the kernel killed the bash process running extract.sh. Fix is structural, not a memory bump: the disk-backed bind mount at /host/sanitized (mapped to /var/lib/whp/cpanel-importer-extract on host) has effectively unlimited capacity and doesn't count against the cgroup memory limit. Moving the working dirs there sidesteps the OOM class entirely. Layout change: EXTRACT_DIR /tmp/extract -> $SANITIZED_DIR/extract-work WORK_DIR /tmp/sanitized -> $SANITIZED_DIR/work Two ripple changes: - The old rsync_out stage cross-filesystem-copied ~10 GB from tmpfs to /host/sanitized/<id>/extracted. That's now a same-filesystem `mv` (constant-time rename) since extract-work IS already inside /host/sanitized/<id>/. Stage renamed to finalize_layout for clarity; pre-existing wipe of extracted/ + mysql/ guards against partial-run residue. - The stripped-symlinks actions sidecar moved to /tmp explicitly (entrypoint.sh passes the 4th arg to extract.sh) so finalize's rename doesn't (a) carry a dotfile into the cleaned tree the panel imports and (b) move it out from under write_report's read. Also fixes the unrelated-but-cosmetic freshclam warning by cd'ing to /var/lib/clamav (the configured DatabaseDirectory, tmpfs writable) before invoking freshclam in a subshell. The "Can't create freshclam.dat in /opt/whp" errors were because /opt/whp is the container WORKDIR which lives on the read-only rootfs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 11:29:31 -07:00
# freshclam writes freshclam.dat to its CWD; the container's WORKDIR
# is /opt/whp which lives on the read-only rootfs, so freshclam errors
# with "Can't create freshclam.dat in /opt/whp" before it ever reaches
# the database directory. Subshell + cd to the tmpfs at /var/lib/clamav
# (the DatabaseDirectory configured in /etc/freshclam.conf) keeps the
# entrypoint's CWD intact for later stages.
Initial bootstrap: cpanel-importer sanitization sandbox Skeleton for the cpanel-importer Docker container — a one-shot sandbox the WHP panel invokes BEFORE extracting a customer cpmove tarball. See cpanel-import-container-spec.md (in /workspace/) for the full design. What this ships in v1.0: - Dockerfile: almalinux:10-minimal + PHP 8.4 (Remi) + ClamAV 1.4 + SaneSecurity Foxhole.PHP rules + tar/mariadb-client/rsync. Runs as UID 999 (whp-import) via the panel-side --user 999:999 flag. - scripts/entrypoint.sh: validates env, runs (optional) freshclam, drives extract -> scan-files -> scan-dbs -> rsync -> report.json. - scripts/extract.sh + scripts/lib/scan-symlinks.php: pre-extract symlink scan ported standalone from web-files/libs/CpanelBackupImporter.php (the existing 2026-05-29 whp02 destruction-vector fix). Aborts with exit 3 before tar runs if any DANGEROUS symlink is found. - scripts/scan-files.php: ClamAV walk + classify-and-action. v1.0 ships with an empty cleaner registry — every hit is QUARANTINE_ONLY. Cleaner hooks are stubbed for v1.1. - scripts/scan-dbs.php: regex MyISAM -> InnoDB rewrite (always applied), WordPress identification, and ONE WP content scan check (siteurl_external_domain). v1.1 will grow the check set. - scripts/lib/safety-net.php: container-narrow open_basedir allow-list, much tighter than the panel-side one. - .gitea/workflows/build-push.yaml: builds + smoke-tests + PHP-syntax-checks + bash-syntax-checks before pushing to repo.anhonesthost.net/cloud-hosting-platform/cpanel-importer. - tests/build-fixtures.sh: builds cpmove-clean.tar.gz (benign WP dump) and cpmove-alfa.tar.gz (the ALFA-shell symlink-to-/etc vector) for local end-to-end testing. - README.md / CONTRIBUTING.md: docker-run invocation, bind-mount catalog, report.json schema, how to add a cleaner pattern or a WP scan signature. Local acceptance test results: - clean fixture -> status=completed, 3 MyISAM->InnoDB, no flags, 0 - ALFA fixture -> exit 1, status=failed, failed_stage=extract, "tarball contains dangerous symlinks; aborting" on stderr - compromised-siteurl fixture -> imported_into_new_server=false, .flagged file written, summary_for_panel.show_alert=true Image size: 197 MB compressed (gzipped docker save), ~397 MB unique layers extracted. Well under the spec's 600 MB compressed / 1.2 GB extracted budget. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-30 19:56:57 -07:00
# freshclam is allowed to fail (e.g., container has no outbound net);
# we proceed with the baseline rules from build time + log a warning.
fix: move EXTRACT_DIR + WORK_DIR off tmpfs onto disk-backed bind mount rc=137 OOM kill triaged on whp02 darkside import. dmesg confirmed: memory: usage 2097100kB, limit 2097152kB, failcnt 132 oom_kill_process ... task=bash uid=999 Root cause: extract.sh untars the cpmove into EXTRACT_DIR which was /tmp/extract — a tmpfs mount (RAM-backed). The container's --memory 2g cgroup ceiling counts tmpfs writes against RSS, so the 3 GB cpmove decompressing into tmpfs hit the limit at ~7s into tar and the kernel killed the bash process running extract.sh. Fix is structural, not a memory bump: the disk-backed bind mount at /host/sanitized (mapped to /var/lib/whp/cpanel-importer-extract on host) has effectively unlimited capacity and doesn't count against the cgroup memory limit. Moving the working dirs there sidesteps the OOM class entirely. Layout change: EXTRACT_DIR /tmp/extract -> $SANITIZED_DIR/extract-work WORK_DIR /tmp/sanitized -> $SANITIZED_DIR/work Two ripple changes: - The old rsync_out stage cross-filesystem-copied ~10 GB from tmpfs to /host/sanitized/<id>/extracted. That's now a same-filesystem `mv` (constant-time rename) since extract-work IS already inside /host/sanitized/<id>/. Stage renamed to finalize_layout for clarity; pre-existing wipe of extracted/ + mysql/ guards against partial-run residue. - The stripped-symlinks actions sidecar moved to /tmp explicitly (entrypoint.sh passes the 4th arg to extract.sh) so finalize's rename doesn't (a) carry a dotfile into the cleaned tree the panel imports and (b) move it out from under write_report's read. Also fixes the unrelated-but-cosmetic freshclam warning by cd'ing to /var/lib/clamav (the configured DatabaseDirectory, tmpfs writable) before invoking freshclam in a subshell. The "Can't create freshclam.dat in /opt/whp" errors were because /opt/whp is the container WORKDIR which lives on the read-only rootfs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 11:29:31 -07:00
if ! ( cd /var/lib/clamav && freshclam --no-warnings >/tmp/freshclam.log 2>&1 ); then
Initial bootstrap: cpanel-importer sanitization sandbox Skeleton for the cpanel-importer Docker container — a one-shot sandbox the WHP panel invokes BEFORE extracting a customer cpmove tarball. See cpanel-import-container-spec.md (in /workspace/) for the full design. What this ships in v1.0: - Dockerfile: almalinux:10-minimal + PHP 8.4 (Remi) + ClamAV 1.4 + SaneSecurity Foxhole.PHP rules + tar/mariadb-client/rsync. Runs as UID 999 (whp-import) via the panel-side --user 999:999 flag. - scripts/entrypoint.sh: validates env, runs (optional) freshclam, drives extract -> scan-files -> scan-dbs -> rsync -> report.json. - scripts/extract.sh + scripts/lib/scan-symlinks.php: pre-extract symlink scan ported standalone from web-files/libs/CpanelBackupImporter.php (the existing 2026-05-29 whp02 destruction-vector fix). Aborts with exit 3 before tar runs if any DANGEROUS symlink is found. - scripts/scan-files.php: ClamAV walk + classify-and-action. v1.0 ships with an empty cleaner registry — every hit is QUARANTINE_ONLY. Cleaner hooks are stubbed for v1.1. - scripts/scan-dbs.php: regex MyISAM -> InnoDB rewrite (always applied), WordPress identification, and ONE WP content scan check (siteurl_external_domain). v1.1 will grow the check set. - scripts/lib/safety-net.php: container-narrow open_basedir allow-list, much tighter than the panel-side one. - .gitea/workflows/build-push.yaml: builds + smoke-tests + PHP-syntax-checks + bash-syntax-checks before pushing to repo.anhonesthost.net/cloud-hosting-platform/cpanel-importer. - tests/build-fixtures.sh: builds cpmove-clean.tar.gz (benign WP dump) and cpmove-alfa.tar.gz (the ALFA-shell symlink-to-/etc vector) for local end-to-end testing. - README.md / CONTRIBUTING.md: docker-run invocation, bind-mount catalog, report.json schema, how to add a cleaner pattern or a WP scan signature. Local acceptance test results: - clean fixture -> status=completed, 3 MyISAM->InnoDB, no flags, 0 - ALFA fixture -> exit 1, status=failed, failed_stage=extract, "tarball contains dangerous symlinks; aborting" on stderr - compromised-siteurl fixture -> imported_into_new_server=false, .flagged file written, summary_for_panel.show_alert=true Image size: 197 MB compressed (gzipped docker save), ~397 MB unique layers extracted. Well under the spec's 600 MB compressed / 1.2 GB extracted budget. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-30 19:56:57 -07:00
log "WARN: freshclam failed; proceeding with build-time signature DB"
tail -20 /tmp/freshclam.log || true
fi
else
log "CLAMAV_REFRESH=false; skipping freshclam"
fi
# --- extract the cpmove ----------------------------------------------------
STAGE="extract"
log "stage: extract"
fix: move EXTRACT_DIR + WORK_DIR off tmpfs onto disk-backed bind mount rc=137 OOM kill triaged on whp02 darkside import. dmesg confirmed: memory: usage 2097100kB, limit 2097152kB, failcnt 132 oom_kill_process ... task=bash uid=999 Root cause: extract.sh untars the cpmove into EXTRACT_DIR which was /tmp/extract — a tmpfs mount (RAM-backed). The container's --memory 2g cgroup ceiling counts tmpfs writes against RSS, so the 3 GB cpmove decompressing into tmpfs hit the limit at ~7s into tar and the kernel killed the bash process running extract.sh. Fix is structural, not a memory bump: the disk-backed bind mount at /host/sanitized (mapped to /var/lib/whp/cpanel-importer-extract on host) has effectively unlimited capacity and doesn't count against the cgroup memory limit. Moving the working dirs there sidesteps the OOM class entirely. Layout change: EXTRACT_DIR /tmp/extract -> $SANITIZED_DIR/extract-work WORK_DIR /tmp/sanitized -> $SANITIZED_DIR/work Two ripple changes: - The old rsync_out stage cross-filesystem-copied ~10 GB from tmpfs to /host/sanitized/<id>/extracted. That's now a same-filesystem `mv` (constant-time rename) since extract-work IS already inside /host/sanitized/<id>/. Stage renamed to finalize_layout for clarity; pre-existing wipe of extracted/ + mysql/ guards against partial-run residue. - The stripped-symlinks actions sidecar moved to /tmp explicitly (entrypoint.sh passes the 4th arg to extract.sh) so finalize's rename doesn't (a) carry a dotfile into the cleaned tree the panel imports and (b) move it out from under write_report's read. Also fixes the unrelated-but-cosmetic freshclam warning by cd'ing to /var/lib/clamav (the configured DatabaseDirectory, tmpfs writable) before invoking freshclam in a subshell. The "Can't create freshclam.dat in /opt/whp" errors were because /opt/whp is the container WORKDIR which lives on the read-only rootfs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 11:29:31 -07:00
# 4th arg pins the stripped-symlinks actions sidecar to /tmp (not inside
# $EXTRACT_DIR) so finalize_layout's mv doesn't carry an importer dotfile
# into the cleaned tree and so write_report can read it after the rename.
STRIPPED_SYMLINKS_FILE="/tmp/stripped-symlinks.json"
if ! /scripts/extract.sh "$IMPORT_BACKUP_FILE" "$EXTRACT_DIR" "$IMPORT_USERNAME" "$STRIPPED_SYMLINKS_FILE"; then
Initial bootstrap: cpanel-importer sanitization sandbox Skeleton for the cpanel-importer Docker container — a one-shot sandbox the WHP panel invokes BEFORE extracting a customer cpmove tarball. See cpanel-import-container-spec.md (in /workspace/) for the full design. What this ships in v1.0: - Dockerfile: almalinux:10-minimal + PHP 8.4 (Remi) + ClamAV 1.4 + SaneSecurity Foxhole.PHP rules + tar/mariadb-client/rsync. Runs as UID 999 (whp-import) via the panel-side --user 999:999 flag. - scripts/entrypoint.sh: validates env, runs (optional) freshclam, drives extract -> scan-files -> scan-dbs -> rsync -> report.json. - scripts/extract.sh + scripts/lib/scan-symlinks.php: pre-extract symlink scan ported standalone from web-files/libs/CpanelBackupImporter.php (the existing 2026-05-29 whp02 destruction-vector fix). Aborts with exit 3 before tar runs if any DANGEROUS symlink is found. - scripts/scan-files.php: ClamAV walk + classify-and-action. v1.0 ships with an empty cleaner registry — every hit is QUARANTINE_ONLY. Cleaner hooks are stubbed for v1.1. - scripts/scan-dbs.php: regex MyISAM -> InnoDB rewrite (always applied), WordPress identification, and ONE WP content scan check (siteurl_external_domain). v1.1 will grow the check set. - scripts/lib/safety-net.php: container-narrow open_basedir allow-list, much tighter than the panel-side one. - .gitea/workflows/build-push.yaml: builds + smoke-tests + PHP-syntax-checks + bash-syntax-checks before pushing to repo.anhonesthost.net/cloud-hosting-platform/cpanel-importer. - tests/build-fixtures.sh: builds cpmove-clean.tar.gz (benign WP dump) and cpmove-alfa.tar.gz (the ALFA-shell symlink-to-/etc vector) for local end-to-end testing. - README.md / CONTRIBUTING.md: docker-run invocation, bind-mount catalog, report.json schema, how to add a cleaner pattern or a WP scan signature. Local acceptance test results: - clean fixture -> status=completed, 3 MyISAM->InnoDB, no flags, 0 - ALFA fixture -> exit 1, status=failed, failed_stage=extract, "tarball contains dangerous symlinks; aborting" on stderr - compromised-siteurl fixture -> imported_into_new_server=false, .flagged file written, summary_for_panel.show_alert=true Image size: 197 MB compressed (gzipped docker save), ~397 MB unique layers extracted. Well under the spec's 600 MB compressed / 1.2 GB extracted budget. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-30 19:56:57 -07:00
die "extract.sh failed; see stderr above"
fi
# --- ClamAV scan + auto-clean/quarantine ----------------------------------
STAGE="scan_files"
log "stage: scan_files"
php /scripts/scan-files.php \
--extract "$EXTRACT_DIR" \
--quarantine "$QUARANTINE_DIR" \
--report /tmp/scan-files-report.json \
--import-id "$IMPORT_ID" \
|| die "scan-files.php failed; see stderr above"
# --- DB engine swap + WP content scan -------------------------------------
STAGE="scan_dbs"
log "stage: scan_dbs"
php /scripts/scan-dbs.php \
--extract "$EXTRACT_DIR" \
--out "$WORK_DIR/mysql" \
--final-prefix "$SANITIZED_DIR/mysql" \
--report /tmp/scan-dbs-report.json \
--import-id "$IMPORT_ID" \
--username "$IMPORT_USERNAME" \
|| die "scan-dbs.php failed; see stderr above"
fix: move EXTRACT_DIR + WORK_DIR off tmpfs onto disk-backed bind mount rc=137 OOM kill triaged on whp02 darkside import. dmesg confirmed: memory: usage 2097100kB, limit 2097152kB, failcnt 132 oom_kill_process ... task=bash uid=999 Root cause: extract.sh untars the cpmove into EXTRACT_DIR which was /tmp/extract — a tmpfs mount (RAM-backed). The container's --memory 2g cgroup ceiling counts tmpfs writes against RSS, so the 3 GB cpmove decompressing into tmpfs hit the limit at ~7s into tar and the kernel killed the bash process running extract.sh. Fix is structural, not a memory bump: the disk-backed bind mount at /host/sanitized (mapped to /var/lib/whp/cpanel-importer-extract on host) has effectively unlimited capacity and doesn't count against the cgroup memory limit. Moving the working dirs there sidesteps the OOM class entirely. Layout change: EXTRACT_DIR /tmp/extract -> $SANITIZED_DIR/extract-work WORK_DIR /tmp/sanitized -> $SANITIZED_DIR/work Two ripple changes: - The old rsync_out stage cross-filesystem-copied ~10 GB from tmpfs to /host/sanitized/<id>/extracted. That's now a same-filesystem `mv` (constant-time rename) since extract-work IS already inside /host/sanitized/<id>/. Stage renamed to finalize_layout for clarity; pre-existing wipe of extracted/ + mysql/ guards against partial-run residue. - The stripped-symlinks actions sidecar moved to /tmp explicitly (entrypoint.sh passes the 4th arg to extract.sh) so finalize's rename doesn't (a) carry a dotfile into the cleaned tree the panel imports and (b) move it out from under write_report's read. Also fixes the unrelated-but-cosmetic freshclam warning by cd'ing to /var/lib/clamav (the configured DatabaseDirectory, tmpfs writable) before invoking freshclam in a subshell. The "Can't create freshclam.dat in /opt/whp" errors were because /opt/whp is the container WORKDIR which lives on the read-only rootfs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 11:29:31 -07:00
# --- finalize cleaned tree into /host/sanitized/<id>/ ---------------------
Initial bootstrap: cpanel-importer sanitization sandbox Skeleton for the cpanel-importer Docker container — a one-shot sandbox the WHP panel invokes BEFORE extracting a customer cpmove tarball. See cpanel-import-container-spec.md (in /workspace/) for the full design. What this ships in v1.0: - Dockerfile: almalinux:10-minimal + PHP 8.4 (Remi) + ClamAV 1.4 + SaneSecurity Foxhole.PHP rules + tar/mariadb-client/rsync. Runs as UID 999 (whp-import) via the panel-side --user 999:999 flag. - scripts/entrypoint.sh: validates env, runs (optional) freshclam, drives extract -> scan-files -> scan-dbs -> rsync -> report.json. - scripts/extract.sh + scripts/lib/scan-symlinks.php: pre-extract symlink scan ported standalone from web-files/libs/CpanelBackupImporter.php (the existing 2026-05-29 whp02 destruction-vector fix). Aborts with exit 3 before tar runs if any DANGEROUS symlink is found. - scripts/scan-files.php: ClamAV walk + classify-and-action. v1.0 ships with an empty cleaner registry — every hit is QUARANTINE_ONLY. Cleaner hooks are stubbed for v1.1. - scripts/scan-dbs.php: regex MyISAM -> InnoDB rewrite (always applied), WordPress identification, and ONE WP content scan check (siteurl_external_domain). v1.1 will grow the check set. - scripts/lib/safety-net.php: container-narrow open_basedir allow-list, much tighter than the panel-side one. - .gitea/workflows/build-push.yaml: builds + smoke-tests + PHP-syntax-checks + bash-syntax-checks before pushing to repo.anhonesthost.net/cloud-hosting-platform/cpanel-importer. - tests/build-fixtures.sh: builds cpmove-clean.tar.gz (benign WP dump) and cpmove-alfa.tar.gz (the ALFA-shell symlink-to-/etc vector) for local end-to-end testing. - README.md / CONTRIBUTING.md: docker-run invocation, bind-mount catalog, report.json schema, how to add a cleaner pattern or a WP scan signature. Local acceptance test results: - clean fixture -> status=completed, 3 MyISAM->InnoDB, no flags, 0 - ALFA fixture -> exit 1, status=failed, failed_stage=extract, "tarball contains dangerous symlinks; aborting" on stderr - compromised-siteurl fixture -> imported_into_new_server=false, .flagged file written, summary_for_panel.show_alert=true Image size: 197 MB compressed (gzipped docker save), ~397 MB unique layers extracted. Well under the spec's 600 MB compressed / 1.2 GB extracted budget. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-30 19:56:57 -07:00
fix: move EXTRACT_DIR + WORK_DIR off tmpfs onto disk-backed bind mount rc=137 OOM kill triaged on whp02 darkside import. dmesg confirmed: memory: usage 2097100kB, limit 2097152kB, failcnt 132 oom_kill_process ... task=bash uid=999 Root cause: extract.sh untars the cpmove into EXTRACT_DIR which was /tmp/extract — a tmpfs mount (RAM-backed). The container's --memory 2g cgroup ceiling counts tmpfs writes against RSS, so the 3 GB cpmove decompressing into tmpfs hit the limit at ~7s into tar and the kernel killed the bash process running extract.sh. Fix is structural, not a memory bump: the disk-backed bind mount at /host/sanitized (mapped to /var/lib/whp/cpanel-importer-extract on host) has effectively unlimited capacity and doesn't count against the cgroup memory limit. Moving the working dirs there sidesteps the OOM class entirely. Layout change: EXTRACT_DIR /tmp/extract -> $SANITIZED_DIR/extract-work WORK_DIR /tmp/sanitized -> $SANITIZED_DIR/work Two ripple changes: - The old rsync_out stage cross-filesystem-copied ~10 GB from tmpfs to /host/sanitized/<id>/extracted. That's now a same-filesystem `mv` (constant-time rename) since extract-work IS already inside /host/sanitized/<id>/. Stage renamed to finalize_layout for clarity; pre-existing wipe of extracted/ + mysql/ guards against partial-run residue. - The stripped-symlinks actions sidecar moved to /tmp explicitly (entrypoint.sh passes the 4th arg to extract.sh) so finalize's rename doesn't (a) carry a dotfile into the cleaned tree the panel imports and (b) move it out from under write_report's read. Also fixes the unrelated-but-cosmetic freshclam warning by cd'ing to /var/lib/clamav (the configured DatabaseDirectory, tmpfs writable) before invoking freshclam in a subshell. The "Can't create freshclam.dat in /opt/whp" errors were because /opt/whp is the container WORKDIR which lives on the read-only rootfs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 11:29:31 -07:00
STAGE="finalize_layout"
log "stage: finalize_layout"
# Both EXTRACT_DIR and WORK_DIR already live INSIDE $SANITIZED_DIR (the
# bind-mounted disk-backed output root), so we don't need to cross-filesystem
# rsync 10GB+ of cleaned files. A same-filesystem `mv` is constant-time
# (just a rename) — turns what used to be a multi-minute rsync into a
# fraction of a second.
#
# Cleanup posture: if a previous run partially populated `extracted/` or
# `mysql/`, we wipe them first so the rename can't fail with EEXIST. The
# container's --read-only rootfs makes accidentally removing the wrong
# path impossible — these are under the per-import bind mount only.
rm -rf "$SANITIZED_DIR/extracted" "$SANITIZED_DIR/mysql"
mv "$EXTRACT_DIR" "$SANITIZED_DIR/extracted" || die "finalize: rename extract-work failed"
mv "$WORK_DIR/mysql" "$SANITIZED_DIR/mysql" || die "finalize: rename work/mysql failed"
# Tidy up the now-empty WORK_DIR shell.
rmdir "$WORK_DIR" 2>/dev/null || true
Initial bootstrap: cpanel-importer sanitization sandbox Skeleton for the cpanel-importer Docker container — a one-shot sandbox the WHP panel invokes BEFORE extracting a customer cpmove tarball. See cpanel-import-container-spec.md (in /workspace/) for the full design. What this ships in v1.0: - Dockerfile: almalinux:10-minimal + PHP 8.4 (Remi) + ClamAV 1.4 + SaneSecurity Foxhole.PHP rules + tar/mariadb-client/rsync. Runs as UID 999 (whp-import) via the panel-side --user 999:999 flag. - scripts/entrypoint.sh: validates env, runs (optional) freshclam, drives extract -> scan-files -> scan-dbs -> rsync -> report.json. - scripts/extract.sh + scripts/lib/scan-symlinks.php: pre-extract symlink scan ported standalone from web-files/libs/CpanelBackupImporter.php (the existing 2026-05-29 whp02 destruction-vector fix). Aborts with exit 3 before tar runs if any DANGEROUS symlink is found. - scripts/scan-files.php: ClamAV walk + classify-and-action. v1.0 ships with an empty cleaner registry — every hit is QUARANTINE_ONLY. Cleaner hooks are stubbed for v1.1. - scripts/scan-dbs.php: regex MyISAM -> InnoDB rewrite (always applied), WordPress identification, and ONE WP content scan check (siteurl_external_domain). v1.1 will grow the check set. - scripts/lib/safety-net.php: container-narrow open_basedir allow-list, much tighter than the panel-side one. - .gitea/workflows/build-push.yaml: builds + smoke-tests + PHP-syntax-checks + bash-syntax-checks before pushing to repo.anhonesthost.net/cloud-hosting-platform/cpanel-importer. - tests/build-fixtures.sh: builds cpmove-clean.tar.gz (benign WP dump) and cpmove-alfa.tar.gz (the ALFA-shell symlink-to-/etc vector) for local end-to-end testing. - README.md / CONTRIBUTING.md: docker-run invocation, bind-mount catalog, report.json schema, how to add a cleaner pattern or a WP scan signature. Local acceptance test results: - clean fixture -> status=completed, 3 MyISAM->InnoDB, no flags, 0 - ALFA fixture -> exit 1, status=failed, failed_stage=extract, "tarball contains dangerous symlinks; aborting" on stderr - compromised-siteurl fixture -> imported_into_new_server=false, .flagged file written, summary_for_panel.show_alert=true Image size: 197 MB compressed (gzipped docker save), ~397 MB unique layers extracted. Well under the spec's 600 MB compressed / 1.2 GB extracted budget. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-30 19:56:57 -07:00
# --- merge per-stage reports into the final report.json -------------------
STAGE="write_report"
log "stage: write_report"
DURATION=$(( $(date -u +%s) - START_TS ))
php -r '
sanitize-dont-refuse: strip dangerous symlinks via tar --exclude Shifts the sandbox's symlink handling from "refuse the whole tarball" to "drop the dangerous entries from extraction and record them as quarantine actions". This is what sandbox mode is supposed to do — make malicious cpmoves safe to import rather than gate-keeping them. Three coordinated changes: 1. scan-symlinks.php — exit 0 even when DANGEROUS findings exist. The JSON report is the source of truth; the caller decides what to do with it. Usage/IO errors still exit 2. STDERR still names each finding (now "STRIP X -> Y" instead of "refusing tarball") so the streamed [container] log on the panel side surfaces them. 2. extract.sh — reads the scan-symlinks report, builds a newline-delimited exclude list of DANGEROUS archive_paths, and passes it to `tar --exclude-from=`. The stripped entries never reach the filesystem; tar skips them silently. Also writes a small JSON sidecar at $EXTRACT_DIR/.cpanel-importer-stripped-symlinks.json describing each strip-action so the merge step can surface them in report.json without re-parsing scan-symlinks output. 3. entrypoint.sh write_report — reads the sidecar, prepends each stripped_dangerous_symlink action to the actions[] list, bumps files_quarantined by the strip-count, and rewrites summary_for_panel.alert_message to call them out distinctly: "N dangerous symlink(s) stripped during extract; M files quarantined; K cleaned in place. Customer site may have been compromised at the source — recommend review." Result on darkside: instead of the import failing on the ALFA alfasymlink/root entry, that entry is silently skipped during extract, recorded as `stripped_dangerous_symlink path=... target=/ reason=absolute target is root /`, and the rest of the tarball extracts normally. Subsequent ClamAV scan + DB sanitization run to completion; panel sees a verdict-completed import with the stripped symlinks visible in the Sanitization Sandbox panel on the results page. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 11:13:57 -07:00
$importId = $argv[1];
$duration = (int) $argv[2];
$filesPath = $argv[3];
$dbsPath = $argv[4];
$strippedPath = $argv[5];
$outPath = $argv[6];
Initial bootstrap: cpanel-importer sanitization sandbox Skeleton for the cpanel-importer Docker container — a one-shot sandbox the WHP panel invokes BEFORE extracting a customer cpmove tarball. See cpanel-import-container-spec.md (in /workspace/) for the full design. What this ships in v1.0: - Dockerfile: almalinux:10-minimal + PHP 8.4 (Remi) + ClamAV 1.4 + SaneSecurity Foxhole.PHP rules + tar/mariadb-client/rsync. Runs as UID 999 (whp-import) via the panel-side --user 999:999 flag. - scripts/entrypoint.sh: validates env, runs (optional) freshclam, drives extract -> scan-files -> scan-dbs -> rsync -> report.json. - scripts/extract.sh + scripts/lib/scan-symlinks.php: pre-extract symlink scan ported standalone from web-files/libs/CpanelBackupImporter.php (the existing 2026-05-29 whp02 destruction-vector fix). Aborts with exit 3 before tar runs if any DANGEROUS symlink is found. - scripts/scan-files.php: ClamAV walk + classify-and-action. v1.0 ships with an empty cleaner registry — every hit is QUARANTINE_ONLY. Cleaner hooks are stubbed for v1.1. - scripts/scan-dbs.php: regex MyISAM -> InnoDB rewrite (always applied), WordPress identification, and ONE WP content scan check (siteurl_external_domain). v1.1 will grow the check set. - scripts/lib/safety-net.php: container-narrow open_basedir allow-list, much tighter than the panel-side one. - .gitea/workflows/build-push.yaml: builds + smoke-tests + PHP-syntax-checks + bash-syntax-checks before pushing to repo.anhonesthost.net/cloud-hosting-platform/cpanel-importer. - tests/build-fixtures.sh: builds cpmove-clean.tar.gz (benign WP dump) and cpmove-alfa.tar.gz (the ALFA-shell symlink-to-/etc vector) for local end-to-end testing. - README.md / CONTRIBUTING.md: docker-run invocation, bind-mount catalog, report.json schema, how to add a cleaner pattern or a WP scan signature. Local acceptance test results: - clean fixture -> status=completed, 3 MyISAM->InnoDB, no flags, 0 - ALFA fixture -> exit 1, status=failed, failed_stage=extract, "tarball contains dangerous symlinks; aborting" on stderr - compromised-siteurl fixture -> imported_into_new_server=false, .flagged file written, summary_for_panel.show_alert=true Image size: 197 MB compressed (gzipped docker save), ~397 MB unique layers extracted. Well under the spec's 600 MB compressed / 1.2 GB extracted budget. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-30 19:56:57 -07:00
sanitize-dont-refuse: strip dangerous symlinks via tar --exclude Shifts the sandbox's symlink handling from "refuse the whole tarball" to "drop the dangerous entries from extraction and record them as quarantine actions". This is what sandbox mode is supposed to do — make malicious cpmoves safe to import rather than gate-keeping them. Three coordinated changes: 1. scan-symlinks.php — exit 0 even when DANGEROUS findings exist. The JSON report is the source of truth; the caller decides what to do with it. Usage/IO errors still exit 2. STDERR still names each finding (now "STRIP X -> Y" instead of "refusing tarball") so the streamed [container] log on the panel side surfaces them. 2. extract.sh — reads the scan-symlinks report, builds a newline-delimited exclude list of DANGEROUS archive_paths, and passes it to `tar --exclude-from=`. The stripped entries never reach the filesystem; tar skips them silently. Also writes a small JSON sidecar at $EXTRACT_DIR/.cpanel-importer-stripped-symlinks.json describing each strip-action so the merge step can surface them in report.json without re-parsing scan-symlinks output. 3. entrypoint.sh write_report — reads the sidecar, prepends each stripped_dangerous_symlink action to the actions[] list, bumps files_quarantined by the strip-count, and rewrites summary_for_panel.alert_message to call them out distinctly: "N dangerous symlink(s) stripped during extract; M files quarantined; K cleaned in place. Customer site may have been compromised at the source — recommend review." Result on darkside: instead of the import failing on the ALFA alfasymlink/root entry, that entry is silently skipped during extract, recorded as `stripped_dangerous_symlink path=... target=/ reason=absolute target is root /`, and the rest of the tarball extracts normally. Subsequent ClamAV scan + DB sanitization run to completion; panel sees a verdict-completed import with the stripped symlinks visible in the Sanitization Sandbox panel on the results page. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 11:13:57 -07:00
$files = is_file($filesPath) ? json_decode(file_get_contents($filesPath), true) : null;
$dbs = is_file($dbsPath) ? json_decode(file_get_contents($dbsPath), true) : null;
$stripped = is_file($strippedPath) ? json_decode(file_get_contents($strippedPath), true) : null;
Initial bootstrap: cpanel-importer sanitization sandbox Skeleton for the cpanel-importer Docker container — a one-shot sandbox the WHP panel invokes BEFORE extracting a customer cpmove tarball. See cpanel-import-container-spec.md (in /workspace/) for the full design. What this ships in v1.0: - Dockerfile: almalinux:10-minimal + PHP 8.4 (Remi) + ClamAV 1.4 + SaneSecurity Foxhole.PHP rules + tar/mariadb-client/rsync. Runs as UID 999 (whp-import) via the panel-side --user 999:999 flag. - scripts/entrypoint.sh: validates env, runs (optional) freshclam, drives extract -> scan-files -> scan-dbs -> rsync -> report.json. - scripts/extract.sh + scripts/lib/scan-symlinks.php: pre-extract symlink scan ported standalone from web-files/libs/CpanelBackupImporter.php (the existing 2026-05-29 whp02 destruction-vector fix). Aborts with exit 3 before tar runs if any DANGEROUS symlink is found. - scripts/scan-files.php: ClamAV walk + classify-and-action. v1.0 ships with an empty cleaner registry — every hit is QUARANTINE_ONLY. Cleaner hooks are stubbed for v1.1. - scripts/scan-dbs.php: regex MyISAM -> InnoDB rewrite (always applied), WordPress identification, and ONE WP content scan check (siteurl_external_domain). v1.1 will grow the check set. - scripts/lib/safety-net.php: container-narrow open_basedir allow-list, much tighter than the panel-side one. - .gitea/workflows/build-push.yaml: builds + smoke-tests + PHP-syntax-checks + bash-syntax-checks before pushing to repo.anhonesthost.net/cloud-hosting-platform/cpanel-importer. - tests/build-fixtures.sh: builds cpmove-clean.tar.gz (benign WP dump) and cpmove-alfa.tar.gz (the ALFA-shell symlink-to-/etc vector) for local end-to-end testing. - README.md / CONTRIBUTING.md: docker-run invocation, bind-mount catalog, report.json schema, how to add a cleaner pattern or a WP scan signature. Local acceptance test results: - clean fixture -> status=completed, 3 MyISAM->InnoDB, no flags, 0 - ALFA fixture -> exit 1, status=failed, failed_stage=extract, "tarball contains dangerous symlinks; aborting" on stderr - compromised-siteurl fixture -> imported_into_new_server=false, .flagged file written, summary_for_panel.show_alert=true Image size: 197 MB compressed (gzipped docker save), ~397 MB unique layers extracted. Well under the spec's 600 MB compressed / 1.2 GB extracted budget. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-30 19:56:57 -07:00
$filesScanned = $files["files_scanned"] ?? 0;
$filesClean = $files["files_clean"] ?? 0;
$filesCleaned = $files["files_cleaned"] ?? 0;
$filesQuarantined = $files["files_quarantined"] ?? 0;
$actions = $files["actions"] ?? [];
$databases = $dbs["databases"] ?? [];
sanitize-dont-refuse: strip dangerous symlinks via tar --exclude Shifts the sandbox's symlink handling from "refuse the whole tarball" to "drop the dangerous entries from extraction and record them as quarantine actions". This is what sandbox mode is supposed to do — make malicious cpmoves safe to import rather than gate-keeping them. Three coordinated changes: 1. scan-symlinks.php — exit 0 even when DANGEROUS findings exist. The JSON report is the source of truth; the caller decides what to do with it. Usage/IO errors still exit 2. STDERR still names each finding (now "STRIP X -> Y" instead of "refusing tarball") so the streamed [container] log on the panel side surfaces them. 2. extract.sh — reads the scan-symlinks report, builds a newline-delimited exclude list of DANGEROUS archive_paths, and passes it to `tar --exclude-from=`. The stripped entries never reach the filesystem; tar skips them silently. Also writes a small JSON sidecar at $EXTRACT_DIR/.cpanel-importer-stripped-symlinks.json describing each strip-action so the merge step can surface them in report.json without re-parsing scan-symlinks output. 3. entrypoint.sh write_report — reads the sidecar, prepends each stripped_dangerous_symlink action to the actions[] list, bumps files_quarantined by the strip-count, and rewrites summary_for_panel.alert_message to call them out distinctly: "N dangerous symlink(s) stripped during extract; M files quarantined; K cleaned in place. Customer site may have been compromised at the source — recommend review." Result on darkside: instead of the import failing on the ALFA alfasymlink/root entry, that entry is silently skipped during extract, recorded as `stripped_dangerous_symlink path=... target=/ reason=absolute target is root /`, and the rest of the tarball extracts normally. Subsequent ClamAV scan + DB sanitization run to completion; panel sees a verdict-completed import with the stripped symlinks visible in the Sanitization Sandbox panel on the results page. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 11:13:57 -07:00
// Prepend the stripped-symlinks actions from extract.sh so the operator
// sees them at the top of the actions[] table on the results page. Bumps
// files_quarantined because the strip-action is morally equivalent to a
// quarantine - the entry was not extracted, the symlink file is "in the
// archive but absent from the cleaned tree".
$strippedActions = $stripped["actions"] ?? [];
$strippedCount = count($strippedActions);
if ($strippedCount > 0) {
$actions = array_merge($strippedActions, $actions);
$filesQuarantined += $strippedCount;
}
Initial bootstrap: cpanel-importer sanitization sandbox Skeleton for the cpanel-importer Docker container — a one-shot sandbox the WHP panel invokes BEFORE extracting a customer cpmove tarball. See cpanel-import-container-spec.md (in /workspace/) for the full design. What this ships in v1.0: - Dockerfile: almalinux:10-minimal + PHP 8.4 (Remi) + ClamAV 1.4 + SaneSecurity Foxhole.PHP rules + tar/mariadb-client/rsync. Runs as UID 999 (whp-import) via the panel-side --user 999:999 flag. - scripts/entrypoint.sh: validates env, runs (optional) freshclam, drives extract -> scan-files -> scan-dbs -> rsync -> report.json. - scripts/extract.sh + scripts/lib/scan-symlinks.php: pre-extract symlink scan ported standalone from web-files/libs/CpanelBackupImporter.php (the existing 2026-05-29 whp02 destruction-vector fix). Aborts with exit 3 before tar runs if any DANGEROUS symlink is found. - scripts/scan-files.php: ClamAV walk + classify-and-action. v1.0 ships with an empty cleaner registry — every hit is QUARANTINE_ONLY. Cleaner hooks are stubbed for v1.1. - scripts/scan-dbs.php: regex MyISAM -> InnoDB rewrite (always applied), WordPress identification, and ONE WP content scan check (siteurl_external_domain). v1.1 will grow the check set. - scripts/lib/safety-net.php: container-narrow open_basedir allow-list, much tighter than the panel-side one. - .gitea/workflows/build-push.yaml: builds + smoke-tests + PHP-syntax-checks + bash-syntax-checks before pushing to repo.anhonesthost.net/cloud-hosting-platform/cpanel-importer. - tests/build-fixtures.sh: builds cpmove-clean.tar.gz (benign WP dump) and cpmove-alfa.tar.gz (the ALFA-shell symlink-to-/etc vector) for local end-to-end testing. - README.md / CONTRIBUTING.md: docker-run invocation, bind-mount catalog, report.json schema, how to add a cleaner pattern or a WP scan signature. Local acceptance test results: - clean fixture -> status=completed, 3 MyISAM->InnoDB, no flags, 0 - ALFA fixture -> exit 1, status=failed, failed_stage=extract, "tarball contains dangerous symlinks; aborting" on stderr - compromised-siteurl fixture -> imported_into_new_server=false, .flagged file written, summary_for_panel.show_alert=true Image size: 197 MB compressed (gzipped docker save), ~397 MB unique layers extracted. Well under the spec's 600 MB compressed / 1.2 GB extracted budget. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-30 19:56:57 -07:00
$dbRefused = 0;
foreach ($databases as $db) {
if (($db["imported_into_new_server"] ?? true) === false) $dbRefused++;
}
$severity = "info";
$alert = false;
$msg = "Sanitization clean: no malware signatures detected.";
sanitize-dont-refuse: strip dangerous symlinks via tar --exclude Shifts the sandbox's symlink handling from "refuse the whole tarball" to "drop the dangerous entries from extraction and record them as quarantine actions". This is what sandbox mode is supposed to do — make malicious cpmoves safe to import rather than gate-keeping them. Three coordinated changes: 1. scan-symlinks.php — exit 0 even when DANGEROUS findings exist. The JSON report is the source of truth; the caller decides what to do with it. Usage/IO errors still exit 2. STDERR still names each finding (now "STRIP X -> Y" instead of "refusing tarball") so the streamed [container] log on the panel side surfaces them. 2. extract.sh — reads the scan-symlinks report, builds a newline-delimited exclude list of DANGEROUS archive_paths, and passes it to `tar --exclude-from=`. The stripped entries never reach the filesystem; tar skips them silently. Also writes a small JSON sidecar at $EXTRACT_DIR/.cpanel-importer-stripped-symlinks.json describing each strip-action so the merge step can surface them in report.json without re-parsing scan-symlinks output. 3. entrypoint.sh write_report — reads the sidecar, prepends each stripped_dangerous_symlink action to the actions[] list, bumps files_quarantined by the strip-count, and rewrites summary_for_panel.alert_message to call them out distinctly: "N dangerous symlink(s) stripped during extract; M files quarantined; K cleaned in place. Customer site may have been compromised at the source — recommend review." Result on darkside: instead of the import failing on the ALFA alfasymlink/root entry, that entry is silently skipped during extract, recorded as `stripped_dangerous_symlink path=... target=/ reason=absolute target is root /`, and the rest of the tarball extracts normally. Subsequent ClamAV scan + DB sanitization run to completion; panel sees a verdict-completed import with the stripped symlinks visible in the Sanitization Sandbox panel on the results page. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 11:13:57 -07:00
if ($filesQuarantined > 0 || $dbRefused > 0 || $strippedCount > 0) {
Initial bootstrap: cpanel-importer sanitization sandbox Skeleton for the cpanel-importer Docker container — a one-shot sandbox the WHP panel invokes BEFORE extracting a customer cpmove tarball. See cpanel-import-container-spec.md (in /workspace/) for the full design. What this ships in v1.0: - Dockerfile: almalinux:10-minimal + PHP 8.4 (Remi) + ClamAV 1.4 + SaneSecurity Foxhole.PHP rules + tar/mariadb-client/rsync. Runs as UID 999 (whp-import) via the panel-side --user 999:999 flag. - scripts/entrypoint.sh: validates env, runs (optional) freshclam, drives extract -> scan-files -> scan-dbs -> rsync -> report.json. - scripts/extract.sh + scripts/lib/scan-symlinks.php: pre-extract symlink scan ported standalone from web-files/libs/CpanelBackupImporter.php (the existing 2026-05-29 whp02 destruction-vector fix). Aborts with exit 3 before tar runs if any DANGEROUS symlink is found. - scripts/scan-files.php: ClamAV walk + classify-and-action. v1.0 ships with an empty cleaner registry — every hit is QUARANTINE_ONLY. Cleaner hooks are stubbed for v1.1. - scripts/scan-dbs.php: regex MyISAM -> InnoDB rewrite (always applied), WordPress identification, and ONE WP content scan check (siteurl_external_domain). v1.1 will grow the check set. - scripts/lib/safety-net.php: container-narrow open_basedir allow-list, much tighter than the panel-side one. - .gitea/workflows/build-push.yaml: builds + smoke-tests + PHP-syntax-checks + bash-syntax-checks before pushing to repo.anhonesthost.net/cloud-hosting-platform/cpanel-importer. - tests/build-fixtures.sh: builds cpmove-clean.tar.gz (benign WP dump) and cpmove-alfa.tar.gz (the ALFA-shell symlink-to-/etc vector) for local end-to-end testing. - README.md / CONTRIBUTING.md: docker-run invocation, bind-mount catalog, report.json schema, how to add a cleaner pattern or a WP scan signature. Local acceptance test results: - clean fixture -> status=completed, 3 MyISAM->InnoDB, no flags, 0 - ALFA fixture -> exit 1, status=failed, failed_stage=extract, "tarball contains dangerous symlinks; aborting" on stderr - compromised-siteurl fixture -> imported_into_new_server=false, .flagged file written, summary_for_panel.show_alert=true Image size: 197 MB compressed (gzipped docker save), ~397 MB unique layers extracted. Well under the spec's 600 MB compressed / 1.2 GB extracted budget. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-30 19:56:57 -07:00
$alert = true;
sanitize-dont-refuse: strip dangerous symlinks via tar --exclude Shifts the sandbox's symlink handling from "refuse the whole tarball" to "drop the dangerous entries from extraction and record them as quarantine actions". This is what sandbox mode is supposed to do — make malicious cpmoves safe to import rather than gate-keeping them. Three coordinated changes: 1. scan-symlinks.php — exit 0 even when DANGEROUS findings exist. The JSON report is the source of truth; the caller decides what to do with it. Usage/IO errors still exit 2. STDERR still names each finding (now "STRIP X -> Y" instead of "refusing tarball") so the streamed [container] log on the panel side surfaces them. 2. extract.sh — reads the scan-symlinks report, builds a newline-delimited exclude list of DANGEROUS archive_paths, and passes it to `tar --exclude-from=`. The stripped entries never reach the filesystem; tar skips them silently. Also writes a small JSON sidecar at $EXTRACT_DIR/.cpanel-importer-stripped-symlinks.json describing each strip-action so the merge step can surface them in report.json without re-parsing scan-symlinks output. 3. entrypoint.sh write_report — reads the sidecar, prepends each stripped_dangerous_symlink action to the actions[] list, bumps files_quarantined by the strip-count, and rewrites summary_for_panel.alert_message to call them out distinctly: "N dangerous symlink(s) stripped during extract; M files quarantined; K cleaned in place. Customer site may have been compromised at the source — recommend review." Result on darkside: instead of the import failing on the ALFA alfasymlink/root entry, that entry is silently skipped during extract, recorded as `stripped_dangerous_symlink path=... target=/ reason=absolute target is root /`, and the rest of the tarball extracts normally. Subsequent ClamAV scan + DB sanitization run to completion; panel sees a verdict-completed import with the stripped symlinks visible in the Sanitization Sandbox panel on the results page. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 11:13:57 -07:00
$severity = ($filesQuarantined > 50 || $dbRefused > 0 || $strippedCount > 0) ? "warning" : "info";
$parts = [];
if ($strippedCount > 0) {
$parts[] = sprintf("%d dangerous symlink(s) stripped during extract", $strippedCount);
}
if ($filesQuarantined - $strippedCount > 0) {
$parts[] = sprintf("%d files quarantined", $filesQuarantined - $strippedCount);
}
if ($filesCleaned > 0) {
$parts[] = sprintf("%d cleaned in place", $filesCleaned);
}
if ($dbRefused > 0) {
$parts[] = sprintf("%d database(s) refused as compromised", $dbRefused);
}
$msg = implode("; ", $parts)
. ". Customer site may have been compromised at the source — recommend review.";
Initial bootstrap: cpanel-importer sanitization sandbox Skeleton for the cpanel-importer Docker container — a one-shot sandbox the WHP panel invokes BEFORE extracting a customer cpmove tarball. See cpanel-import-container-spec.md (in /workspace/) for the full design. What this ships in v1.0: - Dockerfile: almalinux:10-minimal + PHP 8.4 (Remi) + ClamAV 1.4 + SaneSecurity Foxhole.PHP rules + tar/mariadb-client/rsync. Runs as UID 999 (whp-import) via the panel-side --user 999:999 flag. - scripts/entrypoint.sh: validates env, runs (optional) freshclam, drives extract -> scan-files -> scan-dbs -> rsync -> report.json. - scripts/extract.sh + scripts/lib/scan-symlinks.php: pre-extract symlink scan ported standalone from web-files/libs/CpanelBackupImporter.php (the existing 2026-05-29 whp02 destruction-vector fix). Aborts with exit 3 before tar runs if any DANGEROUS symlink is found. - scripts/scan-files.php: ClamAV walk + classify-and-action. v1.0 ships with an empty cleaner registry — every hit is QUARANTINE_ONLY. Cleaner hooks are stubbed for v1.1. - scripts/scan-dbs.php: regex MyISAM -> InnoDB rewrite (always applied), WordPress identification, and ONE WP content scan check (siteurl_external_domain). v1.1 will grow the check set. - scripts/lib/safety-net.php: container-narrow open_basedir allow-list, much tighter than the panel-side one. - .gitea/workflows/build-push.yaml: builds + smoke-tests + PHP-syntax-checks + bash-syntax-checks before pushing to repo.anhonesthost.net/cloud-hosting-platform/cpanel-importer. - tests/build-fixtures.sh: builds cpmove-clean.tar.gz (benign WP dump) and cpmove-alfa.tar.gz (the ALFA-shell symlink-to-/etc vector) for local end-to-end testing. - README.md / CONTRIBUTING.md: docker-run invocation, bind-mount catalog, report.json schema, how to add a cleaner pattern or a WP scan signature. Local acceptance test results: - clean fixture -> status=completed, 3 MyISAM->InnoDB, no flags, 0 - ALFA fixture -> exit 1, status=failed, failed_stage=extract, "tarball contains dangerous symlinks; aborting" on stderr - compromised-siteurl fixture -> imported_into_new_server=false, .flagged file written, summary_for_panel.show_alert=true Image size: 197 MB compressed (gzipped docker save), ~397 MB unique layers extracted. Well under the spec's 600 MB compressed / 1.2 GB extracted budget. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-30 19:56:57 -07:00
}
$report = [
"import_id" => $importId,
"status" => "completed",
"scan_duration_seconds" => $duration,
"files_scanned" => $filesScanned,
"files_clean" => $filesClean,
"files_cleaned" => $filesCleaned,
"files_quarantined" => $filesQuarantined,
"actions" => $actions,
"databases" => $databases,
"summary_for_panel" => [
"show_alert" => $alert,
"alert_severity" => $severity,
"alert_message" => $msg,
],
];
file_put_contents($outPath, json_encode($report, JSON_PRETTY_PRINT | JSON_UNESCAPED_SLASHES) . "\n");
fprintf(STDERR, "report written: %s\n", $outPath);
sanitize-dont-refuse: strip dangerous symlinks via tar --exclude Shifts the sandbox's symlink handling from "refuse the whole tarball" to "drop the dangerous entries from extraction and record them as quarantine actions". This is what sandbox mode is supposed to do — make malicious cpmoves safe to import rather than gate-keeping them. Three coordinated changes: 1. scan-symlinks.php — exit 0 even when DANGEROUS findings exist. The JSON report is the source of truth; the caller decides what to do with it. Usage/IO errors still exit 2. STDERR still names each finding (now "STRIP X -> Y" instead of "refusing tarball") so the streamed [container] log on the panel side surfaces them. 2. extract.sh — reads the scan-symlinks report, builds a newline-delimited exclude list of DANGEROUS archive_paths, and passes it to `tar --exclude-from=`. The stripped entries never reach the filesystem; tar skips them silently. Also writes a small JSON sidecar at $EXTRACT_DIR/.cpanel-importer-stripped-symlinks.json describing each strip-action so the merge step can surface them in report.json without re-parsing scan-symlinks output. 3. entrypoint.sh write_report — reads the sidecar, prepends each stripped_dangerous_symlink action to the actions[] list, bumps files_quarantined by the strip-count, and rewrites summary_for_panel.alert_message to call them out distinctly: "N dangerous symlink(s) stripped during extract; M files quarantined; K cleaned in place. Customer site may have been compromised at the source — recommend review." Result on darkside: instead of the import failing on the ALFA alfasymlink/root entry, that entry is silently skipped during extract, recorded as `stripped_dangerous_symlink path=... target=/ reason=absolute target is root /`, and the rest of the tarball extracts normally. Subsequent ClamAV scan + DB sanitization run to completion; panel sees a verdict-completed import with the stripped symlinks visible in the Sanitization Sandbox panel on the results page. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 11:13:57 -07:00
' "$IMPORT_ID" "$DURATION" /tmp/scan-files-report.json /tmp/scan-dbs-report.json "$STRIPPED_SYMLINKS_FILE" "$SANITIZED_DIR/report.json" \
Initial bootstrap: cpanel-importer sanitization sandbox Skeleton for the cpanel-importer Docker container — a one-shot sandbox the WHP panel invokes BEFORE extracting a customer cpmove tarball. See cpanel-import-container-spec.md (in /workspace/) for the full design. What this ships in v1.0: - Dockerfile: almalinux:10-minimal + PHP 8.4 (Remi) + ClamAV 1.4 + SaneSecurity Foxhole.PHP rules + tar/mariadb-client/rsync. Runs as UID 999 (whp-import) via the panel-side --user 999:999 flag. - scripts/entrypoint.sh: validates env, runs (optional) freshclam, drives extract -> scan-files -> scan-dbs -> rsync -> report.json. - scripts/extract.sh + scripts/lib/scan-symlinks.php: pre-extract symlink scan ported standalone from web-files/libs/CpanelBackupImporter.php (the existing 2026-05-29 whp02 destruction-vector fix). Aborts with exit 3 before tar runs if any DANGEROUS symlink is found. - scripts/scan-files.php: ClamAV walk + classify-and-action. v1.0 ships with an empty cleaner registry — every hit is QUARANTINE_ONLY. Cleaner hooks are stubbed for v1.1. - scripts/scan-dbs.php: regex MyISAM -> InnoDB rewrite (always applied), WordPress identification, and ONE WP content scan check (siteurl_external_domain). v1.1 will grow the check set. - scripts/lib/safety-net.php: container-narrow open_basedir allow-list, much tighter than the panel-side one. - .gitea/workflows/build-push.yaml: builds + smoke-tests + PHP-syntax-checks + bash-syntax-checks before pushing to repo.anhonesthost.net/cloud-hosting-platform/cpanel-importer. - tests/build-fixtures.sh: builds cpmove-clean.tar.gz (benign WP dump) and cpmove-alfa.tar.gz (the ALFA-shell symlink-to-/etc vector) for local end-to-end testing. - README.md / CONTRIBUTING.md: docker-run invocation, bind-mount catalog, report.json schema, how to add a cleaner pattern or a WP scan signature. Local acceptance test results: - clean fixture -> status=completed, 3 MyISAM->InnoDB, no flags, 0 - ALFA fixture -> exit 1, status=failed, failed_stage=extract, "tarball contains dangerous symlinks; aborting" on stderr - compromised-siteurl fixture -> imported_into_new_server=false, .flagged file written, summary_for_panel.show_alert=true Image size: 197 MB compressed (gzipped docker save), ~397 MB unique layers extracted. Well under the spec's 600 MB compressed / 1.2 GB extracted budget. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-30 19:56:57 -07:00
|| die "report merge failed"
log "done — exited cleanly after ${DURATION}s"
exit 0