Files
cpanel-importer/README.md
Claude (bootstrap) 5487dfc8f1 Initial bootstrap: cpanel-importer sanitization sandbox
Skeleton for the cpanel-importer Docker container — a one-shot
sandbox the WHP panel invokes BEFORE extracting a customer cpmove
tarball. See cpanel-import-container-spec.md (in /workspace/) for the
full design.

What this ships in v1.0:

- Dockerfile: almalinux:10-minimal + PHP 8.4 (Remi) + ClamAV 1.4 +
  SaneSecurity Foxhole.PHP rules + tar/mariadb-client/rsync. Runs as
  UID 999 (whp-import) via the panel-side --user 999:999 flag.

- scripts/entrypoint.sh: validates env, runs (optional) freshclam,
  drives extract -> scan-files -> scan-dbs -> rsync -> report.json.

- scripts/extract.sh + scripts/lib/scan-symlinks.php: pre-extract
  symlink scan ported standalone from
  web-files/libs/CpanelBackupImporter.php (the existing 2026-05-29
  whp02 destruction-vector fix). Aborts with exit 3 before tar runs
  if any DANGEROUS symlink is found.

- scripts/scan-files.php: ClamAV walk + classify-and-action. v1.0
  ships with an empty cleaner registry — every hit is
  QUARANTINE_ONLY. Cleaner hooks are stubbed for v1.1.

- scripts/scan-dbs.php: regex MyISAM -> InnoDB rewrite (always
  applied), WordPress identification, and ONE WP content scan check
  (siteurl_external_domain). v1.1 will grow the check set.

- scripts/lib/safety-net.php: container-narrow open_basedir
  allow-list, much tighter than the panel-side one.

- .gitea/workflows/build-push.yaml: builds + smoke-tests +
  PHP-syntax-checks + bash-syntax-checks before pushing to
  repo.anhonesthost.net/cloud-hosting-platform/cpanel-importer.

- tests/build-fixtures.sh: builds cpmove-clean.tar.gz (benign WP
  dump) and cpmove-alfa.tar.gz (the ALFA-shell symlink-to-/etc
  vector) for local end-to-end testing.

- README.md / CONTRIBUTING.md: docker-run invocation, bind-mount
  catalog, report.json schema, how to add a cleaner pattern or a WP
  scan signature.

Local acceptance test results:
- clean fixture -> status=completed, 3 MyISAM->InnoDB, no flags, 0
- ALFA fixture -> exit 1, status=failed, failed_stage=extract,
  "tarball contains dangerous symlinks; aborting" on stderr
- compromised-siteurl fixture -> imported_into_new_server=false,
  .flagged file written, summary_for_panel.show_alert=true

Image size: 197 MB compressed (gzipped docker save), ~397 MB unique
layers extracted. Well under the spec's 600 MB compressed / 1.2 GB
extracted budget.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-30 19:56:57 -07:00

222 lines
7.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# cpanel-importer
A **sanitization sandbox** for cPanel `cpmove` tarballs, run as a one-shot
Docker container before WHP imports a customer site.
It is **not** a full importer. The container:
1. extracts the cpmove tarball into a tmpfs scratch dir (after a
pre-extract symlink scan),
2. runs ClamAV (with SaneSecurity PHP-malware rules) over every file,
quarantining hits,
3. rewrites `ENGINE=MyISAM``ENGINE=InnoDB` in every `.sql` dump,
4. runs a WordPress content scan on each WP dump and refuses dumps with
high-confidence malware signals (e.g. `siteurl` pointing at a
non-customer domain),
5. rsyncs the cleaned tree to `/host/sanitized/<importid>/`,
6. emits a JSON report describing every action taken.
The WHP panel reads `/host/sanitized/<importid>/report.json` after the
container exits and hands the cleaned files off to the existing
`CpanelBackupImporter` flow (Linux-user create, MySQL DB create, file
rsync, DNS push, container provision, etc.).
**Full design:** `/workspace/cpanel-import-container-spec.md` (also
checked in at `docs/cpanel-import-container-spec.md` when this repo is
mirrored to the panel).
**Panel-side glue:** `/workspace/whp/web-files/libs/CpanelBackupImporter.php`
+ `web-files/api/cpanel-import-ajax.php` + `web-files/pages/cpanel-import-results.php`.
---
## How the panel invokes it
```bash
docker run \
--rm \
--name whp-cpanel-import-${IMPORT_ID} \
--network client-net \
--user 999:999 \
--cap-drop=ALL \
--security-opt=no-new-privileges \
--read-only \
--tmpfs /tmp:rw,nosuid,nodev,exec,size=4g \
--tmpfs /var/lib/clamav:rw,nosuid,nodev,size=512m \
--volume /docker/users/${USERNAME}/userfiles/${BACKUP_NAME}:/host/backup/${BACKUP_NAME}:ro \
--volume /docker/users/${USERNAME}/.cpanel-import-quarantine:/host/quarantine:rw \
--volume /docker/users/${USERNAME}/.cpanel-import-sanitized:/host/sanitized:rw \
--env IMPORT_ID=${IMPORT_ID} \
--env IMPORT_USERNAME=${USERNAME} \
--env IMPORT_BACKUP_FILE=/host/backup/${BACKUP_NAME} \
--env CLAMAV_REFRESH=true \
--memory=4g \
--memory-swap=4g \
--cpus=2 \
--pull=missing \
repo.anhonesthost.net/cloud-hosting-platform/cpanel-importer:2026.05.NNN
```
Container exits with status `0` on success, non-zero on any failure
(missing/unreadable backup, dangerous symlink found, scanner error).
Even on failure, `/host/sanitized/<importid>/report.json` is written
with `"status": "failed"` and the failing stage.
---
## Bind-mount catalog
| Host path | Container path | Mode | Purpose |
|---|---|---|---|
| `/docker/users/<user>/userfiles/<tarball>` | `/host/backup/<tarball>` | RO | the cpmove input |
| `/docker/users/<user>/.cpanel-import-quarantine/` | `/host/quarantine/` | RW | files moved here on ClamAV hit |
| `/docker/users/<user>/.cpanel-import-sanitized/<importid>/` | `/host/sanitized/` | RW | cleaned output the panel reads |
Anything not listed here is **not** visible to the container. No `/etc`,
no `/usr`, no `/root`, no `/home`, no `docker.sock`. The worker runs as
UID/GID 999 with `--cap-drop=ALL --read-only`.
---
## `report.json` schema
Written to `/host/sanitized/<importid>/report.json` at the end of every
run, success or failure.
### Success
```json
{
"import_id": "import_abc123",
"status": "completed",
"scan_duration_seconds": 143,
"files_scanned": 28471,
"files_clean": 28432,
"files_cleaned": 0,
"files_quarantined": 39,
"actions": [
{
"path": "cpmove-testuser/homedir/public_html/example.com/ALFA_DATA/index.php",
"signature": "PHP.Webshell.ALFA",
"action": "quarantined",
"cleaner": null,
"backup": "/host/quarantine/import_abc123/cpmove-testuser/homedir/public_html/example.com/ALFA_DATA/index.php"
}
],
"databases": [
{
"dbname": "testuser_wp",
"size_bytes": 5393199573,
"engine_changes": {
"myisam_to_innodb": 17,
"row_format_dynamic_applied": 0,
"fulltext_indexes_dropped": 0
},
"wp_content_scan": {
"is_wordpress": true,
"flags": [
{
"severity": "high",
"code": "siteurl_external_domain",
"details": "wp_options.siteurl = \"http://evil.tld\" — host 'evil.tld' not in allowed domain list (example.com)"
}
]
},
"imported_into_new_server": false,
"flagged_sql_path": "/host/sanitized/import_abc123/mysql/testuser_wp.sql.flagged"
}
],
"summary_for_panel": {
"show_alert": true,
"alert_severity": "warning",
"alert_message": "39 files quarantined + 0 cleaned in place; 1 database(s) refused as compromised. ..."
}
}
```
### Failure
```json
{
"import_id": "import_abc123",
"status": "failed",
"failed_stage": "extract",
"error": "scan-symlinks.php exited non-zero — tarball contains DANGEROUS symlinks",
"scan_duration_seconds": 4,
"files": null,
"databases": null
}
```
`failed_stage` is one of: `validate_env`, `freshclam`, `extract`,
`scan_files`, `scan_dbs`, `rsync_out`, `write_report`.
---
## Local development
```bash
# Build the image
docker build -t cpanel-importer:dev .
# Build the synthetic fixture tarballs
bash tests/build-fixtures.sh
# Run against the clean fixture
mkdir -p /tmp/test-quarantine /tmp/test-sanitized
docker run --rm \
-e IMPORT_ID=test \
-e IMPORT_USERNAME=testuser \
-e IMPORT_BACKUP_FILE=/host/backup/cpmove-clean.tar.gz \
-e CLAMAV_REFRESH=false \
-v "$(pwd)/tests/fixtures/cpmove-clean.tar.gz:/host/backup/cpmove-clean.tar.gz:ro" \
-v /tmp/test-quarantine:/host/quarantine \
-v /tmp/test-sanitized:/host/sanitized \
cpanel-importer:dev
cat /tmp/test-sanitized/test/report.json
# Run against the ALFA-symlink fixture — must exit non-zero with a
# "dangerous symlinks" message and report.json should have
# status=failed, failed_stage=extract.
docker run --rm \
-e IMPORT_ID=test-alfa \
-e IMPORT_USERNAME=testuser \
-e IMPORT_BACKUP_FILE=/host/backup/cpmove-alfa.tar.gz \
-e CLAMAV_REFRESH=false \
-v "$(pwd)/tests/fixtures/cpmove-alfa.tar.gz:/host/backup/cpmove-alfa.tar.gz:ro" \
-v /tmp/test-quarantine:/host/quarantine \
-v /tmp/test-sanitized:/host/sanitized \
cpanel-importer:dev \
&& echo "BUG: should have exited non-zero" \
|| echo "OK: refused dangerous tarball"
cat /tmp/test-sanitized/test-alfa/report.json
```
---
## What is in this v1.0 vs. what is stubbed for v1.1+
| Feature | v1.0 | v1.1 |
|---|---|---|
| Pre-extract symlink scan | full port of `scanTarballForDangerousSymlinks` | |
| Hardened tar extract | yes | |
| ClamAV + SaneSecurity Foxhole.PHP rules | yes | |
| File classification | quarantine-on-every-hit | KNOWN_REMOVABLE + REMOVABLE_WITH_BACKUP cleaners |
| MyISAM → InnoDB rewrite | yes | |
| WP identification | yes (wp_options + wp_posts + wp_users + sentinel) | |
| WP content scan | siteurl_external_domain only | post_content script-injection, theme/stylesheet malware patterns, user_pass leaked-hash, Wordfence regex |
| ROW_FORMAT=DYNAMIC, FULLTEXT drop | stubbed (always 0) | yes |
| Sandboxed MariaDB-in-container for SQL transforms | not present (regex transforms only) | yes |
See `CONTRIBUTING.md` for how to add a cleaner pattern or a new WP scan
signature.
---
## References
- Spec: `/workspace/cpanel-import-container-spec.md`
- Panel-side importer: `/workspace/whp/web-files/libs/CpanelBackupImporter.php`
- WHP panel `safety-net.php`: `/workspace/whp/web-files/includes/safety-net.php`
- Existing CI workflow for sibling project: `/workspace/cloud-apache-container/.gitea/workflows/build-push.yaml`