Skeleton for the cpanel-importer Docker container — a one-shot sandbox the WHP panel invokes BEFORE extracting a customer cpmove tarball. See cpanel-import-container-spec.md (in /workspace/) for the full design. What this ships in v1.0: - Dockerfile: almalinux:10-minimal + PHP 8.4 (Remi) + ClamAV 1.4 + SaneSecurity Foxhole.PHP rules + tar/mariadb-client/rsync. Runs as UID 999 (whp-import) via the panel-side --user 999:999 flag. - scripts/entrypoint.sh: validates env, runs (optional) freshclam, drives extract -> scan-files -> scan-dbs -> rsync -> report.json. - scripts/extract.sh + scripts/lib/scan-symlinks.php: pre-extract symlink scan ported standalone from web-files/libs/CpanelBackupImporter.php (the existing 2026-05-29 whp02 destruction-vector fix). Aborts with exit 3 before tar runs if any DANGEROUS symlink is found. - scripts/scan-files.php: ClamAV walk + classify-and-action. v1.0 ships with an empty cleaner registry — every hit is QUARANTINE_ONLY. Cleaner hooks are stubbed for v1.1. - scripts/scan-dbs.php: regex MyISAM -> InnoDB rewrite (always applied), WordPress identification, and ONE WP content scan check (siteurl_external_domain). v1.1 will grow the check set. - scripts/lib/safety-net.php: container-narrow open_basedir allow-list, much tighter than the panel-side one. - .gitea/workflows/build-push.yaml: builds + smoke-tests + PHP-syntax-checks + bash-syntax-checks before pushing to repo.anhonesthost.net/cloud-hosting-platform/cpanel-importer. - tests/build-fixtures.sh: builds cpmove-clean.tar.gz (benign WP dump) and cpmove-alfa.tar.gz (the ALFA-shell symlink-to-/etc vector) for local end-to-end testing. - README.md / CONTRIBUTING.md: docker-run invocation, bind-mount catalog, report.json schema, how to add a cleaner pattern or a WP scan signature. Local acceptance test results: - clean fixture -> status=completed, 3 MyISAM->InnoDB, no flags, 0 - ALFA fixture -> exit 1, status=failed, failed_stage=extract, "tarball contains dangerous symlinks; aborting" on stderr - compromised-siteurl fixture -> imported_into_new_server=false, .flagged file written, summary_for_panel.show_alert=true Image size: 197 MB compressed (gzipped docker save), ~397 MB unique layers extracted. Well under the spec's 600 MB compressed / 1.2 GB extracted budget. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
193 lines
6.7 KiB
Markdown
193 lines
6.7 KiB
Markdown
# Contributing — cpanel-importer
|
||
|
||
## How to add an auto-cleaner pattern
|
||
|
||
Auto-cleaners live in `scripts/scan-files.php`, in the `$cleaners`
|
||
registry at the top of the main flow.
|
||
|
||
A cleaner has three parts:
|
||
|
||
```php
|
||
$cleaners['short-cleaner-name'] = [
|
||
'class' => 'KNOWN_REMOVABLE', // or 'REMOVABLE_WITH_BACKUP'
|
||
'match' => fn(string $sig): bool => str_contains($sig, 'PHP.Trojan.EvalB64'),
|
||
'clean' => function (string $path): bool {
|
||
// Read $path, transform, write back; return true on success.
|
||
// The file at $path is the LIVE extracted file — your edit
|
||
// here is what ends up in /host/sanitized/<id>/extracted/.
|
||
// The original has ALREADY been backed up to <path>.original
|
||
// by the orchestrator before this is called.
|
||
},
|
||
];
|
||
```
|
||
|
||
### Safety checklist before merging a new cleaner
|
||
|
||
1. **Backup is guaranteed.** The orchestrator copies the file to
|
||
`<quarantine>/<relpath>.original` BEFORE calling `clean()`. Verify
|
||
this is still true in `scan-files.php` if you refactor the dispatch.
|
||
2. **Cleaner is idempotent.** Running it twice on the same file must
|
||
produce the same output the second time as the first.
|
||
3. **Cleaner is conservative.** If the file does NOT match your
|
||
transform exactly, return `false` (the orchestrator will fall back
|
||
to quarantining). Never "best-effort" a half-clean.
|
||
4. **Cleaner has a regression test.** Add a fixture under
|
||
`tests/fixtures/cleaner-<name>/` with input + expected output, and
|
||
exercise it from `tests/run-tests.sh` (or your CI step).
|
||
5. **Cleaner classification is correct.**
|
||
- `KNOWN_REMOVABLE` = the whole pattern is known-safe to strip.
|
||
- `REMOVABLE_WITH_BACKUP` = legit file with injected lines; we are
|
||
confident in surgical removal but back up anyway.
|
||
- `QUARANTINE_ONLY` = no clean variant; don't write a `clean()`.
|
||
6. **Signature match is tight.** Prefer
|
||
`str_contains($sig, 'specific-sig-name')` over broad regex matches.
|
||
A false-positive cleaner can corrupt customer files.
|
||
|
||
### Manual test loop
|
||
|
||
```bash
|
||
docker build -t cpanel-importer:dev .
|
||
# Place a known-infected synthetic file under tests/fixtures/cleaner-X/in/
|
||
# Run scan-files.php directly against it:
|
||
docker run --rm \
|
||
--entrypoint /scripts/scan-files.php \
|
||
-v "$PWD/tests/fixtures/cleaner-X/in:/tmp/extract" \
|
||
-v "$PWD/tests/fixtures/cleaner-X/quarantine:/host/quarantine" \
|
||
cpanel-importer:dev \
|
||
--extract /tmp/extract --quarantine /host/quarantine \
|
||
--report /tmp/r.json --import-id test
|
||
```
|
||
|
||
---
|
||
|
||
## How to add a WordPress content scan signature
|
||
|
||
Scan checks live in `scripts/scan-dbs.php`, in `wp_content_scan()`.
|
||
|
||
Each check should produce a flag dict on hit:
|
||
|
||
```php
|
||
$flags[] = [
|
||
'severity' => 'high', // 'high' refuses the DB (per default threshold N=1)
|
||
// 'medium' / 'low' flag in the report but allow import
|
||
'code' => 'short_machine_readable_code',
|
||
'details' => 'Human-readable explanation including the matched value(s).',
|
||
];
|
||
```
|
||
|
||
### Safety checklist
|
||
|
||
1. **Severity reflects confidence.** Use `high` only when a false
|
||
positive is acceptable for the customer (they re-import via the
|
||
"import anyway" UI button). Errors of measurement here translate
|
||
directly to admin support tickets.
|
||
2. **Check is fast.** The whole `.sql` dump is in memory as a string;
|
||
prefer `preg_match` on the raw string or a pre-built map (see
|
||
`extract_wp_options()`) over re-parsing the full dump.
|
||
3. **Check is well-tested.** Add a fixture under
|
||
`tests/fixtures/wp-scan-<code>/` with a synthetic dump that
|
||
triggers the flag and one that does not.
|
||
4. **Allow-list awareness.** If the check is comparing a value against
|
||
the customer's domain list, use
|
||
`domain_in_allowlist($host, $allowedDomains)` so subdomain matches
|
||
work consistently with the rest of the scanner.
|
||
5. **Don't break engine swap.** `wp_content_scan()` runs AFTER the
|
||
engine swap on the same `$rewritten` string. Both your check and
|
||
the engine swap must be tolerant of each other's output.
|
||
|
||
---
|
||
|
||
## How to test locally
|
||
|
||
### Build the image
|
||
|
||
```bash
|
||
docker build -t cpanel-importer:dev .
|
||
```
|
||
|
||
Confirm the image is under the budget:
|
||
|
||
```bash
|
||
docker images cpanel-importer:dev --format '{{.Size}}'
|
||
```
|
||
|
||
Target: < 1 GB extracted (spec asks < 600 MB compressed for prod, but
|
||
local builds typically come in around 700–900 MB extracted including
|
||
ClamAV signature DBs).
|
||
|
||
### Build the fixtures
|
||
|
||
```bash
|
||
bash tests/build-fixtures.sh
|
||
```
|
||
|
||
Two tarballs land under `tests/fixtures/`:
|
||
- `cpmove-clean.tar.gz` — a benign cpmove with a WordPress MyISAM dump.
|
||
- `cpmove-alfa.tar.gz` — same shape PLUS an ALFA-style symlink to /etc.
|
||
|
||
### Run against the clean fixture
|
||
|
||
```bash
|
||
mkdir -p /tmp/test-quarantine /tmp/test-sanitized
|
||
docker run --rm \
|
||
-e IMPORT_ID=test-clean \
|
||
-e IMPORT_USERNAME=testuser \
|
||
-e IMPORT_BACKUP_FILE=/host/backup/cpmove-clean.tar.gz \
|
||
-e CLAMAV_REFRESH=false \
|
||
-v "$PWD/tests/fixtures/cpmove-clean.tar.gz:/host/backup/cpmove-clean.tar.gz:ro" \
|
||
-v /tmp/test-quarantine:/host/quarantine \
|
||
-v /tmp/test-sanitized:/host/sanitized \
|
||
cpanel-importer:dev
|
||
```
|
||
|
||
Expect `status=completed`, MyISAM count > 0, no flags, exit 0.
|
||
|
||
### Run against the ALFA fixture
|
||
|
||
```bash
|
||
docker run --rm \
|
||
-e IMPORT_ID=test-alfa \
|
||
-e IMPORT_USERNAME=testuser \
|
||
-e IMPORT_BACKUP_FILE=/host/backup/cpmove-alfa.tar.gz \
|
||
-e CLAMAV_REFRESH=false \
|
||
-v "$PWD/tests/fixtures/cpmove-alfa.tar.gz:/host/backup/cpmove-alfa.tar.gz:ro" \
|
||
-v /tmp/test-quarantine:/host/quarantine \
|
||
-v /tmp/test-sanitized:/host/sanitized \
|
||
cpanel-importer:dev
|
||
```
|
||
|
||
Expect non-zero exit, `status=failed`, `failed_stage=extract`, and
|
||
stderr from inside the container containing
|
||
`tarball contains dangerous symlinks; aborting`.
|
||
|
||
### Iterating on PHP / shell scripts
|
||
|
||
The `scripts/` directory is `COPY`ed in late in the Dockerfile, so
|
||
edits there only re-trigger the last layer of the build — typical
|
||
turnaround is ~5 seconds.
|
||
|
||
---
|
||
|
||
## Code style
|
||
|
||
- Bash scripts: `set -euo pipefail`, absolute paths only, every external
|
||
command on its own logical line, comment each non-obvious flag.
|
||
- PHP scripts: 4-space indent, single quotes for non-interpolated
|
||
strings, `<?php` opener on line 1, no closing `?>`.
|
||
- All scripts must be idempotent — the worker may be re-run against the
|
||
same `IMPORT_ID` on retry; second runs must overwrite the prior
|
||
`report.json` cleanly.
|
||
|
||
---
|
||
|
||
## CI
|
||
|
||
Pushes to `trunk` build + push the image to
|
||
`repo.anhonesthost.net/cloud-hosting-platform/cpanel-importer:latest` and
|
||
`...:<sha>`. Pushes of a `YYYY.MM.NNN` tag additionally tag
|
||
`...:YYYY.MM.NNN`. CI runs the smoke test (image starts and
|
||
`echo ok` runs) and PHP `-l` / `bash -n` syntax checks on every script
|
||
before pushing.
|
||
|
||
See `.gitea/workflows/build-push.yaml`.
|