Initial bootstrap: cpanel-importer sanitization sandbox
Skeleton for the cpanel-importer Docker container — a one-shot sandbox the WHP panel invokes BEFORE extracting a customer cpmove tarball. See cpanel-import-container-spec.md (in /workspace/) for the full design. What this ships in v1.0: - Dockerfile: almalinux:10-minimal + PHP 8.4 (Remi) + ClamAV 1.4 + SaneSecurity Foxhole.PHP rules + tar/mariadb-client/rsync. Runs as UID 999 (whp-import) via the panel-side --user 999:999 flag. - scripts/entrypoint.sh: validates env, runs (optional) freshclam, drives extract -> scan-files -> scan-dbs -> rsync -> report.json. - scripts/extract.sh + scripts/lib/scan-symlinks.php: pre-extract symlink scan ported standalone from web-files/libs/CpanelBackupImporter.php (the existing 2026-05-29 whp02 destruction-vector fix). Aborts with exit 3 before tar runs if any DANGEROUS symlink is found. - scripts/scan-files.php: ClamAV walk + classify-and-action. v1.0 ships with an empty cleaner registry — every hit is QUARANTINE_ONLY. Cleaner hooks are stubbed for v1.1. - scripts/scan-dbs.php: regex MyISAM -> InnoDB rewrite (always applied), WordPress identification, and ONE WP content scan check (siteurl_external_domain). v1.1 will grow the check set. - scripts/lib/safety-net.php: container-narrow open_basedir allow-list, much tighter than the panel-side one. - .gitea/workflows/build-push.yaml: builds + smoke-tests + PHP-syntax-checks + bash-syntax-checks before pushing to repo.anhonesthost.net/cloud-hosting-platform/cpanel-importer. - tests/build-fixtures.sh: builds cpmove-clean.tar.gz (benign WP dump) and cpmove-alfa.tar.gz (the ALFA-shell symlink-to-/etc vector) for local end-to-end testing. - README.md / CONTRIBUTING.md: docker-run invocation, bind-mount catalog, report.json schema, how to add a cleaner pattern or a WP scan signature. Local acceptance test results: - clean fixture -> status=completed, 3 MyISAM->InnoDB, no flags, 0 - ALFA fixture -> exit 1, status=failed, failed_stage=extract, "tarball contains dangerous symlinks; aborting" on stderr - compromised-siteurl fixture -> imported_into_new_server=false, .flagged file written, summary_for_panel.show_alert=true Image size: 197 MB compressed (gzipped docker save), ~397 MB unique layers extracted. Well under the spec's 600 MB compressed / 1.2 GB extracted budget. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
192
CONTRIBUTING.md
Normal file
192
CONTRIBUTING.md
Normal file
@@ -0,0 +1,192 @@
|
||||
# Contributing — cpanel-importer
|
||||
|
||||
## How to add an auto-cleaner pattern
|
||||
|
||||
Auto-cleaners live in `scripts/scan-files.php`, in the `$cleaners`
|
||||
registry at the top of the main flow.
|
||||
|
||||
A cleaner has three parts:
|
||||
|
||||
```php
|
||||
$cleaners['short-cleaner-name'] = [
|
||||
'class' => 'KNOWN_REMOVABLE', // or 'REMOVABLE_WITH_BACKUP'
|
||||
'match' => fn(string $sig): bool => str_contains($sig, 'PHP.Trojan.EvalB64'),
|
||||
'clean' => function (string $path): bool {
|
||||
// Read $path, transform, write back; return true on success.
|
||||
// The file at $path is the LIVE extracted file — your edit
|
||||
// here is what ends up in /host/sanitized/<id>/extracted/.
|
||||
// The original has ALREADY been backed up to <path>.original
|
||||
// by the orchestrator before this is called.
|
||||
},
|
||||
];
|
||||
```
|
||||
|
||||
### Safety checklist before merging a new cleaner
|
||||
|
||||
1. **Backup is guaranteed.** The orchestrator copies the file to
|
||||
`<quarantine>/<relpath>.original` BEFORE calling `clean()`. Verify
|
||||
this is still true in `scan-files.php` if you refactor the dispatch.
|
||||
2. **Cleaner is idempotent.** Running it twice on the same file must
|
||||
produce the same output the second time as the first.
|
||||
3. **Cleaner is conservative.** If the file does NOT match your
|
||||
transform exactly, return `false` (the orchestrator will fall back
|
||||
to quarantining). Never "best-effort" a half-clean.
|
||||
4. **Cleaner has a regression test.** Add a fixture under
|
||||
`tests/fixtures/cleaner-<name>/` with input + expected output, and
|
||||
exercise it from `tests/run-tests.sh` (or your CI step).
|
||||
5. **Cleaner classification is correct.**
|
||||
- `KNOWN_REMOVABLE` = the whole pattern is known-safe to strip.
|
||||
- `REMOVABLE_WITH_BACKUP` = legit file with injected lines; we are
|
||||
confident in surgical removal but back up anyway.
|
||||
- `QUARANTINE_ONLY` = no clean variant; don't write a `clean()`.
|
||||
6. **Signature match is tight.** Prefer
|
||||
`str_contains($sig, 'specific-sig-name')` over broad regex matches.
|
||||
A false-positive cleaner can corrupt customer files.
|
||||
|
||||
### Manual test loop
|
||||
|
||||
```bash
|
||||
docker build -t cpanel-importer:dev .
|
||||
# Place a known-infected synthetic file under tests/fixtures/cleaner-X/in/
|
||||
# Run scan-files.php directly against it:
|
||||
docker run --rm \
|
||||
--entrypoint /scripts/scan-files.php \
|
||||
-v "$PWD/tests/fixtures/cleaner-X/in:/tmp/extract" \
|
||||
-v "$PWD/tests/fixtures/cleaner-X/quarantine:/host/quarantine" \
|
||||
cpanel-importer:dev \
|
||||
--extract /tmp/extract --quarantine /host/quarantine \
|
||||
--report /tmp/r.json --import-id test
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## How to add a WordPress content scan signature
|
||||
|
||||
Scan checks live in `scripts/scan-dbs.php`, in `wp_content_scan()`.
|
||||
|
||||
Each check should produce a flag dict on hit:
|
||||
|
||||
```php
|
||||
$flags[] = [
|
||||
'severity' => 'high', // 'high' refuses the DB (per default threshold N=1)
|
||||
// 'medium' / 'low' flag in the report but allow import
|
||||
'code' => 'short_machine_readable_code',
|
||||
'details' => 'Human-readable explanation including the matched value(s).',
|
||||
];
|
||||
```
|
||||
|
||||
### Safety checklist
|
||||
|
||||
1. **Severity reflects confidence.** Use `high` only when a false
|
||||
positive is acceptable for the customer (they re-import via the
|
||||
"import anyway" UI button). Errors of measurement here translate
|
||||
directly to admin support tickets.
|
||||
2. **Check is fast.** The whole `.sql` dump is in memory as a string;
|
||||
prefer `preg_match` on the raw string or a pre-built map (see
|
||||
`extract_wp_options()`) over re-parsing the full dump.
|
||||
3. **Check is well-tested.** Add a fixture under
|
||||
`tests/fixtures/wp-scan-<code>/` with a synthetic dump that
|
||||
triggers the flag and one that does not.
|
||||
4. **Allow-list awareness.** If the check is comparing a value against
|
||||
the customer's domain list, use
|
||||
`domain_in_allowlist($host, $allowedDomains)` so subdomain matches
|
||||
work consistently with the rest of the scanner.
|
||||
5. **Don't break engine swap.** `wp_content_scan()` runs AFTER the
|
||||
engine swap on the same `$rewritten` string. Both your check and
|
||||
the engine swap must be tolerant of each other's output.
|
||||
|
||||
---
|
||||
|
||||
## How to test locally
|
||||
|
||||
### Build the image
|
||||
|
||||
```bash
|
||||
docker build -t cpanel-importer:dev .
|
||||
```
|
||||
|
||||
Confirm the image is under the budget:
|
||||
|
||||
```bash
|
||||
docker images cpanel-importer:dev --format '{{.Size}}'
|
||||
```
|
||||
|
||||
Target: < 1 GB extracted (spec asks < 600 MB compressed for prod, but
|
||||
local builds typically come in around 700–900 MB extracted including
|
||||
ClamAV signature DBs).
|
||||
|
||||
### Build the fixtures
|
||||
|
||||
```bash
|
||||
bash tests/build-fixtures.sh
|
||||
```
|
||||
|
||||
Two tarballs land under `tests/fixtures/`:
|
||||
- `cpmove-clean.tar.gz` — a benign cpmove with a WordPress MyISAM dump.
|
||||
- `cpmove-alfa.tar.gz` — same shape PLUS an ALFA-style symlink to /etc.
|
||||
|
||||
### Run against the clean fixture
|
||||
|
||||
```bash
|
||||
mkdir -p /tmp/test-quarantine /tmp/test-sanitized
|
||||
docker run --rm \
|
||||
-e IMPORT_ID=test-clean \
|
||||
-e IMPORT_USERNAME=testuser \
|
||||
-e IMPORT_BACKUP_FILE=/host/backup/cpmove-clean.tar.gz \
|
||||
-e CLAMAV_REFRESH=false \
|
||||
-v "$PWD/tests/fixtures/cpmove-clean.tar.gz:/host/backup/cpmove-clean.tar.gz:ro" \
|
||||
-v /tmp/test-quarantine:/host/quarantine \
|
||||
-v /tmp/test-sanitized:/host/sanitized \
|
||||
cpanel-importer:dev
|
||||
```
|
||||
|
||||
Expect `status=completed`, MyISAM count > 0, no flags, exit 0.
|
||||
|
||||
### Run against the ALFA fixture
|
||||
|
||||
```bash
|
||||
docker run --rm \
|
||||
-e IMPORT_ID=test-alfa \
|
||||
-e IMPORT_USERNAME=testuser \
|
||||
-e IMPORT_BACKUP_FILE=/host/backup/cpmove-alfa.tar.gz \
|
||||
-e CLAMAV_REFRESH=false \
|
||||
-v "$PWD/tests/fixtures/cpmove-alfa.tar.gz:/host/backup/cpmove-alfa.tar.gz:ro" \
|
||||
-v /tmp/test-quarantine:/host/quarantine \
|
||||
-v /tmp/test-sanitized:/host/sanitized \
|
||||
cpanel-importer:dev
|
||||
```
|
||||
|
||||
Expect non-zero exit, `status=failed`, `failed_stage=extract`, and
|
||||
stderr from inside the container containing
|
||||
`tarball contains dangerous symlinks; aborting`.
|
||||
|
||||
### Iterating on PHP / shell scripts
|
||||
|
||||
The `scripts/` directory is `COPY`ed in late in the Dockerfile, so
|
||||
edits there only re-trigger the last layer of the build — typical
|
||||
turnaround is ~5 seconds.
|
||||
|
||||
---
|
||||
|
||||
## Code style
|
||||
|
||||
- Bash scripts: `set -euo pipefail`, absolute paths only, every external
|
||||
command on its own logical line, comment each non-obvious flag.
|
||||
- PHP scripts: 4-space indent, single quotes for non-interpolated
|
||||
strings, `<?php` opener on line 1, no closing `?>`.
|
||||
- All scripts must be idempotent — the worker may be re-run against the
|
||||
same `IMPORT_ID` on retry; second runs must overwrite the prior
|
||||
`report.json` cleanly.
|
||||
|
||||
---
|
||||
|
||||
## CI
|
||||
|
||||
Pushes to `trunk` build + push the image to
|
||||
`repo.anhonesthost.net/cloud-hosting-platform/cpanel-importer:latest` and
|
||||
`...:<sha>`. Pushes of a `YYYY.MM.NNN` tag additionally tag
|
||||
`...:YYYY.MM.NNN`. CI runs the smoke test (image starts and
|
||||
`echo ok` runs) and PHP `-l` / `bash -n` syntax checks on every script
|
||||
before pushing.
|
||||
|
||||
See `.gitea/workflows/build-push.yaml`.
|
||||
Reference in New Issue
Block a user