# Contributing — cpanel-importer ## How to add an auto-cleaner pattern Auto-cleaners live in `scripts/scan-files.php`, in the `$cleaners` registry at the top of the main flow. A cleaner has three parts: ```php $cleaners['short-cleaner-name'] = [ 'class' => 'KNOWN_REMOVABLE', // or 'REMOVABLE_WITH_BACKUP' 'match' => fn(string $sig): bool => str_contains($sig, 'PHP.Trojan.EvalB64'), 'clean' => function (string $path): bool { // Read $path, transform, write back; return true on success. // The file at $path is the LIVE extracted file — your edit // here is what ends up in /host/sanitized//extracted/. // The original has ALREADY been backed up to .original // by the orchestrator before this is called. }, ]; ``` ### Safety checklist before merging a new cleaner 1. **Backup is guaranteed.** The orchestrator copies the file to `/.original` BEFORE calling `clean()`. Verify this is still true in `scan-files.php` if you refactor the dispatch. 2. **Cleaner is idempotent.** Running it twice on the same file must produce the same output the second time as the first. 3. **Cleaner is conservative.** If the file does NOT match your transform exactly, return `false` (the orchestrator will fall back to quarantining). Never "best-effort" a half-clean. 4. **Cleaner has a regression test.** Add a fixture under `tests/fixtures/cleaner-/` with input + expected output, and exercise it from `tests/run-tests.sh` (or your CI step). 5. **Cleaner classification is correct.** - `KNOWN_REMOVABLE` = the whole pattern is known-safe to strip. - `REMOVABLE_WITH_BACKUP` = legit file with injected lines; we are confident in surgical removal but back up anyway. - `QUARANTINE_ONLY` = no clean variant; don't write a `clean()`. 6. **Signature match is tight.** Prefer `str_contains($sig, 'specific-sig-name')` over broad regex matches. A false-positive cleaner can corrupt customer files. ### Manual test loop ```bash docker build -t cpanel-importer:dev . # Place a known-infected synthetic file under tests/fixtures/cleaner-X/in/ # Run scan-files.php directly against it: docker run --rm \ --entrypoint /scripts/scan-files.php \ -v "$PWD/tests/fixtures/cleaner-X/in:/tmp/extract" \ -v "$PWD/tests/fixtures/cleaner-X/quarantine:/host/quarantine" \ cpanel-importer:dev \ --extract /tmp/extract --quarantine /host/quarantine \ --report /tmp/r.json --import-id test ``` --- ## How to add a WordPress content scan signature Scan checks live in `scripts/scan-dbs.php`, in `wp_content_scan()`. Each check should produce a flag dict on hit: ```php $flags[] = [ 'severity' => 'high', // 'high' refuses the DB (per default threshold N=1) // 'medium' / 'low' flag in the report but allow import 'code' => 'short_machine_readable_code', 'details' => 'Human-readable explanation including the matched value(s).', ]; ``` ### Safety checklist 1. **Severity reflects confidence.** Use `high` only when a false positive is acceptable for the customer (they re-import via the "import anyway" UI button). Errors of measurement here translate directly to admin support tickets. 2. **Check is fast.** The whole `.sql` dump is in memory as a string; prefer `preg_match` on the raw string or a pre-built map (see `extract_wp_options()`) over re-parsing the full dump. 3. **Check is well-tested.** Add a fixture under `tests/fixtures/wp-scan-/` with a synthetic dump that triggers the flag and one that does not. 4. **Allow-list awareness.** If the check is comparing a value against the customer's domain list, use `domain_in_allowlist($host, $allowedDomains)` so subdomain matches work consistently with the rest of the scanner. 5. **Don't break engine swap.** `wp_content_scan()` runs AFTER the engine swap on the same `$rewritten` string. Both your check and the engine swap must be tolerant of each other's output. --- ## How to test locally ### Build the image ```bash docker build -t cpanel-importer:dev . ``` Confirm the image is under the budget: ```bash docker images cpanel-importer:dev --format '{{.Size}}' ``` Target: < 1 GB extracted (spec asks < 600 MB compressed for prod, but local builds typically come in around 700–900 MB extracted including ClamAV signature DBs). ### Build the fixtures ```bash bash tests/build-fixtures.sh ``` Two tarballs land under `tests/fixtures/`: - `cpmove-clean.tar.gz` — a benign cpmove with a WordPress MyISAM dump. - `cpmove-alfa.tar.gz` — same shape PLUS an ALFA-style symlink to /etc. ### Run against the clean fixture ```bash mkdir -p /tmp/test-quarantine /tmp/test-sanitized docker run --rm \ -e IMPORT_ID=test-clean \ -e IMPORT_USERNAME=testuser \ -e IMPORT_BACKUP_FILE=/host/backup/cpmove-clean.tar.gz \ -e CLAMAV_REFRESH=false \ -v "$PWD/tests/fixtures/cpmove-clean.tar.gz:/host/backup/cpmove-clean.tar.gz:ro" \ -v /tmp/test-quarantine:/host/quarantine \ -v /tmp/test-sanitized:/host/sanitized \ cpanel-importer:dev ``` Expect `status=completed`, MyISAM count > 0, no flags, exit 0. ### Run against the ALFA fixture ```bash docker run --rm \ -e IMPORT_ID=test-alfa \ -e IMPORT_USERNAME=testuser \ -e IMPORT_BACKUP_FILE=/host/backup/cpmove-alfa.tar.gz \ -e CLAMAV_REFRESH=false \ -v "$PWD/tests/fixtures/cpmove-alfa.tar.gz:/host/backup/cpmove-alfa.tar.gz:ro" \ -v /tmp/test-quarantine:/host/quarantine \ -v /tmp/test-sanitized:/host/sanitized \ cpanel-importer:dev ``` Expect non-zero exit, `status=failed`, `failed_stage=extract`, and stderr from inside the container containing `tarball contains dangerous symlinks; aborting`. ### Iterating on PHP / shell scripts The `scripts/` directory is `COPY`ed in late in the Dockerfile, so edits there only re-trigger the last layer of the build — typical turnaround is ~5 seconds. --- ## Code style - Bash scripts: `set -euo pipefail`, absolute paths only, every external command on its own logical line, comment each non-obvious flag. - PHP scripts: 4-space indent, single quotes for non-interpolated strings, ``. - All scripts must be idempotent — the worker may be re-run against the same `IMPORT_ID` on retry; second runs must overwrite the prior `report.json` cleanly. --- ## CI Pushes to `trunk` build + push the image to `repo.anhonesthost.net/cloud-hosting-platform/cpanel-importer:latest` and `...:`. Pushes of a `YYYY.MM.NNN` tag additionally tag `...:YYYY.MM.NNN`. CI runs the smoke test (image starts and `echo ok` runs) and PHP `-l` / `bash -n` syntax checks on every script before pushing. See `.gitea/workflows/build-push.yaml`.