193 lines
6.7 KiB
Markdown
193 lines
6.7 KiB
Markdown
|
|
# Contributing — cpanel-importer
|
|||
|
|
|
|||
|
|
## How to add an auto-cleaner pattern
|
|||
|
|
|
|||
|
|
Auto-cleaners live in `scripts/scan-files.php`, in the `$cleaners`
|
|||
|
|
registry at the top of the main flow.
|
|||
|
|
|
|||
|
|
A cleaner has three parts:
|
|||
|
|
|
|||
|
|
```php
|
|||
|
|
$cleaners['short-cleaner-name'] = [
|
|||
|
|
'class' => 'KNOWN_REMOVABLE', // or 'REMOVABLE_WITH_BACKUP'
|
|||
|
|
'match' => fn(string $sig): bool => str_contains($sig, 'PHP.Trojan.EvalB64'),
|
|||
|
|
'clean' => function (string $path): bool {
|
|||
|
|
// Read $path, transform, write back; return true on success.
|
|||
|
|
// The file at $path is the LIVE extracted file — your edit
|
|||
|
|
// here is what ends up in /host/sanitized/<id>/extracted/.
|
|||
|
|
// The original has ALREADY been backed up to <path>.original
|
|||
|
|
// by the orchestrator before this is called.
|
|||
|
|
},
|
|||
|
|
];
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Safety checklist before merging a new cleaner
|
|||
|
|
|
|||
|
|
1. **Backup is guaranteed.** The orchestrator copies the file to
|
|||
|
|
`<quarantine>/<relpath>.original` BEFORE calling `clean()`. Verify
|
|||
|
|
this is still true in `scan-files.php` if you refactor the dispatch.
|
|||
|
|
2. **Cleaner is idempotent.** Running it twice on the same file must
|
|||
|
|
produce the same output the second time as the first.
|
|||
|
|
3. **Cleaner is conservative.** If the file does NOT match your
|
|||
|
|
transform exactly, return `false` (the orchestrator will fall back
|
|||
|
|
to quarantining). Never "best-effort" a half-clean.
|
|||
|
|
4. **Cleaner has a regression test.** Add a fixture under
|
|||
|
|
`tests/fixtures/cleaner-<name>/` with input + expected output, and
|
|||
|
|
exercise it from `tests/run-tests.sh` (or your CI step).
|
|||
|
|
5. **Cleaner classification is correct.**
|
|||
|
|
- `KNOWN_REMOVABLE` = the whole pattern is known-safe to strip.
|
|||
|
|
- `REMOVABLE_WITH_BACKUP` = legit file with injected lines; we are
|
|||
|
|
confident in surgical removal but back up anyway.
|
|||
|
|
- `QUARANTINE_ONLY` = no clean variant; don't write a `clean()`.
|
|||
|
|
6. **Signature match is tight.** Prefer
|
|||
|
|
`str_contains($sig, 'specific-sig-name')` over broad regex matches.
|
|||
|
|
A false-positive cleaner can corrupt customer files.
|
|||
|
|
|
|||
|
|
### Manual test loop
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
docker build -t cpanel-importer:dev .
|
|||
|
|
# Place a known-infected synthetic file under tests/fixtures/cleaner-X/in/
|
|||
|
|
# Run scan-files.php directly against it:
|
|||
|
|
docker run --rm \
|
|||
|
|
--entrypoint /scripts/scan-files.php \
|
|||
|
|
-v "$PWD/tests/fixtures/cleaner-X/in:/tmp/extract" \
|
|||
|
|
-v "$PWD/tests/fixtures/cleaner-X/quarantine:/host/quarantine" \
|
|||
|
|
cpanel-importer:dev \
|
|||
|
|
--extract /tmp/extract --quarantine /host/quarantine \
|
|||
|
|
--report /tmp/r.json --import-id test
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## How to add a WordPress content scan signature
|
|||
|
|
|
|||
|
|
Scan checks live in `scripts/scan-dbs.php`, in `wp_content_scan()`.
|
|||
|
|
|
|||
|
|
Each check should produce a flag dict on hit:
|
|||
|
|
|
|||
|
|
```php
|
|||
|
|
$flags[] = [
|
|||
|
|
'severity' => 'high', // 'high' refuses the DB (per default threshold N=1)
|
|||
|
|
// 'medium' / 'low' flag in the report but allow import
|
|||
|
|
'code' => 'short_machine_readable_code',
|
|||
|
|
'details' => 'Human-readable explanation including the matched value(s).',
|
|||
|
|
];
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Safety checklist
|
|||
|
|
|
|||
|
|
1. **Severity reflects confidence.** Use `high` only when a false
|
|||
|
|
positive is acceptable for the customer (they re-import via the
|
|||
|
|
"import anyway" UI button). Errors of measurement here translate
|
|||
|
|
directly to admin support tickets.
|
|||
|
|
2. **Check is fast.** The whole `.sql` dump is in memory as a string;
|
|||
|
|
prefer `preg_match` on the raw string or a pre-built map (see
|
|||
|
|
`extract_wp_options()`) over re-parsing the full dump.
|
|||
|
|
3. **Check is well-tested.** Add a fixture under
|
|||
|
|
`tests/fixtures/wp-scan-<code>/` with a synthetic dump that
|
|||
|
|
triggers the flag and one that does not.
|
|||
|
|
4. **Allow-list awareness.** If the check is comparing a value against
|
|||
|
|
the customer's domain list, use
|
|||
|
|
`domain_in_allowlist($host, $allowedDomains)` so subdomain matches
|
|||
|
|
work consistently with the rest of the scanner.
|
|||
|
|
5. **Don't break engine swap.** `wp_content_scan()` runs AFTER the
|
|||
|
|
engine swap on the same `$rewritten` string. Both your check and
|
|||
|
|
the engine swap must be tolerant of each other's output.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## How to test locally
|
|||
|
|
|
|||
|
|
### Build the image
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
docker build -t cpanel-importer:dev .
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Confirm the image is under the budget:
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
docker images cpanel-importer:dev --format '{{.Size}}'
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Target: < 1 GB extracted (spec asks < 600 MB compressed for prod, but
|
|||
|
|
local builds typically come in around 700–900 MB extracted including
|
|||
|
|
ClamAV signature DBs).
|
|||
|
|
|
|||
|
|
### Build the fixtures
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
bash tests/build-fixtures.sh
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Two tarballs land under `tests/fixtures/`:
|
|||
|
|
- `cpmove-clean.tar.gz` — a benign cpmove with a WordPress MyISAM dump.
|
|||
|
|
- `cpmove-alfa.tar.gz` — same shape PLUS an ALFA-style symlink to /etc.
|
|||
|
|
|
|||
|
|
### Run against the clean fixture
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
mkdir -p /tmp/test-quarantine /tmp/test-sanitized
|
|||
|
|
docker run --rm \
|
|||
|
|
-e IMPORT_ID=test-clean \
|
|||
|
|
-e IMPORT_USERNAME=testuser \
|
|||
|
|
-e IMPORT_BACKUP_FILE=/host/backup/cpmove-clean.tar.gz \
|
|||
|
|
-e CLAMAV_REFRESH=false \
|
|||
|
|
-v "$PWD/tests/fixtures/cpmove-clean.tar.gz:/host/backup/cpmove-clean.tar.gz:ro" \
|
|||
|
|
-v /tmp/test-quarantine:/host/quarantine \
|
|||
|
|
-v /tmp/test-sanitized:/host/sanitized \
|
|||
|
|
cpanel-importer:dev
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Expect `status=completed`, MyISAM count > 0, no flags, exit 0.
|
|||
|
|
|
|||
|
|
### Run against the ALFA fixture
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
docker run --rm \
|
|||
|
|
-e IMPORT_ID=test-alfa \
|
|||
|
|
-e IMPORT_USERNAME=testuser \
|
|||
|
|
-e IMPORT_BACKUP_FILE=/host/backup/cpmove-alfa.tar.gz \
|
|||
|
|
-e CLAMAV_REFRESH=false \
|
|||
|
|
-v "$PWD/tests/fixtures/cpmove-alfa.tar.gz:/host/backup/cpmove-alfa.tar.gz:ro" \
|
|||
|
|
-v /tmp/test-quarantine:/host/quarantine \
|
|||
|
|
-v /tmp/test-sanitized:/host/sanitized \
|
|||
|
|
cpanel-importer:dev
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Expect non-zero exit, `status=failed`, `failed_stage=extract`, and
|
|||
|
|
stderr from inside the container containing
|
|||
|
|
`tarball contains dangerous symlinks; aborting`.
|
|||
|
|
|
|||
|
|
### Iterating on PHP / shell scripts
|
|||
|
|
|
|||
|
|
The `scripts/` directory is `COPY`ed in late in the Dockerfile, so
|
|||
|
|
edits there only re-trigger the last layer of the build — typical
|
|||
|
|
turnaround is ~5 seconds.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Code style
|
|||
|
|
|
|||
|
|
- Bash scripts: `set -euo pipefail`, absolute paths only, every external
|
|||
|
|
command on its own logical line, comment each non-obvious flag.
|
|||
|
|
- PHP scripts: 4-space indent, single quotes for non-interpolated
|
|||
|
|
strings, `<?php` opener on line 1, no closing `?>`.
|
|||
|
|
- All scripts must be idempotent — the worker may be re-run against the
|
|||
|
|
same `IMPORT_ID` on retry; second runs must overwrite the prior
|
|||
|
|
`report.json` cleanly.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## CI
|
|||
|
|
|
|||
|
|
Pushes to `trunk` build + push the image to
|
|||
|
|
`repo.anhonesthost.net/cloud-hosting-platform/cpanel-importer:latest` and
|
|||
|
|
`...:<sha>`. Pushes of a `YYYY.MM.NNN` tag additionally tag
|
|||
|
|
`...:YYYY.MM.NNN`. CI runs the smoke test (image starts and
|
|||
|
|
`echo ok` runs) and PHP `-l` / `bash -n` syntax checks on every script
|
|||
|
|
before pushing.
|
|||
|
|
|
|||
|
|
See `.gitea/workflows/build-push.yaml`.
|