main
11 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
08b995a29c |
scan-dbs: stream the SQL file instead of loading 5GB+ into memory
All checks were successful
cpanel-importer Build and Push / Build-and-Push (push) Successful in 1m21s
Surfaced on whp02 alsacorp retry: scan-dbs.php hit PHP fatal at line 86
"Allowed memory size of 134217728 bytes exhausted (tried to allocate
5488440384 bytes)" while loading alsacorp_alsa1.sql via
file_get_contents. The dump is multi-GB (typical for WooCommerce stores
with media metadata); the 128MB-default PHP memory_limit + the 2GB
cgroup on the container both fail well below the actual file size.
Rewrote the per-DB pass as a streaming loop over 4MB chunks:
- engine_swap_chunk: same `\bENGINE=MyISAM\b` regex, mutates a
per-DB counter via reference so the per-chunk counts accumulate
into a single myisam_to_innodb total.
- is_wp_chunk_scan: OR-folds the four WP fingerprint regexes
(CREATE TABLE *_options, *_posts, *_users + the
'siteurl|home|template|stylesheet' sentinel) into a state dict;
any chunk that flips a flag from false to true keeps it true for
the rest of the file. Caller AND-folds at finalization.
- wp_options_chunk_scan: extracts (option_name, option_value)
tuples from INSERT INTO options statements as they pass through.
First occurrence wins so we keep the live value, not later
duplicates.
- wp_content_scan_from_values: extracted the finalization logic
from the legacy wp_content_scan() so the streaming path can
submit a pre-built option-values map instead of re-scanning the
full string.
Per-chunk carry: a 128-byte buffer at the end of each chunk is held
back and prepended to the next chunk so a pattern split across a
chunk boundary (e.g. "ENGINE=" at byte 4194302, "MyISAM" at byte
4194304) is still seen by the regex. 128 bytes is generous for our
patterns (longest is "ENGINE = MyISAM" with whitespace flex).
Output goes to a `<db>.sql.tmp` first, then renamed to
`<db>.sql{,.flagged}` once we know the flag verdict — avoids a
partial file if the scan dies mid-stream.
Legacy `engine_swap`, `is_wordpress_dump`, and the unused
`wp_content_scan`+`extract_wp_options` are kept in place for the
small-file path (none of them currently called from the new
streaming loop, but they're public-ish helpers the next dbsanitize
revision could reuse).
Resident memory now bounded to <16 MB per DB regardless of input
file size — should handle the 30 GB+ outliers we'll inevitably see.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
4888f85b54 |
scan-files: silence open_basedir warnings during walk
All checks were successful
cpanel-importer Build and Push / Build-and-Push (push) Successful in 53s
Container run on darkside completed end-to-end successfully (170,753 files scanned, 78 symlinks skipped via filter, 65 quarantined, walk errors = 0) — but the import log was flooded with PHP Warnings on every cpmove-internal symlink whose absolute target points outside the container's open_basedir allow-list: PHP Warning: is_link(): open_basedir restriction in effect. File(/host/sanitized/.../cybercoveconsulting.com/wp-content/db.php) is not within the allowed path(s): (/host:/tmp:...) The actual code path was correct — is_link() still returns true when warning, so the filter callback properly skipped these. But the noise made the streamed [container] log on the panel side unreadable (hundreds of warning lines per real signal line). Root cause: PHP's open_basedir check normalizes via realpath() even for is_link/is_file. cpmove symlinks like: db.php -> /home/<user>/<addon>.com/wp-content/db.php access-logs -> /usr/local/apache/domlogs/<user> cpanel-styled -> /usr/local/cpanel/base/frontend/.../glass have absolute targets that don't exist anywhere in the container (no /home, no /usr/local), so realpath() can't normalize them under any allow-list entry. PHP fires Warning, returns the lstat answer anyway, and our filter handles the skip correctly. Fix: a scoped set_error_handler around the walk that suppresses ONLY E_WARNINGs containing 'open_basedir restriction'. Non-open_basedir warnings still surface. The handler is restored immediately after the file-count loop, so subsequent stages (clamscan output parsing, quarantine actions) keep the default error reporting. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
9652a71816 |
extract: skip cpmove-*/homedir/mail tree
All checks were successful
cpanel-importer Build and Push / Build-and-Push (push) Successful in 54s
WHP does not import cPanel mailbox data (mail-import is a panel-side roadmap item, not a sandbox-mode step). Extracting + ClamAV-scanning the mail tree wastes time and disk: on real customer accounts the mail dir often dwarfs everything else (10+ GB of historical maildir/mbox), and clamscan has to walk every message. Appended to the existing tar --exclude-from list (where we already strip DANGEROUS-classified symlinks) so the existing plumbing covers both. tar's fnmatch globs handle nested mail subdirs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
cda432e808 |
scan-files: skip symlinks during file walk to avoid open_basedir aborts
All checks were successful
cpanel-importer Build and Push / Build-and-Push (push) Successful in 56s
cPanel cpmove tarballs contain symlinks with absolute targets pointing at the SOURCE server's filesystem (e.g. addon docroots symlinked to /home/<user>/<addondomain>.com/ on the cPanel host). After extract into the container, those symlinks dangle — their targets don't exist in the container's namespace AND are not under any open_basedir-allowed prefix. PHP's SplFileInfo::isFile() (called from the RecursiveIteratorIterator file-count loop) follows symlinks. The realpath check against open_basedir then fires on the symlink TARGET, not the link path, and throws RuntimeException mid-iteration — aborting the entire scan without writing report.json. Surfaced on darkside import as: PHP Fatal error: Uncaught RuntimeException: SplFileInfo::isFile(): open_basedir restriction in effect. File(/host/sanitized/.../ cybercoveconsulting.com/wp-content/db.php) is not within the allowed path(s): (/host:/tmp:/opt/whp:/scripts:/var/lib/clamav:...) Fix is two-layered: 1. RecursiveCallbackFilterIterator pre-filters symlinks via is_link() before they reach hasChildren/isFile. is_link is open_basedir-safe (it stats the link itself, doesn't resolve). Skipped count is reported on STDERR so operators see what was skipped. 2. try/catch around the per-entry isFile() as a defense-in-depth layer — if any other fs op throws mid-walk (race, planted device node, etc.) we count it as a walk_error and continue, not abort. Note that clamscan already walks the extract tree on its own pass and its default symlink posture is "don't follow" — the same posture we want here. Symlink-as-file would also be useless to quarantine (it's a 0-byte fs entry whose target is the actual artifact). Skipping symlinks therefore doesn't miss anything. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
a60bf53a4a |
fix: move EXTRACT_DIR + WORK_DIR off tmpfs onto disk-backed bind mount
All checks were successful
cpanel-importer Build and Push / Build-and-Push (push) Successful in 56s
rc=137 OOM kill triaged on whp02 darkside import. dmesg confirmed:
memory: usage 2097100kB, limit 2097152kB, failcnt 132
oom_kill_process ... task=bash uid=999
Root cause: extract.sh untars the cpmove into EXTRACT_DIR which was
/tmp/extract — a tmpfs mount (RAM-backed). The container's
--memory 2g cgroup ceiling counts tmpfs writes against RSS, so the
3 GB cpmove decompressing into tmpfs hit the limit at ~7s into tar
and the kernel killed the bash process running extract.sh.
Fix is structural, not a memory bump: the disk-backed bind mount
at /host/sanitized (mapped to /var/lib/whp/cpanel-importer-extract
on host) has effectively unlimited capacity and doesn't count against
the cgroup memory limit. Moving the working dirs there sidesteps the
OOM class entirely.
Layout change:
EXTRACT_DIR /tmp/extract -> $SANITIZED_DIR/extract-work
WORK_DIR /tmp/sanitized -> $SANITIZED_DIR/work
Two ripple changes:
- The old rsync_out stage cross-filesystem-copied ~10 GB from tmpfs
to /host/sanitized/<id>/extracted. That's now a same-filesystem
`mv` (constant-time rename) since extract-work IS already inside
/host/sanitized/<id>/. Stage renamed to finalize_layout for
clarity; pre-existing wipe of extracted/ + mysql/ guards against
partial-run residue.
- The stripped-symlinks actions sidecar moved to /tmp explicitly
(entrypoint.sh passes the 4th arg to extract.sh) so finalize's
rename doesn't (a) carry a dotfile into the cleaned tree the
panel imports and (b) move it out from under write_report's read.
Also fixes the unrelated-but-cosmetic freshclam warning by cd'ing to
/var/lib/clamav (the configured DatabaseDirectory, tmpfs writable)
before invoking freshclam in a subshell. The "Can't create
freshclam.dat in /opt/whp" errors were because /opt/whp is the
container WORKDIR which lives on the read-only rootfs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
db78a36935 |
sanitize-dont-refuse: strip dangerous symlinks via tar --exclude
All checks were successful
cpanel-importer Build and Push / Build-and-Push (push) Successful in 1m10s
Shifts the sandbox's symlink handling from "refuse the whole tarball"
to "drop the dangerous entries from extraction and record them as
quarantine actions". This is what sandbox mode is supposed to do —
make malicious cpmoves safe to import rather than gate-keeping them.
Three coordinated changes:
1. scan-symlinks.php — exit 0 even when DANGEROUS findings exist. The
JSON report is the source of truth; the caller decides what to do
with it. Usage/IO errors still exit 2. STDERR still names each
finding (now "STRIP X -> Y" instead of "refusing tarball") so the
streamed [container] log on the panel side surfaces them.
2. extract.sh — reads the scan-symlinks report, builds a
newline-delimited exclude list of DANGEROUS archive_paths, and
passes it to `tar --exclude-from=`. The stripped entries never
reach the filesystem; tar skips them silently. Also writes a small
JSON sidecar at $EXTRACT_DIR/.cpanel-importer-stripped-symlinks.json
describing each strip-action so the merge step can surface them in
report.json without re-parsing scan-symlinks output.
3. entrypoint.sh write_report — reads the sidecar, prepends each
stripped_dangerous_symlink action to the actions[] list, bumps
files_quarantined by the strip-count, and rewrites
summary_for_panel.alert_message to call them out distinctly:
"N dangerous symlink(s) stripped during extract; M files
quarantined; K cleaned in place. Customer site may have been
compromised at the source — recommend review."
Result on darkside: instead of the import failing on the ALFA
alfasymlink/root entry, that entry is silently skipped during
extract, recorded as `stripped_dangerous_symlink path=... target=/
reason=absolute target is root /`, and the rest of the tarball
extracts normally. Subsequent ClamAV scan + DB sanitization run
to completion; panel sees a verdict-completed import with the
stripped symlinks visible in the Sanitization Sandbox panel on the
results page.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
60a232c54a |
scan-symlinks: tighten DANGEROUS prefix list to actual destruction class
All checks were successful
cpanel-importer Build and Push / Build-and-Push (push) Successful in 1m27s
Previous version of scan-symlinks.php was a verbatim port of the panel's
scanTarballForDangerousSymlinks(), which flagged every symlink whose
target sits under /etc, /usr, /bin, /sbin, /lib, /lib64, /var/lib,
/var/log, /var/cache, or /var/spool. That's the right posture for the
panel's pre-extract scan in DIRECT mode — refuse before extract — but
it makes the container REFUSE every cpmove that comes from a real
cPanel source server, including totally clean ones. Standard cPanel
accounts ship with stock symlinks like:
homedir/access-logs -> /usr/local/apache/domlogs/<user>
homedir/var/cpanel/styled/current_style
-> /usr/local/cpanel/base/frontend/...
homedir/.cpanel/email -> /usr/local/cpanel/...
homedir/etc -> /var/cpanel/userhomes/<user>/etc
Every customer tarball has 5-20 of these. Treating them as DANGEROUS
made the container abort with verdict=refused before extract.sh ever
ran. Surfaced on darkside import to whp02: scan-symlinks found
homedir/access-logs (a textbook cPanel symlink) and the import bombed.
The real destruction class — what ALFA TEaM Shell uses, what we saw
brick whp02 in May — is symlinks whose target is the exact filesystem
root or under one of the genuinely catastrophic system trees that
either escape the customer account or clobber boot/config/proc state:
/ exact root (the classic alfasymlink/root)
/etc config tampering, /etc/shadow exfil
/root root home dir
/boot bootloader / kernel
/proc process info / kernel knobs
/sys sysfs
/dev device nodes
Everything else (notably /usr, /var) becomes UNCERTAIN: reported in
the JSON output but doesn't refuse the tarball. With --cap-drop=ALL
--read-only --network none --user 999, a /usr-targeting symlink in
the container's sandbox can at worst dangle on extract; it can't
touch the host.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
5e206edc50 |
ci: lint inside built image at /scripts/ instead of bind-mounting host $PWD
All checks were successful
cpanel-importer Build and Push / Build-and-Push (push) Successful in 1m0s
Two failed attempts before this: - Run 3703 (orig): docker run -v "$PWD:/src" --entrypoint php ... Failed because Gitea's act-based runner is itself containerized; $PWD inside the runner is not a path the host docker daemon can bind mount. "Could not open input file: /src/scripts/scan-dbs.php". - Run 3704 (first attempt): php -l "$f" directly on the runner. Failed because the runner image (catthehacker/ubuntu act) doesn't ship php-cli by default. "php: command not found" exit 127. The right fix: the Dockerfile already does COPY --chown=whp-import:whp-import scripts/ /scripts/ so the scripts exist inside the just-built smoke image at /scripts/. Linting via `docker run --entrypoint php cpanel-importer:smoke -l /scripts/foo.php` reads from the image's own rootfs — no bind mount, no runner-side php dependency. The for-loop var $f is still scripts/foo.php (matches host glob), and the path inside the container becomes /scripts/foo.php after the `-l "/$f"` prefix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
cff68569cb |
ci: lint scripts directly on runner instead of via docker-in-docker
Some checks failed
cpanel-importer Build and Push / Build-and-Push (push) Failing after 1m17s
The Gitea runner is itself containerized, so the previous docker run -v "$PWD:/src" --entrypoint php cpanel-importer:smoke -l "/src/$f" shape couldn't bind mount the checkout: the runner's $PWD is not a path the host docker daemon can reach. CI run 3703 surfaced this as "Could not open input file: /src/scripts/scan-dbs.php" — the file existed on the checkout, but the new container saw an empty /src. Running php / bash directly on the runner side-steps the entire DinD issue. ubuntu-latest already ships php-cli and bash, the checkout files live in $PWD where the runner can see them, no docker-socket gymnastics needed. Smoke test (echo ok in the built image) and the build-and-push step keep their docker invocations — those run against the built image artifact, not the source tree, so DinD bind mount isn't involved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
b4ecdbc3b5 |
ci: trigger on main branch (renamed from trunk)
Some checks failed
cpanel-importer Build and Push / Build-and-Push (push) Failing after 51s
The Gitea repo's default branch is main; the local development branch
stayed trunk and pushes via `trunk:main` refspec. Workflow needs to
match what the remote sees.
run-name now interpolates ${{ gitea.ref_name }} so it accurately names
the branch on any future renames.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
5487dfc8f1 |
Initial bootstrap: cpanel-importer sanitization sandbox
Skeleton for the cpanel-importer Docker container — a one-shot sandbox the WHP panel invokes BEFORE extracting a customer cpmove tarball. See cpanel-import-container-spec.md (in /workspace/) for the full design. What this ships in v1.0: - Dockerfile: almalinux:10-minimal + PHP 8.4 (Remi) + ClamAV 1.4 + SaneSecurity Foxhole.PHP rules + tar/mariadb-client/rsync. Runs as UID 999 (whp-import) via the panel-side --user 999:999 flag. - scripts/entrypoint.sh: validates env, runs (optional) freshclam, drives extract -> scan-files -> scan-dbs -> rsync -> report.json. - scripts/extract.sh + scripts/lib/scan-symlinks.php: pre-extract symlink scan ported standalone from web-files/libs/CpanelBackupImporter.php (the existing 2026-05-29 whp02 destruction-vector fix). Aborts with exit 3 before tar runs if any DANGEROUS symlink is found. - scripts/scan-files.php: ClamAV walk + classify-and-action. v1.0 ships with an empty cleaner registry — every hit is QUARANTINE_ONLY. Cleaner hooks are stubbed for v1.1. - scripts/scan-dbs.php: regex MyISAM -> InnoDB rewrite (always applied), WordPress identification, and ONE WP content scan check (siteurl_external_domain). v1.1 will grow the check set. - scripts/lib/safety-net.php: container-narrow open_basedir allow-list, much tighter than the panel-side one. - .gitea/workflows/build-push.yaml: builds + smoke-tests + PHP-syntax-checks + bash-syntax-checks before pushing to repo.anhonesthost.net/cloud-hosting-platform/cpanel-importer. - tests/build-fixtures.sh: builds cpmove-clean.tar.gz (benign WP dump) and cpmove-alfa.tar.gz (the ALFA-shell symlink-to-/etc vector) for local end-to-end testing. - README.md / CONTRIBUTING.md: docker-run invocation, bind-mount catalog, report.json schema, how to add a cleaner pattern or a WP scan signature. Local acceptance test results: - clean fixture -> status=completed, 3 MyISAM->InnoDB, no flags, 0 - ALFA fixture -> exit 1, status=failed, failed_stage=extract, "tarball contains dangerous symlinks; aborting" on stderr - compromised-siteurl fixture -> imported_into_new_server=false, .flagged file written, summary_for_panel.show_alert=true Image size: 197 MB compressed (gzipped docker save), ~397 MB unique layers extracted. Well under the spec's 600 MB compressed / 1.2 GB extracted budget. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |