Files
cpanel-importer/Dockerfile
Claude (bootstrap) 5487dfc8f1 Initial bootstrap: cpanel-importer sanitization sandbox
Skeleton for the cpanel-importer Docker container — a one-shot
sandbox the WHP panel invokes BEFORE extracting a customer cpmove
tarball. See cpanel-import-container-spec.md (in /workspace/) for the
full design.

What this ships in v1.0:

- Dockerfile: almalinux:10-minimal + PHP 8.4 (Remi) + ClamAV 1.4 +
  SaneSecurity Foxhole.PHP rules + tar/mariadb-client/rsync. Runs as
  UID 999 (whp-import) via the panel-side --user 999:999 flag.

- scripts/entrypoint.sh: validates env, runs (optional) freshclam,
  drives extract -> scan-files -> scan-dbs -> rsync -> report.json.

- scripts/extract.sh + scripts/lib/scan-symlinks.php: pre-extract
  symlink scan ported standalone from
  web-files/libs/CpanelBackupImporter.php (the existing 2026-05-29
  whp02 destruction-vector fix). Aborts with exit 3 before tar runs
  if any DANGEROUS symlink is found.

- scripts/scan-files.php: ClamAV walk + classify-and-action. v1.0
  ships with an empty cleaner registry — every hit is
  QUARANTINE_ONLY. Cleaner hooks are stubbed for v1.1.

- scripts/scan-dbs.php: regex MyISAM -> InnoDB rewrite (always
  applied), WordPress identification, and ONE WP content scan check
  (siteurl_external_domain). v1.1 will grow the check set.

- scripts/lib/safety-net.php: container-narrow open_basedir
  allow-list, much tighter than the panel-side one.

- .gitea/workflows/build-push.yaml: builds + smoke-tests +
  PHP-syntax-checks + bash-syntax-checks before pushing to
  repo.anhonesthost.net/cloud-hosting-platform/cpanel-importer.

- tests/build-fixtures.sh: builds cpmove-clean.tar.gz (benign WP
  dump) and cpmove-alfa.tar.gz (the ALFA-shell symlink-to-/etc
  vector) for local end-to-end testing.

- README.md / CONTRIBUTING.md: docker-run invocation, bind-mount
  catalog, report.json schema, how to add a cleaner pattern or a WP
  scan signature.

Local acceptance test results:
- clean fixture -> status=completed, 3 MyISAM->InnoDB, no flags, 0
- ALFA fixture -> exit 1, status=failed, failed_stage=extract,
  "tarball contains dangerous symlinks; aborting" on stderr
- compromised-siteurl fixture -> imported_into_new_server=false,
  .flagged file written, summary_for_panel.show_alert=true

Image size: 197 MB compressed (gzipped docker save), ~397 MB unique
layers extracted. Well under the spec's 600 MB compressed / 1.2 GB
extracted budget.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-30 19:56:57 -07:00

167 lines
7.3 KiB
Docker

# cpanel-importer — sanitization sandbox for cPanel cpmove tarballs.
#
# See cpanel-import-container-spec.md §1 for the full design.
#
# Build: docker build -t cpanel-importer:dev .
# Run: see README.md for the docker run invocation the WHP panel uses.
FROM almalinux:10-minimal
LABEL org.opencontainers.image.title="cpanel-importer"
LABEL org.opencontainers.image.description="cPanel cpmove sanitization sandbox (ClamAV + SaneSecurity + WP content scan)"
LABEL org.opencontainers.image.source="https://repo.anhonesthost.net/cloud-hosting-platform/cpanel-importer"
LABEL org.opencontainers.image.licenses="MIT"
ARG TARGETARCH=amd64
# UID/GID of the unprivileged worker. Matches the spec — panel calls
# `docker run --user 999:999`, so this UID must actually exist inside the
# image (the EPEL `clamav` and `php` user accounts collide with low UIDs;
# 999 is well clear of them).
ARG WHP_UID=999
ARG WHP_GID=999
ENV LANG=C.UTF-8 \
LC_ALL=C.UTF-8 \
PHP_INI_DIR=/etc/php.d
# Single RUN to minimize layers and image size. Cleans dnf cache and
# the SaneSecurity rsync temp files at the end of the layer.
#
# Pinning strategy:
# - PHP 8.4: AlmaLinux 10 stock ships PHP 8.3 only; the spec asks for
# 8.4 specifically. We add Remi's modular repo and enable the
# `php:remi-8.4` stream. We DO NOT pin to a specific 8.4.X because
# Remi rolls security patches into the same minor and an exact pin
# would block updates.
# - clamav / clamav-update: track the AL10 EPEL stream. CI builds
# monthly so signature DB age is bounded.
# - SaneSecurity: rsync at build time, then again at container start
# via `freshclam` (with the SaneSecurity third-party DBs configured).
#
# Ordering note: clamav-filesystem's RPM scripts auto-create a
# `virusgroup` system group at the next free GID. If we let dnf install
# clamav first, that lands at GID 999 — which then collides with the
# UID/GID we want for whp-import. We pre-create our user FIRST so
# virusgroup ends up at 998.
RUN set -eux; \
# microdnf is what almalinux:10-minimal ships with by default.
microdnf -y install --setopt=install_weak_deps=0 \
epel-release \
dnf \
shadow-utils \
; \
# Add Remi's repo for PHP 8.4 (AL10 stock has 8.3 only).
dnf -y --setopt=install_weak_deps=0 install \
https://rpms.remirepo.net/enterprise/remi-release-10.rpm ; \
dnf -y --setopt=install_weak_deps=0 module reset php ; \
dnf -y --setopt=install_weak_deps=0 module enable php:remi-8.4 ; \
# Pre-create the worker BEFORE installing clamav so virusgroup
# doesn't claim our GID.
groupadd --system --gid ${WHP_GID} whp-import ; \
useradd --system --uid ${WHP_UID} --gid ${WHP_GID} \
--home-dir /opt/whp --no-create-home \
--shell /sbin/nologin whp-import ; \
dnf -y --setopt=install_weak_deps=0 install \
php-cli \
php-json \
php-mbstring \
php-pdo \
php-mysqlnd \
php-xml \
php-zip \
php-process \
clamav \
clamav-update \
tar \
gzip \
bzip2 \
xz \
mariadb \
rsync \
ca-certificates \
coreutils-single \
findutils \
which \
; \
mkdir -p /opt/whp /scripts /host/backup /host/quarantine /host/sanitized \
/var/lib/clamav /var/log/clamav ; \
# /opt/whp + /var/log/clamav owned by worker now. /var/lib/clamav
# ownership is set AFTER the freshclam build-time pull below — root
# has to be able to write there during the build.
chown -R whp-import:whp-import /opt/whp /var/log/clamav ; \
# /host/quarantine and /host/sanitized are the bind-mount RW
# targets. The panel chowns the HOST paths to UID 999 before
# invocation (see README.md). When the host path is empty Docker
# copies the IMAGE-side dir's ownership onto the new volume; we
# need that ownership to be whp-import so an empty bind mount on
# those paths still results in a writable volume. (Bind mounts to
# an EXISTING host dir keep host ownership and are independent of
# this — the panel sets up its own dirs with mode 750 owner 999.)
chown whp-import:whp-import /host/quarantine /host/sanitized ; \
# Strip dnf cache.
dnf -y clean all ; \
rm -rf /var/cache/dnf /var/cache/yum /var/cache/ldconfig/* \
/usr/share/doc /usr/share/man /usr/share/info
# Pre-seed ClamAV signature databases at build time so the first
# container run isn't dependent on freshclam succeeding before the scan.
#
# We do two passes:
# 1. freshclam (mainline ClamAV signatures: main.cvd, daily.cvd, bytecode.cvd).
# 2. rsync the SaneSecurity Foxhole.PHP DB — PHP-malware-focused, this
# is the high-value addition for our use case. Junkemailfilter rules
# are deliberately skipped (we don't scan email here).
#
# Both runs are wrapped in `|| true` so a transient network failure
# during build does not break the image build; the container also runs
# `freshclam` on start so a stale baseline gets refreshed at runtime.
COPY configs/freshclam.conf /etc/freshclam.conf
COPY configs/sanesecurity-mirror.txt /opt/whp/sanesecurity-mirror.txt
# Pre-seed signatures as root, then chown the result. We don't ship the
# privilege-switching tools (runuser/su are in util-linux full, ~2MB we
# don't need at runtime) — the worker only needs to READ /var/lib/clamav
# and the runtime freshclam refresh runs as the same UID 999 anyway, so
# ownership matters there.
RUN set -eux; \
chown whp-import:whp-import /etc/freshclam.conf ; \
# Mainline ClamAV DB pull at build time so we have something to scan
# against even if the runtime freshclam refresh fails (e.g., no net).
# freshclam has a compile-time default --user=clamupdate (UID 997)
# and tries to setuid() to it; the build-time dir is whp-import-owned
# so we tell it explicitly to stay as root for this one-shot pull.
freshclam --no-warnings --user=root || \
echo "WARN: freshclam failed during build; runtime refresh will retry" ; \
# SaneSecurity Foxhole.PHP rules. The project rotates mirrors; the
# file we COPYed lists the working rsync mirror used at build time.
SANE_MIRROR="$(cat /opt/whp/sanesecurity-mirror.txt)" ; \
rsync -av --no-motd --contimeout=30 \
--include='foxhole_filename.cdb' \
--include='foxhole_filename.cdb.sig' \
--include='foxhole_generic.cdb' \
--include='foxhole_generic.cdb.sig' \
--include='foxhole_js.cdb' \
--include='foxhole_js.cdb.sig' \
--include='foxhole_js.ndb' \
--include='foxhole_js.ndb.sig' \
--include='foxhole_mail.cdb' \
--include='foxhole_mail.cdb.sig' \
--include='foxhole_all.ndb' \
--include='foxhole_all.ndb.sig' \
--exclude='*' \
"rsync://${SANE_MIRROR}/sanesecurity/" /var/lib/clamav/ \
|| echo "WARN: SaneSecurity rsync failed during build; runtime freshclam will retry" ; \
chown -R whp-import:whp-import /var/lib/clamav ; \
chmod -R u=rwX,g=rX,o= /var/lib/clamav ; \
ls -la /var/lib/clamav/
COPY --chown=whp-import:whp-import scripts/ /scripts/
RUN chmod 0755 /scripts/entrypoint.sh /scripts/extract.sh \
/scripts/scan-files.php /scripts/scan-dbs.php
WORKDIR /opt/whp
USER whp-import
# stdin is closed — the container reads its inputs from env + bind mounts.
ENTRYPOINT ["/scripts/entrypoint.sh"]