fix: move EXTRACT_DIR + WORK_DIR off tmpfs onto disk-backed bind mount
All checks were successful
cpanel-importer Build and Push / Build-and-Push (push) Successful in 56s
All checks were successful
cpanel-importer Build and Push / Build-and-Push (push) Successful in 56s
rc=137 OOM kill triaged on whp02 darkside import. dmesg confirmed:
memory: usage 2097100kB, limit 2097152kB, failcnt 132
oom_kill_process ... task=bash uid=999
Root cause: extract.sh untars the cpmove into EXTRACT_DIR which was
/tmp/extract — a tmpfs mount (RAM-backed). The container's
--memory 2g cgroup ceiling counts tmpfs writes against RSS, so the
3 GB cpmove decompressing into tmpfs hit the limit at ~7s into tar
and the kernel killed the bash process running extract.sh.
Fix is structural, not a memory bump: the disk-backed bind mount
at /host/sanitized (mapped to /var/lib/whp/cpanel-importer-extract
on host) has effectively unlimited capacity and doesn't count against
the cgroup memory limit. Moving the working dirs there sidesteps the
OOM class entirely.
Layout change:
EXTRACT_DIR /tmp/extract -> $SANITIZED_DIR/extract-work
WORK_DIR /tmp/sanitized -> $SANITIZED_DIR/work
Two ripple changes:
- The old rsync_out stage cross-filesystem-copied ~10 GB from tmpfs
to /host/sanitized/<id>/extracted. That's now a same-filesystem
`mv` (constant-time rename) since extract-work IS already inside
/host/sanitized/<id>/. Stage renamed to finalize_layout for
clarity; pre-existing wipe of extracted/ + mysql/ guards against
partial-run residue.
- The stripped-symlinks actions sidecar moved to /tmp explicitly
(entrypoint.sh passes the 4th arg to extract.sh) so finalize's
rename doesn't (a) carry a dotfile into the cleaned tree the
panel imports and (b) move it out from under write_report's read.
Also fixes the unrelated-but-cosmetic freshclam warning by cd'ing to
/var/lib/clamav (the configured DatabaseDirectory, tmpfs writable)
before invoking freshclam in a subshell. The "Can't create
freshclam.dat in /opt/whp" errors were because /opt/whp is the
container WORKDIR which lives on the read-only rootfs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -85,9 +85,22 @@ SANITIZED_DIR="/host/sanitized/$IMPORT_ID"
|
||||
mkdir -p "$QUARANTINE_DIR" "$SANITIZED_DIR" \
|
||||
|| die "cannot create quarantine/sanitized output dirs (are the bind mounts RW?)"
|
||||
|
||||
# Container-internal scratch space (mounted as tmpfs by the panel).
|
||||
EXTRACT_DIR="/tmp/extract"
|
||||
WORK_DIR="/tmp/sanitized"
|
||||
# Working scratch lives inside the disk-backed bind mount, NOT under /tmp.
|
||||
# /tmp is mounted as tmpfs (RAM-backed) by the panel for fast small-file
|
||||
# scratch (per-stage reports, exclude lists). Putting the multi-GB cpmove
|
||||
# extract there blew the container's --memory 2g cgroup ceiling (tmpfs
|
||||
# writes count against cgroup RSS), surfaced as rc=137 OOM kills mid-tar.
|
||||
#
|
||||
# Layout:
|
||||
# EXTRACT_DIR $SANITIZED_DIR/extract-work — tar untars here. After
|
||||
# scan-files quarantines bad files, this is the cleaned
|
||||
# tree. Renamed to $SANITIZED_DIR/extracted at the end of
|
||||
# the run so the panel can find it at the expected path.
|
||||
# WORK_DIR $SANITIZED_DIR/work — scan-dbs writes cleaned
|
||||
# SQL dumps here; folded into $SANITIZED_DIR/mysql at the
|
||||
# end of the run.
|
||||
EXTRACT_DIR="$SANITIZED_DIR/extract-work"
|
||||
WORK_DIR="$SANITIZED_DIR/work"
|
||||
mkdir -p "$EXTRACT_DIR" "$WORK_DIR/mysql"
|
||||
|
||||
# --- refresh ClamAV signatures --------------------------------------------
|
||||
@@ -95,9 +108,15 @@ mkdir -p "$EXTRACT_DIR" "$WORK_DIR/mysql"
|
||||
STAGE="freshclam"
|
||||
if [[ "$CLAMAV_REFRESH" == "true" ]]; then
|
||||
log "refreshing ClamAV signatures (freshclam)"
|
||||
# freshclam writes freshclam.dat to its CWD; the container's WORKDIR
|
||||
# is /opt/whp which lives on the read-only rootfs, so freshclam errors
|
||||
# with "Can't create freshclam.dat in /opt/whp" before it ever reaches
|
||||
# the database directory. Subshell + cd to the tmpfs at /var/lib/clamav
|
||||
# (the DatabaseDirectory configured in /etc/freshclam.conf) keeps the
|
||||
# entrypoint's CWD intact for later stages.
|
||||
# freshclam is allowed to fail (e.g., container has no outbound net);
|
||||
# we proceed with the baseline rules from build time + log a warning.
|
||||
if ! freshclam --no-warnings >/tmp/freshclam.log 2>&1; then
|
||||
if ! ( cd /var/lib/clamav && freshclam --no-warnings >/tmp/freshclam.log 2>&1 ); then
|
||||
log "WARN: freshclam failed; proceeding with build-time signature DB"
|
||||
tail -20 /tmp/freshclam.log || true
|
||||
fi
|
||||
@@ -109,7 +128,11 @@ fi
|
||||
|
||||
STAGE="extract"
|
||||
log "stage: extract"
|
||||
if ! /scripts/extract.sh "$IMPORT_BACKUP_FILE" "$EXTRACT_DIR" "$IMPORT_USERNAME"; then
|
||||
# 4th arg pins the stripped-symlinks actions sidecar to /tmp (not inside
|
||||
# $EXTRACT_DIR) so finalize_layout's mv doesn't carry an importer dotfile
|
||||
# into the cleaned tree and so write_report can read it after the rename.
|
||||
STRIPPED_SYMLINKS_FILE="/tmp/stripped-symlinks.json"
|
||||
if ! /scripts/extract.sh "$IMPORT_BACKUP_FILE" "$EXTRACT_DIR" "$IMPORT_USERNAME" "$STRIPPED_SYMLINKS_FILE"; then
|
||||
die "extract.sh failed; see stderr above"
|
||||
fi
|
||||
|
||||
@@ -137,29 +160,31 @@ php /scripts/scan-dbs.php \
|
||||
--username "$IMPORT_USERNAME" \
|
||||
|| die "scan-dbs.php failed; see stderr above"
|
||||
|
||||
# --- rsync cleaned tree to /host/sanitized --------------------------------
|
||||
# --- finalize cleaned tree into /host/sanitized/<id>/ ---------------------
|
||||
|
||||
STAGE="rsync_out"
|
||||
log "stage: rsync_out"
|
||||
# Copy the (now-cleaned) extracted tree to the sanitized output. We exclude
|
||||
# files that scan-files.php quarantined — they are NOT present in the
|
||||
# extract dir anymore (the scanner moved them), so this is the cleaned
|
||||
# tree by construction.
|
||||
rsync -a --no-owner --no-group --no-perms --chmod=Du=rwx,Dg=rx,Do=,Fu=rw,Fg=r,Fo= \
|
||||
"$EXTRACT_DIR"/ "$SANITIZED_DIR/extracted/" \
|
||||
|| die "rsync to sanitized dir failed"
|
||||
|
||||
# Then drop the cleaned .sql files in place too.
|
||||
rsync -a --no-owner --no-group --no-perms --chmod=Du=rwx,Dg=rx,Do=,Fu=rw,Fg=r,Fo= \
|
||||
"$WORK_DIR/mysql"/ "$SANITIZED_DIR/mysql/" \
|
||||
|| die "rsync of cleaned .sql files failed"
|
||||
STAGE="finalize_layout"
|
||||
log "stage: finalize_layout"
|
||||
# Both EXTRACT_DIR and WORK_DIR already live INSIDE $SANITIZED_DIR (the
|
||||
# bind-mounted disk-backed output root), so we don't need to cross-filesystem
|
||||
# rsync 10GB+ of cleaned files. A same-filesystem `mv` is constant-time
|
||||
# (just a rename) — turns what used to be a multi-minute rsync into a
|
||||
# fraction of a second.
|
||||
#
|
||||
# Cleanup posture: if a previous run partially populated `extracted/` or
|
||||
# `mysql/`, we wipe them first so the rename can't fail with EEXIST. The
|
||||
# container's --read-only rootfs makes accidentally removing the wrong
|
||||
# path impossible — these are under the per-import bind mount only.
|
||||
rm -rf "$SANITIZED_DIR/extracted" "$SANITIZED_DIR/mysql"
|
||||
mv "$EXTRACT_DIR" "$SANITIZED_DIR/extracted" || die "finalize: rename extract-work failed"
|
||||
mv "$WORK_DIR/mysql" "$SANITIZED_DIR/mysql" || die "finalize: rename work/mysql failed"
|
||||
# Tidy up the now-empty WORK_DIR shell.
|
||||
rmdir "$WORK_DIR" 2>/dev/null || true
|
||||
|
||||
# --- merge per-stage reports into the final report.json -------------------
|
||||
|
||||
STAGE="write_report"
|
||||
log "stage: write_report"
|
||||
DURATION=$(( $(date -u +%s) - START_TS ))
|
||||
STRIPPED_SYMLINKS_FILE="$EXTRACT_DIR/.cpanel-importer-stripped-symlinks.json"
|
||||
php -r '
|
||||
$importId = $argv[1];
|
||||
$duration = (int) $argv[2];
|
||||
|
||||
Reference in New Issue
Block a user