Files
cloud-node-container/Dockerfile
Josh Knapp b431a66a7b Fix wedged-container outage: TCP healthcheck + tini-managed PID 1
A 10s postgresql restart took down transcribe.shadowdao.com-01 for ~17h
because pm2 gave up after 5 fast retries, the entrypoint's trailing
tail -f kept PID 1 alive, and the healthcheck (wget --spider on nginx
port 80) succeeded on the 301-to-https redirect regardless of whether
Node was alive.

Three coordinated fixes to the cnoc image:

- HEALTHCHECK: replace the redirect-passing wget probe with TCP-level
  checks on 127.0.0.1:3000 (Node) and :80 (nginx). Tenant-agnostic, no
  /ping dependency — catches the exact incident scenario (port 3000
  closed when pm2 exits).
- entrypoint.sh: exec pm2 via tini so it becomes PID 1. When pm2
  exhausts max_restarts and exits, the container exits and the
  unless-stopped restart policy brings it back. Logs are tailed in the
  background with -F (logrotate-safe).
- Dockerfile: install tini from EPEL for proper signal forwarding and
  zombie reaping of nginx/crond children that reparent to PID 1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 06:59:52 -07:00

42 lines
1.6 KiB
Docker

FROM almalinux/9-base
ARG NODEVER=20
# Install repos, update, install only needed packages, clean up in one layer
RUN dnf install -y \
https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm && \
dnf update -y && \
dnf install -y wget procps cronie iproute nginx openssl git microdnf make gcc gcc-c++ tini && \
dnf group install -y 'Development Tools' && \
dnf clean all && \
rm -rf /var/cache/dnf /usr/share/doc /usr/share/man /usr/share/locale/* \
/var/cache/yum /tmp/* /var/tmp/*
# Copy scripts into the image and set permissions
COPY ./scripts/ /scripts/
RUN chmod +x /scripts/*
# Generate self-signed cert, create needed dirs, install Node.js, clean up
RUN openssl req -newkey rsa:2048 -nodes -keyout /etc/pki/tls/private/localhost.key -x509 -days 3650 -subj "/CN=localhost" -out /etc/pki/tls/certs/localhost.crt && \
mkdir -p /var/log/nodejs && \
/scripts/install-node$NODEVER.sh && \
rm -rf /tmp/*
# Install PM2 globally for process management with minimal footprint
RUN npm install -g pm2@latest --production && \
npm cache clean --force && \
rm -rf /tmp/*
# Copy nginx config
COPY ./configs/nginx.conf /etc/nginx/nginx.conf
# Copy examples directory for default app fallback
COPY ./examples/ /examples/
# Set up cron job for log rotation
RUN echo "15 */12 * * * root /scripts/log-rotate.sh" >> /etc/crontab
HEALTHCHECK --interval=30s --timeout=5s --start-period=60s --retries=3 \
CMD bash -c ': </dev/tcp/127.0.0.1/3000 && : </dev/tcp/127.0.0.1/80' \
|| exit 1
ENTRYPOINT [ "/scripts/entrypoint.sh" ]