A 10s postgresql restart took down transcribe.shadowdao.com-01 for ~17h because pm2 gave up after 5 fast retries, the entrypoint's trailing tail -f kept PID 1 alive, and the healthcheck (wget --spider on nginx port 80) succeeded on the 301-to-https redirect regardless of whether Node was alive. Three coordinated fixes to the cnoc image: - HEALTHCHECK: replace the redirect-passing wget probe with TCP-level checks on 127.0.0.1:3000 (Node) and :80 (nginx). Tenant-agnostic, no /ping dependency — catches the exact incident scenario (port 3000 closed when pm2 exits). - entrypoint.sh: exec pm2 via tini so it becomes PID 1. When pm2 exhausts max_restarts and exits, the container exits and the unless-stopped restart policy brings it back. Logs are tailed in the background with -F (logrotate-safe). - Dockerfile: install tini from EPEL for proper signal forwarding and zombie reaping of nginx/crond children that reparent to PID 1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
42 lines
1.6 KiB
Docker
42 lines
1.6 KiB
Docker
FROM almalinux/9-base
|
|
ARG NODEVER=20
|
|
|
|
# Install repos, update, install only needed packages, clean up in one layer
|
|
RUN dnf install -y \
|
|
https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm && \
|
|
dnf update -y && \
|
|
dnf install -y wget procps cronie iproute nginx openssl git microdnf make gcc gcc-c++ tini && \
|
|
dnf group install -y 'Development Tools' && \
|
|
dnf clean all && \
|
|
rm -rf /var/cache/dnf /usr/share/doc /usr/share/man /usr/share/locale/* \
|
|
/var/cache/yum /tmp/* /var/tmp/*
|
|
|
|
# Copy scripts into the image and set permissions
|
|
COPY ./scripts/ /scripts/
|
|
RUN chmod +x /scripts/*
|
|
|
|
# Generate self-signed cert, create needed dirs, install Node.js, clean up
|
|
RUN openssl req -newkey rsa:2048 -nodes -keyout /etc/pki/tls/private/localhost.key -x509 -days 3650 -subj "/CN=localhost" -out /etc/pki/tls/certs/localhost.crt && \
|
|
mkdir -p /var/log/nodejs && \
|
|
/scripts/install-node$NODEVER.sh && \
|
|
rm -rf /tmp/*
|
|
|
|
# Install PM2 globally for process management with minimal footprint
|
|
RUN npm install -g pm2@latest --production && \
|
|
npm cache clean --force && \
|
|
rm -rf /tmp/*
|
|
|
|
# Copy nginx config
|
|
COPY ./configs/nginx.conf /etc/nginx/nginx.conf
|
|
|
|
# Copy examples directory for default app fallback
|
|
COPY ./examples/ /examples/
|
|
|
|
# Set up cron job for log rotation
|
|
RUN echo "15 */12 * * * root /scripts/log-rotate.sh" >> /etc/crontab
|
|
|
|
HEALTHCHECK --interval=30s --timeout=5s --start-period=60s --retries=3 \
|
|
CMD bash -c ': </dev/tcp/127.0.0.1/3000 && : </dev/tcp/127.0.0.1/80' \
|
|
|| exit 1
|
|
|
|
ENTRYPOINT [ "/scripts/entrypoint.sh" ] |