Fix wedged-container outage: TCP healthcheck + tini-managed PID 1

A 10s postgresql restart took down transcribe.shadowdao.com-01 for ~17h
because pm2 gave up after 5 fast retries, the entrypoint's trailing
tail -f kept PID 1 alive, and the healthcheck (wget --spider on nginx
port 80) succeeded on the 301-to-https redirect regardless of whether
Node was alive.

Three coordinated fixes to the cnoc image:

- HEALTHCHECK: replace the redirect-passing wget probe with TCP-level
  checks on 127.0.0.1:3000 (Node) and :80 (nginx). Tenant-agnostic, no
  /ping dependency — catches the exact incident scenario (port 3000
  closed when pm2 exits).
- entrypoint.sh: exec pm2 via tini so it becomes PID 1. When pm2
  exhausts max_restarts and exits, the container exits and the
  unless-stopped restart policy brings it back. Logs are tailed in the
  background with -F (logrotate-safe).
- Dockerfile: install tini from EPEL for proper signal forwarding and
  zombie reaping of nginx/crond children that reparent to PID 1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-05 06:59:52 -07:00
parent 821db4873e
commit b431a66a7b
2 changed files with 17 additions and 18 deletions

View File

@@ -5,7 +5,7 @@ ARG NODEVER=20
RUN dnf install -y \
https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm && \
dnf update -y && \
dnf install -y wget procps cronie iproute nginx openssl git microdnf make gcc gcc-c++ && \
dnf install -y wget procps cronie iproute nginx openssl git microdnf make gcc gcc-c++ tini && \
dnf group install -y 'Development Tools' && \
dnf clean all && \
rm -rf /var/cache/dnf /usr/share/doc /usr/share/man /usr/share/locale/* \
@@ -36,6 +36,7 @@ COPY ./examples/ /examples/
RUN echo "15 */12 * * * root /scripts/log-rotate.sh" >> /etc/crontab
HEALTHCHECK --interval=30s --timeout=5s --start-period=60s --retries=3 \
CMD wget --spider -q http://localhost/ || exit 1
CMD bash -c ': </dev/tcp/127.0.0.1/3000 && : </dev/tcp/127.0.0.1/80' \
|| exit 1
ENTRYPOINT [ "/scripts/entrypoint.sh" ]