Two related fixes for the issues the AI Monitor surfaced on whp01 on
2026-05-12 (haproxy-manager going "healthy but stalled" after long
uptime, and noise from POST /blocked-ip returning 405):
1. Production WSGI server. The Flask app was running on werkzeug's
built-in dev server (the one that prints "WARNING: This is a
development server" on every startup). werkzeug is single-threaded
and accumulates worker state over long uptimes; after ~24h on whp01
the health endpoint stops responding while the container still
reports "healthy" because Docker's HEALTHCHECK uses an HTTP probe
from inside the same werkzeug process that's stalled.
Replace with gunicorn (gthread worker class, --max-requests=1000
with jitter so workers recycle periodically). Two gunicorn instances,
one per Flask app — port 8000 for the management API, port 8080 for
the default/blocked-ip page server. Both lift their app objects from
the haproxy_manager module so gunicorn can import them.
Required structural change: default_app was created INSIDE the
__name__ == '__main__' block at module bottom, where gunicorn could
never reach it. Moved to module level. The __main__ block now stays
only for `python haproxy_manager.py` local-dev workflow.
Container init (init_db, certbot register, generate_config,
start_haproxy) extracted into a do_initial_setup() function called
from a new scripts/init.py. start-up.sh runs init.py to completion
before either gunicorn binds, which keeps HAProxy startup off the
WSGI workers' fork paths (no race between workers all trying to
start_haproxy() at once).
2. /blocked-ip and / accept ALL methods. HAProxy proxies blocked-IP
traffic to default_app preserving the original verb, so a blocked
POST request used to hit Flask's GET-only route and get a 405 +
the AI Monitor flagged the noise. Adding the full method list lets
the 403 page render regardless of verb.
Gunicorn settings tunable via env (workers, timeout, max-requests).
API gets --timeout 120 because ACME cert issuance can be slow; the
default page server stays on the gunicorn default 30s.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Volume-mounted /etc/haproxy can shadow the image-baked
trusted_ips.list/trusted_ips.map, causing HAProxy to fail
config validation with "failed to open pattern file" on
non-WHP deployments. Touch empty files if they don't exist
so the ACLs always parse.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The renewal script was exiting immediately when certbot returned a
non-zero exit code, which happens when ANY cert fails to renew. A
single dead domain (e.g., DNS no longer pointed here) would block
ALL other certificates from being processed and combined for HAProxy.
Now logs the failures but continues to copy/combine successfully
renewed certificates and reload HAProxy.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Support wildcard domains (*.domain.tld) in HAProxy config generation
with exact-match ACLs prioritized over wildcard ACLs. Add DNS-01
challenge endpoints that coordinate with certbot via auth/cleanup
hook scripts for wildcard SSL certificate issuance.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Improved certificate renewal and sync scripts to be more resilient:
- Removed 'set -e' to prevent silent failures when individual domains error
- Scripts now continue processing remaining domains even if one fails
- Replaced database queries with direct filesystem scanning of /etc/letsencrypt/live/
- Uses 'find' command to discover all domains with Let's Encrypt certificates
- More reliable as it works even if database is out of sync
Benefits:
- No silent failures - errors are logged but don't stop the entire process
- Works independently of database state
- Simpler and more straightforward
- All domains with certificates get processed regardless of database
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Simplified all certificate renewal scripts to be more straightforward and reliable:
- Scripts now just run certbot renew and copy cert+key files to HAProxy format
- Removed overly complex retry logic and error handling
- Both in-container and host-side scripts work with cron scheduling
Added automatic certbot cleanup when domains are removed:
- When a domain is deleted via API, certbot certificate is also removed
- Prevents renewal errors for domains that no longer exist in HAProxy
- Cleans up both HAProxy combined cert and Let's Encrypt certificate
Script changes:
- renew-certificates.sh: Simplified to 87 lines (from 215)
- sync-certificates.sh: Simplified to 79 lines (from 200+)
- host-renew-certificates.sh: Simplified to 36 lines (from 40)
- All scripts use same pattern: query DB, copy certs, reload HAProxy
Python changes:
- remove_domain() now calls 'certbot delete' to remove certificates
- Prevents orphaned certificates from causing renewal failures
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Write combined certificates to temporary file first
- Verify file is not empty before moving to final location
- Use atomic mv operation to prevent HAProxy from reading partial files
- Add proper cleanup of temporary files on all error paths
- Matches robust patterns from haproxy_manager.py
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
After certbot renews certificates, the separate fullchain.pem and privkey.pem
files must be combined into a single .pem file for HAProxy. The renewal script
was missing this critical step, causing HAProxy to continue using old certificates.
Changes:
- Add update_combined_certificates() function to renew-certificates.sh
- Query database for all SSL-enabled domains
- Combine Let's Encrypt cert + key files using cat (matches haproxy_manager.py pattern)
- Always update combined certs after renewal, even if certbot says no renewal needed
- Add new sync-certificates.sh script for syncing all existing certificates
- Smart update detection in sync script (only updates when source is newer)
This ensures HAProxy always gets properly formatted certificate files after renewal.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Fixed crontab permissions (600) and ownership for proper cron execution
- Added PATH environment variable to crontab to prevent command not found issues
- Created dedicated renewal script with comprehensive logging and error handling
- Added retry logic (3 attempts) for HAProxy reload with socket health checks
- Implemented host-side renewal script for external cron scheduling via docker exec
- Added crontab configuration examples for various renewal schedules
- Updated README with detailed certificate renewal documentation
This resolves issues where the cron job would not run or hang during execution.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Remove semicolons from variable initialization in AWK scripts
- Each variable now on separate line to prevent syntax errors
- Fixes "syntax error at or near ," in monitor-attacks.sh and manage-blocked-ips.sh
- Scripts now properly parse HAProxy 3.0.11 threat intelligence data
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Handle common missing files (favicon.ico, robots.txt) without counting as errors
- Return 404 directly from frontend for these files (bypasses backend counting)
- Add clear-ip.sh script to remove specific IPs from stick-table
- Keep trusted networks whitelist for local/private IPs
This prevents legitimate users from being blocked due to browser
requests for common files that don't exist.
Usage: ./scripts/clear-ip.sh <IP_ADDRESS>
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add blocked_ips database table to store blocked IP addresses
- Implement API endpoints for IP blocking management:
- GET /api/blocked-ips: List all blocked IPs
- POST /api/blocked-ips: Block an IP address
- DELETE /api/blocked-ips: Unblock an IP address
- Update HAProxy configuration generation to include blocked IP ACLs
- Create blocked IP page template for denied access
- Add comprehensive API documentation for WHP integration
- Include test script for IP blocking functionality
- Update .gitignore with Python patterns
- Add CLAUDE.md for codebase documentation
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>