feat: Implement dynamic Gatekeeper proxy and enhance service health monitoring

- **Implemented Dynamic Gatekeeper (Anubis) Proxy:**
  - Introduced Anubis as a Gatekeeper proxy layer for services (`web`, `web-staging`, `feedback`, `health`).
  - Added `docker-gen` setup (`docker-compose_gatekeeper.template.yml`, `gatekeeper-manager`) to dynamically configure Anubis instances based on container labels (`enable_gatekeeper=true`).
  - Updated HAProxy to route traffic through the respective Gatekeeper services.

- **Enhanced Service Health Monitoring & Checks:**
  - Integrated `django-health-check` into the Django application, providing detailed health endpoints (e.g., `/health/`).
  - Replaced the custom health check view with `django-health-check` URLs.
  - Added `psutil` for system metrics in health checks.
  - Made Gatus configuration dynamic using `docker-gen` (`config.template.yaml`), allowing automatic discovery and monitoring of service instances (e.g., web workers).
  - Externalized Gatus SMTP credentials to environment variables.
  - Strengthened `docker-compose_core.yml` with a combined `db-redis-healthcheck` service reporting to Gatus.
  - Added explicit health checks for `db` and `redis` services in `docker-compose.yml`.

- **Improved Docker & Compose Configuration:**
  - Added `depends_on` conditions in `docker-compose.yml` for `web` and `celery` services to wait for the database.
  - Updated `ALLOWED_HOSTS` in `docker-compose_staging.yml` and `docker-compose_web.yml` to include internal container names for Gatekeeper communication.
  - Set `DEBUG=False` for staging services.
  - Removed `.env.production` from `.gitignore` (standardized to `.env`).
  - Streamlined `scripts/entrypoint.sh` by removing the call to the no-longer-present `/deploy.sh`.

- **Dependency Updates:**
  - Added `django-health-check>=3.18.3` and `psutil>=7.0.0` to `pyproject.toml` and `uv.lock`.
  - Updated `settings.py` to include `health_check` apps, configuration, and use `REDIS_URL` consistently.

- **Streamlined deployment script used in GHA:**
  - Updated the workflow to copy new server files and create a new `.env` file in the temporary directory before moving them into place.
  - Consolidated the stopping and removal of old containers into a single step for better clarity and efficiency.
  - Reduce container downtime by rearranging stop/start steps.
This commit is contained in:
badblocks 2025-05-22 19:21:58 -07:00
parent f530790f6c
commit 6aa15d1af9
No known key found for this signature in database
16 changed files with 487 additions and 162 deletions

View file

@ -2,11 +2,6 @@ x-common: &common
restart: always
env_file:
- .env
environment:
- DEBUG=False
- DISABLE_SIGNUPS=True
- PUBLIC_HOST=pkmntrade.club
- ALLOWED_HOSTS=pkmntrade.club,127.0.0.1
services:
web:
@ -15,6 +10,13 @@ services:
entrypoint: ["/ko-app/httpdebug", "--bind", ":8000"]
#image: badbl0cks/pkmntrade-club:stable
#command: ["granian", "--interface", "wsgi", "pkmntrade_club.django_project.wsgi:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "1", "--workers-kill-timeout", "180", "--access-log"]
environment:
- DEBUG=False
- DISABLE_SIGNUPS=True
- PUBLIC_HOST=pkmntrade.club
- ALLOWED_HOSTS=pkmntrade.club,127.0.0.1,pkmntrade-club-web-1,pkmntrade-club-web-2,pkmntrade-club-web-3,pkmntrade-club-web-4
labels:
- "enable_gatekeeper=true"
deploy:
mode: replicated
replicas: 4
@ -24,7 +26,12 @@ services:
# timeout: 10s
# retries: 3
# start_period: 30s
celery:
<<: *common
image: badbl0cks/pkmntrade-club:stable
command: ["celery", "-A", "pkmntrade_club.django_project", "worker", "-l", "INFO", "-B", "-E"]
# celery:
# <<: *common
# image: badbl0cks/pkmntrade-club:stable
# environment:
# - DEBUG=False
# - DISABLE_SIGNUPS=True
# - PUBLIC_HOST=pkmntrade.club
# - ALLOWED_HOSTS=pkmntrade.club,127.0.0.1,pkmntrade-club-celery-1,pkmntrade-club-celery-2
# command: ["celery", "-A", "pkmntrade_club.django_project", "worker", "-l", "INFO", "-B", "-E"]