feat: Implement dynamic Gatekeeper proxy and enhance service health monitoring

- **Implemented Dynamic Gatekeeper (Anubis) Proxy:**
  - Introduced Anubis as a Gatekeeper proxy layer for services (`web`, `web-staging`, `feedback`, `health`).
  - Added `docker-gen` setup (`docker-compose_gatekeeper.template.yml`, `gatekeeper-manager`) to dynamically configure Anubis instances based on container labels (`enable_gatekeeper=true`).
  - Updated HAProxy to route traffic through the respective Gatekeeper services.

- **Enhanced Service Health Monitoring & Checks:**
  - Integrated `django-health-check` into the Django application, providing detailed health endpoints (e.g., `/health/`).
  - Replaced the custom health check view with `django-health-check` URLs.
  - Added `psutil` for system metrics in health checks.
  - Made Gatus configuration dynamic using `docker-gen` (`config.template.yaml`), allowing automatic discovery and monitoring of service instances (e.g., web workers).
  - Externalized Gatus SMTP credentials to environment variables.
  - Strengthened `docker-compose_core.yml` with a combined `db-redis-healthcheck` service reporting to Gatus.
  - Added explicit health checks for `db` and `redis` services in `docker-compose.yml`.

- **Improved Docker & Compose Configuration:**
  - Added `depends_on` conditions in `docker-compose.yml` for `web` and `celery` services to wait for the database.
  - Updated `ALLOWED_HOSTS` in `docker-compose_staging.yml` and `docker-compose_web.yml` to include internal container names for Gatekeeper communication.
  - Set `DEBUG=False` for staging services.
  - Removed `.env.production` from `.gitignore` (standardized to `.env`).
  - Streamlined `scripts/entrypoint.sh` by removing the call to the no-longer-present `/deploy.sh`.

- **Dependency Updates:**
  - Added `django-health-check>=3.18.3` and `psutil>=7.0.0` to `pyproject.toml` and `uv.lock`.
  - Updated `settings.py` to include `health_check` apps, configuration, and use `REDIS_URL` consistently.

- **Streamlined deployment script used in GHA:**
  - Updated the workflow to copy new server files and create a new `.env` file in the temporary directory before moving them into place.
  - Consolidated the stopping and removal of old containers into a single step for better clarity and efficiency.
  - Reduce container downtime by rearranging stop/start steps.
This commit is contained in:
badblocks 2025-05-22 19:21:58 -07:00
parent f530790f6c
commit 6aa15d1af9
No known key found for this signature in database
16 changed files with 487 additions and 162 deletions

View file

@ -241,36 +241,59 @@ jobs:
echo "🧹 Remove the docker image artifact"
rm "${{ runner.temp }}/${{ steps.meta.outputs.REPO_NAME_ONLY }}-${{ github.ref_name }}_${{ github.sha }}.tar"
echo "🛑 Stop and remove containers before updating compose files"
#ssh deploy "cd ${{ steps.meta.outputs.REPO_PROJECT_PATH}} && docker compose -f docker-compose_core.yml down"
if [ "${PROD}" = true ]; then
ssh deploy "cd ${{ steps.meta.outputs.REPO_PROJECT_PATH}} && docker compose -f docker-compose_web.yml down"
else
ssh deploy "cd ${{ steps.meta.outputs.REPO_PROJECT_PATH}} && docker compose -f docker-compose_staging.yml down"
fi
echo "💾 Copy new files to server"
ssh deploy "mkdir -p ${{ steps.meta.outputs.REPO_PROJECT_PATH}}/new"
scp -pr ./server/* deploy:${{ steps.meta.outputs.REPO_PROJECT_PATH}}/new/
echo "💾 Copy files to server"
ssh deploy "mkdir -p ${{ steps.meta.outputs.REPO_PROJECT_PATH}}"
scp -pr ./server/* deploy:${{ steps.meta.outputs.REPO_PROJECT_PATH}}/
echo "📝 Create new .env file"
printf "%s" "${ENV_FILE_BASE64}" | base64 -d | ssh deploy "cat > ${{ steps.meta.outputs.REPO_PROJECT_PATH}}/new/.env && chmod 600 ${{ steps.meta.outputs.REPO_PROJECT_PATH}}/new/.env"
echo "📝 Create .env file"
printf "%s" "${ENV_FILE_BASE64}" | base64 -d | ssh deploy "cat > ${{ steps.meta.outputs.REPO_PROJECT_PATH}}/.env && chmod 600 ${{ steps.meta.outputs.REPO_PROJECT_PATH}}/.env"
echo "🔑 Set up certs"
ssh deploy "mkdir -p ${{ steps.meta.outputs.REPO_PROJECT_PATH}}/new/certs && chmod 550 ${{ steps.meta.outputs.REPO_PROJECT_PATH}}/new/certs && chown 99:root ${{ steps.meta.outputs.REPO_PROJECT_PATH}}/new/certs"
printf "%s" "$CF_PEM_CERT" | ssh deploy "cat > ${{ steps.meta.outputs.REPO_PROJECT_PATH}}/new/certs/crt.pem && chmod 440 ${{ steps.meta.outputs.REPO_PROJECT_PATH}}/new/certs/crt.pem && chown 99:root ${{ steps.meta.outputs.REPO_PROJECT_PATH}}/new/certs/crt.pem"
printf "%s" "$CF_PEM_CA" | ssh deploy "cat > ${{ steps.meta.outputs.REPO_PROJECT_PATH}}/new/certs/ca.pem && chmod 440 ${{ steps.meta.outputs.REPO_PROJECT_PATH}}/new/certs/ca.pem && chown 99:root ${{ steps.meta.outputs.REPO_PROJECT_PATH}}/new/certs/ca.pem"
echo "🔑 Set up certificates"
ssh deploy "mkdir -p ${{ steps.meta.outputs.REPO_PROJECT_PATH}}/certs && chmod 550 ${{ steps.meta.outputs.REPO_PROJECT_PATH}}/certs && chown 99:root ${{ steps.meta.outputs.REPO_PROJECT_PATH}}/certs"
printf "%s" "$CF_PEM_CERT" | ssh deploy "cat > ${{ steps.meta.outputs.REPO_PROJECT_PATH}}/certs/crt.pem && chmod 440 ${{ steps.meta.outputs.REPO_PROJECT_PATH}}/certs/crt.pem && chown 99:root ${{ steps.meta.outputs.REPO_PROJECT_PATH}}/certs/crt.pem"
printf "%s" "$CF_PEM_CA" | ssh deploy "cat > ${{ steps.meta.outputs.REPO_PROJECT_PATH}}/certs/ca.pem && chmod 440 ${{ steps.meta.outputs.REPO_PROJECT_PATH}}/certs/ca.pem && chown 99:root ${{ steps.meta.outputs.REPO_PROJECT_PATH}}/certs/ca.pem"
ssh -T deploy <<EOF
set -eu -o pipefail
cd ${{ steps.meta.outputs.REPO_PROJECT_PATH}}
if [[ -f "docker-compose_web.yml" || -f "docker-compose_staging.yml" ]]; then
# if we have an existing deployment
echo "🛑 Stop and remove old containers"
docker compose -f docker-compose_web.yml down
if [ "${PROD}" = false ]; then
docker compose -f docker-compose_staging.yml down
fi
if [ -f "docker-compose_core.yml" ] && ! diff -q docker-compose_core.yml new/docker-compose_core.yml; then
echo "⚠️ docker-compose_core.yml has changed, stopping and removing old core containers"
docker compose -f docker-compose_core.yml down
else
echo "⚠️ No changes to docker-compose_core.yml, but reloading due to volume mounts being tied to old directories"
docker compose -f docker-compose_core.yml down
fi
echo "🔄 Backup old files (exclude new and backup directories)"
mkdir -p ${{ steps.meta.outputs.REPO_PROJECT_PATH}}/new/backup
find ${{ steps.meta.outputs.REPO_PROJECT_PATH }} -mindepth 1 -maxdepth 1 -path ${{ steps.meta.outputs.REPO_PROJECT_PATH }}/new -prune -o -path ${{ steps.meta.outputs.REPO_PROJECT_PATH }}/backup -prune -o -exec mv '{}' ${{ steps.meta.outputs.REPO_PROJECT_PATH }}/new/backup/ ';'
fi
EOF
echo "🔄 Move all new files into place"
ssh deploy "cd / && mv ${{ steps.meta.outputs.REPO_PROJECT_PATH}} /tmp/"
ssh deploy "cd / && mv /tmp/${{ steps.meta.outputs.REPO_NAME_ONLY}}/new ${{ steps.meta.outputs.REPO_PROJECT_PATH}}"
echo "🔄 Remove old files/directories if they exist"
ssh deploy "rm -rf /tmp/${{ steps.meta.outputs.REPO_NAME_ONLY}} || true"
echo "🚀 Start the new containers"
if [ "${PROD}" = true ]; then
ssh deploy "cd ${{ steps.meta.outputs.REPO_PROJECT_PATH }} && docker compose -f docker-compose_core.yml -f docker-compose_web.yml up -d --no-build"
else
ssh deploy "cd ${{ steps.meta.outputs.REPO_PROJECT_PATH }} && docker compose -f docker-compose_core.yml -f docker-compose_staging.yml up -d --no-build"
ssh deploy "cd ${{ steps.meta.outputs.REPO_PROJECT_PATH }} && docker compose -f docker-compose_core.yml -f docker-compose_web.yml -f docker-compose_staging.yml up -d --no-build"
fi
# echo "🚀 Start the new containers, zero-downtime"
# if [ "${PROD}" = true ]; then
# ssh deploy <<<END
# ssh deploy <<EOF
# cd ${{ steps.meta.outputs.REPO_PROJECT_PATH}}
# old_container_id=$(docker compose -f docker-compose_web.yml ps -f name=web -q | tail -n1)
# docker compose -f docker-compose_web.yml up -d --no-build --no-recreate
@ -283,9 +306,9 @@ jobs:
# docker stop $old_container_id
# docker rm $old_container_id
# #docker compose -f docker-compose_core.yml kill -s SIGUSR2 loba
# END
# EOF
# else
# ssh deploy <<<END
# ssh deploy <<EOF
# cd ${{ steps.meta.outputs.REPO_PROJECT_PATH}}
# old_container_id=$(docker compose -f docker-compose_staging.yml ps -f name=web-staging -q | tail -n1)
# docker compose -f docker-compose_staging.yml up -d --no-build --no-recreate
@ -298,7 +321,7 @@ jobs:
# docker stop $old_container_id
# docker rm $old_container_id
# #docker compose -f docker-compose_core.yml kill -s SIGUSR2 loba
# END
# EOF
# fi
echo "🧹 Prune all unused images"