Zero-Downtime Deploys with PM2 and GitHub Actions

Zero downtime sounds like magic. It is mostly discipline: reload instead of restart, health checks before trusting the new version, and enough instances that one going down does not matter. Here is the pipeline I use.

PM2 reload vs restart

This is the single most important thing to know. pm2 restart kills and replaces the process. pm2 reload rolls instances one at a time. With a clustered Node app, reload means at least one worker is always accepting requests.

pm2 reload ecosystem.config.js --update-env

The --update-env flag is easy to miss and important. Without it, changes to environment variables do not take effect until the next hard restart, and you will spend 20 minutes debugging why your new env var is not being read.

The ecosystem config

module.exports = {
  apps: [{
    name: 'api',
    script: './dist/main.js',
    instances: 'max',
    exec_mode: 'cluster',
    max_memory_restart: '500M',
    env: { NODE_ENV: 'production' },
  }],
};

instances: 'max' spawns one worker per CPU core. exec_mode: 'cluster' enables load balancing across workers and — critically — enables pm2 reload.

GitHub Actions workflow

name: Deploy
on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: appleboy/ssh-action@v1
        with:
          host: ${{ secrets.DEPLOY_HOST }}
          username: ${{ secrets.DEPLOY_USER }}
          key: ${{ secrets.DEPLOY_KEY }}
          script: |
            cd /var/www/api
            git pull origin main
            pnpm install --frozen-lockfile
            pnpm build
            pm2 reload ecosystem.config.js --update-env

Health checks, not hope

PM2 thinks an instance is healthy when the Node process is up. That is not the same as accepting traffic successfully. I added a post-reload curl loop that polls /health for 30 seconds before the workflow reports success. If the new code has a startup bug, the deploy fails instead of silently hanging.

The part I still do not love

SSH-based deploys are fragile. If I were starting today, I would containerize and push to a registry, then the target pulls and restarts. PM2 reload stays either way.