Skip to content

Incidents


2️⃣ FIRST RESPONSE CHECKLIST (WHEN SOMETHING BREAKS)

Step 1: Is it really down?

  • Check UptimeRobot
  • Check Cloudflare dashboard (DNS / proxy status)

Step 2: SSH into the server (via Tailscale)

ssh rahim@server-a
# or
ssh rahim@server-b

Step 3: Quick health check

uptime
free -h
sudo monit status

If SSH fails → go to Section 7 (Emergency).


3️⃣ COMMON PROBLEMS & FIXES

❌ Website down (WordPress)

sudo systemctl restart nginx
sudo systemctl restart php8.3-fpm

❌ Website down (OJS)

sudo systemctl restart apache2
sudo systemctl restart mariadb

⚠️ 502 / 522 errors

  • Check memory:
free -h
  • Restart web + PHP:
sudo monit status

(Monit may already have fixed it.)


⚠️ High memory / swap usage

ps aux --sort=-%mem | head

If services are responsive → do nothing. Swap usage is expected on small servers.


4️⃣ MONITORING & AUTO-HEALING

Monit

  • Automatically restarts:

  • Nginx / Apache

  • PHP-FPM
  • MariaDB
  • Check status:
sudo monit status

systemd watchdog

  • Reboots server only if systemd/kernel hangs
  • No action needed unless troubleshooting a reboot

Verify watchdog:

systemctl show systemd | grep Watchdog