Incidents
2️⃣ FIRST RESPONSE CHECKLIST (WHEN SOMETHING BREAKS)¶
Step 1: Is it really down?¶
- Check UptimeRobot
- Check Cloudflare dashboard (DNS / proxy status)
Step 2: SSH into the server (via Tailscale)¶
ssh rahim@server-a
# or
ssh rahim@server-b
Step 3: Quick health check¶
uptime
free -h
sudo monit status
If SSH fails → go to Section 7 (Emergency).
3️⃣ COMMON PROBLEMS & FIXES¶
❌ Website down (WordPress)¶
sudo systemctl restart nginx
sudo systemctl restart php8.3-fpm
❌ Website down (OJS)¶
sudo systemctl restart apache2
sudo systemctl restart mariadb
⚠️ 502 / 522 errors¶
- Check memory:
free -h
- Restart web + PHP:
sudo monit status
(Monit may already have fixed it.)
⚠️ High memory / swap usage¶
ps aux --sort=-%mem | head
If services are responsive → do nothing. Swap usage is expected on small servers.
4️⃣ MONITORING & AUTO-HEALING¶
Monit¶
-
Automatically restarts:
-
Nginx / Apache
- PHP-FPM
- MariaDB
- Check status:
sudo monit status
systemd watchdog¶
- Reboots server only if systemd/kernel hangs
- No action needed unless troubleshooting a reboot
Verify watchdog:
systemctl show systemd | grep Watchdog