The Problem with Traditional Alerts
If you run a homelab, you probably have some kind of uptime monitoring. Maybe it's Uptime Kuma, maybe it's Prometheus with Alertmanager, or maybe you just notice when things break. The typical alert looks something like: "Service X is DOWN" - and then you're on your own to figure out why.
I wanted something smarter. When my Jellyfin server goes down, I don't just want to know it's down - I want some immediate ideas about what might be wrong. Is it a Docker issue? Did the disk fill up? Is the host unreachable?
So I built a system that automatically queries a local LLM when services go down and sends the AI's troubleshooting suggestions right to my Slack channel alongside the alert.
The Architecture
The system has four components:
- Uptime Kuma - Monitors services and detects outages
- n8n - Workflow automation that orchestrates everything
- Ollama - Runs local LLMs for the AI analysis
- Slack - Where the alerts and suggestions land
The flow is simple: Uptime Kuma detects a service is down and fires a webhook to n8n. The n8n workflow extracts the service details, builds a prompt, sends it to Ollama, and then posts both the alert and the AI response to Slack.
Setting Up Uptime Kuma
If you don't have Uptime Kuma running yet, it's one of the easiest self-hosted monitoring tools to set up. Here's a basic Docker Compose:
services:
uptime-kuma:
image: louislam/uptime-kuma:1
container_name: uptime-kuma
volumes:
- ./data:/app/data
ports:
- "3001:3001"
restart: unless-stopped
Add monitors for your services - HTTP checks, TCP ports, Docker containers, whatever you need. The important part for this project is setting up a webhook notification.
In Uptime Kuma, go to Settings → Notifications → Add Notification, and create a webhook with:
- Type: Webhook
- URL:
https://your-n8n-instance/webhook/uptime-alert - Method: POST
The n8n Workflow
n8n is where the magic happens. The workflow has four nodes:
1. Webhook Trigger
Create a webhook node that listens for POST requests at /webhook/uptime-alert. Uptime Kuma will send a payload that includes the monitor name, URL, and error message.
2. Build the Prompt
A Code node extracts the relevant fields and builds a prompt for the LLM:
const monitor = $input.first().json.body.monitor;
const msg = $input.first().json.body.msg;
return {
model: 'llama3.2',
prompt: `A homelab service is down.
Service: ${monitor.name}
URL: ${monitor.url || monitor.hostname}
Error: ${msg}
Provide 3-5 brief troubleshooting steps. Be concise.`,
monitor_name: monitor.name,
error_message: msg
};
3. Query Ollama
An HTTP Request node sends the prompt to Ollama's API:
- URL:
http://localhost:11434/api/generate - Method: POST
- Body: The model and prompt from the previous node
Set a reasonable timeout (60 seconds) since LLM inference takes a moment, and make sure stream is set to false so you get the complete response in one shot.
4. Send to Slack
Finally, a Slack node posts a formatted message with both the alert and the AI analysis. I use Block Kit for nice formatting:
{
"blocks": [
{
"type": "header",
"text": {
"type": "plain_text",
"text": "🤖 AI Troubleshooting: {{ $json.monitor_name }}"
}
},
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "*Error:* {{ $json.error_message }}"
}
},
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "*AI Analysis:*\n{{ $json.response }}"
}
}
]
}
What It Looks Like in Practice
When my Jellyfin server went down last week, here's what showed up in Slack:
🤖 AI Troubleshooting: Jellyfin
Error: Connection timeout after 30s
AI Analysis:
- Check if the Docker container is running:
docker ps | grep jellyfin- Verify the host is reachable:
ping 192.168.1.64- Check Docker logs for errors:
docker logs jellyfin- Ensure port 8096 isn't blocked by firewall
- Check disk space on media server:
df -h
It turned out to be a Docker issue - the container had crashed due to a corrupted transcoding cache. The AI's suggestion to check Docker logs pointed me right to the problem.
Tips and Gotchas
Model selection matters. I use llama3.2 because it's fast enough to respond within a few seconds and smart enough to give useful suggestions. Larger models give better responses but you'll wait longer for alerts.
Keep prompts focused. The more context you give the LLM about your specific setup, the better the suggestions. You could enhance the prompt with information about your infrastructure pulled from a knowledge base.
Handle Ollama failures gracefully. If Ollama is down (ironic, I know), make sure your workflow still sends the basic alert. I have a fallback that posts "AI analysis unavailable" rather than failing silently.
Don't alert on recovery. Configure Uptime Kuma to only send webhooks on DOWN events, not on recovery. Otherwise you'll get AI analysis of "the service came back up" which isn't very helpful.
What's Next
This basic setup works well, but there's room to expand. Some ideas I'm considering:
- Query a RAG system with my homelab documentation for more specific suggestions
- Add an "attempt auto-repair" option for common issues
- Track which suggestions actually helped to improve prompts over time
- Integrate with Home Assistant to check related sensor data
The point isn't to replace actual debugging skills - it's to get a head start on troubleshooting when you're half asleep and just want to know where to look first.