Mar 12, 2026 aiops

Building an AI Incident Responder for Your Homelab

When a service goes down at 2am, wouldn't it be nice to get AI-powered troubleshooting suggestions along with the alert?

The Problem with Traditional Alerts

If you run a homelab, you probably have some kind of uptime monitoring. Maybe it's Uptime Kuma, maybe it's Prometheus with Alertmanager, or maybe you just notice when things break. The typical alert looks something like: "Service X is DOWN" - and then you're on your own to figure out why.

I wanted something smarter. When my Jellyfin server goes down, I don't just want to know it's down - I want some immediate ideas about what might be wrong. Is it a Docker issue? Did the disk fill up? Is the host unreachable?

So I built a system that automatically queries a local LLM when services go down and sends the AI's troubleshooting suggestions right to my Slack channel alongside the alert.

The Architecture

The system has four components:

Uptime Kuma - Monitors services and detects outages
n8n - Workflow automation that orchestrates everything
Ollama - Runs local LLMs for the AI analysis
Slack - Where the alerts and suggestions land

The flow is simple: Uptime Kuma detects a service is down and fires a webhook to n8n. The n8n workflow extracts the service details, builds a prompt, sends it to Ollama, and then posts both the alert and the AI response to Slack.

Setting Up Uptime Kuma

If you don't have Uptime Kuma running yet, it's one of the easiest self-hosted monitoring tools to set up. Here's a basic Docker Compose:

services:
  uptime-kuma:
    image: louislam/uptime-kuma:1
    container_name: uptime-kuma
    volumes:
      - ./data:/app/data
    ports:
      - "3001:3001"
    restart: unless-stopped

Add monitors for your services - HTTP checks, TCP ports, Docker containers, whatever you need. The important part for this project is setting up a webhook notification.

In Uptime Kuma, go to Settings → Notifications → Add Notification, and create a webhook with:

Type: Webhook
URL: https://your-n8n-instance/webhook/uptime-alert
Method: POST

The n8n Workflow

n8n is where the magic happens. The workflow has four nodes:

1. Webhook Trigger

Create a webhook node that listens for POST requests at /webhook/uptime-alert. Uptime Kuma will send a payload that includes the monitor name, URL, and error message.

2. Build the Prompt

A Code node extracts the relevant fields and builds a prompt for the LLM:

const monitor = $input.first().json.body.monitor;
const msg = $input.first().json.body.msg;

return {
  model: 'llama3.2',
  prompt: `A homelab service is down.
Service: ${monitor.name}
URL: ${monitor.url || monitor.hostname}
Error: ${msg}

Provide 3-5 brief troubleshooting steps. Be concise.`,
  monitor_name: monitor.name,
  error_message: msg
};

3. Query Ollama

An HTTP Request node sends the prompt to Ollama's API:

URL: http://localhost:11434/api/generate
Method: POST
Body: The model and prompt from the previous node

Set a reasonable timeout (60 seconds) since LLM inference takes a moment, and make sure stream is set to false so you get the complete response in one shot.

4. Send to Slack

Finally, a Slack node posts a formatted message with both the alert and the AI analysis. I use Block Kit for nice formatting:

{
  "blocks": [
    {
      "type": "header",
      "text": {
        "type": "plain_text",
        "text": "🤖 AI Troubleshooting: {{ $json.monitor_name }}"
      }
    },
    {
      "type": "section",
      "text": {
        "type": "mrkdwn",
        "text": "*Error:* {{ $json.error_message }}"
      }
    },
    {
      "type": "section",
      "text": {
        "type": "mrkdwn",
        "text": "*AI Analysis:*\n{{ $json.response }}"
      }
    }
  ]
}

What It Looks Like in Practice

When my Home Assistant instance went down last week, here's what showed up in Slack:

🤖 AI Troubleshooting: Home Assistant

Error: Connection refused after 30s

AI Analysis:

Check if the Docker container is running: docker ps | grep homeassistant

Verify the host is reachable: ping 192.168.1.64

Check Docker logs for errors: docker logs homeassistant

Ensure port 8123 isn't blocked by firewall

Check disk space on AI server: df -h

It turned out to be a Docker issue - a failed integration update had crashed the container. The AI's suggestion to check Docker logs pointed me right to the problem.

Tips and Gotchas

Model selection matters. I use llama3.2 because it's fast enough to respond within a few seconds and smart enough to give useful suggestions. Larger models give better responses but you'll wait longer for alerts.

Keep prompts focused. The more context you give the LLM about your specific setup, the better the suggestions. You could enhance the prompt with information about your infrastructure pulled from a knowledge base.

Handle Ollama failures gracefully. If Ollama is down (ironic, I know), make sure your workflow still sends the basic alert. I have a fallback that posts "AI analysis unavailable" rather than failing silently.

Don't alert on recovery. Configure Uptime Kuma to only send webhooks on DOWN events, not on recovery. Otherwise you'll get AI analysis of "the service came back up" which isn't very helpful.

What's Next

This basic setup works well, but there's room to expand. Some ideas I'm considering:

Query a RAG system with my homelab documentation for more specific suggestions
Add an "attempt auto-repair" option for common issues
Track which suggestions actually helped to improve prompts over time
Integrate with Home Assistant to check related sensor data

The point isn't to replace actual debugging skills - it's to get a head start on troubleshooting when you're half asleep and just want to know where to look first.