← Về danh sách bài họcBài 11/20

📊 Bài 11: RabbitMQ Monitoring

⏱️ Thời gian đọc: 20 phút | 📚 Độ khó: Nâng cao

🎯 Sau bài học này, bạn sẽ:

1. Các Metrics Quan Trọng

Metric Mô tả Alert khi
Queue depth Số messages trong queue > 10,000 messages
Consumer count Số consumers đang active = 0 (no consumers!)
Publish rate Messages/second được gửi Tăng đột biến hoặc = 0
Deliver rate Messages/second consumer nhận Thấp hơn publish rate nhiều
Unacked messages Messages đã deliver nhưng chưa ACK > 1000 (consumer stuck)
Memory usage RAM RabbitMQ đang dùng > 80% watermark
Disk free Dung lượng disk còn trống < 2GB
Connection count Số client connections > 500 hoặc tăng nhanh

2. Prometheus + Grafana Setup

# docker-compose-monitoring.yml
version: '3.8'
services:
  rabbitmq:
    image: rabbitmq:3-management
    ports:
      - "5672:5672"
      - "15672:15672"
      - "15692:15692"  # Prometheus metrics
    environment:
      RABBITMQ_DEFAULT_USER: admin
      RABBITMQ_DEFAULT_PASS: secret123
    command: >
      bash -c "rabbitmq-plugins enable rabbitmq_prometheus
      && rabbitmq-server"

  prometheus:
    image: prom/prometheus
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml

  grafana:
    image: grafana/grafana
    ports:
      - "3000:3000"
    environment:
      GF_SECURITY_ADMIN_PASSWORD: admin
# prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'rabbitmq'
    static_configs:
      - targets: ['rabbitmq:15692']
    metrics_path: /metrics

3. Management API Health Check

// health-check.js
const http = require('http');

function checkRabbitMQ() {
    return new Promise((resolve, reject) => {
        const options = {
            hostname: 'localhost',
            port: 15672,
            path: '/api/healthchecks/node',
            auth: 'admin:secret123'
        };

        http.get(options, (res) => {
            let data = '';
            res.on('data', chunk => data += chunk);
            res.on('end', () => {
                const health = JSON.parse(data);
                if (health.status === 'ok') {
                    resolve({ healthy: true, details: health });
                } else {
                    reject(new Error(`Unhealthy: ${health.reason}`));
                }
            });
        }).on('error', reject);
    });
}

// Kiểm tra queue depth
async function checkQueueDepth(queueName, maxDepth = 10000) {
    const res = await fetch(
        `http://localhost:15672/api/queues/%2f/${queueName}`,
        { headers: { Authorization: 'Basic ' + btoa('admin:secret123') } }
    );
    const queue = await res.json();

    return {
        name: queue.name,
        messages: queue.messages,
        consumers: queue.consumers,
        alert: queue.messages > maxDepth
    };
}

4. Alerting Rules

# Prometheus alerting rules
groups:
  - name: rabbitmq
    rules:
      - alert: RabbitMQQueueDepthHigh
        expr: rabbitmq_queue_messages > 10000
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Queue {{ $labels.queue }} has {{ $value }} messages"

      - alert: RabbitMQNoConsumers
        expr: rabbitmq_queue_consumers == 0
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Queue {{ $labels.queue }} has no consumers!"

      - alert: RabbitMQHighMemory
        expr: rabbitmq_process_resident_memory_bytes > 1e9
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "RabbitMQ using > 1GB memory"

5. Troubleshooting Phổ Biến

🔍 Queue depth tăng liên tục:
• Consumer quá chậm → Tăng số workers hoặc prefetch
• Consumer bị crash → Check logs, restart service
• Message poison → Check DLQ, fix logic
🔍 Memory alarm triggered:
• Quá nhiều messages trong queue → Tăng consumers
• Lazy queues: dùng x-queue-mode: lazy để lưu disk
• Connection leak → Kiểm tra app có close connection đúng
🔍 Unacked messages cao:
• Consumer nhận message nhưng không ACK/NACK
• Consumer xử lý quá lâu → Tăng timeout hoặc tối ưu logic
• Bug trong consumer → Check error handling

📝 Tóm Tắt