Health Check

Overview

The health check endpoint is used to monitor the operational status of WuKongIM server and cluster, ensuring the system is running normally.

curl -X GET "http://localhost:5001/health"

{
  "status": "ok"
}

Response Fields

status

string

required

Health status: ok indicates normal, error indicates abnormal

message

string

Error message (only appears when status is error)

Status Codes

Status Code	Description
200	Server health status is normal
500	Server or cluster status is abnormal

Use Cases

Load Balancer Health Checks

Nginx Configuration:

upstream wukongim_backend {
    server 192.168.1.10:5001;
    server 192.168.1.11:5001;
    server 192.168.1.12:5001;
}

server {
    location /health {
        proxy_pass http://wukongim_backend/health;
        proxy_connect_timeout 5s;
        proxy_read_timeout 5s;
    }
    
    location / {
        proxy_pass http://wukongim_backend;
        # Health check configuration
        health_check uri=/health interval=30s fails=3 passes=2;
    }
}

HAProxy Configuration:

backend wukongim_servers
    balance roundrobin
    option httpchk GET /health
    http-check expect status 200
    server wk1 192.168.1.10:5001 check inter 30s
    server wk2 192.168.1.11:5001 check inter 30s
    server wk3 192.168.1.12:5001 check inter 30s

Container Orchestration

Docker Compose:

version: '3.7'
services:
  wukongim:
    image: registry.cn-shanghai.aliyuncs.com/wukongim/wukongim:v2
    healthcheck:
      test: ["CMD", "wget", "-q", "--spider", "http://localhost:5001/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s
    ports:
      - "5001:5001"

Kubernetes Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: wukongim
spec:
  replicas: 3
  selector:
    matchLabels:
      app: wukongim
  template:
    metadata:
      labels:
        app: wukongim
    spec:
      containers:
      - name: wukongim
        image: registry.cn-shanghai.aliyuncs.com/wukongim/wukongim:v2
        ports:
        - containerPort: 5001
        livenessProbe:
          httpGet:
            path: /health
            port: 5001
          initialDelaySeconds: 30
          periodSeconds: 30
          timeoutSeconds: 10
          failureThreshold: 3
        readinessProbe:
          httpGet:
            path: /health
            port: 5001
          initialDelaySeconds: 10
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3

Monitoring and Alerting

Prometheus Monitoring:

# prometheus.yml
scrape_configs:
  - job_name: 'wukongim-health'
    metrics_path: '/health'
    static_configs:
      - targets: ['192.168.1.10:5001', '192.168.1.11:5001', '192.168.1.12:5001']
    scrape_interval: 30s
    scrape_timeout: 10s

Custom Health Check Script:

#!/bin/bash

SERVERS=("192.168.1.10:5001" "192.168.1.11:5001" "192.168.1.12:5001")
WEBHOOK_URL="https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK"

for server in "${SERVERS[@]}"; do
    response=$(curl -s -o /dev/null -w "%{http_code}" "http://$server/health" --max-time 10)
    
    if [ "$response" != "200" ]; then
        # Send alert
        curl -X POST -H 'Content-type: application/json' \
            --data "{\"text\":\"🚨 WuKongIM Health Check Failed: $server returned $response\"}" \
            "$WEBHOOK_URL"
    fi
done

Application Integration

Service Discovery:

class WuKongIMServiceDiscovery {
    constructor(servers) {
        this.servers = servers;
        this.healthyServers = [];
        this.checkInterval = 30000; // 30 seconds
        this.startHealthChecks();
    }
    
    async checkServerHealth(server) {
        try {
            const response = await fetch(`http://${server}/health`, {
                timeout: 5000
            });
            return response.status === 200;
        } catch (error) {
            console.error(`Health check failed for ${server}:`, error);
            return false;
        }
    }
    
    async updateHealthyServers() {
        const healthChecks = this.servers.map(async (server) => {
            const isHealthy = await this.checkServerHealth(server);
            return { server, isHealthy };
        });
        
        const results = await Promise.all(healthChecks);
        this.healthyServers = results
            .filter(result => result.isHealthy)
            .map(result => result.server);
            
        console.log('Healthy servers:', this.healthyServers);
    }
    
    startHealthChecks() {
        this.updateHealthyServers();
        setInterval(() => {
            this.updateHealthyServers();
        }, this.checkInterval);
    }
    
    getHealthyServer() {
        if (this.healthyServers.length === 0) {
            throw new Error('No healthy WuKongIM servers available');
        }
        
        // Round-robin selection
        const server = this.healthyServers[Math.floor(Math.random() * this.healthyServers.length)];
        return server;
    }
}

// Usage
const discovery = new WuKongIMServiceDiscovery([
    '192.168.1.10:5001',
    '192.168.1.11:5001', 
    '192.168.1.12:5001'
]);

Best Practices

Monitoring Frequency: Recommended to check health status every 30-60 seconds
Timeout Settings: Set reasonable timeout values to avoid false alarms
Load Balancing: Can be used for load balancer health checks
Container Orchestration: Suitable for Docker and Kubernetes health check configurations
Alerting Mechanism: Integrate with monitoring systems for automated alerting
Graceful Degradation: Implement fallback mechanisms when health checks fail
Circuit Breaker: Use circuit breaker pattern to handle unhealthy services
Logging: Log health check results for troubleshooting and analysis

Route Management

Message Management

Channel Management

User Management

Conversation Management

Connection Management

Event Management

Monitoring

System

Integration APIs

Administrator

Overview

Response Fields

Status Codes

Use Cases

Load Balancer Health Checks

Container Orchestration

Monitoring and Alerting

Application Integration

Best Practices

Route Management

Message Management

Channel Management

User Management

Conversation Management

Connection Management

Event Management

Monitoring

System

Integration APIs

Administrator

​Overview

​Response Fields

​Status Codes

​Use Cases

​Load Balancer Health Checks

​Container Orchestration

​Monitoring and Alerting

​Application Integration

​Best Practices

Overview

Response Fields

Status Codes

Use Cases

Load Balancer Health Checks

Container Orchestration

Monitoring and Alerting

Application Integration

Best Practices