监控cpu,内存告警

最后更新于:2022-04-02 03:00:37

[TOC] > [参考](https://yunlzheng.gitbook.io/prometheus-book/parti-prometheus-ji-chu/alert/prometheus-alert-rule) ## 修改 prometheus.yml ``` rule_files: - rules/*.rules ``` ## 配置 rules/hoststats-alert.rules ``` groups: - name: hostStatsAlert rules: - alert: hostCpuUsageAlert expr: sum(avg without (cpu)(irate(node_cpu_seconds_total{mode!='idle'}[5m]))) by (instance) > 0.85 for: 1m labels: severity: page annotations: summary: "Instance {{ $labels.instance }} CPU usgae high" description: "{{ $labels.instance }} CPU usage above 85% (current value: {{ $value }})" - alert: hostMemUsageAlert expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes)/node_memory_MemTotal_bytes > 0.85 for: 1m labels: severity: page annotations: summary: "Instance {{ $labels.instance }} MEM usgae high" description: "{{ $labels.instance }} MEM usage above 85% (current value: {{ $value }})" ``` ## 启动 AlertMannager ``` ./alertmanager ``` ## 测试 用此命令升高cpu,使其报警 ``` cat /dev/zero>/dev/null ```
';