第二十一章 基于prometheus自定指标HPA弹性伸缩
最后更新于:2022-04-02 05:07:25
# HPA 自定义监控指标弹性伸缩
## 架构图
![](https://docs.gechiui.com/gc-content/uploads/sites/kancloud/36/3c/363c310ccef6d9fa5b553c21e5efdc81_2297x1144.png)
## 使用helm 在CCE 部署rabbitmq-exporter
安装helm
```
wget https://get.helm.sh/helm-v3.3.4-linux-amd64.tar.gz
tart -zxvf helm-v3.3.4-linux-amd64.tar.gz
mv linux-amd64/helm /usr/local/bin/helm
helm version
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /root/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /root/.kube/config
version.BuildInfo{Version:"v3.3.4", GitCommit:"a61ce5633af99708171414353ed49547cf05013d", GitTreeState:"clean", GoVersion:"go1.14.9"}
You have new mail in /var/spool/mail/root
```
部署rabbitmq-exporter
```
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prom-rabbit prometheus-community/prometheus-rabbitmq-exporter --set "rabbitmq.url=http://rabbitserver:15672" --set "rabbitmq.user=XXXt" --set "rabbitmq.password=XXXXX" --namespace=monitoring
```
验证部署是否成功
```
kubectl get pod,svc -n monitoring
```
![](https://docs.gechiui.com/gc-content/uploads/sites/kancloud/3d/d2/3dd24551458057eba7e5bf70daff91ab_2233x708.png)
编辑prometheus configmap ,添加 job_name ,rabbitmq-exporter 监控指标存入prometheus。注意target 为上图rabbitmq-exporter service 服务地址。
```
kind: ConfigMap
apiVersion: v1
metadata:
name: prometheus
namespace: monitoring
selfLink: /api/v1/namespaces/monitoring/configmaps/prometheus
uid: 036a2fbf-3718-4372-a138-672c62898048
resourceVersion: '3126060'
creationTimestamp: '2021-08-26T02:58:45Z'
labels:
app: prometheus
chart: prometheus-2.21.11
component: server
heritage: Tiller
release: cceaddon-prometheus
annotations:
description: ''
managedFields:
- manager: Go-http-client
operation: Update
apiVersion: v1
time: '2021-08-28T02:53:49Z'
fieldsType: FieldsV1
fieldsV1:
'f:data':
.: {}
'f:prometheus.yml': {}
'f:metadata':
'f:annotations':
.: {}
'f:description': {}
'f:labels':
.: {}
'f:app': {}
'f:chart': {}
'f:component': {}
'f:heritage': {}
'f:release': {}
data:
prometheus.yml: |-
global:
evaluation_interval: 1m
scrape_interval: 15s
scrape_timeout: 10s
alerting:
alertmanagers:
- scheme: https
tls_config:
insecure_skip_verify: true
static_configs:
- targets:
-
alert_relabel_configs:
- source_labels: [kubernetes_pod]
action: replace
target_label: pod
regex: (.+)
- source_labels: [pod_name]
action: replace
target_label: pod
regex: (.+)
rule_files:
- /etc/prometheus/rules/*/*.yaml
scrape_configs:
- bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
job_name: kubernetes-cadvisor
kubernetes_sd_configs:
- role: node
relabel_configs:
- replacement: kubernetes.default.svc:443
target_label: __address__
- regex: (.+)
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
source_labels:
- __meta_kubernetes_node_name
target_label: __metrics_path__
- target_label: cluster
replacement: f4486e8b-00dc-11ec-a6bf-0255ac1000cf
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
- job_name: kubernetes-nodes
kubernetes_sd_configs:
- role: node
relabel_configs:
- regex: (.+)
replacement: $1:9100
source_labels:
- __meta_kubernetes_node_name
target_label: __address__
- target_label: cluster
replacement: f4486e8b-00dc-11ec-a6bf-0255ac1000cf
- target_label: node
source_labels: [instance]
- job_name: kubernetes-service-endpoints
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- target_label: cluster
replacement: f4486e8b-00dc-11ec-a6bf-0255ac1000cf
- action: keep
regex: true
source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_scrape
- action: replace
regex: (https?)
source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_scheme
target_label: __scheme__
- action: replace
regex: (.+)
source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_path
target_label: __metrics_path__
- action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
source_labels:
- __address__
- __meta_kubernetes_service_annotation_prometheus_io_port
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: kubernetes_namespace
- action: replace
source_labels:
- __meta_kubernetes_service_name
target_label: kubernetes_service
- honor_labels: false
job_name: kubernetes-pods
kubernetes_sd_configs:
- role: pod
relabel_configs:
- target_label: cluster
replacement: f4486e8b-00dc-11ec-a6bf-0255ac1000cf
- action: keep
regex: true
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_scrape
- action: drop
regex: cceaddon-prometheus-node-exporter-(.+)
source_labels:
- __meta_kubernetes_pod_name
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: kubernetes_namespace
- action: replace
source_labels:
- __meta_kubernetes_pod_name
target_label: kubernetes_pod
- action: replace
regex: (.+)
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_path
target_label: __metrics_path__
- action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
source_labels:
- __address__
- __meta_kubernetes_pod_annotation_prometheus_io_port
target_label: __address__
- action: replace
regex: (.+)
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_scheme
target_label: __scheme__
metric_relabel_configs:
- source_labels: [ __name__ ]
regex: 'kube_node_labels'
action: drop
tls_config:
insecure_skip_verify: true
- job_name: 'istio-mesh'
metrics_path: /stats/prometheus
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_container_port_name]
action: keep
regex: http-envoy-prom
metric_relabel_configs:
- target_label: cluster
replacement: f4486e8b-00dc-11ec-a6bf-0255ac1000cf
- source_labels: [__name__]
action: keep
regex: istio.*
- job_name: 'rabbitmq-exporter'
static_configs:
- targets: ['prom-rabbit-exporter-prometheus-rabbitmq-exporter:9419']
metric_relabel_configs:
- target_label: namespace
replacement: default
```
使用prometheus 控制平台验证 target :rabbitmq-exporter 部署是否成功
![](https://docs.gechiui.com/gc-content/uploads/sites/kancloud/b3/15/b31532de41c8b56ca4af496d421cdb35_2505x1644.png)
验证rabbitmq_queue_messages 消息队列数值
![](https://docs.gechiui.com/gc-content/uploads/sites/kancloud/ef/1e/ef1e4466a56d6004c200c57f588c05a6_2990x1037.png)
## 构建自定义metric
### 修改 adapter-config,自定查询规则。新增字段
> externalRules
```
kind: ConfigMap
apiVersion: v1
metadata:
name: adapter-config
namespace: monitoring
selfLink: /api/v1/namespaces/monitoring/configmaps/adapter-config
uid: 9b559a81-b0f0-483f-ab4a-65df6073efcd
resourceVersion: '3131912'
creationTimestamp: '2021-08-26T02:58:45Z'
labels:
release: cceaddon-prometheus
managedFields:
- manager: Go-http-client
operation: Update
apiVersion: v1
time: '2021-08-26T02:58:45Z'
fieldsType: FieldsV1
fieldsV1:
'f:data':
.: {}
'f:config.yaml': {}
'f:metadata':
'f:labels':
.: {}
'f:release': {}
data:
config.yaml: |-
rules:
- seriesQuery: '{__name__=~"^container_.*",container_name!="POD",namespace!="",pod_name!=""}'
seriesFilters: []
resources:
overrides:
namespace:
resource: namespace
pod_name:
resource: pod
name:
matches: ^container_(.*)_seconds_total$
as: ""
metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>,container_name!="POD"}[1m])) by (<<.GroupBy>>)
- seriesQuery: '{__name__=~"^container_.*",container_name!="POD",namespace!="",pod_name!=""}'
seriesFilters:
- isNot: ^container_.*_seconds_total$
resources:
overrides:
namespace:
resource: namespace
pod_name:
resource: pod
name:
matches: ^container_(.*)_total$
as: ""
metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>,container_name!="POD"}[1m])) by (<<.GroupBy>>)
- seriesQuery: '{__name__=~"^container_.*",container_name!="POD",namespace!="",pod_name!=""}'
seriesFilters:
- isNot: ^container_.*_total$
resources:
overrides:
namespace:
resource: namespace
pod_name:
resource: pod
name:
matches: ^container_(.*)$
as: ""
metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>,container_name!="POD"}) by (<<.GroupBy>>)
- seriesQuery: '{namespace!="",__name__!~"^container_.*"}'
seriesFilters:
- isNot: .*_total$
resources:
template: <<.Resource>>
name:
matches: ""
as: ""
metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>}) by (<<.GroupBy>>)
- seriesQuery: '{namespace!="",__name__!~"^container_.*"}'
seriesFilters:
- isNot: .*_seconds_total
resources:
template: <<.Resource>>
name:
matches: ^(.*)_total$
as: ""
metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>}[1m])) by (<<.GroupBy>>)
- seriesQuery: '{namespace!="",__name__!~"^container_.*"}'
seriesFilters: []
resources:
template: <<.Resource>>
name:
matches: ^(.*)_seconds_total$
as: ""
metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>}[1m])) by (<<.GroupBy>>)
- seriesQuery: '{kubernetes_namespace!="",__name__!~"^container_.*"}'
seriesFilters:
- isNot: .*_total$
resources:
overrides:
kubernetes_namespace:
resource: namespace
kubernetes_pod:
resource: pod
name:
matches: ""
as: ""
metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>}) by (<<.GroupBy>>)
- seriesQuery: '{kubernetes_namespace!="",__name__!~"^container_.*"}'
seriesFilters:
- isNot: .*_seconds_total
resources:
overrides:
kubernetes_namespace:
resource: namespace
kubernetes_pod:
resource: pod
name:
matches: ^(.*)_total$
as: ""
metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>}[1m])) by (<<.GroupBy>>)
- seriesQuery: '{kubernetes_namespace!="",kubernetes_service!=""}'
seriesFilters:
- isNot: .*_seconds_total
resources:
overrides:
kubernetes_namespace:
resource: namespace
kubernetes_service:
resource: service
name:
matches: ^(.*)_total$
as: ""
metricsQuery: (avg(sum(rate(<<.Series>>{}[5m])) by (kubernetes_service, instance)) by (kubernetes_service))
- seriesQuery: '{kubernetes_namespace!="",__name__!~"^container_.*"}'
seriesFilters: []
resources:
overrides:
kubernetes_namespace:
resource: namespace
kubernetes_pod:
resource: pod
name:
matches: ^(.*)_seconds_total$
as: ""
metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>}[1m])) by (<<.GroupBy>>)
resourceRules:
cpu:
containerQuery: sum(rate(container_cpu_usage_seconds_total{<<.LabelMatchers>>}[1m])) by (<<.GroupBy>>)
nodeQuery: sum(rate(container_cpu_usage_seconds_total{<<.LabelMatchers>>, id='/'}[1m])) by (<<.GroupBy>>)
resources:
overrides:
instance:
resource: node
namespace:
resource: namespace
pod_name:
resource: pod
containerLabel: container_name
memory:
containerQuery: sum(container_memory_working_set_bytes{<<.LabelMatchers>>}) by (<<.GroupBy>>)
nodeQuery: sum(container_memory_working_set_bytes{<<.LabelMatchers>>,id='/'}) by (<<.GroupBy>>)
resources:
overrides:
instance:
resource: node
namespace:
resource: namespace
pod_name:
resource: pod
containerLabel: container_name
window: 1m
externalRules:
- seriesQuery: '{__name__=~"^rabbitmq_.*",queue="input.service.run_intelligent_classification"}'
resources:
template: <<.Resource>>
name:
matches: ""
as: ""
metricsQuery: 'max(<<.Series>>{<<.LabelMatchers>>}) by (<<.GroupBy>>)'
```
自定义custom-metrics-apiserver k8s apiserver ,创建k8s-api.yml
```
apiVersion: apiregistration.k8s.io/v1beta1
kind: APIService
metadata:
name: v1beta1.external.metrics.k8s.io
spec:
service:
name: custom-metrics-apiserver
namespace: monitoring
group: external.metrics.k8s.io
version: v1beta1
insecureSkipTLSVerify: true
groupPriorityMinimum: 100
versionPriority: 100
```
验证rabbitmq_queue_messages 指标, 是否能从metric-server 查询。
```
# kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/default/rabbitmq_queue_messages"
{"kind":"ExternalMetricValueList","apiVersion":"external.metrics.k8s.io/v1beta1","metadata":{"selfLink":"/apis/external.metrics.k8s.io/v1beta1/namespaces/default/rabbitmq_queue_messages"},"items":[{"metricName":"rabbitmq_queue_messages","metricLabels":{},"timestamp":"2021-08-28T10:37:16Z","value":"32847"}]}
```
编辑hpa-controller-custom-metrics clusterrolebinding权限,修改namespace
```
kubectl edit clusterrolebinding -oyaml hpa-controller-custom-metrics
```
```
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
creationTimestamp: "2021-08-26T02:58:45Z"
labels:
release: cceaddon-prometheus
name: hpa-controller-custom-metrics
resourceVersion: "3147794"
selfLink: /apis/rbac.authorization.k8s.io/v1/clusterrolebindings/hpa-controller-custom-metrics
uid: d7ab1332-e54f-42ad-bb51-0b0c5db4f386
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: custom-metrics-server-resources
subjects:
- kind: ServiceAccount
name: horizontal-pod-autoscaler
namespace: kube-system # monitering修改为kube-system
```
编辑 custom-metrics-server-resources ,clusterrole 新增apiGroups : external.metrics.k8s.io
```
kubectl edit clusterrole custom-metrics-server-resources
```
```
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
creationTimestamp: "2021-08-26T02:58:45Z"
labels:
rbac.authorization.k8s.io/aggregate-to-admin: "true"
rbac.authorization.k8s.io/aggregate-to-edit: "true"
rbac.authorization.k8s.io/aggregate-to-view: "true"
release: cceaddon-prometheus
name: custom-metrics-server-resources
resourceVersion: "3140104"
selfLink: /apis/rbac.authorization.k8s.io/v1/clusterroles/custom-metrics-server-resources
uid: 5fb1a3d6-6da2-4197-a039-55f7770cca3e
rules:
- apiGroups:
- custom.metrics.k8s.io
- external.metrics.k8s.io # 新增external组
resources:
- '*'
verbs:
- '*'
```
部署nginx deployment服务,并创建HPA 策略。
```
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: scale-nginx
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: ngin
minReplicas: 1 # 最小副本数
maxReplicas: 5 # 最大副本数
metrics:
- type: External
external:
metricName: rabbitmq_queue_messages # 自定义监控服务名称
targetValue: 1000 # 消息队列大于1000,nginx服务开始扩容
```
HPA RBAC 权限报错
```
[root@alg-ty-69070 ~]# kubectl get hpa -oyaml scale
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
annotations:
autoscaling.alpha.kubernetes.io/conditions: '[{"type":"AbleToScale","status":"True","lastTransitionTime":"2021-08-28T08:48:11Z","reason":"SucceededGetScale","message":"the
HPA controller was able to get the target''s current scale"},{"type":"ScalingActive","status":"False","lastTransitionTime":"2021-08-28T08:48:11Z","reason":"FailedGetExternalMetric","message":"the
HPA was unable to compute the replica count: unable to get external metric default/rabbitmq_queue_messages/nil:
unable to fetch metrics from external metrics API: rabbitmq_queue_messages.external.metrics.k8s.io
is forbidden: User \"system:serviceaccount:kube-system:horizontal-pod-autoscaler\"
cannot list resource \"rabbitmq_queue_messages\" in API group \"external.metrics.k8s.io\"
in the namespace \"default\""}]'
autoscaling.alpha.kubernetes.io/metrics: '[{"type":"External","external":{"metricName":"rabbitmq_queue_messages","targetValue":"1k"}}]'
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"autoscaling/v2beta1","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"scale","namespace":"default"},"spec":{"maxReplicas":10,"metrics":[{"external":{"metricName":"rabbitmq_queue_messages","targetValue":1000},"type":"External"}],"minReplicas":1,"scaleTargetRef":{"apiVersion":"apps/v1","kind":"Deployment","name":"alg-ty-es"}}}
creationTimestamp: "2021-08-28T08:47:56Z"
managedFields:
- apiVersion: autoscaling/v2beta1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:kubectl.kubernetes.io/last-applied-configuration: {}
f:spec:
f:maxReplicas: {}
f:metrics: {}
f:minReplicas: {}
f:scaleTargetRef:
f:apiVersion: {}
f:kind: {}
f:name: {}
manager: kubectl-client-side-apply
operation: Update
time: "2021-08-28T08:47:56Z"
```
';