Kubernetes HPA 自定义指标弹性伸缩：3步实现基于业务QPS的自动扩缩容

痛点

你的服务跑在 Kubernetes 上，HPA 配了 CPU 80% 自动扩容——看起来没问题，直到某天流量洪峰来了，CPU 还在 40%，但接口响应已经飙到 5 秒。原因很简单：CPU 利用率并不总是能准确反映业务负载。对于 IO 密集型、依赖外部服务的应用，基于 CPU/Memory 的默认 HPA 策略就是个摆设。

真实场景： 某电商促销活动，网关服务每秒请求量从 500 QPS 飙到 8000 QPS。Pod 的 CPU 只到 50%（因为大量时间在等后端响应），HPA 纹丝不动，用户看到的是一堆超时。运维被迫手动 kubectl scale，错过了黄金 30 秒。

解决方案：让 HPA 直接基于业务指标（QPS、延迟、队列深度等）来决策扩缩容。

方案架构

核心链路：

应用暴露 Prometheus 指标 → Prometheus 采集 → Prometheus Adapter 转换为 K8s Custom Metrics API → HPA 读取并决策

关键组件： - Prometheus：已有监控系统，采集业务指标 - Prometheus Adapter：将 Prometheus 指标映射为 Kubernetes custom.metrics.k8s.io API - HPA v2：支持 type: Pods 或 type: Object 的自定义指标

实操步骤

第 1 步：应用暴露业务指标

以 Go 网关服务为例，暴露 http_requests_per_second 指标：

var httpRequestsTotal = prometheus.NewCounterVec(
    prometheus.CounterOpts{
        Name: "http_requests_total",
        Help: "Total HTTP requests",
    },
    []string{"method", "path", "status"},
)

func init() {
    prometheus.MustRegister(httpRequestsTotal)
}

确保 Service 上有 annotation 让 Prometheus 自动发现：

apiVersion: v1
kind: Service
metadata:
  name: gateway-svc
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "8080"
    prometheus.io/path: "/metrics"
spec:
  selector:
    app: gateway
  ports:
    - port: 8080

第 2 步：部署 Prometheus Adapter 并配置指标映射

用 Helm 安装 Prometheus Adapter：

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

helm install prometheus-adapter prometheus-community/prometheus-adapter \
  --namespace monitoring \
  --set prometheus.url=http://prometheus-server.monitoring.svc \
  --set prometheus.port=9090 \
  -f adapter-values.yaml

关键配置 adapter-values.yaml：

rules:
  custom:
    - seriesQuery: 'http_requests_total{namespace!="",pod!=""}'
      resources:
        overrides:
          namespace: {resource: "namespace"}
          pod: {resource: "pod"}
      name:
        matches: "^(.*)_total$"
        as: "${1}_per_second"
      metricsQuery: 'sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)'

这段配置将 http_requests_total 转换为 http_requests_per_second，计算 2 分钟窗口内的速率。

验证 Custom Metrics API 是否正常：

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/*/http_requests_per_second" | jq .

第 3 步：配置 HPA v2 使用自定义指标

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: gateway-hpa
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: gateway
  minReplicas: 3
  maxReplicas: 50
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 30
      policies:
        - type: Percent
          value: 100
          periodSeconds: 30
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Percent
          value: 10
          periodSeconds: 60
  metrics:
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: "1000"

含义：每个 Pod 平均 QPS 超过 1000 就扩容，低于 1000 就缩容。behavior 配置了快扩慢缩策略——扩容 30 秒稳定窗口，缩容 5 分钟窗口防抖动。

避坑指南

坑 1：Prometheus Adapter 指标查不到

症状： kubectl get --raw 返回空或 404。

原因： seriesQuery 的 label 过滤和 Prometheus 中实际的 label 不匹配。

解决： 先在 Prometheus 控制台确认 PromQL 能查到数据，再调整 adapter 配置。重启 adapter Pod 使配置生效：

kubectl rollout restart deployment prometheus-adapter -n monitoring

坑 2：缩容太激进导致服务抖动

症状： 流量降低后 Pod 快速缩容，紧接着又因为剩余 Pod 过载触发扩容，反复震荡。

解决： - scaleDown.stabilizationWindowSeconds 至少设 300 秒 - 缩容策略用 Percent: 10，每次最多缩 10% - 设合理的 minReplicas，别让它缩到 1

坑 3：多指标冲突时 HPA 选哪个

症状： 同时配了 CPU 和自定义指标，扩容行为不符合预期。

原理： HPA 会分别计算每个指标建议的副本数，然后取最大值。所以如果 CPU 建议 5 个 Pod，QPS 建议 10 个，最终扩到 10 个。

建议： 多指标场景下，确保每个指标的阈值经过压测验证，不要随意拍脑袋。

总结

对比项	默认 HPA (CPU/Mem)	自定义指标 HPA
适用场景	CPU 密集型计算服务	IO 密集、网关、队列消费者
响应速度	依赖资源采集周期	可自定义 rate 窗口
准确度	粗粒度	直接反映业务负载
运维复杂度	开箱即用	需部署 Adapter + 配置

核心建议： 1. 生产环境至少组合 CPU + 1 个业务指标做 HPA 2. 扩容要快（30s 窗口），缩容要慢（5min 窗口），这是血的教训 3. 上线前用 hey 或 vegeta 做压测，确认指标阈值和扩容速度符合预期

自定义指标 HPA 不是银弹，但它让你的弹性伸缩从"猜"变成"算"。花 30 分钟配一次，关键时刻能救命。