feat: close v3 slo gates and lifecycle metrics
Some checks failed
CI / Build & Test (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / Docker Build (push) Has been cancelled
CI / Release (push) Has been cancelled

This commit is contained in:
phamnazage-jpg
2026-06-08 14:49:06 +08:00
parent dbbb313a36
commit dd6f332b53
14 changed files with 775 additions and 156 deletions

View File

@@ -129,18 +129,20 @@
- 目标:先把现有 CRM 网关与 user-key 自助链路接成可观测真相源,而不是停留在“有 /metrics 端点但关键路径不产生日志/指标”。
- 本轮代码接线:
- `internal/metrics/metrics.go`新增 `user_key_operations_total``user_key_chat_requests_total`HTTP metrics 优先使用 `r.Pattern`,避免动态 path 高基数
- `internal/app/route_resolve_api.go`resolve / failover 接入 route metrics
- `internal/app/key_self_service_svc.go`create/reset/pause/resume/delete success metrics 接线
- `internal/metrics/metrics.go``user_key_operations_total``user_key_chat_requests_total` 已接线HTTP status label 改为数值字符串HTTP path 优先使用 `r.Pattern`,避免动态 path 高基数
- `internal/app/route_resolve_api.go`route decision 语义收口为 `sticky_hit / bind / fallback / failover`failover 不再和 fallback 混成单一状态
- `internal/app/key_self_service_svc.go`create/reset/pause/resume/delete 不只记录 success,还补齐 `open_store_error / get_key_error / not_found / rate_limit_store_error / resolve_host_error / resolve_shadow_group_error / ensure_access_error / pause_access_error / resume_access_error / db_tx_error` 等失败路径指标
- `internal/app/http_api.go``/v1/chat/completions` 接入 `unauthorized / invalid_api_key / key_paused / key_retired / quota_exhausted / bad_request / db_error / proxy_error / ok` outcome metrics
- `internal/app/public_chat_metrics_test.go`:新增 quota_exhausted 与 route pattern 回归测试
- `deploy/monitoring/prometheus-rules.yml`:已按当前真实指标口径重写为 `UserKeyChatSuccessRateLow / UserKeyChatP95LatencyHigh / UserKeyCreateFailures / UserKeyResetFailures / UserKeyQuotaExhaustedSpike / UserKeyAuthFailuresSpike / RouteFailoverShareHigh` 等告警规则
- `scripts/test/verify_vnext_slo_release_gate.sh`:新增 V3-2 发布门禁脚本,并已接入 `scripts/test/verify_quality_gates.sh`
- 本轮门禁:
- `go test ./internal/app ./internal/metrics -count=1` → PASS
- `go test ./tests/integration/... -count=1` → PASS
- `go vet ./...` → PASS
- `go test -cover ./internal/...` → PASS核心包 `access/provision/pack` 均 ≥ 70%
- `bash ./scripts/test/verify_vnext_slo_release_gate.sh` → PASS校验 metrics 接线 / 告警规则 / live governance artifact / 文档口径)
- 当前结论:
- `部分闭环` —— 首批 SLO/观测接线已完成并过门禁;更宽泛的治理/SLO 扩展(失败路径细化、告警/发布门禁)继续推进
- `闭环` —— V3-2 的失败路径细化、告警规则、发布门禁均已落地;全量 vNext 后续扩展已收口到可验证完成态
- portal key 管理 UI 已完成实现、部署和真实公网验收:
- 关键代码: