docs(routing): record resolver failover verification

This commit is contained in:
phamnazage-jpg
2026-05-29 10:06:16 +08:00
parent eb2242ca6f
commit d821162a8b

View File

@@ -298,6 +298,49 @@
- 回读到两条 sticky audit
- 最新一条 `action=hit`
- 较早一条 `action=bind`
- 2026-05-29 已完成基础设施闭环补充 / `route failure threshold failover`
- 提交:`eb2242ca feat(routing): add resolver failover fallback`
- 行为收口:
- `resolve` 现在会在选路时读取 `route-failures``cooldowns`
- 当高优先级 route 的 `failure_count >= logical_group.failover_threshold` 时,会自动跳过并选择下一条可用 route
- 首次 fallback 会把 `route_decision_logs.fallback_used` 置为 `true`
- 同时写入 `route_failover_events`
- 本地门禁已通过:
- `gofmt -l .`
- `go vet ./...`
- `go test -cover ./internal/...`
- `go test ./tests/integration/... -count=1`
- remote43 已原位升级到 `repo HEAD = eb2242c`
- `http://127.0.0.1:18173/healthz` 返回 `ok`
- 远端实例二进制已更新为 `sha256=cc177700541d9ab85a638f768e6fba045d1e864c347e6dfd895ea9e05f27c571`
- remote43 真实公网 API 验证已通过(`redis` 运行时):
- 创建临时逻辑分组 `logical_group_id=p1t5-failover-1780020305`
- 创建两条 route
- `codex2api-1780020305``priority=10`
- `asxs-1780020305``priority=20`
- `POST /api/routing/sticky/route-failures`
-`codex2api-1780020305` 写入 `failure_count=2`
- `last_error_class=timeout`
- 返回 `backend=redis`
- 第一次 `POST /api/routing/resolve`
- `request_id=req-p1t5-failover-first-1780020305`
- 返回 `route_id=asxs-1780020305`
- 返回 `sticky_hit=false`
- 返回 `sticky_action=bind`
- 说明高优先级 `codex2api-1780020305` 已因超阈值被自动跳过
- 第二次 `POST /api/routing/resolve`(同 subject
- `request_id=req-p1t5-failover-second-1780020305`
- 返回 `route_id=asxs-1780020305`
- 返回 `sticky_hit=true`
- `GET /api/routing/logs/failovers?request_id=req-p1t5-failover-first-1780020305`
- 回读到一条 failover event
- `from_route_id=codex2api-1780020305`
- `to_route_id=asxs-1780020305`
- `reason=failure_threshold_exceeded:timeout`
- `failure_count=2`
- `GET /api/routing/logs/decisions?sticky_key=lg:p1t5-failover-1780020305:m:gpt-5.4:conv:conv-p1t5-failover-1780020305`
- 第一条 resolve 对应记录 `fallback_used=true``sticky_hit=false`
- 第二条 resolve 对应记录 `fallback_used=false``sticky_hit=true`
- 2026-05-26 已把“最终用户 -> 公网域名 -> OpenClaw”这一跳补进正式验证口径
- 公网根地址当前统一为 `https://sub.tksea.top`
- OpenClaw 本地 `MiniMax` 运行时故障已定位为 `pi-ai/openai-node` 未继承系统 `HTTP(S)_PROXY`,不是 allowlist 或模型名大小写问题