diff --git a/docs/EXECUTION_BOARD.md b/docs/EXECUTION_BOARD.md index 7bc0f0ce..8fbc47d0 100644 --- a/docs/EXECUTION_BOARD.md +++ b/docs/EXECUTION_BOARD.md @@ -298,6 +298,49 @@ - 回读到两条 sticky audit - 最新一条 `action=hit` - 较早一条 `action=bind` + - 2026-05-29 已完成基础设施闭环补充 / `route failure threshold failover` + - 提交:`eb2242ca feat(routing): add resolver failover fallback` + - 行为收口: + - `resolve` 现在会在选路时读取 `route-failures` 与 `cooldowns` + - 当高优先级 route 的 `failure_count >= logical_group.failover_threshold` 时,会自动跳过并选择下一条可用 route + - 首次 fallback 会把 `route_decision_logs.fallback_used` 置为 `true` + - 同时写入 `route_failover_events` + - 本地门禁已通过: + - `gofmt -l .` + - `go vet ./...` + - `go test -cover ./internal/...` + - `go test ./tests/integration/... -count=1` + - remote43 已原位升级到 `repo HEAD = eb2242c` + - `http://127.0.0.1:18173/healthz` 返回 `ok` + - 远端实例二进制已更新为 `sha256=cc177700541d9ab85a638f768e6fba045d1e864c347e6dfd895ea9e05f27c571` + - remote43 真实公网 API 验证已通过(`redis` 运行时): + - 创建临时逻辑分组 `logical_group_id=p1t5-failover-1780020305` + - 创建两条 route: + - `codex2api-1780020305`,`priority=10` + - `asxs-1780020305`,`priority=20` + - `POST /api/routing/sticky/route-failures` + - 为 `codex2api-1780020305` 写入 `failure_count=2` + - `last_error_class=timeout` + - 返回 `backend=redis` + - 第一次 `POST /api/routing/resolve` + - `request_id=req-p1t5-failover-first-1780020305` + - 返回 `route_id=asxs-1780020305` + - 返回 `sticky_hit=false` + - 返回 `sticky_action=bind` + - 说明高优先级 `codex2api-1780020305` 已因超阈值被自动跳过 + - 第二次 `POST /api/routing/resolve`(同 subject) + - `request_id=req-p1t5-failover-second-1780020305` + - 返回 `route_id=asxs-1780020305` + - 返回 `sticky_hit=true` + - `GET /api/routing/logs/failovers?request_id=req-p1t5-failover-first-1780020305` + - 回读到一条 failover event + - `from_route_id=codex2api-1780020305` + - `to_route_id=asxs-1780020305` + - `reason=failure_threshold_exceeded:timeout` + - `failure_count=2` + - `GET /api/routing/logs/decisions?sticky_key=lg:p1t5-failover-1780020305:m:gpt-5.4:conv:conv-p1t5-failover-1780020305` + - 第一条 resolve 对应记录 `fallback_used=true`、`sticky_hit=false` + - 第二条 resolve 对应记录 `fallback_used=false`、`sticky_hit=true` - 2026-05-26 已把“最终用户 -> 公网域名 -> OpenClaw”这一跳补进正式验证口径: - 公网根地址当前统一为 `https://sub.tksea.top` - OpenClaw 本地 `MiniMax` 运行时故障已定位为 `pi-ai/openai-node` 未继承系统 `HTTP(S)_PROXY`,不是 allowlist 或模型名大小写问题