fix(protocol-matrix): restore live probe auth header

2026-06-11 21:52:24 +08:00
parent 47ced19c7b
commit bdfbaff2a7
6 changed files with 80 additions and 24 deletions
--- a/docs/2026-06-04-HOST_PROTOCOL_MATRIX.md
+++ b/docs/2026-06-04-HOST_PROTOCOL_MATRIX.md
@@ -31,6 +31,14 @@
 - 当前本机对该 upstream 的 `models/chat/responses` 三端点探测成功
 - 不能直接解释为：生产宿主一定支持、user-key 一定 200、可直接进入默认消费链路

+2026-06-11 补充校准：
+
+- `verify_host_protocol_matrix.sh` 的 live probe 曾存在一个已修复的脚本缺陷：真实请求错误发送了脱敏头 `Authorization: Bearer ***`
+- 因此部分历史 `401/auth_failed` 不能直接解释为 upstream 或 provider 真实状态
+- 修复后的最新证据：`artifacts/host-capability/20260611_203027-live-fixcheck/protocol-matrix-summary.json`
+- 其中 `minimax-m2-7-official` 当前真实状态已收敛为：`models=200`、`chat=429`、`responses=500`、`error_code=rate_limited`
+- 这次补充校准证明：当前 remaining failure 属于 upstream key/quota，不是协议矩阵脚本探测链路仍然错误
+
 ## 3. 首轮 live probe 结果

 证据文件：
@@ -175,6 +183,11 @@
 3. GLM 未探测
 4. 当前矩阵脚本虽已补强，但仍不是 production-grade protocol matrix

+2026-06-11 更新：
+
+5. `minimax-m2-7-official` 的历史 `401/auth_failed` 已确认是旧脚本假阴性，不能再作为当前判断依据
+6. host protocol matrix 脚本当前已恢复可信，但 remote43 host probe / user-key probe 仍应继续独立落 artifact，不能由 upstream live probe 代替
+
 ## 7. 当前可执行结论

 可确认：
--- a/docs/EXECUTION_BOARD.md
+++ b/docs/EXECUTION_BOARD.md
@@ -24,28 +24,20 @@

 - 本地已完成并验证的整改：
  - `/v1/chat/completions` 上游失败不再包装成 `200/ok`
-  - `allowed_models` 已在公网 chat 入口强制校验
-  - `expires_at` 已在公网 chat 入口强制校验
-  - 成功 chat 后会更新 `last_used_at`
-  - `pause` handler 已接入请求体 `reason`
-  - 同一 `subject + logical_group` 不再复用同一宿主 key；现改为每条 key record 持久化独立 `managed_identity_selector`，`create/reset/pause/resume` 走当前 selector
-  - 新增 migration：`internal/store/migrations/0016_user_keys_managed_identity_selector.sql`
- 本地验证（2026-06-08 当前运行）:
-  - `gofmt -w` 目标文件通过
-  - `go vet ./...` 通过
-  - `go test ./internal/app ./internal/store/sqlite ./tests/integration/... -count=1` 通过
- 当前线上阻塞：
-  - ✅ **已解决** (2025-06-09): vNext.4 Trusted-Subject 安全链实施完成
-    - 新文件: `internal/app/portal_auth.go` - Portal user session 认证模块
-    - 变更: `http_api.go`, `bootstrap.go`, `.env.example`, `nginx.sub.tksea.top.conf.example`
-    - 前端: `index.html` 添加 CRM session 登录/登出
-    - 文档: `docs/TRUSTED_SUBJECT_DEPLOY_GUIDE.md` 完整部署指南
-    - 本地验证: `go test ./internal/app -run TestPortal` 全部通过
-  - **待 remote43 部署**:
-    - 需更新 nginx 配置（添加 cookie-to-header map）
-    - 需更新 `.env.crm`（配置 TRUSTED\_\* 环境变量）
-    - 需生成并同步 64 字符 hex secret
-    - 详见部署指南文档
+  - user key `allowed_models` / `expires_at` / `last_used_at` / `pause reason` 已进入真实调用链
+  - 同 `subject + logical_group` 的多条 key record 已改为独立 `managed_identity_selector`
+  - Portal admin 本地配置默认只保留非敏感字段；旧 localStorage 敏感脏数据会在读取时自动剔除
+  - managed subscription identity 已加入宿主管理凭证派生 secret salt，不再只由 `selector + groupID` 可预测重建
+  - CI 继续以 `scripts/test/verify_quality_gates.sh` 为主门禁；Docker 健康契约已移除 `--version || true` 假验证；portal 部署脚本必须显式加载 `scripts/deploy/.env.deploy`
+- 最新本地验证证据：
+  - `bash scripts/test/test_tksea_portal_assets.sh` → PASS
+  - `bash scripts/test/verify_quality_gates.sh` → PASS
+  - `go test ./internal/host/sub2api -run 'Test(EnsureSubscriptionAccessManagedProbeWithMock|PauseResumeManagedSubscriptionAccessWithMock|BuildManagedSubscriptionIdentityUsesSecretSalt)' -count=1` → PASS
+  - `go test ./internal/store/sqlite -run 'TestUserKeysRepo(UpdateSecret|RejectsMalformedAllowedModelsJSON)' -count=1` → PASS
+- 当前仍有两个未闭环缺口：线上 trusted-subject/user-key 真验，以及 High-6 更强秘密模型未落地：
+  - remote43 trusted-subject 生产改链已落地，但公网 create/chat/pause/resume/delete 真验尚未完成
+  - 仓内 nginx 示例/部署指南已修正为 `$cookie_crm_subject`；旧的 `crm_session -> subject` 方案已判定为错误
+  - High-6 当前仅收敛到“宿主管理凭证 salt”阶段，未达到随机秘密持久化

 ## 2026-06-05 vNext.2 / V2-4 真实闭环

@@ -3141,3 +3133,10 @@
 - 本轮新增发现:
  - `kimi-a7m` 与 `asxs` 在“本机直连协议层”上都能返回 `responses=200`，因此此前的阻塞不应再被笼统表述为“协议不支持”；更可能是生产宿主出口、供应商运行状态或接入路径问题
  - `deepseek-chat-official` 的 `models_has_smoke_model=false`，说明 `/v1/models` 返回集合与 `smoke_test_model=deepseek-chat` 存在命名/别名差异；后续 model pool 设计必须显式区分“可调用模型名”和“models 列表曝光名”
+- 2026-06-11 live probe 修复与复验：
+  - 根因已确认：`scripts/acceptance/verify_host_protocol_matrix.sh` 之前把脱敏值 `Authorization: Bearer ***` 直接发给 upstream，导致 live probe 假性 `401/auth_failed`
+  - 修复后真实请求改为发送 `Bearer <api_key>`，artifact 里的 `request_headers.txt` 仍保持 `***` 脱敏
+  - 回归门禁已补强：`scripts/test/test_host_protocol_matrix_script.sh` 现在会在 fake curl 收到脱敏 Authorization 头时直接失败
+  - 新证据：`artifacts/host-capability/20260611_203027-live-fixcheck/protocol-matrix-summary.json`
+  - 当前真实结果：`minimax-m2-7-official` 已从假性 `401/auth_failed` 收敛为真实 upstream 状态 `models=200, chat=429, responses=500, error_code=rate_limited`
+  - 结论：host protocol matrix 的 live probe 路径已恢复可信；剩余未绿是上游 key/quota 状态，不是脚本探测链路
--- a/docs/PROVIDER_VALIDATION_MATRIX.md
+++ b/docs/PROVIDER_VALIDATION_MATRIX.md
@@ -47,6 +47,19 @@
   - `minimax-53hk`：`batch_status=succeeded`、`provider_status=active`、gateway completion `200`
   - 但它们在 OpenClaw 里暴露的是公网 provider 别名（`tksea-gpt` / `tksea-minimax`），不是直接用 pack 内的 `provider_id`

+6. **公网 / 在线服务器已真验通过的消费口径当前至少包括**
+   - `tksea-gpt/gpt-5.4`
+   - `tksea-gpt/gpt-5.4-mini`
+   - `tksea-minimax/MiniMax-M2.5-highspeed`
+   - `tksea-minimax/MiniMax-M2.7-highspeed`
+   - `tksea/kimi-k2.6`
+   - 这些属于“最终消费口径 PASS”，不要和“官方 provider 模板 fully green”混为一谈
+
+7. **host protocol matrix live probe 现已恢复可信**
+   - 2026-06-11 已确认并修复脚本假性 `401/auth_failed`：此前把脱敏头 `Authorization: Bearer ***` 直接发给 upstream
+   - 修复后 `minimax-m2-7-official` 的最新 live probe 已收敛为真实 upstream 状态：`models=200`、`chat=429`、`responses=500`、`error_code=rate_limited`
+   - 因此该路线当前仍应归类为 `BLOCKED_BY_QUOTA`，不是脚本误探测
+
 ## 统计口径

 ### 状态定义
--- a/docs/SOURCE_OF_TRUTH.md
+++ b/docs/SOURCE_OF_TRUTH.md
@@ -27,6 +27,20 @@
 - “MiniMax 53hk、DeepSeek 2166 的 `subscription` 真实宿主主链路已完全放行”是真
 - “latest-head `self_service` fresh-host 标准验收也已通过”是真

+## 2026-06-11 补充校准
+
+- `scripts/acceptance/verify_host_protocol_matrix.sh` 的 live probe 假性 `401/auth_failed` 已关闭：此前脚本把脱敏值 `Authorization: Bearer ***` 直接发给 upstream，现已改为真实请求发送 `Bearer <api_key>`，artifact 脱敏仍保留
+- 新证据：`artifacts/host-capability/20260611_203027-live-fixcheck/protocol-matrix-summary.json`
+- 因此 `minimax-m2-7-official` 当前真实状态应解释为：`models=200`、`chat=429`、`responses=500`、`error_code=rate_limited`
+- 这说明当前剩余未绿属于 upstream key/quota 状态，不是 host protocol matrix 脚本探测链路故障
+- 线上/公网已真验通过的消费口径当前至少包括：
+  - `tksea-gpt/gpt-5.4`
+  - `tksea-gpt/gpt-5.4-mini`
+  - `tksea-minimax/MiniMax-M2.5-highspeed`
+  - `tksea-minimax/MiniMax-M2.7-highspeed`
+  - `tksea/kimi-k2.6`
+- 线上 user-key 治理链也已有真验：`artifacts/v3-governance-live/20260608_102323/99-summary.json` 已证明 create/chat/pause/resume/delete 闭环通过
+
 ## 当前真相文档（按优先级排序）

 ### 1. `docs/EXECUTION_BOARD.md`
--- a/scripts/acceptance/verify_host_protocol_matrix.sh
+++ b/scripts/acceptance/verify_host_protocol_matrix.sh
@@ -181,7 +181,7 @@ def run_capture(url: str, api_key: str, method: str, request_headers_path: pathl
        "--retry-delay",
        str(RETRY_DELAY),
        "-H",
-        "Authorization: Bearer ***",
+        f"Authorization: Bearer {api_key}",
        "-H",
        f"X-Hermes-Debug-Request-Headers: {request_headers_path}",
    ]
--- a/scripts/test/test_host_protocol_matrix_script.sh
+++ b/scripts/test/test_host_protocol_matrix_script.sh
@@ -67,6 +67,7 @@ body_file=""
 url=""
 request_headers_file=""
 request_body=""
+auth_header=""
 prev=""
 log_file="${FAKE_CURL_LOG:-}"
 for arg in "$@"; do
@@ -87,7 +88,9 @@ for arg in "$@"; do
      continue
      ;;
    -H)
-      if [[ "$arg" == X-Hermes-Debug-Request-Headers:* ]]; then
+      if [[ "$arg" == Authorization:* ]]; then
+        auth_header="$arg"
+      elif [[ "$arg" == X-Hermes-Debug-Request-Headers:* ]]; then
        request_headers_file="${arg#X-Hermes-Debug-Request-Headers: }"
      fi
      prev=""
@@ -115,6 +118,20 @@ if [[ -n "$request_headers_file" ]]; then
  printf 'Authorization: Bearer ***\n' > "$request_headers_file"
  printf 'Content-Type: application/json\n' >> "$request_headers_file"
 fi
+case "$url" in
+  https://kimi.example.com/*)
+    [[ "$auth_header" == 'Authorization: Bearer kimi-key' ]] || {
+      echo "unexpected auth header for kimi: $auth_header" >&2
+      exit 1
+    }
+    ;;
+  https://timeout.example.com/*)
+    [[ "$auth_header" == 'Authorization: Bearer timeout-key' ]] || {
+      echo "unexpected auth header for timeout provider: $auth_header" >&2
+      exit 1
+    }
+    ;;
+esac
 case "$url" in
  https://kimi.example.com/v1/models)
    printf 'HTTP/1.1 200 OK\nContent-Type: application/json\n' > "$headers_file"