Files
sub2api-cn-relay-manager/docs/PRODUCTION_CLOSURE_BOARD.md
2026-05-21 13:49:58 +08:00

105 lines
9.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Sub2api-CN-Relay-Manager 生产收口板
日期2026-05-20
当前 GateBLOCKED代码门禁已通过`scripts/import_remote43_provider.sh` 的 managed-probe / 本机 `PACK_PATH` 修复已关闭历史 `401 Unauthorized` 假阴性;但 2026-05-21 latest-head fresh host completion smoke 仍未通过DeepSeek `artifacts/real-host-acceptance/20260521_064403_remote43_deepseek_key_import` 与 MiniMax `artifacts/real-host-acceptance/20260521_064454_remote43_minimax_key_import` 都已达到 `subscription_ready``/v1/models`=200`/v1/chat/completions` 仍返回 502。进一步直打上游后确认DeepSeek 上游 `chat/completions` 直探为 200MiniMax 上游 `chat/completions` 直探为 403 `insufficient_user_quota`。因此当前不允许宣称“完全验收/APPROVED”
目标:达到可上线代码质量,并把剩余风险明确收敛为外部环境验收项与已接受 P2 技术债务。
## 2026-05-21 校准说明(最新真相)
- 401 假阴性已关闭:`artifacts/real-host-acceptance/20260521_064403_remote43_deepseek_key_import``20260521_064454_remote43_minimax_key_import``09-models.headers.txt` 已恢复 `HTTP 200`
- fresh-host DB 侧状态也已对齐:脚本指向正确的 `sub2api-fresh-deepseek-20260519_115244-{postgres,redis}-1` 后,`08-subscription-group-state.json` 已能看到真实的 managed user / subscription / key 绑定。
- 新主阻断不是 auth/tooling而是 completion smoke两条 provider 在 host `/v1/chat/completions` 仍返回 `502 upstream_error`
- 上游直探分流证明:
- DeepSeek 上游 `/chat/completions` = `HTTP 200`host 侧 502 属于真实兼容性问题。
- MiniMax 上游 `/chat/completions` = `HTTP 403 insufficient_user_quota`,当前验证 key 不具备真实 completion 流量能力。
- 2026-05-21 进一步缩圈结果:
- DeepSeek managed key 直打 fresh host 仍稳定 `502`;但 remote43 主机上对同一 upstream key + 同一 payload 的直接 `curl` 返回 `HTTP 200`,且响应是 `Content-Type: text/event-stream`。这把阻断进一步收敛为“宿主 chat 上游兼容层”而非 CRM 导入失败。
- fresh-host app 日志显示 DeepSeek subscription group `5` 当前挂了 10 个 active duplicate accountshost chat 会在这些 account 间持续 failover全部报 `account_upstream_error 500/502` 后才返回 gateway `502`
- MiniMax subscription group `6` 当前挂了 6 个 active duplicate accounts但它们的 `temp_unschedulable_reason` 都已明确写成 `insufficient_user_quota`,因此该分支的主阻断仍是 key/quota而不是 CRM 路由链路。
- 汇总证据:`artifacts/real-host-acceptance/20260521_064910_completion_smoke_calibration.md`
- 调通细节与经验沉淀:`docs/REAL_HOST_ACCEPTANCE_LEARNINGS.md`
- 代码/本地运行态门禁已于 2026-05-21 再次独立复跑:`gofmt -l .``go vet ./...``go test ./... -count=1``go test -race ./... -count=1``go test -cover ./internal/... -count=1``go test ./tests/integration/... -count=1` 全通过;并额外验证了本机 CRM(18100) `GET /healthz` / `GET /api/hosts` = `200`,以及 fresh smoke 实例 `127.0.0.1:18101` 可启动并返回 `GET /healthz = ok``GET /api/hosts = {"hosts":[]}`
## 当前门控结论
| 维度 | 状态 | 证据 |
|------|------|------|
| Build & Test | ✅ PASS | `go test -race ./...` |
| Integration | ✅ PASS | `go test ./tests/integration/... -count=1` |
| Static Analysis | ✅ PASS | `go vet ./...` |
| Formatting | ✅ PASS | `gofmt -l .` 空输出 |
| Core Coverage | ✅ PASS | `go test -cover ./internal/...`access 77.3%, pack 72.7%, provision 74.6%sqlite 61.3% 仅作信息项) |
| 控制面 API 计划缺口 | ✅ CLOSED | 已补 `/api/hosts/{hostID}/probe``/api/providers/{providerID}/import-batches``/api/import-batches/{batchID}/rollback` |
| 状态一致性 | ✅ CLOSED | rollback-by-batch 回写 `rolled_back/failed`assign-subscriptions 同步 `import_batches.access_status` |
| provider 消歧 | ✅ CLOSED | pack 维度精确解析,避免同名 provider 跨 pack 误命中 |
| access 语义 | ✅ CLOSED | access preview 改为按 `subscription_ready/self_service_ready/fully_ready/broken` 判定 |
| OpenAPI | ✅ SYNCED | `docs/openapi.yaml` 已补当前控制面端点 |
| Local runtime smoke | ✅ PASS | `go build ./cmd/{server,cli}``GET /healthz``GET /api/hosts` |
| Local OCI image | ✅ PASS | `docker build -f Dockerfile.local -t sub2api-cn-relay-manager:local .` |
| Real-host acceptance tooling | ✅ READY | `docs/REAL_HOST_ACCEPTANCE_RUNBOOK.md` + `scripts/real_host_acceptance.sh` |
| Harness regression self-check | ✅ PASS | `bash ./scripts/test_real_host_scripts.sh` |
| `self_service` 真实宿主 fresh redeploy 复验 | ⚠️ HISTORICAL PASS | `artifacts/real-host-acceptance/20260518_redeploy_matrix`:历史 fresh redeploy host 可打通;当前不再作为唯一真相来源 |
| `subscription` 真实宿主 latest-head fresh host 复验 | ✅ PASS | MiniMax`artifacts/real-host-acceptance/20260521_011544_remote43_minimax_key_import`DeepSeek`artifacts/real-host-acceptance/20260521_011717_remote43_deepseek_key_import`;两条 provider 均 `subscription_ready` |
| stale CRM / channel pricing 缺口 | ✅ CLOSED | 宿主 `GET /api/v1/admin/channels/5``/channels/4` 已返回非空 `model_pricing` + `model_mapping` |
| `self_service`/`subscription` reconcile host-scope 复验 | ⚠️ PARTIAL | `artifacts/real-host-acceptance/20260518_reconcile_hostscope_*` 仍证明 host-scope 语义成立;本次 latest-head rerun 主验证点是 stale-process import/access closure而不是重新跑整套 reconcile/rollback |
## 本轮已关闭项
1. 补齐实现计划 API 缺口
- `POST /api/hosts/{hostID}/probe`
- `GET /api/providers/{providerID}/import-batches`
- `POST /api/import-batches/{batchID}/rollback`
2. 修复生产级语义问题
- rollback/provider 与 assign/access 改为 pack 维度精确定位 provider避免同名 provider 误操作
- `assign-subscriptions` 在写 access closure 后同步更新 `import_batches.access_status`
- `access preview` 改为按目标 mode 判定,不再把任意非 broken 状态误报为可用
- host capability 支持判定纳入 `plans` 能力
3. 补齐验证
- app/sqlite 新增回归测试覆盖以上行为
- 全量 race/integration/vet/gofmt 已复跑通过
- 本地 HTTP smoke 与 `Dockerfile.local` 容器构建已验证通过
4. 补齐上线前执行工具
- 新增 `scripts/build_local_image.sh`,固化本地/代理环境的镜像构建路径
- 新增 `docs/REAL_HOST_ACCEPTANCE_RUNBOOK.md`
- 新增 `scripts/real_host_acceptance.sh`,把真实宿主验收固化为可落盘 artifact 的流程
5. 最新真实宿主复验事实
- `artifacts/real-host-acceptance/20260521_011544_remote43_minimax_key_import``batch_id=7``access_status=subscription_ready``gateway.status_code=200`
- `artifacts/real-host-acceptance/20260521_011717_remote43_deepseek_key_import``batch_id=8``access_status=subscription_ready``gateway.status_code=200`
- 宿主 admin 侧直接复核MiniMax `/api/v1/admin/channels/5` 与 DeepSeek `/api/v1/admin/channels/4` 都已具备 `billing_model_source=channel_mapped``restrict_models=true`、非空 `model_pricing` / `model_mapping`
- 说明当前真实差异已不再是“代码没有把模型映射/定价写进 channel”而是“验收脚本 direct probe 仍可能误报 401”
- `self_service` 通过条件仍是:普通用户 key 绑定标准 group且用户具备可用余额
- `subscription` 通过条件仍是subscription 类型 group + 普通用户订阅分配 + key/group 绑定
## 剩余项P2 / 运营前置,不阻塞按 PRD 首版范围上线)
### 运营前置
- 真实宿主初始化不会自动创建普通用户;上线前必须显式创建普通用户并留存可复用凭据
- `self_service` 需要普通用户 key 绑定目标标准 group且通常还需要可用余额
- `subscription` 需要 subscription 类型 group + 普通用户订阅分配 + key/group 绑定
### P2 已接受技术债务
- access 模块仍未按 implementation plan 拆到 `planner.go / subscription_service.go / self_service_checker.go`
- reconcile 仍内联在 `internal/provision/`,未拆到 `internal/reconcile/*`
- 无内置 scheduler/jobs当前通过手动 reconcile + 外部 cron 补偿
- CLI `run*` 真实链路函数未做系统性 mock 单测
- 标准多阶段 `Dockerfile` 在受限网络下仍依赖容器内联网拉取 Go modules本地部署默认走 `scripts/build_local_image.sh`
- `scripts/import_remote43_provider.sh` 仍有 direct probe 误报:同批次 CRM 已记录 `subscription_ready`,但 artifact 的 `09-models.headers.txt` / `11-chat.headers.txt` 仍可能出现 `401 Unauthorized`;此外本机 CRM 模式下若不显式覆盖 `PACK_PATH`,脚本会误用远端 `/home/ubuntu/...` 路径触发 `stat pack path ... no such file or directory`
## 最短上线闭环
1.`docs/REAL_HOST_ACCEPTANCE_RUNBOOK.md` 准备真实宿主普通用户与可复用凭据
2. 按目标模式完成 key/group/billing(or subscription) 绑定
3. 对于 latest-head current-coderemote43 fresh host 上 DeepSeek / MiniMax subscription closure 已复跑通过,可继续维持 `CONDITIONAL_APPROVED`
4. 如需把 tooling 也一并收口,再补修 `scripts/import_remote43_provider.sh` 的 direct probe auth 与本机 `PACK_PATH` 参数化
## 禁止错误结论
- ❌ 历史失败/成功 artifact 不能脱离时间点复用;当前以 `20260518_redeploy_matrix` 为最新真相
-`HTTP 200` ≠ 宿主初始化会自动准备普通用户/订阅/余额;这些仍是显式运营前置
-`APPROVED` 表示“按 PRD 首版范围可上线”,不表示已变成多宿主自治平台
- ❌ 同名 provider 跨 pack 现在已避免误命中,但前提是调用方提供正确 pack path / pack_id