Expand the batch auto-import V2 spec and TDD plan with stability requirements, result state persistence, and result page design. Add a dedicated architecture document for run state, APIs, pages, and UI field layout, and sync the execution board to the new V2 scope.
177 lines
12 KiB
Markdown
177 lines
12 KiB
Markdown
# sub2api-cn-relay-manager 执行板
|
||
|
||
日期:2026-05-22
|
||
当前 Gate:APPROVED(代码门禁已通过,并且 2026-05-21 已继续收掉 account probe、gateway probe 认证语义和 latest-head `self_service` fresh-host 复验的剩余问题。最新 MiniMax 53hk fresh-host 验收 `artifacts/real-host-acceptance/20260521_191418_remote43_minimax_key_import/21-summary.json`、DeepSeek 2166 `subscription` fresh-host 验收 `artifacts/real-host-acceptance/20260521_201509_remote43_deepseek_key_import/21-summary.json`、以及 latest-head `self_service` 标准 fresh-host 验收 `artifacts/real-host-acceptance/20260521_210403/05-import.json` / `07-access-status.json` 已共同证明:`subscription` 与 `self_service` 主链路都能在真实 fresh host 上闭环到 ready,host `/v1/models` 与 `/v1/chat/completions` 也都真实返回 `HTTP 200`。当前仍存在的 `reconcile=drifted` 只反映共享 fresh-host 环境里的历史残留资源,不阻塞 PRD 首版放行)
|
||
目标:实现独立控制面、零侵入宿主、可导入国产模型并具备可运维的导入/回滚/访问闭环。
|
||
|
||
## 2026-05-22 当前真相
|
||
|
||
- 当前主目录 `artifacts/real-host-acceptance/` 已只保留最终证据;历史调试样本已迁到 `artifacts/real-host-acceptance-archive/`
|
||
- access ready 语义已经收口为:`/v1/models` 命中 `smoke_test_model`,且最小 `POST /v1/chat/completions` smoke 成功;不会再出现 models-only 假 ready
|
||
- `subscription` 主链路已通过 latest fresh-host 复验:
|
||
- MiniMax 53hk:`artifacts/real-host-acceptance/20260521_191418_remote43_minimax_key_import/21-summary.json`
|
||
- DeepSeek 2166:`artifacts/real-host-acceptance/20260521_201509_remote43_deepseek_key_import/21-summary.json`
|
||
- Kimi A7M(local host `v0.1.129`):`artifacts/real-host-acceptance/20260522_122706_local_v0129_kimi_a7m_subscription_freshhost/21-summary.json`
|
||
- `self_service` 主链路已通过 latest-head 标准 fresh-host 复验:
|
||
- `artifacts/real-host-acceptance/20260521_210403/05-import.json`
|
||
- `artifacts/real-host-acceptance/20260521_210403/07-access-status.json`
|
||
- 官方 provider 验证矩阵当前仍保留一条非阻塞事实:
|
||
- `artifacts/real-host-acceptance/20260521_222212_remote43_minimax-m2-7-official_key_import/21-summary.json` 已证明 official MiniMax 模板链路是通的,但该验证 key 当前命中 upstream `429`
|
||
- `reconcile=drifted` 仍可能在 shared fresh-host 上出现,但当前解释是“历史残留资源噪音”,不阻塞 PRD 首版放行
|
||
- 调通细节与诊断经验已沉淀到:
|
||
- `docs/REAL_HOST_ACCEPTANCE_LEARNINGS.md`
|
||
- `docs/REAL_HOST_ARTIFACT_RETENTION.md`
|
||
|
||
## 本轮已完成
|
||
|
||
1. 宿主身份模型统一
|
||
- host 注册时持久化 `auth_type/auth_token`
|
||
- import / reconcile / rollback-provider / access 运行时链路切换为 `host_id` 主键
|
||
- provider status / resources / access status / import-batches 支持 `host_id` 查询维度
|
||
2. managed_resources 宿主维度收口
|
||
- 新增迁移 `0004_host_identity_and_managed_resources.sql`
|
||
- `managed_resources` 唯一键提升为 `(host_id, resource_type, host_resource_id)`
|
||
- 仓储与服务查询切换为 host-scoped 语义
|
||
3. reconcile run 结果按批次收口
|
||
- 新增迁移 `0006_reconcile_runs_batch_scope.sql`
|
||
- `reconcile_runs` 补充 `batch_id`,batch detail 仅返回本批次 reconcile 记录
|
||
4. capability probe 收敛为无副作用探测
|
||
- 不再对真实创建接口发送空 `POST`
|
||
5. rollback-provider 风险收敛
|
||
- 改为优先按已记录批次资源 `RollbackStoredResources()` 回滚
|
||
- 缺少已记录资源时拒绝危险删除
|
||
6. 文档真相同步
|
||
- 新增 `docs/2026-05-18-PRODUCTION_REMEDIATION_TASK_BOARD.md`
|
||
- 下调 `DEPLOYMENT.md` 中未实现的 `/metrics` / 限流 / 监控承诺
|
||
7. current-code remote43 导入链路已补齐 tunnel-aware 验证能力
|
||
- `scripts/import_remote43_provider.sh` 新增 `CRM_HOST_BASE`,允许把“operator 访问 host 地址”和“CRM 进程访问 host 地址”分离
|
||
- 历史 live model-mapping 关键证据保留在:`artifacts/real-host-acceptance/20260520_222713_crm18100_live_model_mapping_validation`
|
||
8. current-code remote43 access gate 根因修正已落地
|
||
- subscription access 改为宿主侧闭环:CRM 不再依赖外部预先给定的宿主普通用户 key,而是按 `subscription_users` selector 在宿主创建/查找托管普通用户、登录创建托管 key、回写 allowed_groups / balance、再执行订阅分配
|
||
- account 创建请求现在同步写入 `credentials.model_mapping`,修正 `/v1/models` 读取 account model whitelist 时回退到 GPT 默认集合的问题
|
||
- 新增/更新测试覆盖:`internal/access`、`internal/provision`、`internal/host/sub2api`
|
||
9. current-code access ready 语义已提升到 completion 层
|
||
- `/v1/models` 不再单独决定 `subscription_ready/self_service_ready`
|
||
- 只有 `/v1/models` 命中 `smoke_test_model` 且 `/v1/chat/completions` smoke 成功,控制面才会把 access 状态记成 ready
|
||
- access closure / import runtime artifact / reconcile rerun payload 都会持久化 `completion_ok/completion_status/completion_type/completion_preview`
|
||
10. current-code remote43 验收脚本已补 upstream API 证据层
|
||
- `scripts/import_remote43_provider.sh` 会直探 provider `base_url` 对应的 upstream `/models` 与 `/chat/completions`
|
||
- 新增 `21-summary.json`,用于把 completion 失败自动分流成 `host_compatibility_gap` 或 `upstream_key_quota_issue`
|
||
11. patched CRM external validation 已完成
|
||
- patched CRM 实例下,DeepSeek 与 MiniMax 都已验证“completion smoke 通过时能落成 succeeded/active,失败时不会误记成 ready”
|
||
- `20260521_191418_remote43_minimax_key_import` 与 `20260521_201509_remote43_deepseek_key_import` 已同时证明当前 `subscription` provider 链路可真实闭环
|
||
- `20260521_210403` 已证明 latest-head `self_service` 标准 fresh-host 验收也可闭环到 `self_service_ready / fully_ready`
|
||
12. artifact 保留策略已收口
|
||
- 主目录 `artifacts/real-host-acceptance/` 当前只保留最终证据
|
||
- 历史失败/半成功/试错样本已迁到 `artifacts/real-host-acceptance-archive/`
|
||
- 分类规则见:`docs/REAL_HOST_ARTIFACT_RETENTION.md`
|
||
13. relay-manager latest-head 已收口 Kimi A7M 两段竞态
|
||
- account test 首次 `403 Forbidden` 已降级为 advisory warning;只要 `/models` 已命中 `smoke_test_model`,不会再把 batch 误判为 blocking failure
|
||
- access closure 对导入后瞬时 `503 / no available accounts` 增加短暂 completion retry,避免宿主异步 probe / account warm-up 窗口把真实可用链路误记成 `broken`
|
||
- `20260522_122706_local_v0129_kimi_a7m_subscription_freshhost` 已证明:在修复后的 relay-manager + patched host 组合下,`kimi-a7m / kimi-k2.6` 可落到 `batch_status=succeeded`、`provider_status=active`、`latest_access_status=subscription_ready`
|
||
|
||
## 已验证门禁
|
||
|
||
- `gofmt -l .` ✅ 空输出
|
||
- `go vet ./...` ✅
|
||
- `go test ./...` ✅
|
||
- `go test -race ./...` ✅
|
||
- `go test -cover ./internal/...` ✅
|
||
- `internal/access`: `80.5%`
|
||
- `internal/host/sub2api`: `78.1%`
|
||
- `internal/pack`: `73.9%`
|
||
- `internal/provision`: `76.3%`
|
||
- `internal/store/sqlite`: `61.4%`
|
||
- `go test ./tests/integration/... -count=1` ✅
|
||
- `bash ./scripts/test_real_host_scripts.sh` ✅
|
||
|
||
## 当前保留的最终证据
|
||
|
||
1. `artifacts/real-host-acceptance/20260520_222713_crm18100_live_model_mapping_validation`
|
||
- 证明 account `credentials.model_mapping` 与 live runtime 对齐
|
||
|
||
2. `artifacts/real-host-acceptance/20260521_142211_crm18100_deepseek_completion_split`
|
||
- 证明 host completion 失败与 upstream completion 成功可以分离
|
||
- 是 completion 分流逻辑的关键根因证据
|
||
|
||
3. `artifacts/real-host-acceptance/20260521_191418_remote43_minimax_key_import`
|
||
- MiniMax 53hk `subscription` 最终成功样本
|
||
- `21-summary.json` 已到 `batch_status=succeeded`、`provider_status=active`
|
||
|
||
4. `artifacts/real-host-acceptance/20260521_201509_remote43_deepseek_key_import`
|
||
- DeepSeek 2166 `subscription` 最终成功样本
|
||
- `21-summary.json` 已到 `batch_status=succeeded`、`provider_status=active`
|
||
|
||
5. `artifacts/real-host-acceptance/20260521_210403`
|
||
- latest-head `self_service` 标准 fresh-host 验收最终成功样本
|
||
- `05-import.json` = `succeeded/self_service_ready/active`
|
||
- `07-access-status.json` = `latest_access_status=fully_ready`
|
||
|
||
6. `artifacts/real-host-acceptance/20260521_222212_remote43_minimax-m2-7-official_key_import`
|
||
- official MiniMax 模板 live 样本
|
||
- 模板链路打通,但当前验证 key 命中 upstream `429`
|
||
|
||
7. `artifacts/real-host-acceptance/20260522_122706_local_v0129_kimi_a7m_subscription_freshhost`
|
||
- latest-head relay-manager 对 patched host `v0.1.129` 的 Kimi A7M `subscription` 最终成功样本
|
||
- `21-summary.json` 已到 `batch_status=succeeded`、`provider_status=active`
|
||
- `account_probe_summary` 明确记录 `probe_advisory=true`、`validation_status=warning`,证明 403 probe race 已被 relay-manager 正确降级
|
||
|
||
## 剩余项(P2 / 运营前置,不阻塞按 PRD 首版范围上线)
|
||
|
||
1. 运营前置
|
||
- 真实宿主初始化不会自动创建普通用户;上线前必须显式创建普通用户并留存可复用凭据
|
||
- `self_service` 需要普通用户 key 绑定目标标准 group,且通常还需要可用余额
|
||
- `subscription` 需要 subscription 类型 group + 普通用户订阅分配 + key/group 绑定
|
||
|
||
2. 结构债务
|
||
- access / reconcile 仍未完全按 implementation plan 拆到独立子模块
|
||
- 当前仍无内置 scheduler/jobs
|
||
|
||
3. 部署与环境限制
|
||
- 标准多阶段 Dockerfile 在受限网络环境下仍不稳
|
||
- 当前推荐 `scripts/build_local_image.sh` + `Dockerfile.local`
|
||
|
||
4. official provider 验证矩阵
|
||
- official MiniMax 当前 live 样本已证明模板链路可用,但验证 key 命中 upstream `429`
|
||
- Qwen / GLM / Kimi / Step 等官方 provider 是否通过 live 验收,仍取决于后续官方 key 与 quota
|
||
|
||
## 当前最短后续路径
|
||
|
||
1. 若继续扩大 provider 覆盖面,优先按 `docs/PROVIDER_VALIDATION_MATRIX.md` 补官方 key,再做 official live 验收
|
||
2. 若继续优化 shared fresh-host 信噪比,对历史残留资源做一次环境清理,降低 `reconcile=drifted` 噪音
|
||
3. 若继续产品化,推进 `v2` 的 batch auto-import 设计评审,再开始实现
|
||
|
||
## v2 规划:Batch Auto-Import(URL + Key)
|
||
|
||
**当前阶段**:🔨 设计中(待评审与完善)
|
||
|
||
**文档**:`docs/2026-05-21-BATCH_AUTO_IMPORT_SPEC.md`(需求规格)
|
||
**TDD 计划**:`docs/2026-05-21-BATCH_AUTO_IMPORT_TDD_PLAN.md`(实现路径,已确认开放问题)
|
||
**技术架构**:`docs/2026-05-22-BATCH_AUTO_IMPORT_V2_ARCHITECTURE.md`(运行态状态库、结果页、API、页面字段布局)
|
||
|
||
**本轮设计收敛**:
|
||
- 已把真实验收中的三类高频问题写入 v2 方案:
|
||
- 添加模型时的模型名归一化与纠错
|
||
- 第三方国产模型的兼容能力画像(`/responses`、`/chat/completions`、Anthropic compatible、stream/tools)
|
||
- 添加账号后的异步确认窗口(首次 `403` probe race、首次 `503 no available accounts` warm-up)
|
||
- 已补充两类产品化能力到 v2:
|
||
- run / item 状态持久化、retry 轨迹、控制面重启后的历史结果查看
|
||
- 批次列表页 / 批次详情页,用于查看模型纠错结果、账号状态、provider warning 与最终 access 状态
|
||
- 当前 v2 的目标已从“同步导入成功”升级为“导入 + 异步确认 + 最终闭环验真”
|
||
|
||
**设计待完成**:
|
||
- [ ] **技术设计**:API 接口(CLI + HTTP)、数据模型、DB schema 变更、错误处理
|
||
- [ ] **UI 设计**:CLI 输出格式 / HTTP API 文档 / Web 控制台(待确认交付形态)
|
||
- [ ] **评审**:相关专业人员评审设计文档
|
||
|
||
**实现暂停**:等设计评审通过后再开始写代码
|
||
|
||
---
|
||
|
||
## 禁止错误结论
|
||
|
||
- ❌ 历史失败 artifact ≠ 当前 latest-head 仍失败
|
||
- ❌ capability probe 无副作用 ≠ 所有宿主版本都已真实兼容
|
||
- ❌ rollback-provider 已改安全路径 ≠ 历史脏资源自动消失
|
||
- ❌ `HTTP 200` ≠ 宿主初始化会自动准备普通用户/订阅/余额;这些仍是显式运营前置
|