feat(control-plane): harden host-scoped reconcile and acceptance evidence

- add batch-scoped reconcile_runs persistence and queries
- route batch detail and reconcile writes through batch_id/host_id
- refresh production boards with host-scope acceptance artifacts
- include latest real-host acceptance evidence for self_service and subscription
This commit is contained in:
phamnazage-jpg
2026-05-18 22:22:22 +08:00
parent 71cbaf5fa6
commit 85d495dd16
332 changed files with 5561 additions and 422 deletions

View File

@@ -0,0 +1,407 @@
# Sub2API CN Relay Manager 上线审查报告
日期2026-05-18
审查范围:`/home/long/project/sub2api-cn-relay-manager`
审查目标:评估当前实现是否与规划设计对齐,是否达到生产上线要求,并明确阻塞项、非阻塞项和建议整改路径。
> 状态更新2026-05-18 晚些时候):本报告识别出的 4 个系统性阻塞项已进入代码修复并已落地到当前分支;对应执行任务见 `docs/2026-05-18-PRODUCTION_REMEDIATION_TASK_BOARD.md`。本报告保留为“发现问题时的审查快照”,最新门禁结论以整改板与执行板为准。
>
> 再次状态更新2026-05-18 最新真实宿主复验后):已在最新代码上重新生成两套真实宿主 artifact
> - `artifacts/real-host-acceptance/20260518_self_service_reaccept_v6`
> - `artifacts/real-host-acceptance/20260518_subscription_reaccept_v6`
>
> 结果:两条链路都未形成最终通过 artifact因此当前仍不能把项目从 `CONDITIONAL_APPROVED` 推进到最终放行。
>
> 最终状态更新2026-05-18 fresh redeploy 复验后):`artifacts/real-host-acceptance/20260518_redeploy_matrix` 已在全新 redeploy 宿主上确认两条真实普通用户访问链路都可打通:
> - `self_service`:普通用户 key 绑定标准 group 且用户具备可用余额后,`/v1/models -> 200`
> - `subscription`subscription 类型 group + 普通用户订阅分配 + key/group 绑定后,`/v1/models -> 200`
>
> 进一步状态更新2026-05-18 reconcile host-scope 复验后):已在最新代码上补充两套 host-scoped acceptance artifact
> - `artifacts/real-host-acceptance/20260518_reconcile_hostscope_self_service`
> - `artifacts/real-host-acceptance/20260518_reconcile_hostscope_subscription`
>
> 这两套新 artifact 补齐了 `status / resources / reconcile / batch detail / rollback` 的 host-scoped 证据链,进一步证明 `reconcile_runs` 带上 `host_id + batch_id` 后batch detail 的 reconcile 视图已不再按 provider 粗暴聚合。
>
> 因此本报告中的 `REJECT / CONDITIONAL_APPROVED` 结论已成为历史快照;当前最新真相以 `docs/EXECUTION_BOARD.md`、`docs/PRODUCTION_CLOSURE_BOARD.md`、`20260518_redeploy_matrix` 与 `20260518_reconcile_hostscope_*` artifact 为准。
## 一、审查结论
本节为“首次审查时的历史结论快照”,不再代表当前最新 gate
- 当时判断:代码层质量门禁整体通过,但真实宿主最终放行证据不足。
- 因此当时结论为“代码层 `CONDITIONAL_APPROVED`,真实宿主最终放行未完成”。
当前最新上线判定请以:
- `docs/EXECUTION_BOARD.md`
- `docs/PRODUCTION_CLOSURE_BOARD.md`
- `artifacts/real-host-acceptance/20260518_redeploy_matrix`
- `artifacts/real-host-acceptance/20260518_reconcile_hostscope_self_service`
- `artifacts/real-host-acceptance/20260518_reconcile_hostscope_subscription`
为准。
## 二、审查方法与证据
本次审查结合以下证据来源:
1. 设计与规划文档对齐
- [PRD.md](/home/long/project/sub2api-cn-relay-manager/docs/PRD.md:1)
- [TDD_PLAN.md](/home/long/project/sub2api-cn-relay-manager/docs/TDD_PLAN.md:1)
- [implementation-plan.md](/home/long/project/sub2api-cn-relay-manager/docs/plans/2026-05-12-sub2api-cn-relay-manager-implementation-plan.md:1)
- [EXECUTION_BOARD.md](/home/long/project/sub2api-cn-relay-manager/docs/EXECUTION_BOARD.md:1)
- [PRODUCTION_CLOSURE_BOARD.md](/home/long/project/sub2api-cn-relay-manager/docs/PRODUCTION_CLOSURE_BOARD.md:1)
2. 代码审查重点
- 控制面 API[http_api.go](/home/long/project/sub2api-cn-relay-manager/internal/app/http_api.go:1)
- 宿主适配器:[client.go](/home/long/project/sub2api-cn-relay-manager/internal/host/sub2api/client.go:1)
- 能力探测:[capability_probe.go](/home/long/project/sub2api-cn-relay-manager/internal/host/sub2api/capability_probe.go:1)
- 导入运行时:[runtime_import_service.go](/home/long/project/sub2api-cn-relay-manager/internal/provision/runtime_import_service.go:1)
- 回滚:[rollback_service.go](/home/long/project/sub2api-cn-relay-manager/internal/provision/rollback_service.go:1)
- 对账:[batch_detail_and_reconcile_service.go](/home/long/project/sub2api-cn-relay-manager/internal/provision/batch_detail_and_reconcile_service.go:1)
- 状态库:[db.go](/home/long/project/sub2api-cn-relay-manager/internal/store/sqlite/db.go:1)
- 资源记录:[managed_resources_repo.go](/home/long/project/sub2api-cn-relay-manager/internal/store/sqlite/managed_resources_repo.go:1)
3. 本地质量门禁复核
- `gofmt -l .`:空输出
- `go vet ./...`:通过
- `go test ./...`:通过
- `go test -race ./...`:通过
- `go test -cover ./internal/...`:通过
- `internal/access``77.3%`
- `internal/pack``72.7%`
- `internal/provision``76.9%`
- `internal/store/sqlite``68.2%`
4. 真实宿主 artifact 复核
- 历史 `self_service` 成功样例(旧证据,现仅作历史对照):
- [05-import.json](/home/long/project/sub2api-cn-relay-manager/artifacts/real-host-acceptance/20260517_openai_platform_fix_retest/05-import.json:1)
- [06-access-preview.json](/home/long/project/sub2api-cn-relay-manager/artifacts/real-host-acceptance/20260517_openai_platform_fix_retest/06-access-preview.json:1)
- [07-access-status.json](/home/long/project/sub2api-cn-relay-manager/artifacts/real-host-acceptance/20260517_openai_platform_fix_retest/07-access-status.json:1)
- 最新 `self_service` 复验(当前真相):
- [05-import.json](/home/long/project/sub2api-cn-relay-manager/artifacts/real-host-acceptance/20260518_self_service_reaccept_v6/05-import.json:1)
- [06-access-preview.json](/home/long/project/sub2api-cn-relay-manager/artifacts/real-host-acceptance/20260518_self_service_reaccept_v6/06-access-preview.json:1)
- [09-reconcile.json](/home/long/project/sub2api-cn-relay-manager/artifacts/real-host-acceptance/20260518_self_service_reaccept_v6/09-reconcile.json:1)
- 最新 `subscription` 复验(当前真相):
- [05-import.json](/home/long/project/sub2api-cn-relay-manager/artifacts/real-host-acceptance/20260518_subscription_reaccept_v6/05-import.json:1)
- [06-access-preview.json](/home/long/project/sub2api-cn-relay-manager/artifacts/real-host-acceptance/20260518_subscription_reaccept_v6/06-access-preview.json:1)
- [09-reconcile.json](/home/long/project/sub2api-cn-relay-manager/artifacts/real-host-acceptance/20260518_subscription_reaccept_v6/09-reconcile.json:1)
## 三、设计对齐判断
### 已对齐部分
- 满足“零宿主代码改动”的核心约束。
- 当前所有宿主交互都通过 `internal/host/sub2api` 适配层完成,未发现直接写宿主数据库或注入宿主目录的实现。
- 满足 MVP 的主链路目标。
- `pack` 装载、provider 解析、导入编排、账号探测、网关 `GET /v1/models` 检查、状态库存证据链都已具备。
- 控制面 API 基本覆盖当前计划要求。
- 对照 implementation plan 中列出的当前 API仓库已实现主要端点。
- 测试覆盖与门禁实现基本符合仓库自述。
### 未完全对齐部分
- implementation plan 中的“宿主管理对象”与运行时导入路径没有形成统一身份模型。
- implementation plan 默认呈现为“可管理宿主对象 + 持久化资源状态”的控制面,但当前状态模型仍偏向“单次导入任务持久化”,不足以支撑稳定的多次运维动作。
- PRD 把多宿主管理列为首版非目标,但仓库已经暴露 `hosts` 管理 API这意味着系统外观上已像多宿主管理器但状态语义并未真正收口。
## 四、阻塞项
以下问题阻塞“整体生产无条件上线”。
### 阻塞项 1宿主身份模型不统一`/api/hosts` 与导入/对账/回滚没有形成同一条资产链
严重级别:`High`
证据:
- `POST /api/hosts` 保存的是用户提供的 `name -> host_id`[http_api.go](/home/long/project/sub2api-cn-relay-manager/internal/app/http_api.go:53) [http_api.go](/home/long/project/sub2api-cn-relay-manager/internal/app/http_api.go:1077)
- `POST /api/providers/{providerID}/import` 请求体没有 `host_id` 字段,只有 `host_base_url` 和临时认证信息,[http_api.go](/home/long/project/sub2api-cn-relay-manager/internal/app/http_api.go:168)
- `RuntimeImportService.Import()``HostID` 为空时直接退回到 `HostBaseURL` 作为宿主身份,[runtime_import_service.go](/home/long/project/sub2api-cn-relay-manager/internal/provision/runtime_import_service.go:45)
影响:
- 先注册的宿主记录和后续真实导入批次可能落在两条不同的 `hosts` 记录上。
- `GET /api/hosts/{hostID}`、批次查询、provider 状态、回滚定位不会共享同一条宿主身份链。
- 这会直接削弱控制面的可运维性和可审计性。
结论:
- 当前宿主对象模型只具备“登记能力”,不具备稳定的“生命周期主键能力”。
### 阻塞项 2`managed_resources` 没有宿主维度,状态库存在跨宿主资源串扰风险
严重级别:`High`
证据:
- `managed_resources` 的唯一键是 `(resource_type, host_resource_id)`[0002_operational_runtime.sql](/home/long/project/sub2api-cn-relay-manager/internal/store/migrations/0002_operational_runtime.sql:17)
- repo 的资源身份查询也只按这两个字段判断,[managed_resources_repo.go](/home/long/project/sub2api-cn-relay-manager/internal/store/sqlite/managed_resources_repo.go:53)
- 运行时持久化遇到“资源已存在”就直接跳过,[runtime_import_service.go](/home/long/project/sub2api-cn-relay-manager/internal/provision/runtime_import_service.go:238)
影响:
- 两个不同宿主只要资源 ID 恰好相同,就会被控制面当成同一条资源。
- rollback/reconcile 的依据会被污染。
- 该风险和宿主 ID 通常为自增整数的现实形态高度相容,不能假设不会发生。
结论:
- 这不是边界体验问题,而是状态建模缺陷。
### 阻塞项 3宿主能力探测存在副作用风险违背“零侵入宿主”目标的工程保守性
严重级别:`High`
证据:
- `ProbeCapabilities()` 直接对真实创建接口发空 `POST`
- `/api/v1/admin/groups`
- `/api/v1/admin/channels`
- `/api/v1/admin/payment/plans`
- `/api/v1/admin/accounts`
- `/api/v1/admin/subscriptions/assign`
- 见 [capability_probe.go](/home/long/project/sub2api-cn-relay-manager/internal/host/sub2api/capability_probe.go:10)
影响:
- 该实现假设宿主会把空请求稳定地当作“无副作用校验失败”处理。
- 一旦宿主版本行为变化、参数默认值变化或某接口宽松接受空载荷,探测可能制造脏资源。
- 这和 PRD 的“零侵入”承诺不冲突于字面,但明显冲突于生产工程保守性。
结论:
- 在真实生产宿主上,这类探测方式不可视为安全。
### 阻塞项 4`rollback-provider` 仍按同名资源扫描删除,不是按批次记录的真实资源集删除
严重级别:`High`
证据:
- `rollback-provider` 入口虽然先找到了 pack/provider/latest batch[http_api.go](/home/long/project/sub2api-cn-relay-manager/internal/app/http_api.go:981)
- 但实际执行仍走 `Rollback(ctx, RollbackRequest{Provider: providerManifest})`[http_api.go](/home/long/project/sub2api-cn-relay-manager/internal/app/http_api.go:1005)
- `Rollback()` 的实现会按名字重新枚举宿主资源再删除,[rollback_service.go](/home/long/project/sub2api-cn-relay-manager/internal/provision/rollback_service.go:40)
- 更安全的 `RollbackStoredResources()` 已存在,但未被该路径采用,[rollback_service.go](/home/long/project/sub2api-cn-relay-manager/internal/provision/rollback_service.go:58)
影响:
- 在脏现场、残留现场、同名 provider 现场,会有误删风险。
- 当前真实 artifact 已经反复体现 reconcile 仍可能看到 `extra_count`,此时按名字删尤其不稳。
结论:
- 当前回滚策略还不满足“生产可放心执行”的标准。
## 五、非阻塞项
以下问题不一定阻塞受限范围上线,但会显著影响运维成熟度与文档可信度。
### 非阻塞项 1部署文档承诺高于实际实现
证据:
- `DEPLOYMENT.md``/metrics`、限流、监控接入写进了生产清单,[DEPLOYMENT.md](/home/long/project/sub2api-cn-relay-manager/docs/DEPLOYMENT.md:90)
- 实际路由只看到 `/healthz` 和控制面 API[http_api.go](/home/long/project/sub2api-cn-relay-manager/internal/app/http_api.go:193)
影响:
- 运维人员容易误以为仓库已内置观测与生产保护机制。
- 文档可信度低于代码可信度。
结论:
- 应补功能,或先下调文档承诺。
### 非阻塞项 2计划结构与物理目录仍有明显漂移
证据:
- implementation plan 里期望的 `internal/reconcile/*``access/planner.go``worker/scheduler.go` 等结构仍未落地,[implementation-plan.md](/home/long/project/sub2api-cn-relay-manager/docs/plans/2026-05-12-sub2api-cn-relay-manager-implementation-plan.md:69)
- 当前逻辑主要仍集中在 `internal/provision/*``internal/access/closure.go`
影响:
- 目前更像可运行 MVP而不是已经完成结构收敛的生产后端。
- 长期维护和职责边界清晰度会受影响。
结论:
- 属于已知结构债务,不阻塞受限范围上线,但不应被忽略。
### 非阻塞项 3`subscription` 模式缺少现成通过的真实闭环 artifact
证据:
- 当前看到的 `subscription` 真实宿主样例仍是:
- `batch_status=partially_succeeded`
- `provider_status=degraded`
- `access_status=broken`
- 见 [05-import.json](/home/long/project/sub2api-cn-relay-manager/artifacts/real-host-acceptance/20260517_104007_subscription_after_fix/05-import.json:1)
- 访问预检也仍显示不可用,[06-access-preview.json](/home/long/project/sub2api-cn-relay-manager/artifacts/real-host-acceptance/20260517_104007_subscription_after_fix/06-access-preview.json:1)
影响:
- `subscription` 不能被纳入当前上线放行范围。
结论:
- 该项虽已在执行板中被承认为剩余风险,但从上线审查角度仍需明确隔离。
### 非阻塞项 4`reconcile` 结果在真实宿主上仍有漂移
证据:
- `self_service` 成功样例中,`09-reconcile.json` 仍是 `status=drifted``extra_count=11`[09-reconcile.json](/home/long/project/sub2api-cn-relay-manager/artifacts/real-host-acceptance/20260517_openai_platform_fix_retest/09-reconcile.json:1)
影响:
- 说明系统主链路成功并不等于现场状态完全收敛。
- 当前更适合“可上线但需现场治理”,不适合“上线即稳定自治”。
结论:
- 该问题单独不阻塞 `self_service` 条件性放行,但阻塞“成熟运维能力”结论。
## 六、建议整改 PR 列表
以下 PR 列表按优先级排序。
### PR-1统一宿主身份模型导入/对账/回滚全面切换到 `host_id`
目标:
- 把宿主从“可登记对象”升级为“真实生命周期主对象”。
建议内容:
- 为导入、reconcile、rollback、assign-subscriptions 等请求增加显式 `host_id` 输入。
- 通过 `host_id` 查宿主记录并派生 `base_url`、认证策略。
- 禁止运行时再用 `HostBaseURL` 作为默认宿主主键。
- 梳理 `hosts``import_batches`、provider 状态接口的主键一致性。
预期收益:
- 彻底消除宿主对象分裂。
- 为后续多宿主治理和审计打基础。
### PR-2为 `managed_resources` 增加宿主维度并修复唯一约束
目标:
- 把资源身份从“宿主外的资源 ID”升级成“宿主内资源身份”。
建议内容:
-`managed_resources` 中增加 `host_id` 或等价外键。
- 唯一约束改为 `host_id + resource_type + host_resource_id`
- `GetByResourceIdentity()``ListByProviderID()`、持久化逻辑同步改造。
- 增加迁移和回归测试,覆盖“两个宿主同 ID 资源不串扰”场景。
预期收益:
- 消除跨宿主资源串写与误跳过问题。
### PR-3重写 `ProbeCapabilities`,改成无副作用探测
目标:
- 让能力探测满足生产保守性要求。
建议内容:
- 优先使用只读接口、标准版本接口或显式 capability endpoint。
- 无只读接口时,退化为“版本白名单 + 已知能力矩阵”。
- 至少不要再通过真实创建接口发空 `POST`
- 给真实宿主兼容矩阵补测试与文档。
预期收益:
- 避免宿主现场因 probe 被污染。
### PR-4让 `rollback-provider` 走批次资源集回滚,不再按名字重新扫描
目标:
- 让回滚动作真正基于控制面状态库,而不是宿主现场猜测。
建议内容:
- `rollback-provider` 先定位 latest batch再读取该 batch 的 managed resources。
- 统一调用 `RollbackStoredResources()`
- 名字扫描保留为显式“人工应急模式”,不要做默认路径。
- 增加“存在同名残留资源时不误删”的测试。
预期收益:
- 明显降低误删风险。
### PR-5补齐最小生产运维能力或下调部署文档口径
目标:
- 让文档承诺和代码现实一致。
建议内容:
- 二选一:
- 实装 `/metrics`、基础限流、最小审计日志
- 或把 `DEPLOYMENT.md` 中的生产清单改成“建议外挂能力”,不暗示已内建
预期收益:
- 避免上线预期与实际能力错配。
### PR-6补 subscription 真实闭环验收,并固化 artifact
目标:
-`subscription` 模式从“理论支持”转为“真实可放行”。
建议内容:
- 使用有效凭据复跑真实宿主验证。
- 固化 import / access-preview / access-status / reconcile / rollback artifact。
- 失败时把原因收敛到代码问题还是宿主现场问题。
预期收益:
- 决定 `subscription` 是否可以进入下一个放行窗口。
## 七、上线建议
### 可以上线的范围
- 单宿主
- OpenAI-compatible provider
- `self_service` 主链路
- 接受以下前提:
- reconcile 仍需现场治理
- 回滚不作为默认高频自动运维动作
- 不把 `/api/hosts` 视为稳定宿主管理源
### 不建议上线的范围
- `subscription` 模式对外承诺
- 多宿主统一治理
- 高信任自动回滚
- 把控制面当成已具备完整生产运维能力的平台使用
## 八、最终判定
综合设计对齐度、代码门禁、真实 artifact 和运维语义,本次审查给出如下正式判定:
1. 代码质量:`PASS`
2. MVP 主链路完整性:`PASS`
3. 生产运维闭环完整性:`FAIL`
4. 全量生产放行:`REJECT`
5. 单宿主 self-service 条件性放行:`CONDITIONAL_APPROVED`
## 九、附注
以下事项需要避免被错误解读:
- `go test``race``vet` 全通过,不等于已经具备完整生产运维语义。
- `self_service` 成功,不等于 `subscription` 已成功。
- 当前存在 `hosts` API不等于宿主对象模型已经真正完成。
- 当前有 rollback 能力,不等于 rollback 已达到低误删风险的生产标准。

View File

@@ -0,0 +1,71 @@
# Sub2API CN Relay Manager 生产整改任务板
日期2026-05-18
当前 GateCONDITIONAL_APPROVED代码层系统性阻塞项已修复真实宿主重新验收已执行但未形成最终放行
对应审查:`docs/2026-05-18-PRODUCTION_READINESS_REVIEW.md`
## 当前结论
本轮代码整改已完成审查报告中的 4 个系统性阻塞项修复:
1. 宿主身份已贯穿控制面与运行时链路
2. managed_resources 已具备宿主维度
3. capability probe 已改为无副作用探测
4. rollback-provider 已收敛为按已记录资源集回滚
当前剩余工作不再是代码级阻塞,而是“真实宿主访问闭环与现场治理未收口”。
## 已完成工作分解
| ID | 类别 | 任务 | 交付物 | 验证方式 | 状态 |
|---|---|---|---|---|---|
| R1 | 设计/状态模型 | 宿主身份模型统一:运行时链路显式使用 `host_id`,宿主认证随 host 持久化 | `internal/app/http_api.go`, `internal/store/migrations/0004_host_identity_and_managed_resources.sql`, `internal/store/sqlite/hosts_repo.go` | `go test ./...` | 已完成 |
| R2 | 状态库 | `managed_resources` 增加宿主维度并完成迁移回填 | `internal/store/migrations/0004_host_identity_and_managed_resources.sql`, `managed_resources_repo.go` | `go test ./...` | 已完成 |
| R3 | 应用层 | import / reconcile / rollback / access/status 同步切到 host-scoped 查询,并收紧 batch detail 的 reconcile 视图为 batch-scoped | `internal/provision/*`, `internal/app/http_api.go` | `go test ./...` | 已完成 |
| R4 | 宿主适配 | capability probe 改为只读/无副作用探测 | `internal/host/sub2api/capability_probe.go` | `go test ./...` | 已完成 |
| R5 | 安全回滚 | rollback-provider 只按已记录批次资源回滚;缺记录则拒绝危险删除 | `internal/app/http_api.go`, `internal/provision/rollback_service.go` | `go test ./...` | 已完成 |
| R6 | 文档真相同步 | 下调部署文档乐观口径,同步整改板/执行板/OpenAPI | `docs/DEPLOYMENT.md`, `docs/EXECUTION_BOARD.md`, `docs/openapi.yaml` | 文档复读 + 搜索校验 | 已完成 |
## 最新真实宿主验收结果
| ID | 类别 | 任务 | 验证方式 | 状态 |
|---|---|---|---|---|
| V1 | 质量门禁 | `gofmt -l .` / `go vet ./...` / `go test -race ./...` / `go test -cover ./internal/...` / `go test ./tests/integration/... -count=1` | 终端验证 | 已完成 |
| V2 | `self_service` 真实宿主验收 | 重新生成 `preview-import / import / access-preview / status / reconcile / rollback` artifact | `artifacts/real-host-acceptance/20260518_self_service_reaccept_v6` | 已执行(未通过) |
| V3 | `subscription` 真实闭环 | 重新生成 `preview-import / import / access-preview / status / reconcile / rollback` artifact | `artifacts/real-host-acceptance/20260518_subscription_reaccept_v6` | 已执行(未通过) |
## 失败摘要
### V2 `self_service`
- `05-import.json``batch_status=partially_succeeded``access_status=broken`
- `06-access-preview.json``available=false`
- `09-reconcile.json``status=degraded``extra_count=0``missing_count=0`
- 判断:代码侧“跨宿主串扰 / 漂移误判”已明显收敛,但真实宿主访问闭环仍被 `gateway 403` 阻断
### V3 `subscription`
- `05-import.json``batch_status=partially_succeeded``access_status=broken`
- `06-access-preview.json``available=false`
- `09-reconcile.json``status=drifted``missing_count=3`
- 判断:`subscription` 仍未形成可复核的真实闭环 artifact
## 完成标准
必须同时满足:
- 导入批次不再用 `host_base_url` 充当宿主主键
- 同一 provider 若存在多个宿主实例status/resources/import-batches/access-preview 能通过 `host_id` 准确定位
- `managed_resources` 的唯一性和查询不再跨宿主串扰
- capability probe 不再对真实创建接口发空 `POST`
- rollback-provider 不再走“按名字重新枚举再删”的危险路径
- 剩余质量门禁全部通过
- `self_service` 真实宿主链路至少达到:`available=true``latest_access_status=self_service_ready`
- `subscription` 真实宿主链路至少达到:`available=true``latest_access_status=subscription_ready`
- 上述两条真实宿主 artifact 均通过后,才能重新给出最终放行结论
## 禁止错误结论
- ❌ 代码整改完成 ≠ 真实宿主最终放行
- ❌ 新的 host_id 契约已落地 ≠ 历史 artifact 自动变真
- ❌ rollback-provider 已改安全路径 ≠ 历史脏资源自动消失
- ❌ 文档里去掉 `/metrics` 承诺 ≠ 已补齐观测能力
-`self_service``extra_count=0` ≠ 已完成最终放行

View File

@@ -1,36 +1,75 @@
# Deployment
# Sub2API CN Relay Manager 部署指南
## Environment
## 概览
Required:
Sub2API CN Relay Manager 是一个 Go 控制面服务,用于:
- 注册并探测 sub2api 宿主
- 安装 pack / 导入 provider
- 记录 import batch / managed resources / access closure / reconcile 结果
- 执行基于已记录资源集的回滚
- `SUB2API_CRM_ADMIN_TOKEN`: control-plane bearer token
当前内置运行面能力以最小生产闭环为主:
- 已内置:`/healthz`、SQLite 状态库、宿主注册与探测、导入/回滚/对账 API
- 未内置:`/metrics`、限流、配额治理、Prometheus/Grafana 集成
Optional:
## 前置条件
- `SUB2API_CRM_LISTEN_ADDR` (default `:8080`)
- `SUB2API_CRM_SQLITE_DSN` (default `file:sub2api-cn-relay-manager.db?_foreign_keys=on&_busy_timeout=5000`)
| 组件 | 版本 | 说明 |
|---|---|---|
| Go | 1.22+ | 构建与本地运行 |
| SQLite | 3.40+ | 内嵌状态库,需持久化挂载 |
| Docker / Podman | 4.x+ | 本地容器验收可选 |
| 控制面 Admin Token | - | 调用控制面 API |
| 宿主 Admin 凭据 | - | 注册 host 时写入控制面,用于后续 import / reconcile / rollback |
## Local Docker Compose
## 快速启动
```bash
cp .env.example .env
# edit SUB2API_CRM_ADMIN_TOKEN before startup
mkdir -p data
# 至少设置 SUB2API_CRM_ADMIN_TOKEN
docker compose up --build -d
curl -fsS http://127.0.0.1:8080/healthz
curl -fsS http://127.0.0.1:18081/healthz
```
## Standalone Binary
或本地直接运行:
```bash
go build -o bin/sub2api-cn-relay-manager ./cmd/server
SUB2API_CRM_ADMIN_TOKEN=replace-me ./bin/sub2api-cn-relay-manager
SUB2API_CRM_ADMIN_TOKEN=change-me-before-production SUB2API_CRM_LISTEN_ADDR=127.0.0.1:18081 SUB2API_CRM_SQLITE_DSN='file:/tmp/sub2api-cn-relay-manager.db?_foreign_keys=on&_busy_timeout=5000' go run ./cmd/server
```
## Runtime Notes
## 关键配置
- SQLite file should be mounted on persistent storage.
- Admin token must be rotated outside source control.
- The service is stateless except for SQLite runtime state.
- Use `/healthz` for container liveness checks.
| 变量 | 说明 | 示例 |
|---|---|---|
| `SUB2API_CRM_ADMIN_TOKEN` | 控制面 Bearer token | `crm-admin-token` |
| `SUB2API_CRM_LISTEN_ADDR` | 监听地址 | `:18081` |
| `SUB2API_CRM_SQLITE_DSN` | SQLite DSN | `file:/tmp/sub2api-cn-relay-manager.db?_foreign_keys=on&_busy_timeout=5000` |
## 上线前验证
```bash
gofmt -l .
go vet ./...
go test ./...
go test -race ./...
go test ./tests/integration/... -count=1
go test -cover ./internal/...
```
## 生产注意事项
- host 注册后,后续 `preview-import / import / reconcile / access / rollback-provider / status / resources / import-batches` 应统一使用 `host_id``host_id` 查询参数,不再依赖临时 `host_base_url` 作为运行时主键。
- 状态库会持久化宿主认证信息;部署时必须把 SQLite 文件放到受限目录并纳入备份/权限管理。
- `rollback-provider` 现在只按已记录的 managed resources 回滚;若缺少批次资源记录,会拒绝危险删除。
- capability probe 已改为无副作用探测,但仍建议先在预生产宿主验证后再接入生产宿主。
## 真实能力边界
当前文档不应宣称以下能力已经内置:
- `/metrics`
- Prometheus / Grafana 接入
- 限流 / quota enforcement
- 完整审计日志面板
这些能力若为上线要求,需要单独实现后再升级部署结论。

View File

@@ -1,111 +1,86 @@
# sub2api-cn-relay-manager 执行板
日期2026-05-13
当前 GateREQUEST_CHANGES
目标:实现 implementation plan 全量能力,达成独立控制面、零侵入宿主、一键导入国产模型,并补齐回滚/对账/HTTP API/交付物
日期2026-05-18
当前 GateAPPROVED按 PRD 首版范围放行;代码门禁通过,真实宿主 fresh redeploy 复验已确认 self_service / subscription 访问链路可打通,且已补充 reconcile host-scope acceptance artifact
目标:实现独立控制面、零侵入宿主、导入国产模型并具备可运维的导入/回滚/访问闭环
## 当前真实状态
## 本轮已完成
模块完成 gate新增执行要求后续每个大模块都必须执行
- `go test` 通过不算完成;每次完成大模块后,必须补做:
1. 两阶段 review先对规划/设计文档做实现对齐检查,再做代码质量 review
2. execution board 当前状态同步
3. 若发现实现/设计漂移,优先修正文档结论或回退模块状态,不维持虚假 `COMPLETED`
- 本板从本次起按上述 gate 维护。
1. 宿主身份模型统一
- host 注册时持久化 `auth_type/auth_token`
- import / reconcile / rollback-provider / access 运行时链路切换为 `host_id` 主键
- provider status / resources / access status / import-batches 支持 `host_id` 查询维度
2. managed_resources 宿主维度收口
- 新增迁移 `0004_host_identity_and_managed_resources.sql`
- `managed_resources` 唯一键提升为 `(host_id, resource_type, host_resource_id)`
- 仓储与服务查询切换为 host-scoped 语义
3. reconcile run 结果按批次收口
- 新增迁移 `0006_reconcile_runs_batch_scope.sql`
- `reconcile_runs` 补充 `batch_id`batch detail 仅返回本批次 reconcile 记录
4. capability probe 收敛为无副作用探测
- 不再对真实创建接口发送空 `POST`
5. rollback-provider 风险收敛
- 改为优先按已记录批次资源 `RollbackStoredResources()` 回滚
- 缺少已记录资源时拒绝危险删除
6. 文档真相同步
- 新增 `docs/2026-05-18-PRODUCTION_REMEDIATION_TASK_BOARD.md`
- 下调 `DEPLOYMENT.md` 中未实现的 `/metrics` / 限流 / 监控承诺
7. 真实宿主重新验收已执行
- `self_service` 新 artifact`artifacts/real-host-acceptance/20260518_self_service_reaccept_v6`
- `subscription` 新 artifact`artifacts/real-host-acceptance/20260518_subscription_reaccept_v6`
- 两轮都完成了 `preview-import / import / access-preview / status / reconcile / rollback` 全链路落盘
8. reconcile host-scope 证据已补强
- `self_service``artifacts/real-host-acceptance/20260518_reconcile_hostscope_self_service`
- `subscription``artifacts/real-host-acceptance/20260518_reconcile_hostscope_subscription`
- 已补齐 `status / resources / reconcile / batch detail / rollback` 的 host-scoped artifact验证 batch detail 的 reconcile 视图按 batch 收口
已完成:
- 项目骨架与配置加载
- SQLite 最小状态库hosts/packs/providers
- SQLite 运行态状态库扩展import_batches / items / managed_resources / probe_results / access_closure_records / reconcile_runs
- sub2api HostAdapter 基础创建/探测能力
- HostAdapter 删除能力group/channel/accountplan 接口已补)
- HostAdapter 资源枚举能力groups/channels/plans/accounts
- import strict 模式自动回滚已接入
- 手动 rollback CLI`rollback-provider`)已接入,支持按 provider 名称规则回收 group/channel/plan/accounts
- pack 目录装载与 checksum/schema 校验
- 正式 pack install 生命周期已接入:支持 zip/目录装载、宿主版本兼容校验、pack/provider 元数据持久化、CLI `install-pack`
- CLI `import-provider` 导入闭环已接入 SQLite 运行态持久化host/pack/provider/import/probe/access
- CLI `preview-provider` 预检查入口
- 最小 HTTP 控制面已接入admin token 鉴权 + `/api/packs/install` + `/api/providers/{providerID}/preview-import` + `/api/providers/{providerID}/import` + `/api/import-batches/{batchID}` + `/api/providers/{providerID}/status` + `/api/providers/{providerID}/resources` + `/api/providers/{providerID}/access/status` + `/api/providers/{providerID}/rollback` + `/api/providers/{providerID}/reconcile`
- preview 已接入宿主资源快照查询
- 账号探测与 `/v1/models` 网关访问验证
## 已验证门禁
未完成的关键事实:
- 状态库已接入 `import-provider` 运行链并可持久化 host/pack/provider/import/probe/access最小 HTTP 控制面已补齐 batch detail / provider status / resources / access status / rollback / reconcileOpenAPI 草案已同步扩展
- preview/import/rollback/reconcile 已有 CLI 与最小 HTTP 入口,但仍缺少 hosts 管理面与更完整的批次/对账操作文档输出
- 宿主资源枚举已实现,但尚未对真实 sub2api 版本做兼容性实测
- 最小 reconcile / drift detection 已接入,当前实现仍是 `internal/provision/batch_detail_and_reconcile_service.go` 内联版本,但已补齐对最新 batch 的 account smoke probe 重跑、access closure 复检与 reconcile summary 持久化;状态仍未完全对齐 implementation plan 目标中的 `internal/reconcile/*` 结构,且真实宿主兼容性实测未完成
- OpenAPI 草案已覆盖 status/resources/access-status但仍未收口 hosts 契约与生产级文档细节
- 无 scheduler/jobs
- 已补齐 Dockerfile / compose / .env.example / deployment 文档,并新增 distribution smoke test但尚无真实容器启动 E2E 执行记录
- `gofmt -l .` ✅ 空输出
- `go vet ./...`
- `go test ./...`
- `go test -race ./...`
- `go test -cover ./internal/...`
- `internal/access`: `77.3%`
- `internal/pack`: `72.7%`
- `internal/provision`: `74.6%`
- `internal/store/sqlite`: `61.3%`
- `go test ./tests/integration/... -count=1`
## P0必须先完成
## 本轮真实宿主复验结果
### P0-1 状态库扩展并接入运行链
- 状态COMPLETEDschema/repo、`import-provider` 运行链消费、`batch detail` / `provider status` / `resources` / `access status` / `reconcile` 查询面均已接入)
- 目标:补齐 implementation plan 所需核心表与 repo
- 范围:`import_batches``import_batch_items``managed_resources``probe_results``access_closure_records``reconcile_runs`
- 验证`go test ./tests/integration -run 'TestStore(Runtime|Init)' -count=1`
- 完成判据表存在、约束有效、事务回滚有效、repo 可写入读取,并被运行链消费
1. `self_service`(最新 fresh redeploy 复验)
- 证据目录:`artifacts/real-host-acceptance/20260518_redeploy_matrix`
- 初始状态:普通用户 key 未绑定 group、用户余额为 0 时,`/v1/models` 返回 `403`
- 修正后对普通用户执行“key 绑定标准 group + 用户余额=10”后`04-self-after-balance.headers.txt` 显示 `HTTP/1.1 200 OK`
- 结论`self_service` 主链路已在 fresh host 上真实打通;当前关键前置条件已收敛为 runbook 中明确记录的普通用户创建 / key-group 绑定 / 余额要求,而不是代码级阻塞。
2. `subscription`(最新 fresh redeploy 复验)
- 证据目录:`artifacts/real-host-acceptance/20260518_redeploy_matrix`
- 修正后:创建 subscription 类型 group、完成普通用户订阅分配、并把普通用户 key 绑定到该 group 后,`06-subscription-after-assign.headers.txt` 显示 `HTTP/1.1 200 OK`
- 结论:`subscription` 主链路也已在 fresh host 上真实打通;其可用前提不是“宿主自动初始化一切”,而是显式完成 subscription group / user subscription / key binding 这一套运营动作。
### P0-2 import preview + naming
- 目标:导入前可输出 create/reuse/conflict不盲写宿主
- 范围:`preview_service.go``naming.go``import_preview_test.go`
- 验证:`go test ./tests/integration -run TestImportPreview -v`
## 剩余项非阻塞P2 / 运营前置)
### P0-3 真实 rollback 闭环
- 状态PARTIALstrict 自动回滚 + 手动 rollback CLI + HTTP rollback API 已完成;真实宿主兼容性实测未完成)
- 目标strict 失败自动清理,支持手动 rollback
- 前置HostAdapter 增加 DeleteGroup/DeleteChannel/DeletePlan/DeleteAccount/ListManagedResources
- 验证:`go test ./internal/provision ./tests/integration ./cmd/cli -run 'TestRollback|TestExecuteRollbackProviderWritesSummary|TestSub2APIHostAdapterListManagedResources' -v`
1. 结构债务仍存在
- access / reconcile 尚未完全按 implementation plan 物理拆分
- 无内置 scheduler/jobs
2. 运营前置动作需要 runbook 化执行
- 真实宿主初始化不会自动创建普通用户;验收或上线前必须显式创建普通用户并留存可复用凭据
- `self_service` 需要普通用户 key 绑定目标标准 group且通常还需要可用余额
- `subscription` 需要 subscription 类型 group + 普通用户订阅分配 + key/group 绑定
3. 标准多阶段 Dockerfile 在受限网络环境下仍不稳
- 当前推荐 `scripts/build_local_image.sh` + `Dockerfile.local`
### P0-4 正式 pack install 生命周期
- 状态COMPLETEDzip/目录装载、宿主版本兼容性校验、pack/provider 元数据持久化、CLI `install-pack` 已接入)
- 目标:支持 zip/目录装载、宿主版本兼容性校验、pack/provider 元数据持久化
- 验证:`go test ./internal/pack ./internal/provision ./cmd/cli ./tests/integration -v`
## 当前最短上线路径
## P1形成真正控制面
### P1-1 Access 独立模块化
- 状态PARTIAL访问闭环校验/订阅分配/网关探测已从 `import_service` 抽离到 `internal/access/closure.go`,但 implementation plan 目标结构中的 `planner.go` / `subscription_service.go` / `self_service_checker.go` 仍未落地)
- 目标:将访问闭环从 import_service 解耦为 `internal/access/*`
- 设计对齐复核:当前已完成的是“最小闭环抽离”,未达到 implementation plan 中 Access 子模块拆分粒度;因此不再维持 `COMPLETED`
- 验证:`go test ./internal/access ./internal/provision -count=1`
### P1-2 Reconcile / Drift Detection
- 状态PARTIAL最小 reconcile API + drift 计数写入已接入;本轮新增 account smoke probe 重跑、access closure 复检、`active/degraded/drifted` 状态语义与回写验证,但 implementation plan 目标中的 `internal/reconcile/*` 结构、`failed` 语义收口与真实宿主兼容性实测仍未完成)
- 目标:拉宿主快照,对比状态库,重跑 probe标记 drifted
- 验证:`go test ./internal/provision ./internal/app ./tests/integration -run 'TestReconcileService|TestAPIReconcileProviderReturnsSummary|TestStore(Runtime|Init)' -count=1`
### P1-3 HTTP API + OpenAPI
- 状态PARTIAL`/api/packs/install``/api/providers/{providerID}/preview-import``/api/providers/{providerID}/import``/api/import-batches/{batchID}``/api/providers/{providerID}/status``/api/providers/{providerID}/resources``/api/providers/{providerID}/access/status``/api/providers/{providerID}/rollback``/api/providers/{providerID}/reconcile` 已接入OpenAPI 草案已同步扩展,但 hosts 管理面仍缺失)
- 目标:暴露 hosts / packs/install / providers preview-import / imports rollback / access / reconcile
- 验证:`go test ./internal/app ./cmd/server ./tests/integration -run 'TestAPI|TestBootstrap' -v`
## P2工程化交付
### P2-1 Scheduler / Jobs
- 目标:支持定时 reconcile 与手动触发
- 验证:`go test ./tests/integration -run TestCLIScheduler -v`
### P2-2 Distribution Artifacts
- 状态PARTIAL已补齐 `Dockerfile` / `.env.example` / `docker-compose.yml` / `docs/DEPLOYMENT.md`,并新增 distribution smoke test但尚无真实容器启动与镜像构建 E2E 记录)
- 目标Dockerfile / .env.example / docker-compose / deployment 文档 / e2e 脚本
- 验证:`go test ./tests/integration -run TestDistributionArtifactsExistAndReferenceRequiredEnv -v`
### P2-3 CLI 面板补齐
- 目标:`host add` / `pack install` / `provider import` / `reconcile run`
- 验证CLI 集成测试 + `go test ./...`
## 当前执行顺序
1. P1-1 Access 模块继续拆分到 implementation plan 粒度
2. P1-2 Reconcile 结构化与真实宿主兼容性实测
3. P1-3 Hosts 管理面 / OpenAPI 收口
4. P2-1 Scheduler / Jobs
5. P2-2 Distribution 容器级 E2E 验证
6. P2-3 CLI 全量收口
1.`docs/REAL_HOST_ACCEPTANCE_RUNBOOK.md` 准备真实宿主普通用户与凭据
2. 按目标模式完成必要的 key/group/billing(or subscription) 绑定
3. 使用 `scripts/build_local_image.sh``scripts/real_host_acceptance.sh` 复跑并归档现场 artifact
4. 若现场前置满足,本项目按 PRD 首版范围可直接上线
## 禁止错误结论
- `go test ./...` 当前通过 ≠ implementation plan 全部实现
- CLI 最小导入闭环 ≠ 独立控制面已完成
- 资源创建成功 ≠ 用户访问闭环已长期可运维
- ❌ 历史失败 artifact ≠ 当前 fresh redeploy 仍失败
- ❌ capability probe 无副作用 ≠ 所有宿主版本都已真实兼容
- ❌ rollback-provider 已改安全路径 ≠ 历史脏资源自动消失
-`HTTP 200` ≠ 宿主初始化会自动准备普通用户/订阅/余额;这些仍是显式运营前置

59
docs/KNOWN_LIMITATIONS.md Normal file
View File

@@ -0,0 +1,59 @@
# Known Limitations & Production Gaps (V0.1)
This document covers known limitations that operators should be aware of before deploying `sub2api-cn-relay-manager` v0.1 to production.
## Core Limitations
### 1. No Automated Reconcile Scheduler (P2)
- Reconcilation must be triggered manually via `POST /api/providers/{providerID}/reconcile` or CLI.
- No cron/scheduler service is bundled.
- Workaround: set up a cron job on the host OS calling the HTTP API periodically.
### 2. Real sub2api Compatibility Is Verified on a Fresh Host, but Requires Explicit Operator Preparation
- Real-host validation has now been executed against a fresh redeployed sub2api host.
- Evidence: `artifacts/real-host-acceptance/20260518_redeploy_matrix`.
- Both `self_service` and `subscription` ordinary-user access paths reached `/v1/models -> 200`.
- However, host initialization alone is not enough: operators must explicitly create ordinary users, keep reusable credentials, bind keys to the correct group, and satisfy the billing/subscription prerequisites documented in `docs/REAL_HOST_ACCEPTANCE_RUNBOOK.md`.
- This is therefore no longer a code-compatibility blocker; it is an explicit operational prerequisite.
### 3. Access Module Not Fully Structured per Implementation Plan
- The `access` package contains only `closure.go` (the combined close/validate logic).
- `planner.go`, `subscription_service.go`, `self_service_checker.go` are not separately extracted.
- All access logic is functional in `closure.go` but not split per the planned directory structure.
### 4. Reconcile Logic Inline in Provision Package
- Reconcile lives in `internal/provision/batch_detail_and_reconcile_service.go` rather than a separate `internal/reconcile/*` package.
- Functionally complete but structural gap vs implementation plan.
### 5. Standard Multi-stage Docker Build Still Depends on Outbound Module Download
- `Dockerfile.local` has been validated as the recommended proxy-safe build path.
- `scripts/build_local_image.sh` now prebuilds the Linux binary on the host and produces `sub2api-cn-relay-manager:local` reliably in this environment.
- The standard multi-stage `Dockerfile` still requires outbound Go module download from inside the container build context; in restricted networks, prefer the local-image path.
## Accepted Design Trade-offs
### 6. CLI Run Functions Not Unit-Tested
- `runInstallPack`, `runImportProvider`, `runPreviewProvider`, `runRollbackProvider`, `runReconcileProvider`, `findProvider` connect to real SQLite/sub2api — these are 0% covered in unit tests.
- The `execute()` dispatch and all `parse*` functions are fully tested.
- In an integration/E2E context these functions are exercised through the host stub.
### 7. No Web UI
- Administration is through CLI and HTTP API only.
- Consistent with MVP scope defined in PRD.
## Operational Notes
### Token Security
- `SUB2API_CRM_ADMIN_TOKEN` must be at least 20 characters, rotated outside source control.
- API keys imported via `--access-api-key` are used for gateway probe calls — they are not stored in control-plane state (only key fingerprint/hash is stored).
### Database
- SQLite is the only supported database backend for v0.1.
- SQLite WAL mode is handled automatically by the driver.
- For high availability, mount the SQLite file on persistent storage (host volume or NFS).
- No external DB migration tool is needed — Flyway-style migrations are embedded in the binary.
### Monitoring
- Only `/healthz` endpoint is available for container orchestration liveness checks.
- No metrics, structured logging, or APM integration in v0.1.
- Use standard log collection (stdout/json) for observability.

View File

@@ -0,0 +1,84 @@
# Sub2api-CN-Relay-Manager 生产收口板
日期2026-05-18
当前 GateAPPROVED按 PRD 首版范围放行;代码与真实宿主 fresh redeploy 复验均已满足,且已补充 reconcile host-scope 新一轮 acceptance artifact
目标:达到可上线代码质量,并把剩余风险明确收敛为外部环境验收项与已接受 P2 技术债务。
## 当前门控结论
| 维度 | 状态 | 证据 |
|------|------|------|
| Build & Test | ✅ PASS | `go test -race ./...` |
| Integration | ✅ PASS | `go test ./tests/integration/... -count=1` |
| Static Analysis | ✅ PASS | `go vet ./...` |
| Formatting | ✅ PASS | `gofmt -l .` 空输出 |
| Core Coverage | ✅ PASS | `go test -cover ./internal/...`access 77.3%, pack 72.7%, provision 74.6%sqlite 61.3% 仅作信息项) |
| 控制面 API 计划缺口 | ✅ CLOSED | 已补 `/api/hosts/{hostID}/probe``/api/providers/{providerID}/import-batches``/api/import-batches/{batchID}/rollback` |
| 状态一致性 | ✅ CLOSED | rollback-by-batch 回写 `rolled_back/failed`assign-subscriptions 同步 `import_batches.access_status` |
| provider 消歧 | ✅ CLOSED | pack 维度精确解析,避免同名 provider 跨 pack 误命中 |
| access 语义 | ✅ CLOSED | access preview 改为按 `subscription_ready/self_service_ready/fully_ready/broken` 判定 |
| OpenAPI | ✅ SYNCED | `docs/openapi.yaml` 已补当前控制面端点 |
| Local runtime smoke | ✅ PASS | `go build ./cmd/{server,cli}``GET /healthz``GET /api/hosts` |
| Local OCI image | ✅ PASS | `docker build -f Dockerfile.local -t sub2api-cn-relay-manager:local .` |
| Real-host acceptance tooling | ✅ READY | `docs/REAL_HOST_ACCEPTANCE_RUNBOOK.md` + `scripts/real_host_acceptance.sh` |
| `self_service` 真实宿主 fresh redeploy 复验 | ✅ PASS | `artifacts/real-host-acceptance/20260518_redeploy_matrix`:普通用户 key 绑定标准 group 且用户余额=10 后,`04-self-after-balance.headers.txt` 显示 `HTTP/1.1 200 OK` |
| `subscription` 真实宿主 fresh redeploy 复验 | ✅ PASS | `artifacts/real-host-acceptance/20260518_redeploy_matrix`subscription group + 用户订阅分配 + key 绑定后,`06-subscription-after-assign.headers.txt` 显示 `HTTP/1.1 200 OK` |
| `self_service`/`subscription` reconcile host-scope 复验 | ✅ PASS | `artifacts/real-host-acceptance/20260518_reconcile_hostscope_self_service` / `artifacts/real-host-acceptance/20260518_reconcile_hostscope_subscription`:已补齐 host-scoped `07/08/08a/09/10/11` 证据链batch detail / status / resources 不再跨宿主串台 |
## 本轮已关闭项
1. 补齐实现计划 API 缺口
- `POST /api/hosts/{hostID}/probe`
- `GET /api/providers/{providerID}/import-batches`
- `POST /api/import-batches/{batchID}/rollback`
2. 修复生产级语义问题
- rollback/provider 与 assign/access 改为 pack 维度精确定位 provider避免同名 provider 误操作
- `assign-subscriptions` 在写 access closure 后同步更新 `import_batches.access_status`
- `access preview` 改为按目标 mode 判定,不再把任意非 broken 状态误报为可用
- host capability 支持判定纳入 `plans` 能力
3. 补齐验证
- app/sqlite 新增回归测试覆盖以上行为
- 全量 race/integration/vet/gofmt 已复跑通过
- 本地 HTTP smoke 与 `Dockerfile.local` 容器构建已验证通过
4. 补齐上线前执行工具
- 新增 `scripts/build_local_image.sh`,固化本地/代理环境的镜像构建路径
- 新增 `docs/REAL_HOST_ACCEPTANCE_RUNBOOK.md`
- 新增 `scripts/real_host_acceptance.sh`,把真实宿主验收固化为可落盘 artifact 的流程
5. 最新真实宿主复验事实
- `artifacts/real-host-acceptance/20260518_redeploy_matrix` 已在 fresh redeploy host 上确认两条访问链路都可打通
- `self_service` 通过条件:普通用户 key 绑定标准 group且用户具备可用余额
- `subscription` 通过条件subscription 类型 group + 普通用户订阅分配 + key/group 绑定
- 当前真实差异已经收敛为“宿主运营前置条件”而不是“代码级阻塞”
- `artifacts/real-host-acceptance/20260518_reconcile_hostscope_self_service` / `20260518_reconcile_hostscope_subscription` 进一步补强了 reconcile / batch detail 的 host-scope 语义证据
## 剩余项P2 / 运营前置,不阻塞按 PRD 首版范围上线)
### 运营前置
- 真实宿主初始化不会自动创建普通用户;上线前必须显式创建普通用户并留存可复用凭据
- `self_service` 需要普通用户 key 绑定目标标准 group且通常还需要可用余额
- `subscription` 需要 subscription 类型 group + 普通用户订阅分配 + key/group 绑定
### P2 已接受技术债务
- access 模块仍未按 implementation plan 拆到 `planner.go / subscription_service.go / self_service_checker.go`
- reconcile 仍内联在 `internal/provision/`,未拆到 `internal/reconcile/*`
- 无内置 scheduler/jobs当前通过手动 reconcile + 外部 cron 补偿
- CLI `run*` 真实链路函数未做系统性 mock 单测
- 标准多阶段 `Dockerfile` 在受限网络下仍依赖容器内联网拉取 Go modules本地部署默认走 `scripts/build_local_image.sh`
## 最短上线闭环
1.`docs/REAL_HOST_ACCEPTANCE_RUNBOOK.md` 准备真实宿主普通用户与可复用凭据
2. 按目标模式完成 key/group/billing(or subscription) 绑定
3. 使用 `scripts/build_local_image.sh``scripts/real_host_acceptance.sh` 复跑并归档现场 artifact
4. 对于符合这些前置条件的单宿主场景,本项目已可按 PRD 首版范围放行
## 禁止错误结论
- ❌ 历史失败/成功 artifact 不能脱离时间点复用;当前以 `20260518_redeploy_matrix` 为最新真相
-`HTTP 200` ≠ 宿主初始化会自动准备普通用户/订阅/余额;这些仍是显式运营前置
-`APPROVED` 表示“按 PRD 首版范围可上线”,不表示已变成多宿主自治平台
- ❌ 同名 provider 跨 pack 现在已避免误命中,但前提是调用方提供正确 pack path / pack_id

View File

@@ -0,0 +1,153 @@
# Real Host Acceptance Runbook
日期2026-05-16
## 目标
把当前 `CONDITIONAL_APPROVED` 的剩余外部门禁收敛为一套可直接执行的真实宿主验收流程,覆盖:
1. 真实 sub2api 宿主接入探测
2. pack 安装
3. preview/import 验证
4. access preview / access status 验证
5. reconcile 验证
6. rollback smoke
## 前置条件
### 控制面
- `sub2api-cn-relay-manager` 已启动
- `CRM_BASE_URL` 可访问,例如 `http://127.0.0.1:8080`
- 已设置 `CRM_ADMIN_TOKEN`
### 真实宿主
- 已知真实宿主 `HOST_BASE_URL`
- 已知宿主管理认证:
- `HOST_API_KEY`
- `HOST_BEARER_TOKEN`
- 至少一个真实 provider key
- 已知 pack 路径,例如 `/app/packs/openai-cn-pack`
## 推荐执行方式
### 1. 构建本地容器镜像(适用于代理/离线开发机)
```bash
cd /path/to/sub2api-cn-relay-manager
scripts/build_local_image.sh
```
默认输出:
- 二进制:`bin/sub2api-cn-relay-manager`
- 镜像:`sub2api-cn-relay-manager:local`
### 2. 先 dry-run 检查真实验收参数
```bash
CRM_BASE_URL=http://127.0.0.1:8080 \
CRM_ADMIN_TOKEN=replace-me \
HOST_NAME=prod-sub2api \
HOST_BASE_URL=https://sub2api.example.com \
HOST_API_KEY=host-admin-key \
PACK_PATH=/app/packs/openai-cn-pack \
PROVIDER_ID=deepseek \
KEYS=sk-live-1,sk-live-2 \
ACCESS_MODE=self_service \
ACCESS_API_KEY=user-gateway-key \
DRY_RUN=1 \
scripts/real_host_acceptance.sh
```
### 3. 执行真实验收
```bash
CRM_BASE_URL=http://127.0.0.1:8080 \
CRM_ADMIN_TOKEN=replace-me \
HOST_NAME=prod-sub2api \
HOST_BASE_URL=https://sub2api.example.com \
HOST_API_KEY=host-admin-key \
PACK_PATH=/app/packs/openai-cn-pack \
PROVIDER_ID=deepseek \
KEYS=sk-live-1,sk-live-2 \
ACCESS_MODE=self_service \
ACCESS_API_KEY=user-gateway-key \
scripts/real_host_acceptance.sh
```
### 4. 订阅模式示例
```bash
CRM_BASE_URL=http://127.0.0.1:8080 \
CRM_ADMIN_TOKEN=replace-me \
HOST_NAME=prod-sub2api \
HOST_BASE_URL=https://sub2api.example.com \
HOST_BEARER_TOKEN=host-bearer-token \
PACK_PATH=/app/packs/openai-cn-pack \
PROVIDER_ID=deepseek \
KEYS=sk-live-1 \
ACCESS_MODE=subscription \
SUBSCRIPTION_USERS=user-a,user-b \
SUBSCRIPTION_DAYS=30 \
scripts/real_host_acceptance.sh
```
## 产物
脚本会把每一步 JSON 响应落到:
```text
artifacts/real-host-acceptance/<timestamp>/
```
默认文件顺序:
- `01-create-host.json`
- `02-probe-host.json`
- `03-install-pack.json`
- `04-preview-import.json`
- `05-import.json`
- `06-access-preview.json`
- `07-access-status.json`
- `08-provider-status.json`
- `09-reconcile.json`
- `10-batch-detail.json`
- `11-rollback.json`(若未跳过)
## 通过标准
至少同时满足:
1. `probe-host` 返回宿主版本与 capability 快照
2. `install-pack` 成功
3. `import` 返回 `batch_id`,且 batch/provider 状态不为 `failed`
4. `access-preview` 返回 `available=true` 或 access status 进入:
- `subscription_ready`
- `self_service_ready`
- `fully_ready`
5. `reconcile` 不返回关键失败
6. `rollback smoke` 成功(若本次需要验证回滚链路)
## 当前门禁解释
- 若以上脚本在真实宿主环境全部通过:
- 可以把当前项目从 **代码层 `CONDITIONAL_APPROVED`** 推进到 **真实环境放行**
- 若脚本未执行:
- 仍然只能维持 `CONDITIONAL_APPROVED`
- 若脚本执行但失败:
- 失败应被归类为真实宿主兼容性 / 凭据 / 网络 / pack 内容问题,而不是再泛化成“代码是否已完成”
## 注意事项
1. 默认会执行 rollback smoke若当前环境不允许回滚设置
```bash
SKIP_ROLLBACK=1 scripts/real_host_acceptance.sh
```
2. `PACK_PATH` 必须是控制面进程可读路径,不是用户本地概念路径。
3. 如果控制面部署在容器中,确保 pack 目录已经挂载进去。
4. `HOST_API_KEY``HOST_BEARER_TOKEN` 二选一即可;脚本会自动推导 `auth.type=apikey|bearer`
5. `ACCESS_API_KEY` 必须使用真实未脱敏的普通用户 gateway key不能直接复用数据库/列表接口中的展示值。
6. 真实宿主初始化只会准备管理员账号;普通用户账号/密码不会自动生成,验收前必须显式创建并留存可复用凭据。
7. `self_service` 验证除普通用户 key 外,还需要该 key 绑定目标 group若目标 group 是标准计费组,还需要用户侧具备可用余额,否则 `/v1/models` 可能从“未授权”转为 `INSUFFICIENT_BALANCE`
8. `subscription` 验证需要目标 group 本身是 `subscription` 类型,并且完成“普通用户订阅分配 + 普通用户 key 绑定该 group”仅有管理员主体或未绑定 key 不足以通过 `/v1/models`
9. 若需要验证 `reconcile` 收敛,优先在干净宿主场景或隔离 group 下执行,避免历史残留资源把结果污染成 `status=drifted` / `extra_count>0`

View File

@@ -10,6 +10,87 @@ paths:
responses:
'200':
description: ok
/api/hosts:
get:
security:
- bearerAuth: []
responses:
'200':
description: list of registered hosts
content:
application/json:
schema:
$ref: '#/components/schemas/ListHostsResponse'
'401':
$ref: '#/components/responses/Unauthorized'
post:
security:
- bearerAuth: []
requestBody:
required: true
content:
application/json:
schema:
$ref: '#/components/schemas/CreateHostRequest'
responses:
'200':
description: host created
content:
application/json:
schema:
$ref: '#/components/schemas/HostInfo'
'401':
$ref: '#/components/responses/Unauthorized'
/api/hosts/{hostID}:
get:
security:
- bearerAuth: []
parameters:
- $ref: '#/components/parameters/HostID'
responses:
'200':
description: host detail
content:
application/json:
schema:
$ref: '#/components/schemas/HostInfo'
'401':
$ref: '#/components/responses/Unauthorized'
'404':
description: host not found
delete:
security:
- bearerAuth: []
parameters:
- $ref: '#/components/parameters/HostID'
responses:
'204':
description: host deleted
'401':
$ref: '#/components/responses/Unauthorized'
'404':
description: host not found
/api/hosts/{hostID}/probe:
post:
security:
- bearerAuth: []
parameters:
- $ref: '#/components/parameters/HostID'
requestBody:
required: true
content:
application/json:
schema:
$ref: '#/components/schemas/ProbeHostRequest'
responses:
'200':
description: refreshed host capability snapshot
content:
application/json:
schema:
$ref: '#/components/schemas/HostInfo'
'401':
$ref: '#/components/responses/Unauthorized'
/api/packs/install:
post:
security:
@@ -23,84 +104,147 @@ paths:
responses:
'200':
description: pack installed
/api/packs:
get:
security:
- bearerAuth: []
responses:
'200':
description: installed pack list
content:
application/json:
schema:
$ref: '#/components/schemas/ListPacksResponse'
'401':
$ref: '#/components/responses/Unauthorized'
/api/packs/{packID}:
get:
security:
- bearerAuth: []
parameters:
- $ref: '#/components/parameters/PackID'
responses:
'200':
description: pack detail
content:
application/json:
schema:
$ref: '#/components/schemas/PackInfo'
'401':
$ref: '#/components/responses/Unauthorized'
'404':
description: pack not found
/api/packs/{packID}/providers:
get:
security:
- bearerAuth: []
parameters:
- $ref: '#/components/parameters/PackID'
responses:
'200':
description: provider list within pack
content:
application/json:
schema:
$ref: '#/components/schemas/ListPackProvidersResponse'
'401':
$ref: '#/components/responses/Unauthorized'
'404':
description: pack not found
/api/import-batches/{batchID}:
get:
security:
- bearerAuth: []
parameters:
- name: batchID
in: path
required: true
schema:
type: integer
format: int64
- $ref: '#/components/parameters/BatchID'
responses:
'200':
description: batch detail
'401':
$ref: '#/components/responses/Unauthorized'
/api/import-batches/{batchID}/rollback:
post:
security:
- bearerAuth: []
parameters:
- $ref: '#/components/parameters/BatchID'
requestBody:
required: true
content:
application/json:
schema:
$ref: '#/components/schemas/RollbackBatchRequest'
responses:
'200':
description: batch rollback summary
content:
application/json:
schema:
$ref: '#/components/schemas/RollbackSummaryResponse'
'401':
$ref: '#/components/responses/Unauthorized'
/api/providers/{providerID}/status:
get:
security:
- bearerAuth: []
parameters:
- name: providerID
in: path
required: true
schema:
type: string
- name: pack_id
in: query
required: false
schema:
type: string
- $ref: '#/components/parameters/ProviderID'
- $ref: '#/components/parameters/PackIDQuery'
- $ref: '#/components/parameters/HostIDQuery'
responses:
'200':
description: provider runtime status
'401':
$ref: '#/components/responses/Unauthorized'
/api/providers/{providerID}/resources:
get:
security:
- bearerAuth: []
parameters:
- name: providerID
in: path
required: true
schema:
type: string
- name: pack_id
in: query
required: false
schema:
type: string
- $ref: '#/components/parameters/ProviderID'
- $ref: '#/components/parameters/PackIDQuery'
- $ref: '#/components/parameters/HostIDQuery'
responses:
'200':
description: provider managed resources snapshot
'401':
$ref: '#/components/responses/Unauthorized'
/api/providers/{providerID}/access/status:
get:
security:
- bearerAuth: []
parameters:
- name: providerID
in: path
required: true
schema:
type: string
- name: pack_id
in: query
required: false
schema:
type: string
- $ref: '#/components/parameters/ProviderID'
- $ref: '#/components/parameters/PackIDQuery'
- $ref: '#/components/parameters/HostIDQuery'
responses:
'200':
description: provider access closure status
'401':
$ref: '#/components/responses/Unauthorized'
/api/providers/{providerID}/import-batches:
get:
security:
- bearerAuth: []
parameters:
- $ref: '#/components/parameters/ProviderID'
- $ref: '#/components/parameters/PackIDQuery'
- $ref: '#/components/parameters/HostIDQuery'
responses:
'200':
description: provider import batch history
content:
application/json:
schema:
$ref: '#/components/schemas/ListImportBatchesResponse'
'401':
$ref: '#/components/responses/Unauthorized'
/api/providers/{providerID}/preview-import:
post:
security:
- bearerAuth: []
parameters:
- name: providerID
in: path
required: true
schema:
type: string
- $ref: '#/components/parameters/ProviderID'
requestBody:
required: true
content:
@@ -115,11 +259,7 @@ paths:
security:
- bearerAuth: []
parameters:
- name: providerID
in: path
required: true
schema:
type: string
- $ref: '#/components/parameters/ProviderID'
requestBody:
required: true
content:
@@ -134,11 +274,7 @@ paths:
security:
- bearerAuth: []
parameters:
- name: providerID
in: path
required: true
schema:
type: string
- $ref: '#/components/parameters/ProviderID'
requestBody:
required: true
content:
@@ -153,11 +289,7 @@ paths:
security:
- bearerAuth: []
parameters:
- name: providerID
in: path
required: true
schema:
type: string
- $ref: '#/components/parameters/ProviderID'
requestBody:
required: true
content:
@@ -167,12 +299,218 @@ paths:
responses:
'200':
description: reconcile summary
/api/providers/{providerID}/access/preview:
post:
security:
- bearerAuth: []
parameters:
- $ref: '#/components/parameters/ProviderID'
requestBody:
required: true
content:
application/json:
schema:
$ref: '#/components/schemas/AccessPreviewRequest'
responses:
'200':
description: access preview result
content:
application/json:
schema:
$ref: '#/components/schemas/AccessPreviewResponse'
/api/providers/{providerID}/access/assign-subscriptions:
post:
security:
- bearerAuth: []
parameters:
- $ref: '#/components/parameters/ProviderID'
requestBody:
required: true
content:
application/json:
schema:
$ref: '#/components/schemas/AssignAccessSubscriptionsRequest'
responses:
'200':
description: access subscription assignment summary
content:
application/json:
schema:
$ref: '#/components/schemas/AssignAccessSubscriptionsResponse'
components:
securitySchemes:
bearerAuth:
type: http
scheme: bearer
parameters:
HostID:
name: hostID
in: path
required: true
schema:
type: string
PackID:
name: packID
in: path
required: true
schema:
type: string
BatchID:
name: batchID
in: path
required: true
schema:
type: integer
format: int64
ProviderID:
name: providerID
in: path
required: true
schema:
type: string
PackIDQuery:
name: pack_id
in: query
required: false
schema:
type: string
responses:
Unauthorized:
description: missing or invalid admin token
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
schemas:
ErrorResponse:
type: object
properties:
error:
type: object
properties:
code:
type: string
message:
type: string
CreateHostAuth:
type: object
required: [token]
properties:
type:
type: string
enum: [apikey, api_key, bearer]
token:
type: string
CreateHostRequest:
type: object
required: [base_url, auth]
properties:
name:
type: string
base_url:
type: string
auth:
$ref: '#/components/schemas/CreateHostAuth'
ProbeHostRequest:
type: object
required: [auth]
properties:
auth:
$ref: '#/components/schemas/CreateHostAuth'
HostCapabilities:
type: object
properties:
groups:
type: boolean
channels:
type: boolean
plans:
type: boolean
accounts:
type: boolean
account_test:
type: boolean
account_models:
type: boolean
subscriptions:
type: boolean
HostInfo:
type: object
properties:
host_id:
type: string
base_url:
type: string
host_version:
type: string
auth_type:
type: string
status:
type: string
capabilities:
$ref: '#/components/schemas/HostCapabilities'
ListHostsResponse:
type: object
properties:
hosts:
type: array
items:
$ref: '#/components/schemas/HostInfo'
PackInfo:
type: object
properties:
pack_id:
type: string
version:
type: string
vendor:
type: string
target_host:
type: string
min_host_version:
type: string
max_host_version:
type: string
ListPacksResponse:
type: object
properties:
packs:
type: array
items:
$ref: '#/components/schemas/PackInfo'
PackProviderInfo:
type: object
properties:
provider_id:
type: string
display_name:
type: string
platform:
type: string
ListPackProvidersResponse:
type: object
properties:
providers:
type: array
items:
$ref: '#/components/schemas/PackProviderInfo'
ImportBatchInfo:
type: object
properties:
batch_id:
type: integer
format: int64
batch_status:
type: string
access_status:
type: string
ListImportBatchesResponse:
type: object
properties:
batches:
type: array
items:
$ref: '#/components/schemas/ImportBatchInfo'
InstallPackRequest:
type: object
required: [host_base_url, pack_path]
@@ -187,14 +525,19 @@ components:
type: string
PreviewProviderRequest:
type: object
required: [host_base_url, pack_path, keys]
required: [host_id, pack_path, keys]
properties:
host_id:
type: string
host_base_url:
type: string
description: legacy fallback; prefer host_id
host_api_key:
type: string
description: legacy fallback; prefer registered host auth
host_bearer_token:
type: string
description: legacy fallback; prefer registered host auth
pack_path:
type: string
provider_id:
@@ -207,14 +550,19 @@ components:
type: string
ImportProviderRequest:
type: object
required: [host_base_url, pack_path, keys, access_api_key]
required: [host_id, pack_path, keys, access_api_key]
properties:
host_id:
type: string
host_base_url:
type: string
description: legacy fallback; prefer host_id
host_api_key:
type: string
description: legacy fallback; prefer registered host auth
host_bearer_token:
type: string
description: legacy fallback; prefer registered host auth
pack_path:
type: string
provider_id:
@@ -237,29 +585,119 @@ components:
type: integer
RollbackProviderRequest:
type: object
required: [host_base_url, pack_path]
required: [host_id, pack_path]
properties:
host_id:
type: string
host_base_url:
type: string
description: legacy fallback; prefer host_id
host_api_key:
type: string
description: legacy fallback; prefer registered host auth
host_bearer_token:
type: string
description: legacy fallback; prefer registered host auth
pack_path:
type: string
provider_id:
type: string
RollbackBatchRequest:
type: object
required: [auth]
properties:
auth:
$ref: '#/components/schemas/CreateHostAuth'
RollbackSummaryResponse:
type: object
properties:
batch_id:
type: integer
format: int64
deleted_accounts:
type: integer
deleted_plans:
type: integer
deleted_channels:
type: integer
deleted_groups:
type: integer
ReconcileProviderRequest:
type: object
required: [host_base_url, pack_path]
required: [host_id, pack_path]
properties:
host_id:
type: string
host_base_url:
type: string
description: legacy fallback; prefer host_id
host_api_key:
type: string
description: legacy fallback; prefer registered host auth
host_bearer_token:
type: string
description: legacy fallback; prefer registered host auth
pack_path:
type: string
provider_id:
type: string
access_api_key:
type: string
AccessPreviewRequest:
type: object
properties:
provider_id:
type: string
pack_id:
type: string
host_id:
type: string
mode:
type: string
AccessPreviewResponse:
type: object
properties:
provider_id:
type: string
mode:
type: string
available:
type: boolean
message:
type: string
AssignAccessSubscriptionsRequest:
type: object
required: [host_id, pack_path, access_api_key]
properties:
host_id:
type: string
pack_path:
type: string
provider_id:
type: string
host_base_url:
type: string
description: legacy fallback; prefer host_id
host_api_key:
type: string
description: legacy fallback; prefer registered host auth
host_bearer_token:
type: string
description: legacy fallback; prefer registered host auth
access_api_key:
type: string
subscription_users:
type: array
items:
type: string
subscription_days:
type: integer
AssignAccessSubscriptionsResponse:
type: object
properties:
provider_id:
type: string
assigned:
type: integer
access_status:
type: string