2026-05-15 19:26:25 +08:00
|
|
|
|
# sub2api-cn-relay-manager 执行板
|
|
|
|
|
|
|
2026-05-20 22:09:40 +08:00
|
|
|
|
日期:2026-05-20
|
|
|
|
|
|
当前 Gate:BLOCKED(代码门禁仍通过,但 2026-05-20 current-code CRM(18092) + remote43 fresh host(18097) 真实宿主复验失败:DeepSeek batch=22、MiniMax batch=23 均仅到 `partially_succeeded/access_status=broken`;宿主普通用户 `/v1/models` 仍暴露 GPT 系默认模型,gateway closure 未通过,不能宣称可上线)
|
2026-05-18 22:22:22 +08:00
|
|
|
|
目标:实现独立控制面、零侵入宿主、可导入国产模型并具备可运维的导入/回滚/访问闭环。
|
|
|
|
|
|
|
|
|
|
|
|
## 本轮已完成
|
|
|
|
|
|
|
|
|
|
|
|
1. 宿主身份模型统一
|
|
|
|
|
|
- host 注册时持久化 `auth_type/auth_token`
|
|
|
|
|
|
- import / reconcile / rollback-provider / access 运行时链路切换为 `host_id` 主键
|
|
|
|
|
|
- provider status / resources / access status / import-batches 支持 `host_id` 查询维度
|
|
|
|
|
|
2. managed_resources 宿主维度收口
|
|
|
|
|
|
- 新增迁移 `0004_host_identity_and_managed_resources.sql`
|
|
|
|
|
|
- `managed_resources` 唯一键提升为 `(host_id, resource_type, host_resource_id)`
|
|
|
|
|
|
- 仓储与服务查询切换为 host-scoped 语义
|
|
|
|
|
|
3. reconcile run 结果按批次收口
|
|
|
|
|
|
- 新增迁移 `0006_reconcile_runs_batch_scope.sql`
|
|
|
|
|
|
- `reconcile_runs` 补充 `batch_id`,batch detail 仅返回本批次 reconcile 记录
|
|
|
|
|
|
4. capability probe 收敛为无副作用探测
|
|
|
|
|
|
- 不再对真实创建接口发送空 `POST`
|
|
|
|
|
|
5. rollback-provider 风险收敛
|
|
|
|
|
|
- 改为优先按已记录批次资源 `RollbackStoredResources()` 回滚
|
|
|
|
|
|
- 缺少已记录资源时拒绝危险删除
|
|
|
|
|
|
6. 文档真相同步
|
|
|
|
|
|
- 新增 `docs/2026-05-18-PRODUCTION_REMEDIATION_TASK_BOARD.md`
|
|
|
|
|
|
- 下调 `DEPLOYMENT.md` 中未实现的 `/metrics` / 限流 / 监控承诺
|
|
|
|
|
|
7. 真实宿主重新验收已执行
|
|
|
|
|
|
- `self_service` 新 artifact:`artifacts/real-host-acceptance/20260518_self_service_reaccept_v6`
|
|
|
|
|
|
- `subscription` 新 artifact:`artifacts/real-host-acceptance/20260518_subscription_reaccept_v6`
|
|
|
|
|
|
- 两轮都完成了 `preview-import / import / access-preview / status / reconcile / rollback` 全链路落盘
|
|
|
|
|
|
8. reconcile host-scope 证据已补强
|
|
|
|
|
|
- `self_service`:`artifacts/real-host-acceptance/20260518_reconcile_hostscope_self_service`
|
|
|
|
|
|
- `subscription`:`artifacts/real-host-acceptance/20260518_reconcile_hostscope_subscription`
|
|
|
|
|
|
- 已补齐 `status / resources / reconcile / batch detail / rollback` 的 host-scoped artifact,验证 batch detail 的 reconcile 视图按 batch 收口
|
2026-05-19 20:21:21 +08:00
|
|
|
|
9. current-code remote43 导入链路已补齐 tunnel-aware 验证能力
|
|
|
|
|
|
- `scripts/import_remote43_provider.sh` 新增 `CRM_HOST_BASE`,允许把“operator 访问 host 地址”和“CRM 进程访问 host 地址”分离
|
|
|
|
|
|
- latest artifact:`/home/long/artifacts/real-host-acceptance/20260519_195827_remote43_deepseek_key_import`
|
2026-05-20 22:09:40 +08:00
|
|
|
|
- 结论:import / batch detail / managed resources 已真实落库;前一轮定位到 channel 创建缺少 model_mapping / restrict_models / billing_model_source,已补齐实现与测试
|
|
|
|
|
|
10. current-code remote43 access gate 根因修正已落地
|
|
|
|
|
|
- subscription access 改为宿主侧闭环:CRM 不再依赖外部预先给定的宿主普通用户 key,而是按 `subscription_users` selector 在宿主创建/查找托管普通用户、登录创建托管 key、回写 allowed_groups / balance、再执行订阅分配
|
|
|
|
|
|
- account 创建请求现在同步写入 `credentials.model_mapping`,修正 `/v1/models` 读取 account model whitelist 时回退到 GPT 默认集合的问题
|
|
|
|
|
|
- 新增/更新测试覆盖:`internal/access`、`internal/provision`、`internal/host/sub2api`
|
2026-05-18 22:22:22 +08:00
|
|
|
|
|
|
|
|
|
|
## 已验证门禁
|
|
|
|
|
|
|
|
|
|
|
|
- `gofmt -l .` ✅ 空输出
|
|
|
|
|
|
- `go vet ./...` ✅
|
|
|
|
|
|
- `go test ./...` ✅
|
|
|
|
|
|
- `go test -race ./...` ✅
|
|
|
|
|
|
- `go test -cover ./internal/...` ✅
|
|
|
|
|
|
- `internal/access`: `77.3%`
|
|
|
|
|
|
- `internal/pack`: `72.7%`
|
|
|
|
|
|
- `internal/provision`: `74.6%`
|
|
|
|
|
|
- `internal/store/sqlite`: `61.3%`
|
|
|
|
|
|
- `go test ./tests/integration/... -count=1` ✅
|
|
|
|
|
|
|
|
|
|
|
|
## 本轮真实宿主复验结果
|
|
|
|
|
|
|
|
|
|
|
|
1. `self_service`(最新 fresh redeploy 复验)
|
|
|
|
|
|
- 证据目录:`artifacts/real-host-acceptance/20260518_redeploy_matrix`
|
|
|
|
|
|
- 初始状态:普通用户 key 未绑定 group、用户余额为 0 时,`/v1/models` 返回 `403`
|
|
|
|
|
|
- 修正后:对普通用户执行“key 绑定标准 group + 用户余额=10”后,`04-self-after-balance.headers.txt` 显示 `HTTP/1.1 200 OK`
|
|
|
|
|
|
- 结论:`self_service` 主链路已在 fresh host 上真实打通;当前关键前置条件已收敛为 runbook 中明确记录的普通用户创建 / key-group 绑定 / 余额要求,而不是代码级阻塞。
|
|
|
|
|
|
2. `subscription`(最新 fresh redeploy 复验)
|
|
|
|
|
|
- 证据目录:`artifacts/real-host-acceptance/20260518_redeploy_matrix`
|
|
|
|
|
|
- 修正后:创建 subscription 类型 group、完成普通用户订阅分配、并把普通用户 key 绑定到该 group 后,`06-subscription-after-assign.headers.txt` 显示 `HTTP/1.1 200 OK`
|
|
|
|
|
|
- 结论:`subscription` 主链路也已在 fresh host 上真实打通;其可用前提不是“宿主自动初始化一切”,而是显式完成 subscription group / user subscription / key binding 这一套运营动作。
|
|
|
|
|
|
|
2026-05-19 20:21:21 +08:00
|
|
|
|
## 2026-05-19 current-code remote43 验收补充结论
|
2026-05-18 22:22:22 +08:00
|
|
|
|
|
2026-05-19 20:21:21 +08:00
|
|
|
|
1. 验收入口
|
|
|
|
|
|
- 证据目录:`/home/long/artifacts/real-host-acceptance/20260519_195827_remote43_deepseek_key_import`
|
|
|
|
|
|
- 本地 CRM 通过隧道访问 remote host,`CRM_HOST_BASE` 指向 CRM 侧可达的 host 地址
|
|
|
|
|
|
2. 导入链路结论
|
|
|
|
|
|
- `import` 成功返回 `batch_id=19`
|
|
|
|
|
|
- `managed_resources` 已包含 `group/channel/plan/account`
|
|
|
|
|
|
- `provider_status=partially_succeeded`,说明已进入真实业务路径,不再是 host 注册/pack path/隧道前置问题
|
|
|
|
|
|
3. access gate 失败结论
|
|
|
|
|
|
- `latest_access_status=broken`
|
|
|
|
|
|
- `access preview available=false`
|
|
|
|
|
|
- `reconcile status=drifted`,其中 `probe_failures=1`
|
2026-05-20 22:09:40 +08:00
|
|
|
|
4. 当前修正
|
|
|
|
|
|
- 旧 artifact 中 `09-models.headers.txt` / `10-models.body.json` 暴露 GPT 系模型,根因已重新归类为:CRM 写了 channel model_mapping,但 account `credentials.model_mapping` 未同步,导致宿主 `/v1/models` 从 account 视图回退到默认模型集
|
|
|
|
|
|
- 同时,旧脚本/调用路径把外部 `subscription_users` / `access_api_key` 直接当宿主用户和宿主 key 使用,无法形成“宿主普通用户创建/查找 + key + 订阅分配”的真正闭环;该问题现已改为宿主托管闭环
|
|
|
|
|
|
- 代码侧阻断点已修复;下一步只剩 DeepSeek / MiniMax 真实 key 复验
|
2026-05-19 20:21:21 +08:00
|
|
|
|
|
|
|
|
|
|
## 剩余项(含当前外部门禁)
|
|
|
|
|
|
|
2026-05-20 22:09:40 +08:00
|
|
|
|
1. current-code real-host access gate 失败,需先修复再谈上线
|
|
|
|
|
|
- DeepSeek:artifact `artifacts/real-host-acceptance/20260520_123726_remote43_deepseek_key_import/03-import.body.json` 显示 `batch_id=22`、`batch_status=partially_succeeded`、`access_status=broken`
|
|
|
|
|
|
- MiniMax:current-code CRM(18092) 对 remote43 fresh host(18097) 手工复验得到 `batch_id=23`、`batch_status=partially_succeeded`、`access_status=broken`
|
|
|
|
|
|
- 两条链路的 `probe_summary_json` / gateway probe 都显示宿主普通用户 `/v1/models` 返回 GPT-5.x / GPT Image 默认集合,未暴露 DeepSeek / MiniMax 目标模型
|
|
|
|
|
|
- 2026-05-20 复核补充:fresh host 上 `groups/channels/account_groups` 已按期望落库,channel 也已具备 `model_mapping + restrict_models + billing_model_source=channel_mapped`;但 `accounts.credentials` 真实仅持久化 `api_key/base_url`,`GET /api/v1/admin/accounts/{id}/models` 仍返回 GPT 默认模型集,`POST /api/v1/admin/accounts/{id}/test` 也会默认拿 `gpt-5.4` 探测并报 `model_not_found`。当前根因已重新归类为“宿主 account 模型暴露契约仍未被 current-code 对齐”,不能再把问题简化成 `channel` 参数缺失或“只差同步 `credentials.model_mapping`”。
|
|
|
|
|
|
- pack contract 漂移已发现并修复:`packs/openai-cn-pack/providers/deepseek.json` 之前出现 `default_models/smoke_test_model` 与 `channel_template.model_mapping` 不一致;`internal/pack` 现已新增校验,要求 `smoke_test_model` 必须出现在 `channel_template.model_mapping`,且 `default_models` 必须被 `channel_template.model_mapping` 全量覆盖,避免类似漂移再次混入真实宿主验收。
|
|
|
|
|
|
- 2026-05-20 21:50 补充:已修复 current-code `channel` 创建/纠偏时 `model_pricing` 丢失的问题。CRM `http://127.0.0.1:18100` 对 `remote43-fresh18097-deepseek-1779280533` 复跑 `POST /api/providers/deepseek/import` 返回 `batch_id=4`、`access_status=subscription_ready`;宿主 `GET /api/v1/admin/channels/4` 已可见 `model_pricing=[{platform:"openai", models:["deepseek-v4-pro","deepseek-v4-flash"], billing_mode:"token", intervals:[]}]`,说明“已存在 channel 可 PUT 纠偏”已生效。当前 remaining gate 不再是 channel pricing 缺失,而是更高层的 provider/account 行为问题。
|
|
|
|
|
|
2. 真实宿主脚本存在环境绑定缺陷
|
|
|
|
|
|
- `scripts/import_remote43_provider.sh` 仍把 Postgres/Redis 容器名硬编码到 `sub2api-relaymgr-pg` / `sub2api-relaymgr-redis`
|
|
|
|
|
|
- 当目标切到 fresh host(18097) 时,脚本会把 subscription user/key prep 误打到旧 relaymgr 宿主,导致 user id 错宿主、出现 `assign subscription for 10 ... 500`
|
|
|
|
|
|
3. 结构债务仍存在
|
2026-05-18 22:22:22 +08:00
|
|
|
|
- access / reconcile 尚未完全按 implementation plan 物理拆分
|
|
|
|
|
|
- 无内置 scheduler/jobs
|
2026-05-20 22:09:40 +08:00
|
|
|
|
4. 运营前置动作需要 runbook 化执行
|
|
|
|
|
|
- 真实宿主初始化不会自动创建普通用户;当前 CRM subscription 闭环声称可按 selector 自动托管宿主普通用户/key,但本轮 remote43 真实宿主复验未通过,不能把该能力当作已验收事实
|
2026-05-18 22:22:22 +08:00
|
|
|
|
- `self_service` 需要普通用户 key 绑定目标标准 group,且通常还需要可用余额
|
|
|
|
|
|
- `subscription` 需要 subscription 类型 group + 普通用户订阅分配 + key/group 绑定
|
2026-05-20 22:09:40 +08:00
|
|
|
|
5. 标准多阶段 Dockerfile 在受限网络环境下仍不稳
|
2026-05-18 22:22:22 +08:00
|
|
|
|
- 当前推荐 `scripts/build_local_image.sh` + `Dockerfile.local`
|
2026-05-20 22:09:40 +08:00
|
|
|
|
6. 真实宿主验收工具需补 host 级参数化
|
|
|
|
|
|
- `scripts/real_host_acceptance.sh` 的 `AFTER_IMPORT_HOOK_COMMAND` 仍有价值,但 remote43/fresh-host 变体还缺“目标 Postgres/Redis 容器名、目标 host env 文件、目标 forward 端口”的显式参数化
|
|
|
|
|
|
- 否则 artifact 会混入旧宿主状态,误导 gate 判断
|
2026-05-18 22:22:22 +08:00
|
|
|
|
|
|
|
|
|
|
## 当前最短上线路径
|
|
|
|
|
|
|
2026-05-20 22:09:40 +08:00
|
|
|
|
1. 先修 current-code 在真实宿主上的两个阻断点:
|
|
|
|
|
|
- 查清并修复为什么宿主 `accounts.credentials` 未持久化 `model_mapping`
|
|
|
|
|
|
- 给 remote43 验收脚本补目标 host 级参数化,避免 Postgres/Redis/host env 误指向旧 relaymgr
|
|
|
|
|
|
2. 用 fresh host 重新跑 DeepSeek / MiniMax subscription 验收,要求 `/v1/models` 暴露目标模型且 `/v1/chat/completions` 返回 200
|
|
|
|
|
|
3. 复跑 `provider status` / `access status` / `access preview` / `batch detail`,确认 `batch_status=succeeded`、`access_status=ready`
|
|
|
|
|
|
4. 若现场前置满足,再重新评估是否恢复 CONDITIONAL_APPROVED / APPROVED
|
2026-05-15 19:26:25 +08:00
|
|
|
|
|
|
|
|
|
|
## 禁止错误结论
|
2026-05-18 22:22:22 +08:00
|
|
|
|
|
|
|
|
|
|
- ❌ 历史失败 artifact ≠ 当前 fresh redeploy 仍失败
|
|
|
|
|
|
- ❌ capability probe 无副作用 ≠ 所有宿主版本都已真实兼容
|
|
|
|
|
|
- ❌ rollback-provider 已改安全路径 ≠ 历史脏资源自动消失
|
|
|
|
|
|
- ❌ `HTTP 200` ≠ 宿主初始化会自动准备普通用户/订阅/余额;这些仍是显式运营前置
|