Files
sub2api-cn-relay-manager/docs/2026-06-04-MODEL_POOL_DESIGN.md
phamnazage-jpg 492f33a129
Some checks failed
CI / Build & Test (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / Docker Build (push) Has been cancelled
CI / Release (push) Has been cancelled
feat(vnext): complete vNext.1 release gate — default chain admission, idempotent init, user key skeleton
- DEFAULT_CHAIN_ADMISSION.md: reviewed and approved, real artifact refs added
- DEFAULT_DATA_IDEMPOTENT_RELEASE_GATE.md: reviewed and approved
- scripts/setup_default_data.sh: idempotent init with --dry-run/--apply/artifact
- scripts/test/test_default_data.sh: 4 test cases all pass
- scripts/acceptance/verify_user_key_self_service.sh: Phase 0 skeleton
- .gitignore: add generated artifact directories
2026-06-05 11:07:50 +08:00

167 lines
5.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Model Pool 设计vNext.1 最小闭环,待审核草案)
> 状态说明:本文件对应的 `internal/provision/model_pool.go` / `model_pool_test.go` 已被提前写出,但当前仅能视为“未获审核批准的实验性骨架”,不能作为既定发布方案事实。是否保留、修改或回退,以 `docs/2026-06-04-vnext-planning-design-review.md` 和 `docs/2026-06-04-vnext-release-scope.md` 的后续审核结论为准。
## 目标
在不改宿主后端源码的前提下,把“一个逻辑模型 = 一条 provider 线路”的旧心智,升级为“一个逻辑模型 = 一个 route pool多条候选线路”。
本设计只做最小可落地闭环:
1. 从现有 provider/probe/capability 事实构建 model pool 视图
2. 明确 advertised model 与 callable model 的分离
3. 复用现有 `logical_group_routes` / `logical_group_route_models` / `route_resolve` 运行面,不重写路由器
4. 为后续宿主导入编排、portal 展示、真实池化验收提供统一数据模型
## 当前真实约束
1. 现有运行面已经支持:
-`public_model` 多 route 候选
- priority
- sticky
- failure threshold / cooldown / failover
2. 当前缺口不是“不会路由”,而是“没有统一 pool 抽象把 provider/capability/model 别名折叠成可编排视图”。
3. `deepseek-chat-official` live probe 证明:
- `chat=200`
- `responses=200`
-`models_has_smoke_model=false`
说明 `/v1/models` 暴露名 与 实际 smoke callable model 可能不同。
4. 因此不能再把一个字符串 `model_id` 同时承担:
- 对外展示名
- 逻辑模型名
- 上游真实可调用名
## 三层模型标识
### 1. canonical family
逻辑家族名,用于跨 provider 聚合,例如:
- `gpt-5.4`
- `deepseek-chat`
- `MiniMax-M3`
- `kimi-k2.6`
### 2. advertised model
对外展示给用户或从 `/v1/models` 观察到的模型名。
可能与 callable model 相同,也可能只是别名。
### 3. callable model
实际发给上游 chat/responses 请求的模型名。
规则:
- pool 选择以 `canonical family` / `public model` 为入口
- route 映射必须保存 `callable model`
- 如发现 `/v1/models` 列表名与 callable model 不同,应额外记录 `advertised model`
## 最小数据结构
建议新增 `internal/provision/model_pool.go`,先只做内存级抽象,不立即改 DB schema。
```go
type ModelPool struct {
PublicModel string
CanonicalModelFamily string
Routes []PoolRoute
}
type PoolRoute struct {
RouteID string
ProviderID string
DisplayName string
BaseURL string
PublicModel string
AdvertisedModel string
CallableModel string
Priority int
Schedulable bool
SupportLevel string
SupportedModels []string
SupportsChat bool
SupportsResponses bool
CooldownUntil string
DisableReason string
KnownAdvisories []string
}
```
## 与现有运行面的映射
### 输入事实层
来自:
- `pack.ProviderManifest`
- `probe.CapabilityProfile`
- `host/sub2api.CapabilityInventory`
- 现有 logical group / route / route model 配置
### 输出运行层
映射到:
- `logical_group_models.public_model`
- `logical_group_routes.{route_id,priority,status,upstream_base_url_hint,cooldown_until}`
- `logical_group_route_models.{public_model,shadow_model,status}`
结论:
- Phase 2 最小实现只需要新增“归一/折叠层”
- 不需要重做 route resolve 逻辑
- route resolve 继续消费 `public_model -> route candidates`
- model pool 负责决定哪些 route candidates 应该被放进去,以及每条 route 对应哪个 callable model
## 最小编排规则
1. 一个 `public_model` 可对应多个 route
2. route 候选必须至少包含:
- provider_id
- route_id
- callable_model
- priority
- schedulable
- supported models该 route 当前可承载的模型集合)
3. support level 为以下值之一:
- `supported-direct`
- `supported-with-plugin-adapter`
- `unsupported-by-host`
- `upstream-unhealthy`
4. 只有以下候选允许进入默认 pool
- `supported-direct`
- 或明确允许的 `supported-with-plugin-adapter`
5. `unsupported-by-host` / `upstream-unhealthy` 不应进入 active pool
6. 当 probe 发现 advertised/callable 差异时:
- `public_model` 保持稳定
- `shadow_model`/runtime callable model 以真实可调用名为准
## 最小验收目标
第一轮不追求真实宿主双供应商导入全部打通,先完成:
1. 单元级:
- 能从多条 provider/capability 输入构建 pool
- 能过滤 unhealthy / unsupported 候选
- 能按 priority 排序
- 能保留 advertised/callable 差异
2. 集成级:
- 能把 pool route 映射到现有 route resolve 运行面
3. 文档级:
- EXECUTION_BOARD 明确 Phase 2 已进入 model pool 抽象
## 本轮不做
1. 不新增宿主 DB schema
2. 不修改 stock sub2api 后端
3. 不直接实现 portal UI
4. 不在本轮声称“真实宿主双供应商池化完全可用”——那属于后续 acceptance 脚本闭环
## 下一步实现顺序
1. 先写 `internal/provision/model_pool_test.go` 失败测试
2. 再实现 `internal/provision/model_pool.go`
3. 先验证内存级 pool 归一逻辑
4. 再决定是否把 runtime import / reconcile 接到这个抽象上