- DEFAULT_CHAIN_ADMISSION.md: reviewed and approved, real artifact refs added - DEFAULT_DATA_IDEMPOTENT_RELEASE_GATE.md: reviewed and approved - scripts/setup_default_data.sh: idempotent init with --dry-run/--apply/artifact - scripts/test/test_default_data.sh: 4 test cases all pass - scripts/acceptance/verify_user_key_self_service.sh: Phase 0 skeleton - .gitignore: add generated artifact directories
167 lines
5.2 KiB
Markdown
167 lines
5.2 KiB
Markdown
# Model Pool 设计(vNext.1 最小闭环,待审核草案)
|
||
|
||
> 状态说明:本文件对应的 `internal/provision/model_pool.go` / `model_pool_test.go` 已被提前写出,但当前仅能视为“未获审核批准的实验性骨架”,不能作为既定发布方案事实。是否保留、修改或回退,以 `docs/2026-06-04-vnext-planning-design-review.md` 和 `docs/2026-06-04-vnext-release-scope.md` 的后续审核结论为准。
|
||
|
||
## 目标
|
||
|
||
在不改宿主后端源码的前提下,把“一个逻辑模型 = 一条 provider 线路”的旧心智,升级为“一个逻辑模型 = 一个 route pool,多条候选线路”。
|
||
|
||
本设计只做最小可落地闭环:
|
||
|
||
1. 从现有 provider/probe/capability 事实构建 model pool 视图
|
||
2. 明确 advertised model 与 callable model 的分离
|
||
3. 复用现有 `logical_group_routes` / `logical_group_route_models` / `route_resolve` 运行面,不重写路由器
|
||
4. 为后续宿主导入编排、portal 展示、真实池化验收提供统一数据模型
|
||
|
||
## 当前真实约束
|
||
|
||
1. 现有运行面已经支持:
|
||
- 同 `public_model` 多 route 候选
|
||
- priority
|
||
- sticky
|
||
- failure threshold / cooldown / failover
|
||
2. 当前缺口不是“不会路由”,而是“没有统一 pool 抽象把 provider/capability/model 别名折叠成可编排视图”。
|
||
3. `deepseek-chat-official` live probe 证明:
|
||
- `chat=200`
|
||
- `responses=200`
|
||
- 但 `models_has_smoke_model=false`
|
||
说明 `/v1/models` 暴露名 与 实际 smoke callable model 可能不同。
|
||
4. 因此不能再把一个字符串 `model_id` 同时承担:
|
||
- 对外展示名
|
||
- 逻辑模型名
|
||
- 上游真实可调用名
|
||
|
||
## 三层模型标识
|
||
|
||
### 1. canonical family
|
||
|
||
逻辑家族名,用于跨 provider 聚合,例如:
|
||
|
||
- `gpt-5.4`
|
||
- `deepseek-chat`
|
||
- `MiniMax-M3`
|
||
- `kimi-k2.6`
|
||
|
||
### 2. advertised model
|
||
|
||
对外展示给用户或从 `/v1/models` 观察到的模型名。
|
||
可能与 callable model 相同,也可能只是别名。
|
||
|
||
### 3. callable model
|
||
|
||
实际发给上游 chat/responses 请求的模型名。
|
||
|
||
规则:
|
||
|
||
- pool 选择以 `canonical family` / `public model` 为入口
|
||
- route 映射必须保存 `callable model`
|
||
- 如发现 `/v1/models` 列表名与 callable model 不同,应额外记录 `advertised model`
|
||
|
||
## 最小数据结构
|
||
|
||
建议新增 `internal/provision/model_pool.go`,先只做内存级抽象,不立即改 DB schema。
|
||
|
||
```go
|
||
type ModelPool struct {
|
||
PublicModel string
|
||
CanonicalModelFamily string
|
||
Routes []PoolRoute
|
||
}
|
||
|
||
type PoolRoute struct {
|
||
RouteID string
|
||
ProviderID string
|
||
DisplayName string
|
||
BaseURL string
|
||
PublicModel string
|
||
AdvertisedModel string
|
||
CallableModel string
|
||
Priority int
|
||
Schedulable bool
|
||
SupportLevel string
|
||
SupportedModels []string
|
||
SupportsChat bool
|
||
SupportsResponses bool
|
||
CooldownUntil string
|
||
DisableReason string
|
||
KnownAdvisories []string
|
||
}
|
||
```
|
||
|
||
## 与现有运行面的映射
|
||
|
||
### 输入事实层
|
||
|
||
来自:
|
||
|
||
- `pack.ProviderManifest`
|
||
- `probe.CapabilityProfile`
|
||
- `host/sub2api.CapabilityInventory`
|
||
- 现有 logical group / route / route model 配置
|
||
|
||
### 输出运行层
|
||
|
||
映射到:
|
||
|
||
- `logical_group_models.public_model`
|
||
- `logical_group_routes.{route_id,priority,status,upstream_base_url_hint,cooldown_until}`
|
||
- `logical_group_route_models.{public_model,shadow_model,status}`
|
||
|
||
结论:
|
||
|
||
- Phase 2 最小实现只需要新增“归一/折叠层”
|
||
- 不需要重做 route resolve 逻辑
|
||
- route resolve 继续消费 `public_model -> route candidates`
|
||
- model pool 负责决定哪些 route candidates 应该被放进去,以及每条 route 对应哪个 callable model
|
||
|
||
## 最小编排规则
|
||
|
||
1. 一个 `public_model` 可对应多个 route
|
||
2. route 候选必须至少包含:
|
||
- provider_id
|
||
- route_id
|
||
- callable_model
|
||
- priority
|
||
- schedulable
|
||
- supported models(该 route 当前可承载的模型集合)
|
||
3. support level 为以下值之一:
|
||
- `supported-direct`
|
||
- `supported-with-plugin-adapter`
|
||
- `unsupported-by-host`
|
||
- `upstream-unhealthy`
|
||
4. 只有以下候选允许进入默认 pool:
|
||
- `supported-direct`
|
||
- 或明确允许的 `supported-with-plugin-adapter`
|
||
5. `unsupported-by-host` / `upstream-unhealthy` 不应进入 active pool
|
||
6. 当 probe 发现 advertised/callable 差异时:
|
||
- `public_model` 保持稳定
|
||
- `shadow_model`/runtime callable model 以真实可调用名为准
|
||
|
||
## 最小验收目标
|
||
|
||
第一轮不追求真实宿主双供应商导入全部打通,先完成:
|
||
|
||
1. 单元级:
|
||
- 能从多条 provider/capability 输入构建 pool
|
||
- 能过滤 unhealthy / unsupported 候选
|
||
- 能按 priority 排序
|
||
- 能保留 advertised/callable 差异
|
||
2. 集成级:
|
||
- 能把 pool route 映射到现有 route resolve 运行面
|
||
3. 文档级:
|
||
- EXECUTION_BOARD 明确 Phase 2 已进入 model pool 抽象
|
||
|
||
## 本轮不做
|
||
|
||
1. 不新增宿主 DB schema
|
||
2. 不修改 stock sub2api 后端
|
||
3. 不直接实现 portal UI
|
||
4. 不在本轮声称“真实宿主双供应商池化完全可用”——那属于后续 acceptance 脚本闭环
|
||
|
||
## 下一步实现顺序
|
||
|
||
1. 先写 `internal/provision/model_pool_test.go` 失败测试
|
||
2. 再实现 `internal/provision/model_pool.go`
|
||
3. 先验证内存级 pool 归一逻辑
|
||
4. 再决定是否把 runtime import / reconcile 接到这个抽象上
|