598 lines
12 KiB
Markdown
598 lines
12 KiB
Markdown
# 插件层逻辑分组与粘性路由设计
|
||
|
||
日期:2026-05-28
|
||
|
||
## 目标
|
||
|
||
在**不修改宿主源码**的前提下,由 relay-manager 插件层提供一层“逻辑分组 + 多 route 调度”能力,实现:
|
||
|
||
- 前端只看到一个逻辑分组
|
||
- 插件自动把请求路由到背后的多个真实线路
|
||
- 同一会话尽量保持 route 粘性,提高宿主与上游缓存命中
|
||
- 宿主继续只承载单线路 group,不承担多 route 聚合
|
||
|
||
这个设计对应的真实部署形态是:
|
||
|
||
```text
|
||
User
|
||
-> relay-manager plugin router
|
||
-> sub2api host (shadow group A / B / C)
|
||
-> upstream route A / B / C
|
||
```
|
||
|
||
## 核心概念
|
||
|
||
### 1. 逻辑分组 `logical_group`
|
||
|
||
逻辑分组是插件层对用户暴露的“产品分组”,例如:
|
||
|
||
- `gpt-shared`
|
||
- `deepseek-shared`
|
||
|
||
它不是宿主里的真实 `group`,而是插件自己的聚合对象。
|
||
|
||
职责:
|
||
|
||
- 面向前端展示
|
||
- 绑定一组公开模型
|
||
- 绑定一组 route
|
||
- 承载 route policy、sticky policy、fallback policy
|
||
|
||
### 2. 路由线路 `route`
|
||
|
||
route 是逻辑分组下的一条具体出站线路,例如:
|
||
|
||
- `asxs`
|
||
- `codex2api`
|
||
- `official`
|
||
|
||
职责:
|
||
|
||
- 指向一个真实宿主 shadow group
|
||
- 声明支持哪些公开模型
|
||
- 提供优先级、权重、健康状态、熔断状态
|
||
|
||
### 3. 宿主影子分组 `shadow_group`
|
||
|
||
shadow group 是宿主里的真实 group,例如:
|
||
|
||
- `gpt-shared__asxs`
|
||
- `gpt-shared__codex2api`
|
||
|
||
职责:
|
||
|
||
- 承载单条 route 对应的账号池
|
||
- 继续使用宿主既有的 group 内 sticky/account scheduling
|
||
- 不再对用户直接暴露
|
||
|
||
## 数据结构
|
||
|
||
### A. 逻辑分组定义
|
||
|
||
建议插件层持久化表:`logical_groups`
|
||
|
||
```json
|
||
{
|
||
"logical_group_id": "gpt-shared",
|
||
"display_name": "GPT Shared",
|
||
"status": "active",
|
||
"description": "GPT 多线路逻辑分组",
|
||
"public_models": [
|
||
"gpt-5.4",
|
||
"gpt-5.4-mini"
|
||
],
|
||
"default_route_policy": "priority",
|
||
"sticky_policy": {
|
||
"mode": "conversation_preferred",
|
||
"conversation_ttl_seconds": 7200,
|
||
"user_model_ttl_seconds": 1800
|
||
},
|
||
"failover_policy": {
|
||
"consecutive_retryable_failures": 2,
|
||
"cooldown_seconds": 600
|
||
},
|
||
"created_at": "2026-05-28T00:00:00Z",
|
||
"updated_at": "2026-05-28T00:00:00Z"
|
||
}
|
||
```
|
||
|
||
字段说明:
|
||
|
||
- `public_models`: 对用户暴露的公开模型
|
||
- `default_route_policy`: 第一版建议只支持 `priority`
|
||
- `sticky_policy`: route 级粘性配置
|
||
- `failover_policy`: 熔断与切换阈值
|
||
|
||
### B. route 定义
|
||
|
||
建议插件层持久化表:`logical_group_routes`
|
||
|
||
```json
|
||
{
|
||
"route_id": "gpt-shared.asxs",
|
||
"logical_group_id": "gpt-shared",
|
||
"name": "asxs",
|
||
"status": "active",
|
||
"priority": 10,
|
||
"weight": 100,
|
||
"shadow_group_id": "gpt-shared__asxs",
|
||
"shadow_group_host_id": "remote43-route-lab-18169",
|
||
"supported_public_models": [
|
||
"gpt-5.4",
|
||
"gpt-5.4-mini"
|
||
],
|
||
"upstream_base_url_hint": "https://api.asxs.top/v1",
|
||
"retryable_error_classes": [
|
||
"upstream_5xx",
|
||
"upstream_429",
|
||
"gateway_timeout"
|
||
],
|
||
"non_retryable_error_classes": [
|
||
"model_unsupported",
|
||
"invalid_api_key",
|
||
"group_disabled"
|
||
],
|
||
"cooldown_until": null,
|
||
"metadata": {}
|
||
}
|
||
```
|
||
|
||
字段说明:
|
||
|
||
- `priority`: 数值越小优先级越高
|
||
- `shadow_group_id`: 该 route 对应宿主真实 group
|
||
- `shadow_group_host_id`: 对应宿主实例
|
||
- `cooldown_until`: 熔断冷却截止时间
|
||
|
||
### C. 模型覆盖定义
|
||
|
||
第一版可以不单独拆表,直接使用 `route.supported_public_models`。
|
||
如果后续要支持 route 级别模型差异,可以扩展成:
|
||
|
||
- `logical_group_route_models`
|
||
|
||
例如:
|
||
|
||
```json
|
||
{
|
||
"route_id": "gpt-shared.codex2api",
|
||
"public_model": "gpt-5.4",
|
||
"shadow_model": "gpt-5.4",
|
||
"enabled": true
|
||
}
|
||
```
|
||
|
||
第一版建议不预先复杂化,先假设:
|
||
|
||
- `public_model == shadow upstream model`
|
||
- 或 shadow model 已由宿主 provider/account mapping 解决
|
||
|
||
## 运行态上下文
|
||
|
||
插件层处理一次请求时,建议构造统一上下文:
|
||
|
||
```json
|
||
{
|
||
"logical_group_id": "gpt-shared",
|
||
"public_model": "gpt-5.4",
|
||
"user_id": "u_123",
|
||
"api_key_id": "k_456",
|
||
"conversation_id": "conv_789",
|
||
"session_id": "sess_789",
|
||
"request_id": "req_abc",
|
||
"sticky_key": "computed-later",
|
||
"selected_route_id": "gpt-shared.asxs",
|
||
"selected_shadow_group_id": "gpt-shared__asxs"
|
||
}
|
||
```
|
||
|
||
## 路由流程
|
||
|
||
### Phase 1:优先支持“同逻辑分组,多 route,不同公开模型名或相同模型名但明确 priority”
|
||
|
||
插件层一次请求的最小流程应为:
|
||
|
||
```text
|
||
1. 解析请求
|
||
2. 确定 logical_group_id
|
||
3. 确定 public_model
|
||
4. 生成 sticky key
|
||
5. 先查 sticky route
|
||
6. sticky route 可用则继续使用
|
||
7. 否则按 route policy 选择候选 route
|
||
8. 写入 sticky
|
||
9. 转发到对应 shadow group
|
||
10. 记录结果,必要时更新失败计数 / 冷却状态
|
||
```
|
||
|
||
### 详细步骤
|
||
|
||
#### 1. 解析用户请求
|
||
|
||
从请求中提取:
|
||
|
||
- 逻辑分组
|
||
- 公开模型名
|
||
- 用户标识
|
||
- conversation/session 标识
|
||
|
||
推荐优先级:
|
||
|
||
- `conversation_id`
|
||
- `session_id`
|
||
- 请求体 metadata 里的稳定用户标识
|
||
- API key / user id
|
||
|
||
#### 2. 加载逻辑分组配置
|
||
|
||
读取:
|
||
|
||
- `logical_groups`
|
||
- `logical_group_routes`
|
||
|
||
过滤:
|
||
|
||
- `logical_group.status == active`
|
||
- `route.status == active`
|
||
- 当前时间未落入 `cooldown_until`
|
||
- `public_model` 在 `supported_public_models` 内
|
||
|
||
#### 3. 生成 sticky key
|
||
|
||
按“强粘性优先”原则:
|
||
|
||
1. conversation 级
|
||
2. session 级
|
||
3. user+model 级
|
||
4. 稳定哈希兜底
|
||
|
||
### 推荐 key 生成规则
|
||
|
||
#### 3.1 conversation 级 sticky
|
||
|
||
当 `conversation_id` 存在时:
|
||
|
||
```text
|
||
lg:{logical_group_id}:m:{public_model}:conv:{conversation_id}
|
||
```
|
||
|
||
#### 3.2 session 级 sticky
|
||
|
||
当 `session_id` 存在时:
|
||
|
||
```text
|
||
lg:{logical_group_id}:m:{public_model}:sess:{session_id}
|
||
```
|
||
|
||
#### 3.3 user-model 级 sticky
|
||
|
||
当只有用户标识时:
|
||
|
||
```text
|
||
lg:{logical_group_id}:m:{public_model}:user:{user_or_api_key_id}
|
||
```
|
||
|
||
#### 3.4 hash bucket 兜底
|
||
|
||
完全无会话标识时:
|
||
|
||
```text
|
||
lg:{logical_group_id}:m:{public_model}:bucket:{stable_hash(user_or_ip_or_api_key)%128}
|
||
```
|
||
|
||
这不是强 session 粘性,只是为了避免每次随机抖动。
|
||
|
||
## Redis key 设计
|
||
|
||
### A. sticky route key
|
||
|
||
value 建议:
|
||
|
||
```json
|
||
{
|
||
"route_id": "gpt-shared.asxs",
|
||
"shadow_group_id": "gpt-shared__asxs",
|
||
"public_model": "gpt-5.4",
|
||
"bound_at": "2026-05-28T12:00:00Z",
|
||
"expires_at": "2026-05-28T14:00:00Z",
|
||
"last_ok_at": "2026-05-28T12:34:00Z",
|
||
"fail_count": 0
|
||
}
|
||
```
|
||
|
||
TTL:
|
||
|
||
- conversation/session 级:默认 `7200s`
|
||
- user-model 级:默认 `1800s`
|
||
- bucket 级:默认 `600s`
|
||
|
||
### B. route failure counter
|
||
|
||
key:
|
||
|
||
```text
|
||
routefail:{route_id}
|
||
```
|
||
|
||
value:
|
||
|
||
```json
|
||
{
|
||
"consecutive_retryable_failures": 1,
|
||
"last_failure_at": "2026-05-28T12:35:00Z",
|
||
"last_error_class": "upstream_5xx"
|
||
}
|
||
```
|
||
|
||
作用:
|
||
|
||
- 辅助 route 熔断
|
||
- 与 sticky key 解耦
|
||
|
||
### C. route cooldown key
|
||
|
||
key:
|
||
|
||
```text
|
||
routecool:{route_id}
|
||
```
|
||
|
||
value:
|
||
|
||
```json
|
||
{
|
||
"cooldown_until": "2026-05-28T12:45:00Z",
|
||
"reason": "consecutive_retryable_failures"
|
||
}
|
||
```
|
||
|
||
作用:
|
||
|
||
- route 进入冷却期后,不参与新 sticky 选择
|
||
- 已命中该 route 的旧 sticky 也要在下次请求时重新评估
|
||
|
||
### D. 可选:逻辑分组模型首选 route cache
|
||
|
||
key:
|
||
|
||
```text
|
||
routepref:{logical_group_id}:{public_model}
|
||
```
|
||
|
||
value:
|
||
|
||
```json
|
||
{
|
||
"route_id": "gpt-shared.asxs",
|
||
"updated_at": "2026-05-28T12:00:00Z"
|
||
}
|
||
```
|
||
|
||
第一版可直接读 DB,不一定要上 Redis。
|
||
只有当路由配置频繁热更新、且请求量明显上来后,才值得加这层缓存。
|
||
|
||
## 选路算法
|
||
|
||
### 第一版推荐:`priority + sticky + failover`
|
||
|
||
不要一开始做复杂权重、实时负载、成本优化。
|
||
第一版最稳的算法:
|
||
|
||
1. 先查 sticky route
|
||
2. 如果 sticky route 当前健康且支持该模型,则继续使用
|
||
3. 否则按 `priority` 从小到大选第一个健康 route
|
||
4. 成功后写新的 sticky
|
||
5. 失败时按错误类别决定是否切换
|
||
|
||
### 伪代码
|
||
|
||
```text
|
||
resolveRoute(ctx):
|
||
candidates = active routes for logical_group + public_model
|
||
sticky = redis.get(sticky_key)
|
||
|
||
if sticky exists:
|
||
route = find route by sticky.route_id
|
||
if route is healthy and not cooling down:
|
||
return route
|
||
|
||
sort candidates by priority asc
|
||
for route in candidates:
|
||
if route healthy and not cooling down:
|
||
redis.set(sticky_key, route)
|
||
return route
|
||
|
||
return no_route_available
|
||
```
|
||
|
||
## 错误分类与 failover
|
||
|
||
### A. 可重试失败 `retryable`
|
||
|
||
建议包括:
|
||
|
||
- 宿主 `502/503/504`
|
||
- upstream `5xx`
|
||
- upstream timeout
|
||
- upstream `429`
|
||
- route host temporarily unavailable
|
||
|
||
处理策略:
|
||
|
||
- 单次失败先累加 `fail_count`
|
||
- 达到阈值后触发 route fallback
|
||
- 对旧 route 进入 `cooldown`
|
||
|
||
### B. 不可重试失败 `non-retryable`
|
||
|
||
建议包括:
|
||
|
||
- invalid api key
|
||
- model unsupported
|
||
- group disabled
|
||
- account disabled
|
||
- provider misconfigured
|
||
|
||
处理策略:
|
||
|
||
- 当前 route 立即标记不可用于该模型
|
||
- 直接尝试下一个 route
|
||
- 如果没有其他 route,则返回明确错误
|
||
|
||
### C. fallback 规则
|
||
|
||
默认建议:
|
||
|
||
- `consecutive_retryable_failures >= 2` 时切 route
|
||
- `cooldown = 600s`
|
||
|
||
这组值保守、简单,足够第一版使用。
|
||
|
||
## 粘性策略
|
||
|
||
### 1. 为什么插件层必须自己做 sticky
|
||
|
||
因为宿主当前 sticky 是按真实 `group_id` 做的,而不是按逻辑分组。
|
||
如果插件层不先把会话稳定送到同一个 shadow group:
|
||
|
||
- 宿主会在不同 shadow group 之间重新选账号
|
||
- 上游缓存命中会下降
|
||
- 会话体验会抖动
|
||
|
||
所以正确顺序是:
|
||
|
||
1. 插件层先做 route sticky
|
||
2. 宿主层再做 shadow group 内 account sticky
|
||
|
||
### 2. TTL 建议
|
||
|
||
默认建议:
|
||
|
||
- conversation sticky:`2h`
|
||
- session sticky:`2h`
|
||
- user-model sticky:`30m`
|
||
- bucket sticky:`10m`
|
||
|
||
原则:
|
||
|
||
- 长会话优先稳定
|
||
- 无状态请求尽量降低长期粘死风险
|
||
|
||
### 3. 何时刷新 sticky TTL
|
||
|
||
只在这些情况下刷新:
|
||
|
||
- 本次请求成功
|
||
- 本次 route 未进入 cooldown
|
||
|
||
不要在失败请求上无脑刷新 TTL,否则坏 route 会被粘住。
|
||
|
||
## 转发行为
|
||
|
||
插件在选定 route 后,应把请求转发到:
|
||
|
||
- 指定宿主实例
|
||
- 指定 shadow group 对应的用户入口/API key
|
||
|
||
第一版建议保持简单:
|
||
|
||
- 一个 route 对应一个宿主 shadow group
|
||
- 一个 shadow group 对应一组独立宿主 API key / group token / user key
|
||
|
||
插件需要做的是:
|
||
|
||
- 保持原始请求体
|
||
- 保留 conversation/session 标识
|
||
- 可附加少量内部调试头,例如:
|
||
- `X-Relay-Logical-Group`
|
||
- `X-Relay-Route-ID`
|
||
- `X-Relay-Shadow-Group`
|
||
|
||
这些头只用于内部审计,不暴露给终端用户。
|
||
|
||
## 最小观测字段
|
||
|
||
插件层至少要记录:
|
||
|
||
- `request_id`
|
||
- `logical_group_id`
|
||
- `public_model`
|
||
- `selected_route_id`
|
||
- `selected_shadow_group_id`
|
||
- `sticky_key_type`
|
||
- `sticky_hit`
|
||
- `fallback_used`
|
||
- `error_class`
|
||
- `upstream_status`
|
||
- `latency_ms`
|
||
|
||
这样以后排查“为什么这次走了 codex2api 而不是 asxs”时才有证据。
|
||
|
||
## 第一版推荐实现边界
|
||
|
||
第一版只做这些,避免过度设计:
|
||
|
||
- 逻辑分组
|
||
- route 列表
|
||
- priority 选路
|
||
- Redis sticky
|
||
- Redis cooldown
|
||
- retryable / non-retryable 分类
|
||
- 转发到 shadow group
|
||
|
||
第一版不要做:
|
||
|
||
- 动态权重学习
|
||
- 实时成本最优
|
||
- 自动 A/B
|
||
- 跨 route token 级缓存共享
|
||
- 复杂多臂老虎机
|
||
|
||
## 以 asxs + codex2api 为例
|
||
|
||
### 逻辑分组
|
||
|
||
```json
|
||
{
|
||
"logical_group_id": "gpt-shared",
|
||
"display_name": "GPT Shared",
|
||
"public_models": ["gpt-5.4", "gpt-5.4-mini"]
|
||
}
|
||
```
|
||
|
||
### routes
|
||
|
||
```json
|
||
[
|
||
{
|
||
"route_id": "gpt-shared.asxs",
|
||
"priority": 10,
|
||
"shadow_group_id": "gpt-shared__asxs",
|
||
"supported_public_models": ["gpt-5.4", "gpt-5.4-mini"]
|
||
},
|
||
{
|
||
"route_id": "gpt-shared.codex2api",
|
||
"priority": 20,
|
||
"shadow_group_id": "gpt-shared__codex2api",
|
||
"supported_public_models": ["gpt-5.4", "gpt-5.4-mini"]
|
||
}
|
||
]
|
||
```
|
||
|
||
效果:
|
||
|
||
- 默认优先 `asxs`
|
||
- `asxs` 故障时切 `codex2api`
|
||
- 同一会话尽量持续命中第一次成功选中的 route
|
||
|
||
## 一句话结论
|
||
|
||
这套设计的本质是:
|
||
|
||
- 对外暴露一个 `logical_group`
|
||
- 对内维护多个 `route -> shadow_group`
|
||
- 由插件层先做 **route sticky**
|
||
- 再由宿主在 shadow group 内做 **account sticky**
|
||
|
||
这样才能在不修改宿主源码的前提下,尽量接近“一个分组、多 URL、强粘性、高缓存命中”的目标效果。
|