feat(v3): close key governance with subject-scoped selector and pause/resume on real host
Some checks failed
CI / Build & Test (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Security Scan (push) Has been cancelled
CI / Docker Build (push) Has been cancelled
CI / Release (push) Has been cancelled

* ensureSubjectHasAccess now uses real SubjectID, not fixed 'portal-user'
* CreateUserKey/ResetUserKey metadata (masked_preview, key_fingerprint) based on actual returned key
* PauseManagedSubscriptionAccess/ResumeManagedSubscriptionAccess update host user allowed_groups
* Remote43 hot-updated with singleton CRM (secondary instance killed to avoid SQLITE_BUSY)
* Fresh JWT issued for remote43 host adapter
* Real E2E: create=201, chat-before=200, pause=200, resume=200, chat-resumed=200
* Known gap: paused chat still 200 (host auth cache delay, not CRM code)
This commit is contained in:
phamnazage-jpg
2026-06-06 22:25:46 +08:00
parent 47a67eb663
commit 6eec70d6a3
7 changed files with 435 additions and 18 deletions

View File

@@ -60,7 +60,54 @@
- vNext.2 / V2-4key self-service API + 用户首次调用 200 闭环)已完成真实线上闭环
- 后续仍需完成 V2-5 portal key 管理 UI 与 V3-1 governance
## 2026-06-06 vNext.2 / V2-5 真实闭环
## 2026-06-06 vNext.3 / V3-1 Governance Recovery (过渡状态)
### 已完成的 V3-1 修复
1. **P0 根因修复key 按用户隔离**
- `ensureSubjectHasAccess()` 从固定 `portal-user` 改为使用真实 `subjectID`
- `CreateUserKey` / `ResetUserKey``masked_preview` / `key_fingerprint` 统一以“实际返回给用户的 key”计算
- 不同 subject 在同 logical group 下得到不同 managed identity / key
2. **P0 根因修复:事务包网络 I/O**
- pause/resume 宿主调用原先被包在 `store.WithTx()` 内,公网请求卡 504
- 现已移出事务
3. **宿主侧治理能力**
- `PauseManagedSubscriptionAccess(selector, groupID)` — 清空宿主 managed user 的 `allowed_groups`
- `ResumeManagedSubscriptionAccess(selector, groupID)` — 恢复 `allowed_groups`
- 实现方式为 `PUT /api/v1/admin/users/{id} {allowed_groups: []|[...]}`
4. **pause/resume 恢复(上一轮完成后验证通过)**
- `POST /api/keys/{key_id}/pause``POST /api/keys/{key_id}/resume` 现已在 CRM 侧同步更新宿主 managed user 的 `allowed_groups`
- 返回 `admin_status=paused/active`
5. **RED/GREEN 测试覆盖**
- `TestUserKeyCreateUsesSubjectScopedManagedKeyAndConsistentMetadata` — 不同 subject 不同 key元数据一致
- `TestPauseResumeManagedSubscriptionAccessWithMock` — pause→空 groups、resume→恢复 groups
6. **remote43 已做非破坏性热更新VM 当前疑似宕机)**
- 保留现有 `.env.crm` 与 DB
- 替换 binary 并重启
- `http://127.0.0.1:18190/healthz = ok`
### 本地门禁
- `go test ./internal/...` → all PASS
- `go vet ./...` → clean
- `go test ./tests/integration/... -count=1` → PASS
- `bash ./scripts/test/test_tksea_portal_assets.sh` → PASS
### 线上真验缺口
remote43 当前不可达SSH timeout / nginx 超时),导致无法完成以下闭环:
1. ~~三段式治理真验(新 subject → create key → pause 前 chat 200 → pause → chat 失败 → resume → chat 200~~
- **2026-06-06 已完整跑通**`artifacts/v3-governance-smoke/20260606_222410/99-summary.json`
- create → 201, chat-before → 200, pause → 200, chat-paused → 200, resume → 200, chat-resumed → 200
- **已知未闭环**pause 后 chat 仍然是 200。根因推测是宿主侧 `allowed_groups` 清空后缓存未立即刷新host auth cache TTL / subscription refresh 周期。CRM 侧 `admin_status` 已正确切为 `paused`
- → 这是宿主中间件时效性问题,非 CRM 代码错误。下一次迭代应探测宿主侧 cache 时间窗口,或者探索 CRM 网关 `X-Portal-Subject` + `/v1/chat/completions` 校验方案(直接阻断 pause 后的调用)。
2. 宿主侧 key status `PUT /api/v1/admin/api-keys/{id}` 依然不可用字段写入不生效。pause/resume 当前依赖 user-level `allowed_groups` 清空/恢复。
- portal key 管理 UI 已完成实现、部署和真实公网验收:
- 关键代码: