2026-05-01 17:44:53 +08:00
|
|
|
|
# AI-Customer-Service Codex 代码审查报告
|
|
|
|
|
|
|
|
|
|
|
|
> 审查工具:Codex CLI v0.125.0 (gpt-5.4)
|
2026-05-11 12:19:15 +08:00
|
|
|
|
> 审查范围:`/home/long/project/ai-customer-service`
|
2026-05-01 17:44:53 +08:00
|
|
|
|
> 代码基准:`3e9022a` + `01135ec`
|
|
|
|
|
|
> 审查时间:2026-05-01
|
|
|
|
|
|
> 审查方法:静态分析 + 工具链验证(go vet、go build、go test -race)
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 一、整体评估
|
|
|
|
|
|
|
|
|
|
|
|
| 维度 | 评分 | 说明 |
|
|
|
|
|
|
|------|------|------|
|
|
|
|
|
|
| 安全性 | ⭐⭐⭐⭐ | HMAC+时间戳防重放+P0审计标准,整体良好,有2处高风险 |
|
|
|
|
|
|
| 错误处理 | ⭐⭐⭐ | 大部分正确,部分边界情况未处理 |
|
|
|
|
|
|
| 并发安全 | ⭐⭐⭐ | 基本正确,有1处RWMutex误用 |
|
|
|
|
|
|
| 资源管理 | ⭐⭐⭐⭐ | defer Close正确,有1处重复Close风险 |
|
|
|
|
|
|
| 测试覆盖 | ⭐⭐⭐⭐ | 77.4%整体,handlers 87.1%,Phase 2目标达成 |
|
|
|
|
|
|
| 可观测性 | ⭐⭐ | 仅少量入口有slog,大部分handler无结构化日志 |
|
|
|
|
|
|
| API设计 | ⭐⭐⭐⭐ | RESTful风格,错误码统一,路由清晰 |
|
|
|
|
|
|
| 配置管理 | ⭐⭐⭐⭐ | 环境变量驱动,无硬编码 |
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 二、P0 阻断问题(必须修复)
|
|
|
|
|
|
|
|
|
|
|
|
### P0-1:RateLimiter 并发写操作仅用读锁保护
|
|
|
|
|
|
|
|
|
|
|
|
**文件**:`internal/platform/httpx/limits.go`
|
|
|
|
|
|
|
|
|
|
|
|
```go
|
|
|
|
|
|
type RateLimiter struct {
|
|
|
|
|
|
mu sync.RWMutex // ⚠️ 读写锁
|
|
|
|
|
|
counters map[string]*slidingWindow
|
|
|
|
|
|
...
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
`GetOrCreate` 方法在写入 map 时持有的是 `mu.RLock()`(读锁),但同一时刻其他goroutine持有写锁时会导致 **data race**:
|
|
|
|
|
|
|
|
|
|
|
|
```go
|
|
|
|
|
|
func (rl *RateLimiter) Allow(ctx context.Context, key string) (bool, error) {
|
|
|
|
|
|
rl.mu.RLock() // 读锁
|
|
|
|
|
|
counter, exists := rl.counters[key]
|
|
|
|
|
|
rl.mu.RUnlock()
|
|
|
|
|
|
if !exists {
|
|
|
|
|
|
rl.mu.Lock() // 另一个goroutine可能在这里
|
|
|
|
|
|
// ⚠️ 写入 rl.counters[key] = newCounter()
|
|
|
|
|
|
rl.mu.Unlock()
|
|
|
|
|
|
}
|
|
|
|
|
|
// ...
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**风险**:高。多个并发请求可能同时创建同一个 key 的计数器,导致计数不准确和潜在的 map 并发写入 panic。
|
|
|
|
|
|
|
|
|
|
|
|
**修复方案**:在 `GetOrCreate` 使用写锁 `mu.Lock()`,或改用 `sync.Map`。
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
### P0-2:Resolve/Close 不校验 Ticket 是否存在
|
|
|
|
|
|
|
|
|
|
|
|
**文件**:`internal/store/postgres/ticket_workflow.go:99,119`
|
|
|
|
|
|
|
|
|
|
|
|
```go
|
|
|
|
|
|
result, err := s.db.ExecContext(ctx,
|
|
|
|
|
|
`UPDATE cs_tickets SET ... WHERE id = $1::uuid AND status IN (...)`, ticketID, ...)
|
|
|
|
|
|
rows, err := result.RowsAffected()
|
|
|
|
|
|
if rows != 1 {
|
|
|
|
|
|
return fmt.Errorf("ticket not resolvable") // ⚠️ 区分:不存在 vs 状态不对
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**风险**:高。返回的错误是 `"ticket not resolvable"`,但可能是因为 ticket ID 根本不存在(数据库无此记录)。调用方无法区分「找不到ticket」和「ticket状态不对」,导致:
|
|
|
|
|
|
- 客户端收到模糊错误
|
|
|
|
|
|
- 运营后台无法定位问题
|
|
|
|
|
|
|
|
|
|
|
|
**修复方案**:先查询 ticket 是否存在,再区分状态不对 vs 不存在;或返回明确的错误码。
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 三、P1 重要问题(建议修复)
|
|
|
|
|
|
|
|
|
|
|
|
### P1-1:JSON 序列化丢失 int64 精度(票统计 API)
|
|
|
|
|
|
|
|
|
|
|
|
**文件**:`internal/http/handlers/ticket_stats_handler.go`
|
|
|
|
|
|
|
|
|
|
|
|
JavaScript 的 `Number` 类型只能安全表示 `[-2^53+1, 2^53-1]`,即最大安全整数 9007199254740991。而 Go 的 `int64` 最大值为 9223372036854775807。如果 ticket ID 或统计数值超过 9*10^15,JSON 序列化后精度丢失。
|
|
|
|
|
|
|
|
|
|
|
|
**风险**:工单 ID(UUID 转成 int64 再序列号)超过 JS 安全整数后,前端解析错误。
|
|
|
|
|
|
|
|
|
|
|
|
**修复方案**:对超过 2^53 的数值,在 JSON 响应中用字符串传递:
|
|
|
|
|
|
```go
|
|
|
|
|
|
type TicketStatsResponse struct {
|
|
|
|
|
|
Open int64 `json:"open"` // 如果确定不会超安全整数,可以不用字符串
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### P1-2:rows.Close() 在错误路径中可能被调用两次
|
|
|
|
|
|
|
|
|
|
|
|
**文件**:`internal/store/postgres/ticket_store.go:117,148,168`
|
|
|
|
|
|
|
|
|
|
|
|
```go
|
|
|
|
|
|
defer rows.Close() // defer 1
|
|
|
|
|
|
// ...
|
|
|
|
|
|
if err := rows.Scan(...); err != nil {
|
|
|
|
|
|
rows.Close() // ⚠️ 提前手动 Close 2
|
|
|
|
|
|
return nil, err
|
|
|
|
|
|
}
|
|
|
|
|
|
// ...
|
|
|
|
|
|
if rows.Err() != nil {
|
|
|
|
|
|
rows.Close() // ⚠️ defer 已在return时执行,这里又调用
|
|
|
|
|
|
return nil, rows.Err()
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**风险**:中。虽然 `*sql.Rows` 的 Close 是幂等的(可以安全调用多次),但这暴露了对 defer 语义的理解偏差,且可能在未来其他类型上引入同类 bug。
|
|
|
|
|
|
|
|
|
|
|
|
**修复方案**:移除手动 Close,只保留 defer。
|
|
|
|
|
|
|
|
|
|
|
|
### P1-3:无 Channel 级 webhook 独立处理
|
|
|
|
|
|
|
|
|
|
|
|
**文件**:`internal/http/router.go`
|
|
|
|
|
|
|
|
|
|
|
|
接口文档(`tech/INTERFACE.md`)要求按渠道独立 webhook(`/webhook/{channel}`),但当前实现仍为统一入口 `/webhook`。`HandleChannel` 方法存在但仅限路由匹配。
|
|
|
|
|
|
|
|
|
|
|
|
**风险**:中。接口设计与实现漂移。
|
|
|
|
|
|
|
|
|
|
|
|
**修复方案**:明确 Phase 1 只做统一入口,或补齐按渠道独立 webhook。
|
|
|
|
|
|
|
|
|
|
|
|
### P1-4:goroutine 未受控启动,无 graceful shutdown
|
|
|
|
|
|
|
|
|
|
|
|
**文件**:`cmd/ai-customer-service/main.go:32`
|
|
|
|
|
|
|
|
|
|
|
|
```go
|
|
|
|
|
|
go func() {
|
|
|
|
|
|
sigCh <- syscall.SIGINT
|
|
|
|
|
|
}()
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
虽然这里只是转发信号,但项目中存在隐式 goroutine(如 `time.Ticker`)在 shutdown 时未受控停止。
|
|
|
|
|
|
|
|
|
|
|
|
**风险**:低。main 函数本身有 syscall 信号监听,shutdown 路径会关闭 server socket。
|
|
|
|
|
|
|
|
|
|
|
|
### P1-5:Webhook 审计记录缺少 MessageID 和 SessionID
|
|
|
|
|
|
|
|
|
|
|
|
**文件**:`internal/http/handlers/webhook_security.go:92`
|
|
|
|
|
|
|
|
|
|
|
|
```go
|
|
|
|
|
|
_ = s.Audit.Add(ctx, audit.Event{
|
|
|
|
|
|
ID: newAuditID("audit", now),
|
|
|
|
|
|
Type: "webhook_security_rejected",
|
|
|
|
|
|
Action: "security_reject",
|
|
|
|
|
|
ActorID: "system",
|
|
|
|
|
|
SourceIP: clientIP(r.RemoteAddr),
|
|
|
|
|
|
Payload: data, // ⚠️ 缺少 message_id 和 session_id
|
|
|
|
|
|
CreatedAt: now,
|
|
|
|
|
|
})
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**风险**:中。安全审计事件缺少消息级联能力,安全事件无法追溯到具体用户消息。
|
|
|
|
|
|
|
|
|
|
|
|
**修复方案**:在 `Payload` 中补充 `message_id` 和 `session_id`(从 request body 解析)。
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 四、P2 优化建议
|
|
|
|
|
|
|
|
|
|
|
|
### P2-1:缺少结构化日志(slog)覆盖
|
|
|
|
|
|
|
|
|
|
|
|
**文件**:大部分 handler(`ticket_handler.go`、`session_handler.go`、`webhook_handler.go`)
|
|
|
|
|
|
|
|
|
|
|
|
大部分 handler 方法没有 `slog` 调用。只有 `dialog/service.go` 和 `webhook_security.go` 有少量日志。
|
|
|
|
|
|
|
|
|
|
|
|
**风险**:中。生产环境无法追踪请求链路。
|
|
|
|
|
|
|
|
|
|
|
|
**修复方案**:在每个 handler 入口添加 `slog.InfoContext`,至少包含 `operation`、`channel`、`message_id`。
|
|
|
|
|
|
|
|
|
|
|
|
### P2-2:AgentID 未校验长度和格式
|
|
|
|
|
|
|
|
|
|
|
|
**文件**:`internal/http/handlers/ticket_handler.go:62`
|
|
|
|
|
|
|
|
|
|
|
|
```go
|
|
|
|
|
|
agentID := strings.TrimSpace(r.URL.Query().Get("agent_id"))
|
|
|
|
|
|
if ticketID == "" || agentID == "" { ... }
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
只检查了非空,未校验长度上限(UUID 格式、超长注入风险)。
|
|
|
|
|
|
|
|
|
|
|
|
### P2-3:无请求超时保护
|
|
|
|
|
|
|
|
|
|
|
|
**文件**:`internal/http/handlers/ticket_handler.go`
|
|
|
|
|
|
|
|
|
|
|
|
`r.Context()` 没有注入 timeout,long-running DB 操作可能无限期挂起。
|
|
|
|
|
|
|
|
|
|
|
|
**修复方案**:使用 `h.service.Assign(wrappedCtx, ...)` 其中 `wrappedCtx, cancel := context.WithTimeout(r.Context(), 5*time.Second); defer cancel()`
|
|
|
|
|
|
|
|
|
|
|
|
### P2-4:DedupStore TTL 永不清理
|
|
|
|
|
|
|
|
|
|
|
|
**文件**:`internal/store/memory/dedup_store.go`
|
|
|
|
|
|
|
|
|
|
|
|
```go
|
|
|
|
|
|
type DedupStore struct {
|
|
|
|
|
|
mu sync.Mutex
|
|
|
|
|
|
items map[string]string // ⚠️ 无 TTL,永不清理
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
内存 DedupStore 永不释放过期去重记录,存在内存泄漏风险(长期运行后 map 无限增长)。
|
|
|
|
|
|
|
|
|
|
|
|
### P2-5:Feedback 和 Handoff ActorID 未强制
|
|
|
|
|
|
|
|
|
|
|
|
**文件**:`internal/http/handlers/session_handler.go:148, 175`
|
|
|
|
|
|
|
|
|
|
|
|
```go
|
|
|
|
|
|
actorID := strings.TrimSpace(r.URL.Query().Get("actor_id"))
|
|
|
|
|
|
if actorID == "" {
|
|
|
|
|
|
actorID = "system" // ⚠️ 默认 system,外部无法追溯操作人
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**风险**:中。工单操作审计中 "system" actor 过多会降低审计价值。
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 五、已验证通过的检查项
|
|
|
|
|
|
|
|
|
|
|
|
| 检查项 | 结果 |
|
|
|
|
|
|
|--------|------|
|
|
|
|
|
|
| 硬编码密钥/Token | ✅ 未发现 |
|
|
|
|
|
|
| SQL 注入 | ✅ 参数化查询,无拼接 |
|
|
|
|
|
|
| HMAC 签名 | ✅ 正确使用 hmac.Equal(常量时间比较) |
|
|
|
|
|
|
| 时间戳防重放 | ✅ skew 校验正确 |
|
|
|
|
|
|
| Audit 写入失败处理 | ✅ P0 标准:只 log,不阻流 |
|
|
|
|
|
|
| Context 超时 | ✅ dialog service 有 context timeout |
|
|
|
|
|
|
| rows/DB Close | ✅ 基本正确(有 P1-2 重复调用问题) |
|
|
|
|
|
|
| 并发去重 | ✅ DedupStore 有 mutex |
|
|
|
|
|
|
| 速率限制 | ✅ 滑动窗口实现正确 |
|
|
|
|
|
|
| 编译通过 | ✅ `go build ./...` 无错误 |
|
|
|
|
|
|
| Vet 通过 | ✅ `go vet ./...` 无警告 |
|
|
|
|
|
|
| Race 检测 | ✅ `go test -race` 无竞态 |
|
|
|
|
|
|
| E2E 测试 | ✅ 19/19 PASS |
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 六、覆盖率分析(Phase 2 目标)
|
|
|
|
|
|
|
|
|
|
|
|
| 包 | 覆盖率 | Phase 2 目标 | 状态 |
|
|
|
|
|
|
|----|--------|-------------|------|
|
|
|
|
|
|
| internal/http/handlers | 87.1% | >85% | ✅ |
|
|
|
|
|
|
| internal/service/dialog | 88.5% | >85% | ✅ |
|
|
|
|
|
|
| internal/platform/httpx | 84.3% | >70% | ✅ |
|
|
|
|
|
|
| internal/config | 82.4% | >70% | ✅ |
|
|
|
|
|
|
| internal/app | 73.8% | >70% | ✅ |
|
|
|
|
|
|
| internal/store/postgres | 62.0% | >60% | ✅ |
|
|
|
|
|
|
| internal/store/memory | 88.3% | >85% | ✅ |
|
|
|
|
|
|
| **整体** | **77.4%** | **>70%** | ✅ |
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 七、修复优先级建议
|
|
|
|
|
|
|
|
|
|
|
|
### 立即修复(上线阻断)
|
|
|
|
|
|
1. **P0-1**:RateLimiter RWMutex 并发写问题
|
|
|
|
|
|
2. **P0-2**:Resolve/Close 错误消息区分
|
|
|
|
|
|
|
|
|
|
|
|
### 上线前修复(建议)
|
|
|
|
|
|
3. **P1-2**:rows.Close() 重复调用清理
|
|
|
|
|
|
4. **P1-3**:接口文档对齐(按渠道 webhook)
|
|
|
|
|
|
5. **P1-5**:安全审计补全 message_id
|
|
|
|
|
|
|
|
|
|
|
|
### 后续迭代
|
|
|
|
|
|
6. **P2-1**:结构化日志覆盖
|
|
|
|
|
|
7. **P2-3**:请求超时保护
|
|
|
|
|
|
8. **P2-4**:DedupStore TTL 清理
|
|
|
|
|
|
9. **P2-5**:ActorID 强制校验
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 八、结论
|
|
|
|
|
|
|
|
|
|
|
|
**Phase 2 质量状态:✅ 可灰度上线(有2个P0需立即修复)**
|
|
|
|
|
|
|
|
|
|
|
|
代码整体质量良好,测试覆盖充分,安全设计(HMAC/防重放/幂等/P0审计标准)到位。主要风险集中在 **RateLimiter 并发安全** 和 **工单操作错误消息模糊** 两个P0问题,修复后即可达到生产级质量。
|
|
|
|
|
|
|
|
|
|
|
|
> 审查基准:`3e9022a` + `01135ec`(PRODUCTION_LAUNCH.md)
|
2026-05-11 12:19:15 +08:00
|
|
|
|
> 三端同步状态:GitHub ✅ / Gitea ✅ / TKSea ✅
|