refactor: clean up project structure

- Remove old review reports (keep latest only) - Move docs/ to deploy/docs-backup/ - Move performance-testing/ to deploy/ - Clean up test output files - Organize root directory
2026-04-06 23:36:03 +08:00
parent 4d71566c0d
commit 349d783fd1
697 changed files with 24114 additions and 163282 deletions
--- a/deploy/performance-testing/config/database-optimization.md
+++ b/deploy/performance-testing/config/database-optimization.md
@@ -0,0 +1,192 @@
+# Sub2API 数据库连接池优化配置
+
+## 📊 当前配置分析
+
+根据 `backend/internal/repository/db_pool.go` 和 `backend/internal/config/config.go` 的分析，当前数据库连接池配置支持以下参数：
+
+```go
+type dbPoolSettings struct {
+    MaxOpenConns    int           // 最大打开连接数
+    MaxIdleConns    int           // 最大空闲连接数
+    ConnMaxLifetime time.Duration // 连接最大生命周期
+    ConnMaxIdleTime time.Duration // 空闲连接最大存活时间
+}
+```
+
+## 🎯 推荐配置
+
+### 小规模部署（< 100 QPS）
+
+```yaml
+# config.yaml
+database:
+  max_open_conns: 25
+  max_idle_conns: 10
+  conn_max_lifetime_minutes: 30
+  conn_max_idle_time_minutes: 5
+```
+
+### 中等规模（100-500 QPS）
+
+```yaml
+# config.yaml
+database:
+  max_open_conns: 50
+  max_idle_conns: 20
+  conn_max_lifetime_minutes: 30
+  conn_max_idle_time_minutes: 5
+```
+
+### 大规模部署（500-2000 QPS）
+
+```yaml
+# config.yaml
+database:
+  max_open_conns: 100
+  max_idle_conns: 30
+  conn_max_lifetime_minutes: 15
+  conn_max_idle_time_minutes: 3
+```
+
+### 超大规模（> 2000 QPS）
+
+```yaml
+# config.yaml
+database:
+  max_open_conns: 200
+  max_idle_conns: 50
+  conn_max_lifetime_minutes: 10
+  conn_max_idle_time_minutes: 2
+```
+
+## 🔧 配置参数详解
+
+### MaxOpenConns
+
+| 场景 | 推荐值 | 说明 |
+|------|--------|------|
+| 小规模 | 25-50 | 避免连接数过多占用资源 |
+| 中等 | 50-100 | 平衡并发和资源消耗 |
+| 大规模 | 100-200 | 需要配合应用水平扩展 |
+
+**计算公式**：
+```
+MaxOpenConns = 预期并发请求数 × (1 + 峰值系数) / 单请求平均连接时间
+```
+
+### MaxIdleConns
+
+| 场景 | 推荐值 | 说明 |
+|------|--------|------|
+| 小规模 | 5-10 | 保持基础连接预热 |
+| 中等 | 15-25 | 覆盖正常并发 |
+| 大规模 | 30-50 | 减少连接建立开销 |
+
+**原则**：`MaxIdleConns <= MaxOpenConns * 0.5`
+
+### ConnMaxLifetime
+
+| 场景 | 推荐值 | 说明 |
+|------|--------|------|
+| 开发/测试 | 1小时 | 减少连接重建 |
+| 生产 | 15-30分钟 | 平衡连接重建开销和资源 |
+
+**原则**：应小于 PostgreSQL 的 `idle_in_transaction_session_timeout`
+
+### ConnMaxIdleTime
+
+| 场景 | 推荐值 | 说明 |
+|------|--------|------|
+| 高频 | 1-3分钟 | 快速回收空闲连接 |
+| 正常 | 5-10分钟 | 平衡连接复用和资源 |
+
+## 📈 性能调优步骤
+
+### 1. 基准测试
+
+```bash
+# 使用 pgbench 进行基准测试
+pgbench -h localhost -U postgres -d sub2api -c 10 -j 4 -T 60
+
+# 测试不同连接池配置
+for conn in 10 25 50 100; do
+  echo "Testing MaxOpenConns=$conn"
+  # 调整配置后重新测试
+done
+```
+
+### 2. 监控关键指标
+
+通过 Prometheus 查询：
+
+```promql
+# 数据库连接使用率
+sub2api_db_connections{state="active"} / sub2api_db_connections{state="max"} * 100
+
+# 等待连接的请求数
+pg_stat_activity_waiting
+
+# 连接等待时间
+pg_stat_activity.max_wait_time
+```
+
+### 3. 优化建议
+
+**问题：高连接等待**
+- 增加 `MaxOpenConns`
+- 检查慢查询
+- 优化索引
+
+**问题：频繁连接重建**
+- 增加 `MaxIdleConns`
+- 增加 `ConnMaxLifetime`
+
+**问题：内存持续增长**
+- 减少 `MaxIdleConns`
+- 减少 `ConnMaxIdleTime`
+
+## 🚀 PostgreSQL 服务端优化
+
+除了应用层配置，还需要在 PostgreSQL 服务端进行优化：
+
+```sql
+-- postgresql.conf 优化
+
+-- 连接池相关
+max_connections = 200
+
+-- 内存相关
+shared_buffers = 256MB
+effective_cache_size = 1GB
+work_mem = 16MB
+maintenance_work_mem = 128MB
+
+-- 查询优化
+random_page_cost = 1.1
+effective_io_concurrency = 200
+
+-- 写入优化
+wal_buffers = 16MB
+checkpoint_completion_target = 0.9
+
+-- 连接优化
+tcp_keepalives_idle = 60
+tcp_keepalives_interval = 10
+tcp_keepalives_count = 6
+```
+
+## 📊 性能基线参考
+
+| 连接池配置 | 10 VU | 50 VU | 100 VU | 200 VU |
+|-----------|-------|-------|--------|--------|
+| 25/10     | 200ms | 500ms  | 1000ms | 2000ms |
+| 50/20     | 150ms | 300ms  | 600ms  | 1200ms |
+| 100/30    | 100ms | 200ms  | 400ms  | 800ms  |
+| 200/50    | 80ms  | 150ms  | 300ms  | 600ms  |
+
+## ⚠️ 注意事项
+
+1. **不要盲目增大连接池**：PostgreSQL 单实例推荐 100-200 连接
+2. **监控实际使用**：使用 `pg_stat_activity` 观察连接使用情况
+3. **考虑使用 PgBouncer**：高并发场景推荐使用连接池中间件
+4. **测试峰值场景**：确保峰值负载下连接池不会成为瓶颈
--- a/deploy/performance-testing/config/redis-optimization.md
+++ b/deploy/performance-testing/config/redis-optimization.md
@@ -0,0 +1,243 @@
+# Sub2API Redis 连接池优化配置
+
+## 📊 当前配置分析
+
+根据 `backend/internal/repository/redis.go` 分析，当前 Redis 配置支持：
+
+```go
+opts := &redis.Options{
+    Addr:         cfg.Redis.Address(),
+    Password:     cfg.Redis.Password,
+    DB:           cfg.Redis.DB,
+    DialTimeout:  time.Duration(cfg.Redis.DialTimeoutSeconds) * time.Second,
+    ReadTimeout:  time.Duration(cfg.Redis.ReadTimeoutSeconds) * time.Second,
+    WriteTimeout: time.Duration(cfg.Redis.WriteTimeoutSeconds) * time.Second,
+    PoolSize:     cfg.Redis.PoolSize,
+    MinIdleConns: cfg.Redis.MinIdleConns,
+}
+```
+
+## 🎯 推荐配置
+
+### 小规模部署（< 1000 QPS）
+
+```yaml
+# config.yaml
+redis:
+  dial_timeout_seconds: 5
+  read_timeout_seconds: 3
+  write_timeout_seconds: 3
+  pool_size: 50
+  min_idle_conns: 10
+```
+
+### 中等规模（1000-5000 QPS）
+
+```yaml
+# config.yaml
+redis:
+  dial_timeout_seconds: 3
+  read_timeout_seconds: 2
+  write_timeout_seconds: 2
+  pool_size: 100
+  min_idle_conns: 20
+```
+
+### 大规模部署（5000-20000 QPS）
+
+```yaml
+# config.yaml
+redis:
+  dial_timeout_seconds: 2
+  read_timeout_seconds: 1
+  write_timeout_seconds: 1
+  pool_size: 200
+  min_idle_conns: 50
+```
+
+### 超大规模（> 20000 QPS）
+
+```yaml
+# config.yaml
+redis:
+  dial_timeout_seconds: 1
+  read_timeout_seconds: 1
+  write_timeout_seconds: 1
+  pool_size: 500
+  min_idle_conns: 100
+```
+
+## 🔧 配置参数详解
+
+### PoolSize
+
+| 场景 | 推荐值 | 说明 |
+|------|--------|------|
+| 小规模 | 50-100 | 单实例足够 |
+| 中等 | 100-200 | 覆盖正常并发 |
+| 大规模 | 200-500 | 需要配合应用水平扩展 |
+
+**计算公式**：
+```
+PoolSize = 预期最大并发请求数 × 1.2 (缓冲)
+```
+
+### MinIdleConns
+
+| 场景 | 推荐值 | 说明 |
+|------|--------|------|
+| 小规模 | 10-20 | 减少冷启动延迟 |
+| 中等 | 20-50 | 保持热连接 |
+| 大规模 | 50-100 | 高可用预热 |
+
+**原则**：`MinIdleConns = PoolSize * 0.2 ~ 0.3`
+
+### 超时配置
+
+| 参数 | 推荐值 | 说明 |
+|------|--------|------|
+| DialTimeout | 2-5秒 | 建连超时，过长会阻塞 |
+| ReadTimeout | 1-3秒 | 读取超时，应小于请求超时 |
+| WriteTimeout | 1-3秒 | 写入超时，应小于请求超时 |
+
+**重要**：超时配置应小于上游 API 超时，避免级联超时
+
+## 📈 性能调优步骤
+
+### 1. 基准测试
+
+```bash
+# 使用 redis-benchmark
+redis-benchmark -h localhost -p 6379 -n 100000 -c 50 -t get,set,hget,hset -d 100
+
+# 测试 Pipeline 性能
+redis-benchmark -h localhost -p 6379 -n 10000 -P 10 -t get,set
+```
+
+### 2. 监控关键指标
+
+```promql
+# Redis 连接使用率
+sub2api_redis_connections{state="total"} / <pool_size> * 100
+
+# Redis 操作延迟
+rate(sub2api_http_request_duration_seconds{path="/metrics"}[5m])
+
+# 缓存命中率
+sub2api_cache_operations_total{result="hit"} / 
+(sub2api_cache_operations_total{result="hit"} + sub2api_cache_operations_total{result="miss"})
+```
+
+### 3. 常见问题排查
+
+**问题：高 Redis 延迟**
+- 检查网络延迟：`redis-cli --latency-history`
+- 检查慢查询：`redis-cli SLOWLOG GET 10`
+- 优化 KEY 设计，减少大 Value
+
+**问题：连接池耗尽**
+- 增加 `PoolSize`
+- 检查是否存在连接泄漏
+- 考虑使用 Redis Cluster
+
+**问题：缓存命中率低**
+- 分析缓存 Key 访问模式
+- 调整 TTL 配置
+- 检查缓存失效策略
+
+## 🚀 Redis 服务端优化
+
+### 单机 Redis 配置
+
+```conf
+# redis.conf
+
+# 网络优化
+tcp-backlog 511
+timeout 300
+tcp-keepalive 300
+
+# 内存优化
+maxmemory 2gb
+maxmemory-policy allkeys-lru
+
+# 持久化优化（根据业务选择）
+rdbcompression yes
+rdbchecksum yes
+save 900 1
+save 300 10
+save 60 10000
+
+# AOF 优化
+appendonly yes
+appendfsync everysec
+auto-aof-rewrite-percentage 100
+auto-aof-rewrite-min-size 64mb
+
+# 客户端优化
+client-output-buffer-limit normal 256mb 64mb 60
+client-output-buffer-limit replica 256mb 64mb 60
+client-output-buffer-limit pubsub 32mb 8mb 60
+```
+
+### Redis Cluster 配置（大规模）
+
+```yaml
+# redis-cluster.yaml
+cluster-enabled: yes
+cluster-config-file: nodes.conf
+cluster-node-timeout: 15000
+cluster-replica-validity-factor: 10
+cluster-migration-barrier: 1
+cluster-require-full-coverage: yes
+```
+
+## 📊 Sub2API 缓存策略分析
+
+### 当前缓存层次
+
+1. **L1 缓存**: go-cache (内存)
+   - `userGroupRateCache`: 30秒 TTL
+   - `modelsListCache`: 15秒 TTL
+
+2. **L2 缓存**: Redis
+   - API Key 认证缓存
+   - 用户组速率限制缓存
+   - 调度器快照缓存
+
+### 缓存 Key 设计建议
+
+```
+# 推荐格式
+{sub2api}:{module}:{entity}:{id}:{field}
+{sub2api}:auth:apikey:{key_hash}
+{sub2api}:ratelimit:user:{user_id}:{window}
+{sub2api}:gateway:scheduler:{account_id}
+```
+
+### 缓存 TTL 建议
+
+| 缓存类型 | 推荐 TTL | 说明 |
+|----------|----------|------|
+| API Key 认证 | 5-15分钟 | 平衡一致性和性能 |
+| 用户组速率 | 30秒-1分钟 | 需要实时性 |
+| 调度器快照 | 1-5分钟 | 允许一定延迟 |
+| 模型列表 | 15-30秒 | 变化较频繁 |
+| 计费数据 | 5-15分钟 | 允许批处理延迟 |
+
+## ⚠️ 注意事项
+
+1. **不要设置过大的 PoolSize**：每个连接占用约 10KB 内存
+2. **监控连接泄漏**：确保连接被正确释放
+3. **考虑读写分离**：大量读操作可以分摊到从节点
+4. **使用 Pipeline**：批量操作减少 RTT
+5. **避免大 Value**：单 Key 建议 < 1MB
+
+## 📊 性能基线参考
+
+| Redis 配置 | 1K QPS | 5K QPS | 10K QPS | 20K QPS |
+|-----------|--------|--------|---------|---------|
+| 50/10     | 5ms    | 15ms   | 40ms    | 100ms   |
+| 100/20    | 3ms    | 8ms    | 20ms    | 50ms    |
+| 200/50    | 2ms    | 5ms    | 12ms    | 30ms    |
+| 500/100   | 1ms    | 3ms    | 8ms     | 20ms    |