fix(deploy): production CRM deployment improvements
- Fix deploy_crm_only.sh: non-destructive hot reload - Enhanced stop logic with pgrep + fuser for port release - Added 3-layer verification (process/control/user) - Check /proc/$pid/exe for (deleted) marker - Never delete DB - Fix portal script contracts: crm_session → crm_subject - deploy_tksea_portal.sh: use $cookie_crm_subject - test_tksea_portal_assets.sh: assert crm_subject exists - nginx.example.conf: updated trusted subject header - Add systemd service management - sub2api-crm.service.template - install_crm_systemd.sh - verify_crm_deployment.sh Update docs/plans/2026-06-04-next-version-plan.md with deployment findings.
This commit is contained in:
@@ -8,15 +8,11 @@
|
||||
# - /kimi/ 与 /kimi-v1/ 继续保留,兼容旧的 Kimi 专用客户端配置
|
||||
#
|
||||
# 安全注意事项:
|
||||
# - portal-subject 从 cookie 提取,由后端 /api/portal/session/login 设置 httpOnly cookie
|
||||
# - X-CRM-Authenticated-Subject 传的是 crm_session(签名 token),CRM 会验签并解出真实 subject
|
||||
# - crm_subject 仅供前端展示,不能作为 trusted subject 来源
|
||||
# - CRM 验证 X-CRM-Trusted-Proxy header 确保请求来自受信 nginx
|
||||
# - 两者必须同时配置才能启用 user-key self-service
|
||||
|
||||
# 从 httpOnly cookie 提取 portal subject
|
||||
map $http_cookie $portal_subject {
|
||||
default "";
|
||||
~*crm_session=([^;]+) $1;
|
||||
}
|
||||
|
||||
location = /portal {
|
||||
return 302 /portal/;
|
||||
@@ -47,7 +43,8 @@ location /portal-proxy/ {
|
||||
}
|
||||
|
||||
location /portal-admin-api/ {
|
||||
# 必须由受信登录/鉴权层把用户 subject 放进 $portal_subject,不能信任浏览器自带 header。
|
||||
# 必须由受信登录/鉴权层把用户签名放进 $cookie_crm_subject,不能信任浏览器自带 header。
|
||||
# 这是 CRM 配置 TRUSTED_SUBJECT_COOKIE=crm_subject 对应的 cookie 名。
|
||||
# 同时 CRM 需配置:
|
||||
# SUB2API_CRM_TRUSTED_SUBJECT_HEADER=X-CRM-Authenticated-Subject
|
||||
# SUB2API_CRM_TRUSTED_PROXY_SECRET_HEADER=X-CRM-Trusted-Proxy
|
||||
@@ -57,8 +54,8 @@ location /portal-admin-api/ {
|
||||
proxy_set_header X-Real-IP $remote_addr;
|
||||
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||||
proxy_set_header X-Forwarded-Proto $scheme;
|
||||
# 关键:从验证过的 cookie 提取并注入 subject
|
||||
proxy_set_header X-CRM-Authenticated-Subject $portal_subject;
|
||||
# 关键:注入 portal_auth.go 设置的签名 crm_subject cookie;CRM 会验签并解出 subject
|
||||
proxy_set_header X-CRM-Authenticated-Subject $cookie_crm_subject;
|
||||
# 受信代理密钥(必须与 CRM 配置一致)
|
||||
proxy_set_header X-CRM-Trusted-Proxy "REPLACE_WITH_64_CHAR_HEX_SECRET";
|
||||
proxy_http_version 1.1;
|
||||
|
||||
334
docs/REMOTE43_OPERATIONS_BASELINE.md
Normal file
334
docs/REMOTE43_OPERATIONS_BASELINE.md
Normal file
@@ -0,0 +1,334 @@
|
||||
# remote43 运维基线(2026-06-10)
|
||||
|
||||
## 1. 目标
|
||||
|
||||
这份文档用于回答三件事:
|
||||
|
||||
1. remote43 当前到底跑着什么
|
||||
2. 为什么“更新部署经常看起来成功、实际没切过去”
|
||||
3. 后续持续运维必须遵守哪些规则
|
||||
|
||||
这是当前真相源,优先于零散会话结论。
|
||||
|
||||
## 2. 服务器画像
|
||||
|
||||
### 主机
|
||||
|
||||
- Host: `ubuntu@43.155.133.187`
|
||||
- Hostname: `VM-0-16-ubuntu`
|
||||
- OS: Ubuntu / Linux `6.8.0-107-generic`
|
||||
- CPU: 2 vCPU
|
||||
- Memory: 3.6 GiB
|
||||
- Swap: 1.9 GiB
|
||||
- Root disk: 59G, 已用 34G, 使用率 59%
|
||||
|
||||
### 当前负载(2026-06-10 09:25 CST 只读巡检)
|
||||
|
||||
- load average: `3.16 3.39 3.40`
|
||||
- 可用内存仅 `259 MiB`
|
||||
- swap 已使用 `982 MiB`
|
||||
- memory PSI / io PSI 持续非零,存在资源压力
|
||||
|
||||
### 主要监听端口
|
||||
|
||||
- `80` / `443` → nginx
|
||||
- `8080` → sub2api host
|
||||
- `127.0.0.1:18190` → CRM
|
||||
|
||||
## 4. 当前运行面真相
|
||||
|
||||
**最后更新: 2026-06-10**
|
||||
|
||||
### CRM 当前状态
|
||||
|
||||
- 运行目录: `/home/ubuntu/crm-only-20260602_18190`
|
||||
- 当前进程 PID: `920892` (systemd 管理)
|
||||
- 当前运行命令: `./sub2api-cn-relay-manager-server`
|
||||
- `/proc/920892/exe` 指向:
|
||||
- `/home/ubuntu/crm-only-20260602_18190/sub2api-cn-relay-manager-server` (无 deleted 标记)
|
||||
- Portal Session: `login_enabled: true` ✓
|
||||
|
||||
### CRM 健康与管理接口
|
||||
|
||||
- `GET /healthz` → `ok`
|
||||
- `GET /api/packs` + admin token → `HTTP 200`
|
||||
|
||||
这表示:
|
||||
|
||||
- CRM 当前不是完全挂掉
|
||||
- 问题核心是“部署切换失败”,不是“服务不可用”
|
||||
|
||||
### CRM 数据库
|
||||
|
||||
- 文件:`/home/ubuntu/crm-only-20260602_18190/sub2api-cn-relay-manager.db`
|
||||
- 当前快速计数(2026-06-10 巡检时):
|
||||
- `hosts = 0`
|
||||
- `logical_groups = 0`
|
||||
- `logical_group_routes = 0`
|
||||
- `user_keys = 0`
|
||||
- `schema_migrations = 17`
|
||||
|
||||
注意:当前 DB 已不是“有完整生产业务数据的状态库”,至少在该时点已经接近空库。
|
||||
|
||||
## 4. 当前仓库/部署目录事实
|
||||
|
||||
### 远端 repo
|
||||
|
||||
- 路径:`/home/ubuntu/sub2api-cn-relay-manager-git-current`
|
||||
- HEAD: `4ec9dad44f6768368c2aa782ed96d36355709823`
|
||||
- `git status --short` 发现:
|
||||
- `?? sub2api-crm-server`
|
||||
|
||||
说明:
|
||||
|
||||
- 远端 repo 不是完全干净
|
||||
- 存在未纳管二进制残留
|
||||
|
||||
### CRM 目录关键文件
|
||||
|
||||
- `.env.crm`:2026-06-09 21:28 更新
|
||||
- `sub2api-crm-server-gateway`:2026-06-09 21:28 上传
|
||||
- `sub2api-cn-relay-manager-server`:2026-06-06 10:39 旧文件
|
||||
- `crm.log`:最新内容只有一条启动失败日志
|
||||
- `crm.pid`:2026-06-09 21:28 写入
|
||||
|
||||
## 5. 已确认的部署失败根因链
|
||||
|
||||
### 根因 1:旧进程没有真正退掉
|
||||
|
||||
证据:
|
||||
|
||||
- `18190` 被旧进程 `54164` 占用
|
||||
- 新 bootstrap 之后 `crm.log` 明确报:
|
||||
- `listen tcp 127.0.0.1:18190: bind: address already in use`
|
||||
|
||||
结论:
|
||||
|
||||
- 新进程没有启动成功
|
||||
- 旧进程继续服务,造成“部署看起来成功、实际没切换”
|
||||
|
||||
### 根因 2:deploy_crm_only.sh 不是生产热更新脚本
|
||||
|
||||
证据:
|
||||
|
||||
- `scripts/deploy/deploy_crm_only.sh:138`
|
||||
- `rm -f "$CRM_DB_FILE" "$CRM_LOG_FILE"`
|
||||
- `scripts/deploy/deploy_crm_only.sh:205`
|
||||
- 远端清理时也删除 `REMOTE_CRM_DB_FILE`
|
||||
|
||||
结论:
|
||||
|
||||
- 该脚本会删除 SQLite DB
|
||||
- 它适合“重建栈”,不适合“保状态热更新”
|
||||
|
||||
### 根因 3:CRM 启动方式仍然脆弱
|
||||
|
||||
证据:
|
||||
|
||||
- `scripts/deploy/deploy_crm_only.sh:140`
|
||||
- `nohup bash -lc 'set -a; source "$CRM_ENV_FILE"; set +a; exec "$CRM_BINARY"'`
|
||||
|
||||
结论:
|
||||
|
||||
- 这是已知脆弱模式
|
||||
- 即使这次主要失败是端口占用,这个模式本身也不应继续作为生产标准
|
||||
|
||||
### ✅ 已修复:portal 部署脚本和测试门禁已修正契约
|
||||
|
||||
原问题:
|
||||
|
||||
- `scripts/deploy/deploy_tksea_portal.sh` 曾使用 `$cookie_crm_session`
|
||||
- `scripts/test/test_tksea_portal_assets.sh` 曾断言 `$cookie_crm_session`,并排斥 `$cookie_crm_subject`
|
||||
|
||||
修复(本轮完成):
|
||||
|
||||
1. `deploy_tksea_portal.sh` 已改为 `$cookie_crm_subject`
|
||||
2. `test_tksea_portal_assets.sh` 已改为断言 `$cookie_crm_subject` 存在、`$cookie_crm_session` 不存在
|
||||
3. `nginx.sub.tksea.top.conf.example` 已同步更新
|
||||
4. 测试门禁已通过:`bash scripts/test/test_tksea_portal_assets.sh` → PASS
|
||||
|
||||
## 6. 资源与容量风险
|
||||
|
||||
### 高风险:内存压力
|
||||
|
||||
证据:
|
||||
|
||||
- 机器总内存 3.6 GiB
|
||||
- 可用仅 259 MiB
|
||||
- swap 已用近 1 GiB
|
||||
- Top RSS:
|
||||
- `gitea web` 占用约 `2.45 GiB` RSS
|
||||
|
||||
结论:
|
||||
|
||||
- 当前机器资源最重的不是 CRM,而是 Gitea
|
||||
- remote43 已处于“高内存压力 + 持续 swap”状态
|
||||
- 这会放大部署时序问题、I/O 抖动、git/编译/解压失败概率
|
||||
|
||||
### 中风险:孤儿/历史进程与残留栈
|
||||
|
||||
证据:
|
||||
|
||||
- 存在 `/tmp/sub2api-crm-static-verify`
|
||||
- 存在多个 `/app/sub2api`
|
||||
- 存在 quarantine / tokens-reef / 临时目录残留
|
||||
|
||||
结论:
|
||||
|
||||
- remote43 不够“单一生产真相”
|
||||
- 存在历史验证栈、遗留进程、临时产物共存
|
||||
|
||||
## 7. 备份与恢复现状
|
||||
|
||||
当前看到的备份更多是“人工备份痕迹”,不是制度化备份:
|
||||
|
||||
- `sub2api-cn-relay-manager.db.bak.20260608_*`
|
||||
- `trusted-subject-backup-20260609_173034/`
|
||||
|
||||
缺口:
|
||||
|
||||
- 没看到定时数据库备份作业
|
||||
- 没看到统一的保留策略
|
||||
- 没看到恢复演练记录
|
||||
|
||||
## 8. 未来持续运维规则(强制)
|
||||
|
||||
### R1. 生产 CRM 只能走“非破坏性热更新”
|
||||
|
||||
禁止:
|
||||
|
||||
- 删除生产 SQLite DB
|
||||
- 用“重建栈脚本”直接覆盖生产
|
||||
- 不确认旧 PID 已退出就启动新进程
|
||||
|
||||
必须:
|
||||
|
||||
1. 记录当前 PID / 二进制 sha256 / DB 文件时间
|
||||
2. 停旧进程并确认 `18190` 已释放
|
||||
3. 上传新二进制
|
||||
4. 启动新进程
|
||||
5. 验证:
|
||||
- `healthz=ok`
|
||||
- 管理接口 200
|
||||
- `readlink /proc/$pid/exe` 指向新二进制,且不带 `(deleted)`
|
||||
|
||||
### R2. 生产脚本禁止删除 DB
|
||||
|
||||
任何生产 deploy 脚本都不得包含:
|
||||
|
||||
- `rm -f "$CRM_DB_FILE"`
|
||||
- `rm -f "$REMOTE_CRM_DB_FILE"`
|
||||
|
||||
如确需重建数据库:
|
||||
|
||||
- 必须单独命名为 destructive/rebuild 脚本
|
||||
- 必须显式要求人工确认
|
||||
|
||||
### R3. CRM 必须纳入明确监督器
|
||||
|
||||
当前状态:
|
||||
|
||||
- CRM 不是 systemd service
|
||||
- 只是一个 pidfile + nohup 进程
|
||||
|
||||
后续规则:
|
||||
|
||||
- 生产 CRM 必须迁移到以下二选一:
|
||||
1. systemd service
|
||||
2. 明确的进程监督器(且有 status/restart/log 统一入口)
|
||||
|
||||
目标:
|
||||
|
||||
- 避免 `(deleted)` ELF 长期运行
|
||||
- 避免 pidfile 漂移
|
||||
- 避免人工 kill/start 时序错误
|
||||
|
||||
### R4. 部署脚本、示例、测试门禁必须共享同一契约
|
||||
|
||||
当前关键契约:
|
||||
|
||||
- trusted subject 来源必须是 `$cookie_crm_subject`
|
||||
- 不能再使用 `$cookie_crm_session`
|
||||
|
||||
规则:
|
||||
|
||||
- 文档改了,deploy script 和 test gate 必须同改
|
||||
- 任何一个仍保留旧契约,都不允许宣称闭环
|
||||
|
||||
### R5. 每次部署后必须做“三层验证”
|
||||
|
||||
1. 进程层
|
||||
|
||||
- 监听端口正确
|
||||
- PID 正确
|
||||
- `/proc/$pid/exe` 不带 `(deleted)`
|
||||
|
||||
2. 控制面层
|
||||
|
||||
- `/healthz` = ok
|
||||
- `GET /api/packs` = 200
|
||||
|
||||
3. 用户面层
|
||||
|
||||
- `GET /api/portal/session`
|
||||
- create/chat/pause/resume/delete 全链路
|
||||
|
||||
### R6. remote43 要建立单一真相目录
|
||||
|
||||
建议长期保留:
|
||||
|
||||
- 一个生产 CRM root
|
||||
- 一个生产 repo root
|
||||
- 一个 runbook 路径
|
||||
- 一个 backup 路径
|
||||
|
||||
禁止长期保留大量含糊用途目录:
|
||||
|
||||
- 多个临时验证 root 并存
|
||||
- 不说明用途的 quarantine / tmp 二进制常驻
|
||||
|
||||
### R7. 建立最低备份制度
|
||||
|
||||
至少包括:
|
||||
|
||||
1. SQLite DB 定时备份
|
||||
2. nginx site 配置备份
|
||||
3. `.env.crm` 备份
|
||||
4. 最近 N 份保留策略
|
||||
5. 一次真实恢复演练
|
||||
|
||||
### R8. 资源治理规则
|
||||
|
||||
- Gitea 当前是最大内存消费者,后续任何“服务器变慢/部署异常”都要先看它
|
||||
- 当 available memory < 300MiB 或 swap 持续增长时,禁止在 remote43 直接进行重编译/大规模解包
|
||||
- 生产构建优先本地 build 后上传成品,而不是远端现编
|
||||
|
||||
## 9. 推荐后续动作(按优先级)
|
||||
|
||||
### P0
|
||||
|
||||
1. 修 `scripts/deploy/deploy_crm_only.sh`
|
||||
- 去掉删 DB
|
||||
- 改成真正 hot-update
|
||||
- 启动后验证新 PID 与新 exe
|
||||
2. 修 `scripts/deploy/deploy_tksea_portal.sh`
|
||||
- 改成 `$cookie_crm_subject`
|
||||
3. 修 `scripts/test/test_tksea_portal_assets.sh`
|
||||
- 门禁改成断言 `$cookie_crm_subject`
|
||||
|
||||
### P1
|
||||
|
||||
4. 为 CRM 增加 systemd service
|
||||
5. 增加数据库定时备份脚本与保留策略
|
||||
6. 增加部署后自动 smoke 验证脚本
|
||||
|
||||
### P2
|
||||
|
||||
7. 清理 remote43 历史验证残留目录/进程
|
||||
8. 评估 Gitea 内存占用与迁移/限额策略
|
||||
|
||||
## 10. 当前结论
|
||||
|
||||
- remote43 不是“服务挂了”,而是“部署切换机制不可靠”
|
||||
- 生产 deploy 脚本当前不满足持续运维标准
|
||||
- 未来必须把“部署、监督、备份、门禁、资源治理”五件事制度化,不能继续靠临时脚本和人工记忆
|
||||
@@ -9,18 +9,19 @@
|
||||
2. 生产宿主 DeepSeek 组可用: 官方 key 已导入
|
||||
3. 生产宿主 MiniMax 组可用: 官方 key 已导入
|
||||
4. OpenClaw 默认: primary=tksea-minimax/MiniMax-M3, fallbacks=[tksea-deepseek/deepseek-chat, deepseek-official/deepseek-chat]
|
||||
5. asxs 结论成谜区分: 本机可用 / 生产宿主出口不可用
|
||||
5. asxs 上游服务正常: api.asxs.top/v1 返回 200
|
||||
6. a7m 上游服务正常: kimi.a7m.com.cn/v1 返回 401(需鉴权)
|
||||
|
||||
## 当前已知缺口
|
||||
## 当前已知缺口 (2026-06-10 更新)
|
||||
|
||||
| 缺口 | 优先级 | 状态 | 阻塞原因 |
|
||||
| ---------------------------- | ------ | -------- | ------------------------------------ |
|
||||
| Kimi 组不可用 | P0 | blocked | a7m 上游 429 overloaded |
|
||||
| GLM 智谱未导入 | P0 | blocked | 无 upstream key |
|
||||
| asxs 生产宿主不可用 | P1 | known | remote43 出口被 Cloudflare 1010 拦截 |
|
||||
| channel_pricing_intervals 空 | P2 | accepted | 不影响路由 |
|
||||
| 幂等部署脚本 | P2 | planned | SQL 型步骤未封装 |
|
||||
| OpenClaw CLI 版本漂移 | P3 | known | 2026.5.12 旧版, backup 已保留 |
|
||||
| 缺口 | 优先级 | 状态 | 阻塞原因 |
|
||||
| ---------------------------- | ------ | -------- | ----------------------------- |
|
||||
| GLM 智谱未导入 | P0 | blocked | 无 upstream key |
|
||||
| channel_pricing_intervals 空 | P2 | accepted | 不影响路由 |
|
||||
| 幂等部署脚本 | P2 | planned | SQL 型步骤未封装 |
|
||||
| OpenClaw CLI 版本漂移 | P3 | known | 2026.5.12 旧版, backup 已保留 |
|
||||
|
||||
**注**: a7m/asxs 上游服务验证正常(2026-06-10), 需配置有效 API key 后导入
|
||||
|
||||
## Phase 1 — 供应链收口
|
||||
|
||||
|
||||
13
scripts/deploy/.env.deploy.example
Normal file
13
scripts/deploy/.env.deploy.example
Normal file
@@ -0,0 +1,13 @@
|
||||
# Copy this file to scripts/deploy/.env.deploy before running deploy_tksea_portal.sh
|
||||
# Do not commit real credentials.
|
||||
|
||||
KEY=/path/to/ssh-key.pem
|
||||
REMOTE=ubuntu@example-host
|
||||
REMOTE_CRM_PORT=18190
|
||||
|
||||
# Optional overrides
|
||||
# REMOTE_PORTAL_DIR=/var/www/sub2api-portal
|
||||
# REMOTE_NGINX_SITE=/etc/nginx/sites-available/tksea
|
||||
# REMOTE_HOST_PORT=8080
|
||||
# LOCAL_PORTAL_DIR=/absolute/path/to/deploy/tksea-portal
|
||||
# REMOTE_STAGE_DIR=/tmp/sub2api-portal-deploy
|
||||
@@ -7,11 +7,19 @@
|
||||
set -euo pipefail
|
||||
|
||||
ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/../.." && pwd)"
|
||||
DEPLOY_ENV_FILE="${DEPLOY_ENV_FILE:-$ROOT_DIR/scripts/deploy/.env.deploy}"
|
||||
if [[ -f "$DEPLOY_ENV_FILE" ]]; then
|
||||
set -a
|
||||
# shellcheck disable=SC1090
|
||||
source "$DEPLOY_ENV_FILE"
|
||||
set +a
|
||||
fi
|
||||
|
||||
# shellcheck disable=SC1091
|
||||
source "$ROOT_DIR/scripts/deploy/remote43_patched_stack_lib.sh"
|
||||
|
||||
KEY="${KEY:-/home/long/下载/zjsea.pem}"
|
||||
REMOTE="${REMOTE:-ubuntu@43.155.133.187}"
|
||||
KEY="${KEY:-}"
|
||||
REMOTE="${REMOTE:-}"
|
||||
STACK_NAME="${STACK_NAME:-crm-only-$(date +%Y%m%d)}"
|
||||
CRM_PORT="${CRM_PORT:-18190}"
|
||||
CRM_BINARY="${CRM_BINARY:-$ROOT_DIR/server}"
|
||||
@@ -120,29 +128,121 @@ if [[ -f "$REMOTE_REPO_BUNDLE" ]]; then
|
||||
git -C "$REMOTE_REPO_ROOT" config user.email "remote43-crm@tksea.top"
|
||||
fi
|
||||
|
||||
# 非破坏性热更新:先确认旧进程退出,再启动新进程
|
||||
# 禁止删除DB:生产数据必须保留
|
||||
|
||||
# 改进的停止逻辑:不仅按 PID 文件,还按进程名和端口清理
|
||||
echo "Stopping any existing CRM processes..."
|
||||
|
||||
# 1. 按 PID 文件停止(如果存在)
|
||||
if [[ -f "$CRM_PID_FILE" ]]; then
|
||||
OLD_PID="$(cat "$CRM_PID_FILE")"
|
||||
if kill "$OLD_PID" >/dev/null 2>&1; then
|
||||
sleep 1
|
||||
if kill -0 "$OLD_PID" >/dev/null 2>&1; then
|
||||
echo "Stopping PID from pidfile: $OLD_PID"
|
||||
kill "$OLD_PID" >/dev/null 2>&1 || true
|
||||
for i in {1..20}; do
|
||||
if ! kill -0 "$OLD_PID" >/dev/null 2>&1; then break; fi
|
||||
sleep 0.5
|
||||
done
|
||||
if kill -0 "$OLD_PID" >/dev/null 2>&1; then
|
||||
kill -9 "$OLD_PID" >/dev/null 2>&1 || true
|
||||
sleep 1
|
||||
fi
|
||||
fi
|
||||
rm -f "$CRM_PID_FILE"
|
||||
fi
|
||||
rm -f "$CRM_DB_FILE" "$CRM_LOG_FILE"
|
||||
|
||||
nohup bash -lc 'set -a; source "$CRM_ENV_FILE"; set +a; exec "$CRM_BINARY"' >"$CRM_LOG_FILE" 2>&1 &
|
||||
echo $! > "$CRM_PID_FILE"
|
||||
# 2. 按进程名停止任何残留的 CRM 进程
|
||||
for pattern in 'sub2api.*crm' 'sub2api.*relay-manager'; do
|
||||
for pid in $(pgrep -f "$pattern" 2>/dev/null); do
|
||||
echo "Stopping process by pattern ($pattern): $pid"
|
||||
kill "$pid" 2>/dev/null || true
|
||||
sleep 0.5
|
||||
if kill -0 "$pid" 2>/dev/null; then
|
||||
kill -9 "$pid" 2>/dev/null || true
|
||||
fi
|
||||
done
|
||||
done
|
||||
|
||||
python3 - "$CRM_PORT" <<'PY'
|
||||
import subprocess, sys, time
|
||||
url = f"http://127.0.0.1:{sys.argv[1]}/healthz"
|
||||
for _ in range(30):
|
||||
r = subprocess.run(["curl", "-fsS", url], text=True, capture_output=True)
|
||||
# 3. 强制释放端口(如有必要)
|
||||
if command -v fuser >/dev/null 2>&1; then
|
||||
fuser -k "${CRM_PORT}/tcp" 2>/dev/null || true
|
||||
fi
|
||||
|
||||
# 清理日志但不碰DB
|
||||
rm -f "$CRM_LOG_FILE"
|
||||
|
||||
# 验证端口未被占用
|
||||
for i in {1..10}; do
|
||||
if ! ss -tlnp 2>/dev/null | grep -q ":$CRM_PORT " && \
|
||||
! netstat -tlnp 2>/dev/null | grep -q ":$CRM_PORT "; then
|
||||
break
|
||||
fi
|
||||
echo "Waiting for port $CRM_PORT to be released... (attempt $i/10)"
|
||||
sleep 1
|
||||
done
|
||||
|
||||
if ss -tlnp 2>/dev/null | grep -q ":$CRM_PORT " || netstat -tlnp 2>/dev/null | grep -q ":$CRM_PORT "; then
|
||||
echo "ERROR: Port $CRM_PORT is still in use after cleanup. Cannot start new CRM." >&2
|
||||
ss -tlnp 2>/dev/null | grep ":$CRM_PORT " || netstat -tlnp 2>/dev/null | grep ":$CRM_PORT "
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "Port $CRM_PORT is free. Starting new CRM..."
|
||||
|
||||
# 使用更可靠的方式启动(优先systemd,回退nohup)
|
||||
if command -v systemctl >/dev/null 2>&1 && [[ -f /etc/systemd/system/sub2api-crm.service ]]; then
|
||||
systemctl restart sub2api-crm || exit 1
|
||||
else
|
||||
nohup bash -lc 'set -a; source "$CRM_ENV_FILE"; set +a; exec "$CRM_BINARY"' >"$CRM_LOG_FILE" 2>&1 &
|
||||
echo $! > "$CRM_PID_FILE"
|
||||
fi
|
||||
|
||||
python3 - "$CRM_PORT" "$CRM_PID_FILE" <<'PY'
|
||||
import subprocess, sys, time, os
|
||||
|
||||
port = sys.argv[1]
|
||||
pid_file = sys.argv[2]
|
||||
|
||||
# 1. 等待 healthz
|
||||
healthz_url = f"http://127.0.0.1:{port}/healthz"
|
||||
for i in range(30):
|
||||
r = subprocess.run(["curl", "-fsS", healthz_url], text=True, capture_output=True)
|
||||
if r.returncode == 0 and r.stdout.strip() == "ok":
|
||||
raise SystemExit(0)
|
||||
print(f"Health check passed on attempt {i+1}")
|
||||
break
|
||||
time.sleep(1)
|
||||
raise SystemExit(f"crm healthz did not become ready on {url}")
|
||||
else:
|
||||
raise SystemExit(f"crm healthz did not become ready on {healthz_url}")
|
||||
|
||||
# 2. 验证二进制不是 deleted 状态
|
||||
with open(pid_file) as f:
|
||||
pid = f.read().strip()
|
||||
exe_link = f"/proc/{pid}/exe"
|
||||
if os.path.islink(exe_link):
|
||||
target = os.readlink(exe_link)
|
||||
if "deleted" in target:
|
||||
raise SystemExit(f"ERROR: Binary shows (deleted): {target}")
|
||||
print(f"Binary OK: {target}")
|
||||
|
||||
# 3. 验证 portal session 路由(新版本应有此路由)
|
||||
session_url = f"http://127.0.0.1:{port}/api/portal/session/state"
|
||||
r = subprocess.run(["curl", "-fsS", session_url], text=True, capture_output=True)
|
||||
if r.returncode == 0:
|
||||
print(f"Portal session route OK: {r.stdout.strip()}")
|
||||
elif r.returncode == 22 and "404" in r.stderr:
|
||||
raise SystemExit(f"ERROR: Portal session route returns 404 - may be running old version")
|
||||
else:
|
||||
print(f"Warning: Portal session route check failed: {r.stderr}")
|
||||
|
||||
raise SystemExit(0)
|
||||
PY
|
||||
|
||||
# 部署验证完成
|
||||
echo "=== Deployment Verification ==="
|
||||
NEW_PID=$(cat "$CRM_PID_FILE")
|
||||
echo "New CRM PID: $NEW_PID"
|
||||
ls -la "/proc/$NEW_PID/exe" 2>/dev/null | grep -v deleted && echo "Binary state: OK (not deleted)" || echo "WARNING: Binary may be deleted"
|
||||
printf "crm_base=http://127.0.0.1:%s\n" "$CRM_PORT"
|
||||
printf "crm_pid_file=%s\n" "$CRM_PID_FILE"
|
||||
printf "crm_log=%s\n" "$CRM_LOG_FILE"
|
||||
@@ -156,9 +256,11 @@ BOOTSTRAP_EOF
|
||||
|
||||
|
||||
main() {
|
||||
require_cmd bash curl git python3 ssh scp
|
||||
remote43_require_file "$KEY" "ssh key"
|
||||
remote43_require_file "$CRM_BINARY" "crm server binary"
|
||||
require_cmd bash curl git python3 ssh scp
|
||||
[[ -n "$KEY" ]] || die "KEY is required; copy scripts/deploy/.env.deploy.example to scripts/deploy/.env.deploy and fill it"
|
||||
[[ -n "$REMOTE" ]] || die "REMOTE is required; copy scripts/deploy/.env.deploy.example to scripts/deploy/.env.deploy and fill it"
|
||||
remote43_require_file "$KEY" "ssh key"
|
||||
remote43_require_file "$CRM_BINARY" "crm server binary"
|
||||
|
||||
rm -f "$LOCAL_REPO_BUNDLE"
|
||||
git -C "$ROOT_DIR" bundle create "$LOCAL_REPO_BUNDLE" main
|
||||
@@ -187,34 +289,95 @@ main() {
|
||||
cp "$bootstrap_file" "$LOCAL_DEPLOY_DIR/bootstrap.sh"
|
||||
|
||||
ssh_remote "mkdir -p $(printf "%q" "$REMOTE_ROOT")
|
||||
# 改进的停止逻辑:不仅按 PID 文件,还按进程名和端口清理
|
||||
echo 'Stopping any existing CRM processes...'
|
||||
|
||||
# 1. 按 PID 文件停止(如果存在)
|
||||
if [[ -f $(printf "%q" "$REMOTE_CRM_PID_FILE") ]]; then
|
||||
OLDPID=\$(cat $(printf "%q" "$REMOTE_CRM_PID_FILE"))
|
||||
kill \$OLDPID 2>/dev/null || true
|
||||
sleep 1
|
||||
if kill -0 \$OLDPID 2>/dev/null; then
|
||||
echo \"Stopping PID from pidfile: \$OLDPID\"
|
||||
kill \$OLDPID 2>/dev/null || true
|
||||
for i in {1..20}; do
|
||||
if ! kill -0 \$OLDPID 2>/dev/null; then break; fi
|
||||
sleep 0.5
|
||||
done
|
||||
if kill -0 \$OLDPID 2>/dev/null; then kill -9 \$OLDPID 2>/dev/null || true; sleep 1; fi
|
||||
fi
|
||||
rm -f $(printf "%q" "$REMOTE_CRM_PID_FILE")
|
||||
fi
|
||||
rm -f $(printf "%q" "$REMOTE_CRM_PID_FILE") $(printf "%q" "$REMOTE_CRM_DB_FILE") $(printf "%q" "$REMOTE_CRM_LOG_FILE") $(printf "%q" "$REMOTE_CRM_BINARY")"
|
||||
|
||||
# 2. 按进程名停止任何残留的 CRM 进程
|
||||
for pattern in 'sub2api.*crm' 'sub2api.*relay-manager'; do
|
||||
for pid in \$(pgrep -f \"\$pattern\" 2>/dev/null); do
|
||||
echo \"Stopping process by pattern (\$pattern): \$pid\"
|
||||
kill \$pid 2>/dev/null || true
|
||||
sleep 0.5
|
||||
if kill -0 \$pid 2>/dev/null; then kill -9 \$pid 2>/dev/null || true; fi
|
||||
done
|
||||
done
|
||||
|
||||
# 3. 强制释放端口
|
||||
fuser -k $(printf "%q" "$CRM_PORT")/tcp 2>/dev/null || true
|
||||
|
||||
# 4. 验证端口释放
|
||||
for i in {1..5}; do
|
||||
if ! ss -tlnp 2>/dev/null | grep -q '$(printf "%q" ":$CRM_PORT")' && \\
|
||||
! netstat -tlnp 2>/dev/null | grep -q '$(printf "%q" ":$CRM_PORT")'; then
|
||||
break
|
||||
fi
|
||||
echo \"Waiting for port release... (\$i/5)\"
|
||||
sleep 1
|
||||
done
|
||||
# 禁止删除DB:rm -f DB_FILE 已被移除
|
||||
rm -f $(printf "%q" "$REMOTE_CRM_LOG_FILE") $(printf "%q" "$REMOTE_CRM_BINARY")"
|
||||
scp_remote "$CRM_BINARY" "$REMOTE:$REMOTE_CRM_BINARY"
|
||||
scp_remote "$LOCAL_REPO_BUNDLE" "$REMOTE:$REMOTE_REPO_BUNDLE"
|
||||
scp_remote "$crm_env_file" "$REMOTE:$REMOTE_CRM_ENV_FILE"
|
||||
scp_remote "$bootstrap_file" "$REMOTE:$REMOTE_BOOTSTRAP_FILE"
|
||||
ssh_remote "bash $(printf "%q" "$REMOTE_BOOTSTRAP_FILE")"
|
||||
ssh_remote "bash $(printf '%q' "$REMOTE_BOOTSTRAP_FILE")"
|
||||
|
||||
cat <<EOF
|
||||
crm-only stack prepared
|
||||
remote crm base: http://127.0.0.1:${CRM_PORT}
|
||||
remote crm env file: ${REMOTE_CRM_ENV_FILE}
|
||||
remote crm log: ${REMOTE_CRM_LOG_FILE}
|
||||
remote repo root: ${REMOTE_REPO_ROOT}
|
||||
local operator env file: ${LOCAL_OPERATOR_ENV_FILE}
|
||||
local tunnel script: ${LOCAL_TUNNEL_SCRIPT}
|
||||
local deploy dir: ${LOCAL_DEPLOY_DIR}
|
||||
echo ""
|
||||
echo "=== Post-Deployment Verification ==="
|
||||
|
||||
next:
|
||||
1. 在另一终端运行: ${LOCAL_TUNNEL_SCRIPT}
|
||||
2. 当前终端执行: set -a; source ${LOCAL_OPERATOR_ENV_FILE}; set +a
|
||||
3. 验证: curl -fsS http://127.0.0.1:${CRM_PORT}/healthz
|
||||
curl -fsS -H "Authorization: Bearer \$crm_admin_token" http://127.0.0.1:${CRM_PORT}/api/packs
|
||||
EOF
|
||||
# 等待服务启动
|
||||
sleep 3
|
||||
|
||||
# 验证 healthz
|
||||
echo -n "1. Health check: "
|
||||
if ssh_remote "curl -fsS http://127.0.0.1:${CRM_PORT}/healthz 2>/dev/null" | grep -q "^ok$"; then
|
||||
echo "[PASS]"
|
||||
else
|
||||
echo "[FAIL]"
|
||||
fi
|
||||
|
||||
# 验证 portal session 路由
|
||||
echo -n "2. Portal session route: "
|
||||
SESSION_RESULT=$(ssh_remote "curl -s -o /dev/null -w '%{http_code}' http://127.0.0.1:${CRM_PORT}/api/portal/session/state 2>/dev/null")
|
||||
if [[ "$SESSION_RESULT" == "200" ]]; then
|
||||
echo "[PASS] Returned 200 - new version"
|
||||
elif [[ "$SESSION_RESULT" == "404" ]]; then
|
||||
echo "[WARN] Returned 404 - may be old version"
|
||||
else
|
||||
echo "[UNKNOWN] Returned $SESSION_RESULT"
|
||||
fi
|
||||
|
||||
# 验证二进制状态
|
||||
echo -n "3. Binary state check: "
|
||||
PID_VAL=$(ssh_remote "cat $(printf '%q' "$REMOTE_CRM_PID_FILE") 2>/dev/null")
|
||||
if [[ -n "$PID_VAL" ]]; then
|
||||
BINARY_LINK=$(ssh_remote "ls /proc/${PID_VAL}/exe 2>/dev/null")
|
||||
if echo "$BINARY_LINK" | grep -q deleted; then
|
||||
echo "[FAIL] Binary shows deleted"
|
||||
else
|
||||
echo "[OK] Binary not deleted"
|
||||
fi
|
||||
else
|
||||
echo "[WARN] Cannot check binary state"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
}
|
||||
|
||||
main "$@"
|
||||
|
||||
@@ -2,12 +2,20 @@
|
||||
set -euo pipefail
|
||||
|
||||
ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/../.." && pwd)"
|
||||
KEY="${KEY:-/home/long/下载/zjsea.pem}"
|
||||
REMOTE="${REMOTE:-ubuntu@43.155.133.187}"
|
||||
DEPLOY_ENV_FILE="${DEPLOY_ENV_FILE:-$ROOT_DIR/scripts/deploy/.env.deploy}"
|
||||
if [[ -f "$DEPLOY_ENV_FILE" ]]; then
|
||||
set -a
|
||||
# shellcheck disable=SC1090
|
||||
source "$DEPLOY_ENV_FILE"
|
||||
set +a
|
||||
fi
|
||||
|
||||
KEY="${KEY:-}"
|
||||
REMOTE="${REMOTE:-}"
|
||||
REMOTE_PORTAL_DIR="${REMOTE_PORTAL_DIR:-/var/www/sub2api-portal}"
|
||||
REMOTE_NGINX_SITE="${REMOTE_NGINX_SITE:-/etc/nginx/sites-available/tksea}"
|
||||
REMOTE_HOST_PORT="${REMOTE_HOST_PORT:-8080}"
|
||||
REMOTE_CRM_PORT="${REMOTE_CRM_PORT:-18190}"
|
||||
REMOTE_CRM_PORT="${REMOTE_CRM_PORT:-}"
|
||||
LOCAL_PORTAL_DIR="${LOCAL_PORTAL_DIR:-$ROOT_DIR/deploy/tksea-portal}"
|
||||
REMOTE_STAGE_DIR="${REMOTE_STAGE_DIR:-/tmp/sub2api-portal-deploy}"
|
||||
DRY_RUN="${DRY_RUN:-0}"
|
||||
@@ -46,6 +54,9 @@ main() {
|
||||
|
||||
[[ -d "$LOCAL_PORTAL_DIR" ]] || die "missing portal dir: $LOCAL_PORTAL_DIR"
|
||||
[[ -f "$LOCAL_PORTAL_DIR/index.html" ]] || die "missing portal index: $LOCAL_PORTAL_DIR/index.html"
|
||||
[[ -n "$KEY" ]] || die "KEY is required; copy scripts/deploy/.env.deploy.example to scripts/deploy/.env.deploy and fill it"
|
||||
[[ -n "$REMOTE" ]] || die "REMOTE is required; copy scripts/deploy/.env.deploy.example to scripts/deploy/.env.deploy and fill it"
|
||||
[[ -n "$REMOTE_CRM_PORT" ]] || die "REMOTE_CRM_PORT is required; copy scripts/deploy/.env.deploy.example to scripts/deploy/.env.deploy and fill it"
|
||||
if [[ "$DRY_RUN" != "1" ]]; then
|
||||
[[ -f "$KEY" ]] || die "missing ssh key: $KEY"
|
||||
fi
|
||||
@@ -97,7 +108,8 @@ block = textwrap.dedent("""\
|
||||
}
|
||||
|
||||
location /portal-admin-api/ {
|
||||
# 必须由受信登录/鉴权层把用户 subject 放进 \$portal_subject,不能信任浏览器自带 header。
|
||||
# 必须由受信登录/鉴权层把用户签名放进 $cookie_crm_subject,不能信任浏览器自带 header。
|
||||
# 这是 CRM 配置 TRUSTED_SUBJECT_COOKIE=crm_subject 对应的 cookie 名。
|
||||
# 同时 CRM 需配置:
|
||||
# SUB2API_CRM_TRUSTED_SUBJECT_HEADER=X-CRM-Authenticated-Subject
|
||||
# SUB2API_CRM_TRUSTED_PROXY_SECRET_HEADER=X-CRM-Trusted-Proxy
|
||||
@@ -107,8 +119,7 @@ block = textwrap.dedent("""\
|
||||
proxy_set_header X-Real-IP \$remote_addr;
|
||||
proxy_set_header X-Forwarded-For \$proxy_add_x_forwarded_for;
|
||||
proxy_set_header X-Forwarded-Proto \$scheme;
|
||||
proxy_set_header X-Portal-Subject "";
|
||||
proxy_set_header X-CRM-Authenticated-Subject \$portal_subject;
|
||||
proxy_set_header X-CRM-Authenticated-Subject \$cookie_crm_subject;
|
||||
proxy_set_header X-CRM-Trusted-Proxy "REPLACE_WITH_SUB2API_CRM_TRUSTED_PROXY_SECRET";
|
||||
proxy_http_version 1.1;
|
||||
}
|
||||
|
||||
135
scripts/deploy/install_crm_systemd.sh
Normal file
135
scripts/deploy/install_crm_systemd.sh
Normal file
@@ -0,0 +1,135 @@
|
||||
#!/bin/bash
|
||||
# install_crm_systemd.sh - 安装 CRM systemd 服务
|
||||
# Usage: sudo ./install_crm_systemd.sh [crm_dir]
|
||||
|
||||
set -e
|
||||
|
||||
CRM_DIR="${1:-/home/ubuntu/crm-only-20260602_18190}"
|
||||
SERVICE_NAME="sub2api-crm"
|
||||
SERVICE_FILE="/etc/systemd/system/${SERVICE_NAME}.service"
|
||||
ENV_FILE="${CRM_DIR}/.env.crm"
|
||||
|
||||
echo "=== Installing Sub2API CRM systemd service ==="
|
||||
echo "CRM Directory: ${CRM_DIR}"
|
||||
echo "Service File: ${SERVICE_FILE}"
|
||||
|
||||
# 检查是否为 root
|
||||
if [ "$EUID" -ne 0 ]; then
|
||||
echo "ERROR: Please run with sudo"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# 检查目录存在
|
||||
if [ ! -d "${CRM_DIR}" ]; then
|
||||
echo "ERROR: Directory ${CRM_DIR} does not exist"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# 检查可执行文件存在
|
||||
if [ ! -x "${CRM_DIR}/sub2api-cn-relay-manager-server" ]; then
|
||||
echo "ERROR: Binary not found or not executable"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# 停止现有 nohup 进程
|
||||
echo "Stopping existing CRM processes..."
|
||||
pkill -f 'sub2api-cn-relay-manager-server' 2>/dev/null || true
|
||||
sleep 2
|
||||
|
||||
# 从目录名提取端口
|
||||
PORT=$(echo "${CRM_DIR}" | grep -oE '[0-9]+' | tail -1)
|
||||
if [ -z "${PORT}" ]; then
|
||||
PORT="18190"
|
||||
fi
|
||||
echo "Detected port: ${PORT}"
|
||||
|
||||
# 获取运行用户
|
||||
RUN_USER=$(stat -c '%U' "${CRM_DIR}")
|
||||
echo "Run user: ${RUN_USER}"
|
||||
|
||||
# 检查环境变量文件
|
||||
if [ ! -f "${ENV_FILE}" ]; then
|
||||
echo "ERROR: Environment file ${ENV_FILE} not found"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# 创建 systemd service 文件
|
||||
cat > "${SERVICE_FILE}" << EOF
|
||||
[Unit]
|
||||
Description=Sub2API CRM API Server (Port ${PORT})
|
||||
After=network.target
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
User=${RUN_USER}
|
||||
Group=${RUN_USER}
|
||||
WorkingDirectory=${CRM_DIR}
|
||||
EnvironmentFile=${ENV_FILE}
|
||||
Environment="PATH=/usr/local/bin:/usr/bin:/bin"
|
||||
ExecStart=${CRM_DIR}/sub2api-cn-relay-manager-server
|
||||
ExecReload=/bin/kill -HUP \$MAINPID
|
||||
KillMode=process
|
||||
Restart=on-failure
|
||||
RestartSec=5
|
||||
StandardOutput=append:${CRM_DIR}/crm.log
|
||||
StandardError=append:${CRM_DIR}/crm.log
|
||||
|
||||
# Security hardening
|
||||
NoNewPrivileges=true
|
||||
PrivateTmp=true
|
||||
ProtectSystem=strict
|
||||
ProtectHome=true
|
||||
ReadWritePaths=${CRM_DIR}
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
EOF
|
||||
|
||||
echo "Service file created: ${SERVICE_FILE}"
|
||||
|
||||
# 重新加载 systemd
|
||||
systemctl daemon-reload
|
||||
|
||||
# 启用服务
|
||||
systemctl enable "${SERVICE_NAME}"
|
||||
echo "Service enabled"
|
||||
|
||||
# 启动服务
|
||||
echo "Starting service..."
|
||||
systemctl start "${SERVICE_NAME}"
|
||||
sleep 3
|
||||
|
||||
# 验证服务状态
|
||||
if systemctl is-active --quiet "${SERVICE_NAME}"; then
|
||||
echo "✓ Service is running"
|
||||
else
|
||||
echo "ERROR: Service failed to start"
|
||||
systemctl status "${SERVICE_NAME}" --no-pager
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# 验证端口监听
|
||||
if ss -tlnp | grep -q ":${PORT}"; then
|
||||
echo "✓ Port ${PORT} is listening"
|
||||
else
|
||||
echo "WARNING: Port ${PORT} not listening"
|
||||
fi
|
||||
|
||||
# 健康检查
|
||||
echo "Health check..."
|
||||
for i in 1 2 3; do
|
||||
if curl -fsS "http://127.0.0.1:${PORT}/healthz" -m 2 >/dev/null 2>&1; then
|
||||
echo "✓ Health check passed"
|
||||
break
|
||||
fi
|
||||
sleep 2
|
||||
done
|
||||
|
||||
echo ""
|
||||
echo "=== Installation complete ==="
|
||||
echo "Commands:"
|
||||
echo " systemctl status ${SERVICE_NAME} - View status"
|
||||
echo " systemctl stop ${SERVICE_NAME} - Stop service"
|
||||
echo " systemctl start ${SERVICE_NAME} - Start service"
|
||||
echo " systemctl restart ${SERVICE_NAME} - Restart service"
|
||||
echo " journalctl -u ${SERVICE_NAME} -f - View logs"
|
||||
28
scripts/deploy/sub2api-crm.service.template
Normal file
28
scripts/deploy/sub2api-crm.service.template
Normal file
@@ -0,0 +1,28 @@
|
||||
[Unit]
|
||||
Description=Sub2API CRM API Server (Port 18190)
|
||||
After=network.target
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
User=ubuntu
|
||||
Group=ubuntu
|
||||
WorkingDirectory=/home/ubuntu/crm-only-20260602_18190
|
||||
EnvironmentFile=/home/ubuntu/crm-only-20260602_18190/.env.crm
|
||||
Environment="PATH=/usr/local/bin:/usr/bin:/bin"
|
||||
ExecStart=/home/ubuntu/crm-only-20260602_18190/sub2api-cn-relay-manager-server
|
||||
ExecReload=/bin/kill -HUP $MAINPID
|
||||
KillMode=process
|
||||
Restart=on-failure
|
||||
RestartSec=5
|
||||
StandardOutput=append:/home/ubuntu/crm-only-20260602_18190/crm.log
|
||||
StandardError=append:/home/ubuntu/crm-only-20260602_18190/crm.log
|
||||
|
||||
# Security hardening
|
||||
NoNewPrivileges=true
|
||||
PrivateTmp=true
|
||||
ProtectSystem=strict
|
||||
ProtectHome=true
|
||||
ReadWritePaths=/home/ubuntu/crm-only-20260602_18190
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
129
scripts/deploy/verify_crm_deployment.sh
Normal file
129
scripts/deploy/verify_crm_deployment.sh
Normal file
@@ -0,0 +1,129 @@
|
||||
#!/bin/bash
|
||||
# verify_crm_deployment.sh - 部署后三层验证脚本
|
||||
|
||||
set -e
|
||||
|
||||
CRM_DIR="${1:-/home/ubuntu/crm-only-20260602_18190}"
|
||||
PORT=$(echo "${CRM_DIR}" | grep -oE '[0-9]+' | tail -1)
|
||||
PORT="${PORT:-18190}"
|
||||
|
||||
ERRORS=0
|
||||
|
||||
echo "============================================"
|
||||
echo "CRM 部署三层验证"
|
||||
echo "============================================"
|
||||
echo "目录: ${CRM_DIR}"
|
||||
echo "端口: ${PORT}"
|
||||
echo ""
|
||||
|
||||
# === Layer 1: 进程层 ===
|
||||
echo "[1/3] 进程层验证"
|
||||
|
||||
# 检查 PID 文件
|
||||
if [ -f "${CRM_DIR}/crm.pid" ]; then
|
||||
PID=$(cat "${CRM_DIR}/crm.pid")
|
||||
echo " PID 文件: ${PID}"
|
||||
|
||||
# 检查进程存在
|
||||
if ps -p "${PID}" >/dev/null 2>&1; then
|
||||
echo " ✓ 进程存在"
|
||||
|
||||
# 检查 /proc/$PID/exe
|
||||
EXE=$(readlink "/proc/${PID}/exe" 2>/dev/null || echo "unknown")
|
||||
echo " /proc/${PID}/exe: ${EXE}"
|
||||
|
||||
if echo "${EXE}" | grep -q "(deleted)"; then
|
||||
echo " ✗ ERROR: Binary has (deleted) marker - old process still running"
|
||||
ERRORS=$((ERRORS + 1))
|
||||
else
|
||||
echo " ✓ Binary state: OK"
|
||||
fi
|
||||
else
|
||||
echo " ✗ ERROR: PID ${PID} not running"
|
||||
ERRORS=$((ERRORS + 1))
|
||||
fi
|
||||
else
|
||||
echo " ! PID 文件不存在 (可能使用 systemd)"
|
||||
# 尝试 pgrep
|
||||
NEW_PID=$(pgrep -f 'sub2api-cn-relay-manager-server' | head -1)
|
||||
if [ -n "${NEW_PID}" ]; then
|
||||
echo " ✓ Found running process: ${NEW_PID}"
|
||||
else
|
||||
echo " ✗ ERROR: No running process found"
|
||||
ERRORS=$((ERRORS + 1))
|
||||
fi
|
||||
fi
|
||||
|
||||
echo ""
|
||||
|
||||
# === Layer 2: 控制面层 ===
|
||||
echo "[2/3] 控制面层验证"
|
||||
|
||||
# Test endpoints
|
||||
ENDPOINTS=(
|
||||
"healthz:健康检查"
|
||||
"api/portal/session:Portal Session"
|
||||
"api/portal/logical-groups:Logical Groups"
|
||||
)
|
||||
|
||||
for item in "${ENDPOINTS[@]}"; do
|
||||
IFS=':' read -r endpoint name <<< "${item}"
|
||||
|
||||
STATUS=$(curl -s "http://127.0.0.1:${PORT}/${endpoint}" -w '%{http_code}' -o /dev/null 2>/dev/null || echo "000")
|
||||
|
||||
if [ "${STATUS}" = "200" ]; then
|
||||
echo " ✓ ${name}: HTTP ${STATUS}"
|
||||
elif [ "${STATUS}" = "401" ] || [ "${STATUS}" = "403" ]; then
|
||||
echo " ~ ${name}: HTTP ${STATUS} (expected for protected endpoint)"
|
||||
else
|
||||
echo " ✗ ${name}: HTTP ${STATUS}"
|
||||
ERRORS=$((ERRORS + 1))
|
||||
fi
|
||||
done
|
||||
|
||||
# 检查 login_enabled
|
||||
SESSION_RESP=$(curl -s "http://127.0.0.1:${PORT}/api/portal/session" 2>/dev/null || echo "{}")
|
||||
if echo "${SESSION_RESP}" | grep -q '"login_enabled":true'; then
|
||||
echo " ✓ Portal login enabled"
|
||||
elif echo "${SESSION_RESP}" | grep -q '"login_enabled":false'; then
|
||||
echo " ✗ ERROR: Portal login disabled (SUB2API_CRM_PORTAL_SESSION_SECRET not set)"
|
||||
ERRORS=$((ERRORS + 1))
|
||||
else
|
||||
echo " ! Cannot determine login_enabled state"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
|
||||
# === Layer 3: 用户面层 ===
|
||||
echo "[3/3] 用户面层验证"
|
||||
|
||||
# 检查 nginx 配置
|
||||
if [ -f /etc/nginx/sites-enabled/crm.sub.tksea.top.conf ] || [ -f /etc/nginx/sites-available/crm.sub.tksea.top.conf ]; then
|
||||
echo " ✓ Nginx config exists"
|
||||
|
||||
# 检查 nginx 是否指向正确端口
|
||||
NGINX_CONF=$(cat /etc/nginx/sites-enabled/crm.sub.tksea.top.conf 2>/dev/null || cat /etc/nginx/sites-available/crm.sub.tksea.top.conf 2>/dev/null)
|
||||
if echo "${NGINX_CONF}" | grep -q "127.0.0.1:${PORT}"; then
|
||||
echo " ✓ Nginx points to port ${PORT}"
|
||||
else
|
||||
echo " ! WARNING: Nginx may not point to correct port"
|
||||
fi
|
||||
else
|
||||
echo " ! Nginx config not found in standard location"
|
||||
fi
|
||||
|
||||
# 检查外部访问 (可选,可能因网络限制失败)
|
||||
# EXTERN_RESP=$(curl -s "https://crm.sub.tksea.top/api/portal/session" --connect-timeout 3 2>/dev/null || echo "")
|
||||
# if [ -n "${EXTERN_RESP}" ]; then
|
||||
# echo " ✓ External access working"
|
||||
# fi
|
||||
|
||||
echo ""
|
||||
echo "============================================"
|
||||
if [ ${ERRORS} -eq 0 ]; then
|
||||
echo "✓ 全部验证通过"
|
||||
exit 0
|
||||
else
|
||||
echo "✗ 发现 ${ERRORS} 个错误"
|
||||
exit 1
|
||||
fi
|
||||
@@ -14,6 +14,9 @@ ADMIN_PROVIDERS_FILE="$ROOT_DIR/deploy/tksea-portal/admin/providers.html"
|
||||
ADMIN_BATCH_FILE="$ROOT_DIR/deploy/tksea-portal/admin/batch-import.html"
|
||||
NGINX_FILE="$ROOT_DIR/deploy/tksea-portal/nginx.sub.tksea.top.conf.example"
|
||||
DEPLOY_SCRIPT="$ROOT_DIR/scripts/deploy/deploy_tksea_portal.sh"
|
||||
DEPLOY_ENV_EXAMPLE="$ROOT_DIR/scripts/deploy/.env.deploy.example"
|
||||
DEPLOY_CRM_ONLY_SCRIPT="$ROOT_DIR/scripts/deploy/deploy_crm_only.sh"
|
||||
REMOTE43_SETUP_SCRIPT="$ROOT_DIR/scripts/deploy/setup_remote43_patched_stack.sh"
|
||||
|
||||
fail() {
|
||||
echo "FAIL: $*" >&2
|
||||
@@ -28,6 +31,14 @@ assert_contains_file() {
|
||||
fi
|
||||
}
|
||||
|
||||
assert_not_contains_file() {
|
||||
local file="$1"
|
||||
local needle="$2"
|
||||
if grep -Fq "$needle" "$file"; then
|
||||
fail "did not expect [$needle] in $file"
|
||||
fi
|
||||
}
|
||||
|
||||
[[ -f "$HTML_FILE" ]] || fail "missing $HTML_FILE"
|
||||
[[ -f "$ADMIN_HTML_FILE" ]] || fail "missing $ADMIN_HTML_FILE"
|
||||
[[ -f "$ADMIN_COMMON_CSS_FILE" ]] || fail "missing $ADMIN_COMMON_CSS_FILE"
|
||||
@@ -40,6 +51,9 @@ assert_contains_file() {
|
||||
[[ -f "$ADMIN_BATCH_FILE" ]] || fail "missing $ADMIN_BATCH_FILE"
|
||||
[[ -f "$NGINX_FILE" ]] || fail "missing $NGINX_FILE"
|
||||
[[ -f "$DEPLOY_SCRIPT" ]] || fail "missing $DEPLOY_SCRIPT"
|
||||
[[ -f "$DEPLOY_ENV_EXAMPLE" ]] || fail "missing $DEPLOY_ENV_EXAMPLE"
|
||||
[[ -f "$DEPLOY_CRM_ONLY_SCRIPT" ]] || fail "missing $DEPLOY_CRM_ONLY_SCRIPT"
|
||||
[[ -f "$REMOTE43_SETUP_SCRIPT" ]] || fail "missing $REMOTE43_SETUP_SCRIPT"
|
||||
|
||||
assert_contains_file "$HTML_FILE" "Sub2API 多模型接入中心"
|
||||
assert_contains_file "$HTML_FILE" "https://sub.tksea.top/portal/"
|
||||
@@ -111,7 +125,24 @@ assert_contains_file "$ADMIN_COMMON_JS_FILE" "/portal/admin/route-health.html"
|
||||
assert_contains_file "$ADMIN_COMMON_JS_FILE" "/portal/admin/accounts.html"
|
||||
assert_contains_file "$ADMIN_COMMON_JS_FILE" "/portal/admin/providers.html"
|
||||
assert_contains_file "$ADMIN_COMMON_JS_FILE" "/portal/admin/batch-import.html"
|
||||
# 契约修正:crm_subject 是 trusted subject 来源,crm_session 已弃用
|
||||
assert_contains_file "$NGINX_FILE" '$cookie_crm_subject'
|
||||
assert_not_contains_file "$NGINX_FILE" '$cookie_crm_session'
|
||||
assert_contains_file "$DEPLOY_SCRIPT" '$cookie_crm_subject'
|
||||
assert_not_contains_file "$DEPLOY_SCRIPT" '$cookie_crm_session'
|
||||
assert_contains_file "$ADMIN_COMMON_JS_FILE" "/portal/"
|
||||
assert_contains_file "$ADMIN_COMMON_JS_FILE" "filterSensitiveData"
|
||||
assert_contains_file "$ADMIN_COMMON_JS_FILE" "containsSensitiveData(payload)"
|
||||
|
||||
assert_not_contains_file "$ADMIN_ACCOUNTS_FILE" "adminToken: adminTokenInput.value,"
|
||||
assert_not_contains_file "$ADMIN_PROVIDERS_FILE" "adminToken: adminTokenInput.value,"
|
||||
assert_not_contains_file "$ADMIN_PROVIDERS_FILE" "accessAPIKey: accessAPIKeyInput.value.trim(),"
|
||||
assert_not_contains_file "$ADMIN_PROVIDERS_FILE" "providerKeys: providerKeysInput.value,"
|
||||
assert_not_contains_file "$ADMIN_HTML_FILE" "adminToken: adminTokenInput.value,"
|
||||
assert_not_contains_file "$ADMIN_HTML_FILE" "probeAPIKey: probeAPIKeyInput.value.trim(),"
|
||||
assert_not_contains_file "$ADMIN_HTML_FILE" "entries: entriesInput.value,"
|
||||
assert_not_contains_file "$ADMIN_ROUTE_HEALTH_FILE" "localStorage.setItem(storageKey"
|
||||
assert_not_contains_file "$ADMIN_LOGICAL_GROUPS_FILE" "localStorage.setItem(storageKey"
|
||||
|
||||
assert_contains_file "$ADMIN_HTML_FILE" "Batch Import Admin"
|
||||
assert_contains_file "$ADMIN_HTML_FILE" "/portal/admin-common.css"
|
||||
@@ -190,6 +221,12 @@ assert_contains_file "$ADMIN_ACCOUNTS_FILE" "/portal-admin-api"
|
||||
assert_contains_file "$ADMIN_PROVIDERS_FILE" "Provider Admin"
|
||||
assert_contains_file "$ADMIN_PROVIDERS_FILE" "/portal/admin-common.css"
|
||||
assert_contains_file "$ADMIN_PROVIDERS_FILE" "/portal/admin-common.js"
|
||||
assert_not_contains_file "$DEPLOY_CRM_ONLY_SCRIPT" 'KEY="${KEY:-/home/long/下载/zjsea.pem}"'
|
||||
assert_not_contains_file "$DEPLOY_CRM_ONLY_SCRIPT" 'REMOTE="${REMOTE:-ubuntu@43.155.133.187}"'
|
||||
assert_not_contains_file "$REMOTE43_SETUP_SCRIPT" 'KEY="${KEY:-/home/long/下载/zjsea.pem}"'
|
||||
assert_not_contains_file "$REMOTE43_SETUP_SCRIPT" 'REMOTE="${REMOTE:-ubuntu@43.155.133.187}"'
|
||||
assert_contains_file "$DEPLOY_CRM_ONLY_SCRIPT" '.env.deploy.example'
|
||||
assert_contains_file "$REMOTE43_SETUP_SCRIPT" '.env.deploy.example'
|
||||
assert_contains_file "$ADMIN_PROVIDERS_FILE" "data-admin-nav"
|
||||
assert_contains_file "$ADMIN_PROVIDERS_FILE" "管理员登录"
|
||||
assert_contains_file "$ADMIN_PROVIDERS_FILE" "/api/packs"
|
||||
@@ -237,5 +274,12 @@ assert_contains_file "$DEPLOY_SCRIPT" "REMOTE_PORTAL_DIR"
|
||||
assert_contains_file "$DEPLOY_SCRIPT" "REMOTE_CRM_PORT"
|
||||
assert_contains_file "$DEPLOY_SCRIPT" "LOCAL_PORTAL_DIR"
|
||||
assert_contains_file "$DEPLOY_SCRIPT" "patch_tksea_portal_nginx.py"
|
||||
assert_contains_file "$DEPLOY_SCRIPT" "DEPLOY_ENV_FILE"
|
||||
assert_not_contains_file "$DEPLOY_SCRIPT" "/home/long/下载/zjsea.pem"
|
||||
assert_not_contains_file "$DEPLOY_SCRIPT" "ubuntu@43.155.133.187"
|
||||
|
||||
assert_contains_file "$DEPLOY_ENV_EXAMPLE" "KEY="
|
||||
assert_contains_file "$DEPLOY_ENV_EXAMPLE" "REMOTE="
|
||||
assert_contains_file "$DEPLOY_ENV_EXAMPLE" "REMOTE_CRM_PORT="
|
||||
|
||||
echo "PASS: tksea portal assets look consistent"
|
||||
|
||||
Reference in New Issue
Block a user