feat(intraday): add discovery and verification watch pipeline
This commit is contained in:
@@ -0,0 +1,420 @@
|
||||
# Intraday Discovery + Verification Implementation Plan
|
||||
|
||||
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
|
||||
|
||||
**Goal:** 在不污染正式日报语义的前提下,为现有日内链路增加“搜索引擎 + 大模型候选发现层”和“官方来源验证层”,让当天的大模型价格新闻、版本发布、活动窗口能更早进入候选池,并只把已验证事实接入现有 `daily_signal_snapshot` / 日报语义链路。
|
||||
|
||||
**Architecture:** 保留现有 `scripts/run_intraday_price_watch.sh` 作为结构化价格事实刷新入口,不改它“只刷新价格/信号、不生成正式日报”的边界。新增一条独立的 `run_intraday_discovery_watch.sh` 发现链路:先用搜索引擎与 LLM 生成候选事件,再通过官方页面 / 价格页 / docs / 公告页做二次验证。候选与验证结果分别落入新表;只有 `official_confirmed` 的事件才允许映射进 `materialize_daily_signals.go` 的 `signalModelEvent`,并由现有 `generate_daily_report.go` 继续消费,不新造第二套日报事实系统。发现层与验证层必须通过仓库内可运行的 provider adapter 落地,不能依赖当前会话专属工具;实现上采用“命令或 HTTP provider 适配层 + fixture 测试”的方式,确保本地 cron 和 CI 环境可执行。已验证 discovery 事件接入现有事件流时必须去重:若同一 `provider + model + event_type + date` 已由 importer / 原生 loader 给出,则以原生事实为准,discovery 事件只补缺,不覆盖。
|
||||
|
||||
**Tech Stack:** Go 1.22、PostgreSQL、Bash、可配置搜索/LLM provider adapter、JSONB
|
||||
|
||||
---
|
||||
|
||||
### Task 1: 为候选发现与验证链路定义持久化结构
|
||||
|
||||
**Files:**
|
||||
- Create: `db/migrations/017_intraday_news_candidates.sql`
|
||||
- Modify: `docs/CONFIGURATION.md`
|
||||
- Modify: `DEPLOYMENT.md`
|
||||
|
||||
**Step 1: 新增候选表与验证表 migration**
|
||||
|
||||
创建两张表:
|
||||
- `intraday_news_candidate`
|
||||
- `intraday_news_verification`
|
||||
|
||||
候选表至少包含:
|
||||
- `candidate_date`
|
||||
- `event_type`
|
||||
- `provider_name`
|
||||
- `model_name`
|
||||
- `provider_country`
|
||||
- `title`
|
||||
- `summary`
|
||||
- `candidate_urls JSONB`
|
||||
- `discovery_source`
|
||||
- `discovery_query`
|
||||
- `discovery_evidence JSONB`
|
||||
- `normalized_key`
|
||||
- `status`
|
||||
- `verification_confidence`
|
||||
- `verification_notes`
|
||||
|
||||
验证表至少包含:
|
||||
- `candidate_id`
|
||||
- `verifier_source`
|
||||
- `verifier_url`
|
||||
- `verifier_status`
|
||||
- `extracted_facts JSONB`
|
||||
- `notes`
|
||||
|
||||
约束:
|
||||
- `intraday_news_candidate.normalized_key` 必须唯一,用于防止同日重复发现
|
||||
- `status` 至少支持:`candidate` / `verifying` / `verified` / `rejected` / `stale`
|
||||
- `verification_confidence` 至少支持:`candidate` / `secondary_confirmed` / `official_confirmed`
|
||||
|
||||
**Step 2: 明确与正式事实层的边界文档**
|
||||
|
||||
在 `docs/CONFIGURATION.md` 和 `DEPLOYMENT.md` 写明:
|
||||
- 候选发现层不会直接写 `daily_report`
|
||||
- 候选发现层不会覆盖 `latest_report`
|
||||
- `daily_signal_snapshot` 只消费已验证事实,不消费 `candidate_only`
|
||||
- `leak_or_rumor` 默认只保留在候选层,不进入正式日报事实
|
||||
|
||||
**Step 3: 运行 migration 验证**
|
||||
|
||||
Run:
|
||||
- `bash scripts/apply_migration.sh`
|
||||
|
||||
Expected:
|
||||
- 新表创建成功
|
||||
- 重复执行 migration 不报错
|
||||
|
||||
**Step 4: Commit**
|
||||
|
||||
```bash
|
||||
git add db/migrations/017_intraday_news_candidates.sql docs/CONFIGURATION.md DEPLOYMENT.md
|
||||
git commit -m "feat(intraday): add candidate and verification persistence"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 2: 实现候选发现层最小闭环
|
||||
|
||||
**Files:**
|
||||
- Create: `scripts/discover_intraday_news_candidates.go`
|
||||
- Create: `scripts/discover_intraday_news_candidates_test.go`
|
||||
- Create: `scripts/testdata/intraday_discovery_search_sample.json`
|
||||
- Create: `scripts/testdata/intraday_discovery_llm_sample.json`
|
||||
- Modify: `docs/CONFIGURATION.md`
|
||||
- Create: `scripts/intraday_discovery_provider.go`
|
||||
|
||||
**Step 1: 先写失败测试**
|
||||
|
||||
补 4 组测试:
|
||||
- 搜索结果解析测试:验证能从样例结果提取 title / summary / url / provider 线索
|
||||
- LLM 输出解析测试:验证能把 LLM JSON 输出转成候选事件
|
||||
- 候选归一化测试:验证同一事件经过标题差异改写后仍生成同一 `normalized_key`
|
||||
- URL 过滤测试:验证没有 URL 的候选被丢弃,避免 LLM 空口造线索
|
||||
|
||||
**Step 2: 运行失败测试**
|
||||
|
||||
Run:
|
||||
- `go test -count=1 -tags llm_script ./scripts/discover_intraday_news_candidates.go ./scripts/discover_intraday_news_candidates_test.go`
|
||||
|
||||
Expected:
|
||||
- 新增测试失败
|
||||
- 失败原因是缺少解析、归一化或去重逻辑
|
||||
|
||||
**Step 3: 实现最小候选发现器**
|
||||
|
||||
在 `discover_intraday_news_candidates.go` 中实现:
|
||||
- 固定 provider 查询模板集(中英双语)
|
||||
- 搜索结果抓取适配层
|
||||
- LLM 候选摘要适配层
|
||||
- 去重与归一化逻辑
|
||||
- 写入 `intraday_news_candidate`
|
||||
- provider adapter 抽象层(搜索 / LLM 均可通过命令或 HTTP provider 接入,默认实现不可依赖当前会话专属工具)
|
||||
|
||||
限制:
|
||||
- LLM 只允许输出候选,不允许直接标成 `verified`
|
||||
- 无 URL 候选直接丢弃
|
||||
- 搜索 / LLM provider 未配置时必须以前置条件错误退出,不能伪装成业务无新闻
|
||||
- 默认事件类型至少支持:
|
||||
- `price_cut`
|
||||
- `price_increase`
|
||||
- `official_release`
|
||||
- `promo_campaign`
|
||||
- `leak_or_rumor`
|
||||
- `unknown`
|
||||
|
||||
**Step 4: 重新运行测试**
|
||||
|
||||
Run:
|
||||
- `go test -count=1 -tags llm_script ./scripts/discover_intraday_news_candidates.go ./scripts/discover_intraday_news_candidates_test.go`
|
||||
|
||||
Expected:
|
||||
- 候选解析与归一化测试通过
|
||||
|
||||
**Step 5: 运行一次 dry-run 验证**
|
||||
|
||||
Run:
|
||||
- `go run -tags llm_script ./scripts/discover_intraday_news_candidates.go --date=2026-05-25 --dry-run`
|
||||
|
||||
Expected:
|
||||
- 输出 `candidate_total` / `provider_hit_count` / `event_type_counts`
|
||||
- dry-run 不写 `daily_report`
|
||||
- dry-run 不改 `latest_report`
|
||||
|
||||
**Step 6: Commit**
|
||||
|
||||
```bash
|
||||
git add scripts/discover_intraday_news_candidates.go scripts/discover_intraday_news_candidates_test.go scripts/testdata/intraday_discovery_search_sample.json scripts/testdata/intraday_discovery_llm_sample.json docs/CONFIGURATION.md
|
||||
git commit -m "feat(intraday): add news candidate discovery pipeline"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 3: 实现候选验证层并固化“只信官方事实”的规则
|
||||
|
||||
**Files:**
|
||||
- Create: `scripts/verify_intraday_news_candidates.go`
|
||||
- Create: `scripts/verify_intraday_news_candidates_test.go`
|
||||
- Create: `scripts/testdata/intraday_verification_official_release.html`
|
||||
- Create: `scripts/testdata/intraday_verification_pricing_page.html`
|
||||
- Create: `scripts/testdata/intraday_verification_secondary_media.html`
|
||||
- Modify: `docs/CONFIGURATION.md`
|
||||
|
||||
**Step 1: 先写失败测试**
|
||||
|
||||
补 5 组测试:
|
||||
- 官方发布页验证测试:命中模型名与发布时间时,产出 `official_confirmed`
|
||||
- 官方价格页验证测试:只有拿到真实价格变化时,才允许产出 `price_cut` / `price_increase`
|
||||
- 活动页验证测试:官方活动页可映射为 `promo_campaign`
|
||||
- 二手媒体降级测试:二手媒体最多得到 `secondary_confirmed`,不能直接进入正式事实层
|
||||
- 泄露类隔离测试:`leak_or_rumor` 即使有外部讨论,也不会升级为正式日报事实
|
||||
|
||||
**Step 2: 运行失败测试**
|
||||
|
||||
Run:
|
||||
- `go test -count=1 -tags llm_script ./scripts/verify_intraday_news_candidates.go ./scripts/verify_intraday_news_candidates_test.go`
|
||||
|
||||
Expected:
|
||||
- 新增测试失败
|
||||
- 失败原因是缺少来源分类与验证状态映射逻辑
|
||||
|
||||
**Step 3: 实现验证器**
|
||||
|
||||
在 `verify_intraday_news_candidates.go` 中实现:
|
||||
- 读取 `candidate` / `verifying` 状态候选
|
||||
- 拉取 `candidate_urls`
|
||||
- 基于域名与页面内容判定:
|
||||
- `official_page`
|
||||
- `pricing_page`
|
||||
- `official_docs`
|
||||
- `official_blog`
|
||||
- `secondary_media`
|
||||
- 把验证轨迹写入 `intraday_news_verification`
|
||||
- 更新 `intraday_news_candidate.status` 与 `verification_confidence`
|
||||
- 验证成功后只更新候选层状态,不直接写 `daily_signal_snapshot`;正式事实仍统一由物化器汇总
|
||||
|
||||
规则:
|
||||
- 只有官方页面 / 价格页 / docs / 公告页可以产出 `official_confirmed`
|
||||
- 价格新闻若无法拿到真实价格事实,只能维持候选或二级确认,不能伪造价格变化事件
|
||||
- `leak_or_rumor` 默认不升级为正式事实
|
||||
|
||||
**Step 4: 重新运行测试**
|
||||
|
||||
Run:
|
||||
- `go test -count=1 -tags llm_script ./scripts/verify_intraday_news_candidates.go ./scripts/verify_intraday_news_candidates_test.go`
|
||||
|
||||
Expected:
|
||||
- 验证规则测试通过
|
||||
|
||||
**Step 5: 运行一次 dry-run 验证**
|
||||
|
||||
Run:
|
||||
- `go run -tags llm_script ./scripts/verify_intraday_news_candidates.go --date=2026-05-25 --dry-run`
|
||||
|
||||
Expected:
|
||||
- 输出 `verified_total` / `official_confirmed_total` / `secondary_confirmed_total`
|
||||
- dry-run 只打印摘要,不写 `daily_report`
|
||||
|
||||
**Step 6: Commit**
|
||||
|
||||
```bash
|
||||
git add scripts/verify_intraday_news_candidates.go scripts/verify_intraday_news_candidates_test.go scripts/testdata/intraday_verification_official_release.html scripts/testdata/intraday_verification_pricing_page.html scripts/testdata/intraday_verification_secondary_media.html docs/CONFIGURATION.md
|
||||
git commit -m "feat(intraday): add candidate verification pipeline"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 4: 把已验证事件接入现有 `materialize_daily_signals.go`
|
||||
|
||||
**Files:**
|
||||
- Modify: `scripts/materialize_daily_signals.go`
|
||||
- Create or Modify: `scripts/materialize_daily_signals_test.go`
|
||||
- Modify: `docs/plans/2026-05-27-intraday-price-watch-plan.md`
|
||||
- Modify: `README.md`
|
||||
- Modify: `docs/PRODUCTION_CHECKLIST.md`
|
||||
|
||||
**Step 1: 先写失败测试**
|
||||
|
||||
补 4 组测试:
|
||||
- 已验证官方发布事件会进入 `daily_signal_snapshot.top_events`
|
||||
- 已验证活动事件会进入 `daily_signal_snapshot.top_events`
|
||||
- `candidate_only` 与 `leak_or_rumor` 不进入正式快照
|
||||
- 未拿到真实价格变化数据的“价格新闻”不会被错误映射为 `price_cut` / `price_increase`
|
||||
|
||||
**Step 2: 运行失败测试**
|
||||
|
||||
Run:
|
||||
- `go test -count=1 -tags llm_script ./scripts/materialize_daily_signals.go ./scripts/materialize_daily_signals_test.go`
|
||||
|
||||
Expected:
|
||||
- 新增测试失败
|
||||
- 失败原因是当前物化器还不会读取已验证候选事件
|
||||
|
||||
**Step 3: 最小实现 verified event loader**
|
||||
|
||||
在 `materialize_daily_signals.go` 中新增:
|
||||
- `loadVerifiedIntradayNewsEvents(db, date string)`
|
||||
- 将 `official_confirmed` 的:
|
||||
- `official_release`
|
||||
- `promo_campaign`
|
||||
- 已确认真实价格变化的 `price_cut` / `price_increase`
|
||||
映射为现有 `signalModelEvent`
|
||||
- 与现有 `loadSignalModelEvents` 结果做去重合并;同日同模型同事件类型若已由 importer / 原生 loader 给出,则 discovery 事件仅补 `SourceURL` / 证据缺口,不抢占优先级
|
||||
|
||||
约束:
|
||||
- 不新造第二套快照表
|
||||
- 不改变 `daily_signal_snapshot` 的正式事实语义
|
||||
- `secondary_confirmed` 默认不进入正式快照
|
||||
|
||||
**Step 4: 重新运行测试**
|
||||
|
||||
Run:
|
||||
- `go test -count=1 -tags llm_script ./scripts/materialize_daily_signals.go ./scripts/materialize_daily_signals_test.go`
|
||||
|
||||
Expected:
|
||||
- verified event 相关测试通过
|
||||
|
||||
**Step 5: 联合验证日内边界**
|
||||
|
||||
Run:
|
||||
- `REPORT_TRIGGER_SOURCE=intraday_discovery go run -tags llm_script ./scripts/materialize_daily_signals.go --date=2026-05-25 --dry-run`
|
||||
|
||||
Expected:
|
||||
- 输出含 `page_mode` / `event_count`
|
||||
- 不写 `daily_report`
|
||||
- 不覆盖 `latest_report`
|
||||
|
||||
**Step 6: Commit**
|
||||
|
||||
```bash
|
||||
git add scripts/materialize_daily_signals.go scripts/materialize_daily_signals_test.go README.md docs/PRODUCTION_CHECKLIST.md docs/plans/2026-05-27-intraday-price-watch-plan.md
|
||||
git commit -m "feat(intraday): materialize verified discovery events"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 5: 组装新的日内发现入口并补部署说明
|
||||
|
||||
**Files:**
|
||||
- Create: `scripts/run_intraday_discovery_watch.sh`
|
||||
- Modify: `README.md`
|
||||
- Modify: `docs/CONFIGURATION.md`
|
||||
- Modify: `DEPLOYMENT.md`
|
||||
- Modify: `docs/PRODUCTION_CHECKLIST.md`
|
||||
|
||||
**Step 1: 实现独立入口脚本**
|
||||
|
||||
脚本顺序固定为:
|
||||
1. `discover_intraday_news_candidates.go`
|
||||
2. `verify_intraday_news_candidates.go`
|
||||
3. `materialize_daily_signals.go`(仅消费 verified 事件)
|
||||
|
||||
要求:
|
||||
- 明确要求 `DATABASE_URL`
|
||||
- 搜索 / LLM 所需 key 缺失时,输出前置条件错误,不伪装成代码失败
|
||||
- 不执行 `generate_daily_report.go`
|
||||
- 不写 `daily_report`
|
||||
- 不覆盖 `latest_report`
|
||||
|
||||
**Step 2: 更新调度文档**
|
||||
|
||||
文档里明确两条 cron:
|
||||
- 结构化价格刷新:`run_intraday_price_watch.sh`
|
||||
- 新闻发现与验证:`run_intraday_discovery_watch.sh`
|
||||
|
||||
推荐起步频率:
|
||||
- `run_intraday_discovery_watch.sh`:每 2 小时一次
|
||||
- `run_intraday_price_watch.sh`:每 4 小时一次
|
||||
|
||||
**Step 3: 运行脚本级 dry-run**
|
||||
|
||||
Run:
|
||||
- `bash scripts/run_intraday_discovery_watch.sh --dry-run`
|
||||
|
||||
Expected:
|
||||
- 输出候选发现摘要 + 验证摘要 + 信号物化摘要
|
||||
- 不生成正式日报产物
|
||||
|
||||
**Step 4: Commit**
|
||||
|
||||
```bash
|
||||
git add scripts/run_intraday_discovery_watch.sh README.md docs/CONFIGURATION.md DEPLOYMENT.md docs/PRODUCTION_CHECKLIST.md
|
||||
git commit -m "feat(intraday): add discovery watch runner"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 6: 运行最终联合验收并准备本地提交
|
||||
|
||||
**Files:**
|
||||
- Modify: `README.md`(仅在最终说明缺失时)
|
||||
- Modify: `docs/CONFIGURATION.md`(仅在最终说明缺失时)
|
||||
- Modify: `DEPLOYMENT.md`(仅在最终说明缺失时)
|
||||
|
||||
**Step 1: 运行 focused Go tests**
|
||||
|
||||
Run:
|
||||
- `go test -count=1 -tags llm_script ./scripts/discover_intraday_news_candidates.go ./scripts/discover_intraday_news_candidates_test.go`
|
||||
- `go test -count=1 -tags llm_script ./scripts/verify_intraday_news_candidates.go ./scripts/verify_intraday_news_candidates_test.go`
|
||||
- `go test -count=1 -tags llm_script ./scripts/materialize_daily_signals.go ./scripts/materialize_daily_signals_test.go`
|
||||
|
||||
Expected:
|
||||
- 发现层、验证层、信号物化层 focused tests 全通过
|
||||
|
||||
**Step 2: 运行现有日报/前端回归边界**
|
||||
|
||||
Run:
|
||||
- `go test -count=1 -tags llm_script ./scripts/generate_daily_report.go ./scripts/generate_daily_report_test.go ./scripts/official_import_signature_audit_query_lib.go`
|
||||
- `bash scripts/secret_gate_test.sh`
|
||||
- `bash scripts/test_importers.sh`
|
||||
- `cd frontend && npm test -- --run`
|
||||
- `cd frontend && npm run build`
|
||||
|
||||
Expected:
|
||||
- 原有日报与前端链路不回归
|
||||
- discovery 新增能力不污染正式日报边界
|
||||
|
||||
**Step 3: 运行脚本级联合 dry-run**
|
||||
|
||||
Run:
|
||||
- `bash scripts/run_intraday_discovery_watch.sh --dry-run`
|
||||
- `REPORT_TRIGGER_SOURCE=intraday go run -tags llm_script ./scripts/materialize_daily_signals.go --date=2026-05-25 --dry-run`
|
||||
|
||||
Expected:
|
||||
- 不写 `daily_report`
|
||||
- 不覆盖 `latest_report`
|
||||
- 能稳定输出候选数、验证数、事件数、page_mode、source_audit
|
||||
|
||||
**Step 4: 本地提交**
|
||||
|
||||
```bash
|
||||
git add db/migrations/017_intraday_news_candidates.sql scripts/discover_intraday_news_candidates.go scripts/discover_intraday_news_candidates_test.go scripts/verify_intraday_news_candidates.go scripts/verify_intraday_news_candidates_test.go scripts/materialize_daily_signals.go scripts/materialize_daily_signals_test.go scripts/run_intraday_discovery_watch.sh README.md docs/CONFIGURATION.md DEPLOYMENT.md docs/PRODUCTION_CHECKLIST.md docs/plans/2026-05-25-intraday-discovery-verification-implementation-plan.md docs/plans/2026-05-27-intraday-price-watch-plan.md
|
||||
git commit -m "feat(intraday): add discovery and verification watch pipeline"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 验收标准
|
||||
|
||||
实现完成后,必须同时满足:
|
||||
- 搜索 + LLM 只能产生候选事件,不能直接写成正式日报事实
|
||||
- 只有 `official_confirmed` 的事件才能进入正式 `daily_signal_snapshot` 语义链路
|
||||
- `leak_or_rumor` 不进入正式日报事实层
|
||||
- `run_intraday_discovery_watch.sh` 与 `run_intraday_price_watch.sh` 职责分离
|
||||
- 正式日报仍只由 `run_daily.sh` 负责
|
||||
- 新增链路不会写 `daily_report`、不会覆盖 `latest_report`
|
||||
- discovery provider adapter 在无配置时会明确报前置条件错误;有 fixture / dry-run 模式可本地验证
|
||||
- 新增 focused tests、现有日报测试、前端构建全部通过
|
||||
|
||||
## 非目标
|
||||
|
||||
本计划刻意不做:
|
||||
- 不新增第二套正式日报系统
|
||||
- 不让 LLM 直接替代价格 importer 或官方发布 importer
|
||||
- 不把二手媒体新闻直接映射为 `price_cut` / `price_increase`
|
||||
- 不在第一阶段引入新的前端“候选情报面板”复杂交互;若后续需要,单独立计划
|
||||
@@ -55,6 +55,6 @@
|
||||
|
||||
## 下一步建议
|
||||
|
||||
1. 把前端查询页增加“最近一次价格追踪时间”提示
|
||||
2. 给 `materialize_daily_signals.go` 增加 `trigger_source=intraday` 的文档说明
|
||||
3. 如果日内事件仍不够敏感,再考虑引入独立 `intraday_signal_snapshot` 表
|
||||
1. 为 `run_intraday_discovery_watch.sh` 补充生产级 provider adapter 和调度说明
|
||||
2. 给前端查询页增加“最近一次价格追踪时间 / 最近一次 discovery 验证时间”提示
|
||||
3. 如果日内事件仍不够敏感,再考虑引入独立 `intraday_signal_snapshot` 或候选情报面板
|
||||
|
||||
Reference in New Issue
Block a user