瀏覽代碼

fix(sgsc-完整性审查):修复补充验证机制在短路径下未执行的问题

- 补充验证(_detect_and_supplement)此前只在内容>150行的长路径中调用
- 短路径(_classify_single_chunk直接return)从未触发补充验证
- 导致文件制度等行数少的章节,路桥集团/桥梁公司等分类始终漏识别
- 修复:在短路径return前同样执行补充验证并合并结果
WangXuMing 3 周之前
父節點
當前提交
be083610d0
共有 1 個文件被更改,包括 20 次插入1 次删除
  1. 20 1
      core/construction_review/component/reviewers/utils/llm_content_classifier_v2.py

+ 20 - 1
core/construction_review/component/reviewers/utils/llm_content_classifier_v2.py

@@ -666,7 +666,26 @@ class ContentClassifierClient:
 
         if total_lines <= MAX_LINES_PER_CHUNK:
             # 内容不长,直接处理
-            return await self._classify_single_chunk(section, start_time)
+            result = await self._classify_single_chunk(section, start_time)
+            # 补充验证:关键字扫描 + LLM二次确认,补充遗漏的分类
+            if not result.error and result.classified_contents is not None:
+                supplement = await self._detect_and_supplement(section, result.classified_contents)
+                if supplement:
+                    merged = self._merge_classified_contents(result.classified_contents + supplement, section)
+                    total_l, classified_l, coverage_r = self._calculate_coverage_rate(section, merged)
+                    return ClassificationResult(
+                        model=result.model,
+                        section_key=result.section_key,
+                        section_name=result.section_name,
+                        classified_contents=merged,
+                        latency=result.latency,
+                        raw_response=result.raw_response,
+                        error=result.error,
+                        total_lines=total_l,
+                        classified_lines=classified_l,
+                        coverage_rate=coverage_r
+                    )
+            return result
 
         # 内容过长,无重叠分块处理
         # 不使用 overlap:有重叠时边界行被两块各看一次反而容易两头都不认领,