Răsfoiți Sursa

feat:从markdown中提取专家意见到excel

Meric 2 săptămâni în urmă
părinte
comite
5595e1c65e
6 a modificat fișierele cu 906 adăugiri și 5 ștergeri
  1. 1 0
      .gitignore
  2. 111 0
      CLAUDE.md
  3. 3 0
      pyproject.toml
  4. 1 1
      src/app/scripts/ceshi/03-施工方案筛选.py
  5. 682 0
      src/app/scripts/md2excel_extractor.py
  6. 108 4
      uv.lock

+ 1 - 0
.gitignore

@@ -58,3 +58,4 @@ docs/_build/
 # PyBuilder
 target/
 
+src/app/scripts/ceshi/temp/评审筛选进度缓存.json

+ 111 - 0
CLAUDE.md

@@ -0,0 +1,111 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## 项目概述
+
+路桥数据治理与知识库入库脚本项目,用于将标准规范、施工方案等文档数据向量化并存入 Milvus/MinIO/MySQL。
+
+## 常用命令
+
+### 环境安装
+
+```bash
+# 创建虚拟环境
+uv venv
+
+# 安装依赖(使用清华镜像)
+uv sync --index-url https://pypi.tuna.tsinghua.edu.cn/simple
+```
+
+### 运行脚本
+
+```bash
+# 运行任意脚本(示例)
+uv run -m src.app.scripts.statu_to_milvus
+uv run -m src.app.scripts.base_info_json_generation
+uv run -m src.app.scripts.base_in_collection
+```
+
+## 项目架构
+
+### 目录结构
+
+```
+src/app/
+├── config/           # 配置与客户端初始化
+│   ├── setting.py    # .env 配置加载(MinIO/Milvus/MySQL/Embedding)
+│   ├── minio_client.py   # MinIO 客户端(单例)
+│   ├── milvus_client.py  # Milvus 客户端(LRU缓存)
+│   ├── embeddings.py     # OpenAI Embeddings 客户端
+│   └── database.py       # SQLAlchemy 异步引擎
+├── models/           # 数据库模型
+│   └── standard_base_info.py  # 施工标准规范表模型
+└── scripts/          # 数据处理脚本
+    ├── base_*        # 编制依据处理流程
+    └── plan_*        # 施工方案处理流程
+```
+
+### 配置管理
+
+所有服务配置通过 [.env](.env) 文件管理,由 `setting.py` 的 `Settings` 类加载:
+
+| 服务 | 关键配置项 |
+|------|------------|
+| MinIO | `MINIO_ENDPOINT`, `MINIO_ACCESS_KEY`, `MINIO_SECRET_KEY` |
+| Milvus | `MILVUS_HOST`, `MILVUS_PORT`, `MILVUS_DB` |
+| MySQL | `DATABASE_URL` (async: `mysql+aiomysql://...`) |
+| Embedding | `EMBEDDING_BASE_URL`, `EMBEDDING_MODEL` |
+
+### 数据处理流程
+
+**编制依据 (base_* 系列)**:
+1. `base_count.py` → 检查 Excel ID 与目录匹配
+2. `base_check.py` → 校验目录结构/命名一致性
+3. `base_info_json_generation.py` → 生成 JSON(含 MD 切分 parent/children)
+4. `base_in_minio.py` → 上传文件到 MinIO
+5. `base_info_in_database.py` → 写入 MySQL
+6. `base_create_collection.py` → 创建 Milvus Collection(含 BM25 function)
+7. `base_in_collection.py` → 向量化并写入 Milvus
+
+**施工方案 (plan_* 系列)**:流程同上,Collection 名不同。
+
+### Markdown 切分逻辑
+
+文档切分在 `base_info_json_generation.py` 中实现:
+
+- **parent**:按 `#` 一级标题切分,超长段(>6000字符)再切片,共享 `parent_id`
+- **children**:在 parent 内按空行切分,记录 `hierarchy`(标题路径)
+- `parent_id`:由 `doc_name|parent_seq|h1_title` SHA1 生成,保证稳定
+
+### Milvus Collection Schema
+
+Collection 包含以下字段:
+
+| 字段 | 类型 | 说明 |
+|------|------|------|
+| `text` | VARCHAR(65535) | 文本内容,enable_analyzer=True |
+| `dense` | FLOAT_VECTOR | 密集向量(语义检索) |
+| `sparse` | SPARSE_FLOAT_VECTOR | 稀疏向量(BM25 函数生成) |
+| `document_id` | VARCHAR(256) | 文档 ID |
+| `parent_id` | VARCHAR(256) | 父节点 ID |
+| `metadata` | JSON | 元数据(chinese_name, standard_number 等) |
+
+BM25 函数配置:
+```python
+schema.add_function(
+    Function(
+        name="bm25_fn",
+        input_field_names=["text"],
+        output_field_names=["sparse"],
+        function_type=FunctionType.BM25,
+    )
+)
+```
+
+## 开发注意事项
+
+- **脚本路径配置**:大多数脚本顶部有 `EXCEL_FILE`、`ROOT_FOLDER` 等路径常量,运行前需改为本地路径
+- **base/plan 区分**:`base_*` 与 `plan_*` 脚本对应不同数据类型,Collection 名不同,不要混用
+- **Milvus 动态字段**:当前 schema 未开启动态字段,写入字段必须与 collection 定义完全一致
+- **向量维度**:默认 4096 维,由 `EMBEDDING_MODEL`(Qwen3-Embedding-8B)决定

+ 3 - 0
pyproject.toml

@@ -12,7 +12,10 @@ dependencies = [
     "minio>=7.2.20",
     "openai>=2.15.0",
     "openpyxl>=3.1.5",
+    "pandas>=3.0.0",
     "pymilvus>=2.6.6",
+    "pypdf2>=3.0.1",
+    "python-docx>=1.2.0",
     "python-dotenv>=1.2.1",
     "sqlalchemy>=2.0.46",
     "tiktoken>=0.12.0",

+ 1 - 1
src/app/scripts/ceshi/03-施工方案筛选.py

@@ -58,7 +58,7 @@ warnings.filterwarnings('ignore', category=Warning)
 # 规则:
 # 1) 填绝对路径(如 E:/data/raw/670)则直接使用(Windows 建议用 / 或 \\)
 # 2) 填相对路径(如 ../../raw/670)则相对当前脚本目录解析
-SOURCE_DIR = r"F:\提供的原始文件\原始文件\100份"
+SOURCE_DIR = r"E:\提供的原始文件\原始文件\全部的原始文档\未提取"
 EXPERT_OUTPUT_DIR = r"F:\提供的原始文件\原始文件\PDF分类结果_服务器MinerU版\专家评审意见_记录"
 COMPANY_OUTPUT_DIR = r"F:\提供的原始文件\原始文件\PDF分类结果_服务器MinerU版\公司集团评审意见说明"
 TEMP_DIR = "temp"

+ 682 - 0
src/app/scripts/md2excel_extractor.py

@@ -0,0 +1,682 @@
+#!/usr/bin/env python3
+"""
+md2excel: Markdown 专家意见文档批量提取工具
+
+功能说明:
+    遍历文件夹中的 Markdown 文档,使用大模型语义理解提取项目名称、
+    方案名称和专家意见,写入 Excel 汇总表。
+
+用法:
+    python md2excel_extractor.py <源文件夹路径> <输出Excel路径>
+
+示例:
+    python md2excel_extractor.py D:/专家意见/temp D:/汇总表.xlsx
+
+目录结构要求:
+    源文件夹/
+    ├── 子文件夹1/
+    │   └── auto/
+    │       └── xxx.md
+    ├── 子文件夹2/
+    │   └── auto/
+    │       └── yyy.md
+    └── ...
+"""
+
+import os
+import sys
+import json
+import time
+import re
+import requests
+from pathlib import Path
+from typing import List, Dict, Optional, Any
+from dataclasses import dataclass
+
+from openpyxl import Workbook, load_workbook
+from openpyxl.styles import Font, Alignment, Border, Side, PatternFill
+
+
+# ==================== 配置区域 ====================
+
+# Excel 列配置
+EXCEL_HEADERS = ["文件名称", "项目名称", "方案名称", "专项方案专家评审意见回复表"]
+
+# 列宽配置
+COLUMN_WIDTHS = {
+    'A': 45,  # 文件名称
+    'B': 50,  # 项目名称
+    'C': 55,  # 方案名称
+    'D': 120, # 专家意见回复表
+}
+
+# 数据行高度
+ROW_HEIGHT = 180
+
+# API 请求间隔(秒)
+API_DELAY = 0.5
+
+# 单文件最大读取字符数(控制 token 消耗)
+MAX_CONTENT_LENGTH = 12000
+
+# ==================== LLM API 配置 ====================
+# 本地部署的大模型 API 配置
+LLM_API_URL = "http://localhost:25423/v1/chat/completions"
+LLM_API_KEY = "sk_prod_SELVoIV1d3gku28koH_ONg8L_B2cQis__71f55615"
+LLM_MODEL = "/model/Qwen3.5-122B-A10B"
+LLM_TEMPERATURE = 0.0      # 信息提取任务建议用 0,确保结果稳定可复现
+LLM_MAX_TOKENS = 8192      # 专家意见回复可能很长,建议设为 8192 或更大(原 512 可能不够)
+LLM_TIMEOUT = 120          # API 请求超时时间(秒)
+
+
+# ==================== 数据模型 ====================
+
+@dataclass
+class ExtractedInfo:
+    """提取的信息结构"""
+    file_name: str
+    project_name: str
+    plan_name: str
+    expert_opinion: str
+
+
+# ==================== 大模型调用实现 ====================
+
+def call_llm_api(prompt: str) -> str:
+    """
+    调用本地部署的大模型 API 进行文本理解和信息提取
+    
+    API 端点: http://localhost:25423/v1/chat/completions
+    模型: /model/Qwen3.5-122B-A10B
+    
+    Args:
+        prompt: 完整的提示词文本(已包含待分析的文档内容)
+    
+    Returns:
+        大模型返回的文本结果(应为 JSON 格式字符串)
+    
+    Raises:
+        requests.RequestException: HTTP 请求失败
+        json.JSONDecodeError: 响应 JSON 解析失败
+        KeyError: 响应格式不符合预期
+    """
+    headers = {
+        "Content-Type": "application/json",
+        "Authorization": f"Bearer {LLM_API_KEY}"
+    }
+    
+    payload = {
+        "model": LLM_MODEL,
+        "messages": [{"role": "user", "content": prompt}],
+        "temperature": LLM_TEMPERATURE,
+        "max_tokens": LLM_MAX_TOKENS
+    }
+    
+    try:
+        response = requests.post(
+            LLM_API_URL,
+            headers=headers,
+            json=payload,
+            timeout=LLM_TIMEOUT
+        )
+        response.raise_for_status()
+        
+        result = response.json()
+        
+        # 解析 OpenAI 兼容格式的响应
+        # 格式: {"choices": [{"message": {"content": "..."}}]}
+        if "choices" not in result or not result["choices"]:
+            raise KeyError(f"响应中未找到 'choices' 字段: {result.keys()}")
+        
+        message = result["choices"][0].get("message", {})
+        content = message.get("content", "").strip()
+        
+        if not content:
+            raise ValueError("模型返回内容为空")
+        
+        return content
+    
+    except requests.exceptions.ConnectionError as e:
+        raise ConnectionError(
+            f"无法连接到本地 LLM 服务 ({LLM_API_URL}),请确认服务已启动。\n"
+            f"原始错误: {e}"
+        )
+    except requests.exceptions.Timeout:
+        raise TimeoutError(
+            f"请求本地 LLM 服务超时 (>{LLM_TIMEOUT}秒),请检查模型是否过载或增大 LLM_TIMEOUT 配置。"
+        )
+    except requests.exceptions.HTTPError as e:
+        raise RuntimeError(
+            f"LLM API 返回 HTTP 错误: {e.response.status_code}\n"
+            f"响应内容: {e.response.text[:500]}"
+        )
+
+
+# ==================== 提示词模板 ====================
+
+def build_extraction_prompt(content: str) -> str:
+    """
+    构建用于大模型信息提取的详细提示词
+    
+    此提示词经过精心设计,包含:
+    - 角色设定: 让模型理解其作为文档分析专家的身份
+    - 任务说明: 明确需要提取的三个核心字段
+    - 提取规则: 详细的字段定位和推断规则
+    - 输出格式: 严格的 JSON 格式要求
+    - 容错处理: 信息缺失时的标注规范
+    - 示例说明: 帮助模型理解期望的输出形式
+    
+    Args:
+        content: Markdown 文档的原始内容
+    
+    Returns:
+        完整的提示词文本
+    """
+    
+    # 截取内容,避免超出模型上下文长度
+    truncated_content = content[:MAX_CONTENT_LENGTH]
+    if len(content) > MAX_CONTENT_LENGTH:
+        truncated_content += "\n\n... [文档内容已截断,剩余部分省略]"
+    
+    prompt = f"""你是一位资深的工程文档分析专家,擅长从施工方案评审意见文档中提取结构化信息。
+
+## 任务说明
+
+请仔细阅读以下 Markdown 格式的施工方案专家评审意见文档,从中提取三个关键字段的信息。
+
+## 提取字段及规则
+
+### 1. 项目名称
+**定义**: 该施工方案所对应的工程项目名称。
+
+**提取规则**(按优先级排序):
+- 优先从文档中的表格字段提取,查找包含以下关键词的单元格:
+  * "项目名称"
+  * "工程名称"
+  * "工程全称"
+  * "建设项目名称"
+  * "标段名称"
+  
+- 如果表格中没有明确字段,从文档标题、页眉或正文开头部分语义推断。
+  通常项目名称会出现在文档的显著位置,格式如:
+  * "XX高速公路XX标段"
+  * "XX大桥工程"
+  * "XX隧道工程"
+  * "XX合同段"
+
+- 如果确实无法确定,标注为"未明确"。
+
+### 2. 方案名称
+**定义**: 该文档所涉及的专项施工方案名称。
+
+**提取规则**(按优先级排序):
+- 优先从文档中的表格字段提取,查找包含以下关键词的单元格:
+  * "方案名称"
+  * "专项方案名称"
+  * "危险性较大分项工程名称"
+  * "分部分项工程名称"
+  * "施工方案名称"
+  
+- 如果表格中没有明确字段,从文档标题中推断。
+  方案名称通常包含以下关键词:
+  * "施工方案"
+  * "专项方案"
+  * "施工组织设计"
+  * "安全专项方案"
+  * "技术方案"
+
+- 注意区分"项目名称"和"方案名称":
+  * 项目名称:宏观的工程名称(如"XX高速公路")
+  * 方案名称:具体的施工方案(如"XX大桥桩基施工方案")
+
+- 如果确实无法确定,标注为"未明确"。
+
+### 3. 专项方案专家评审意见回复表
+**定义**: 整合后的专家评审意见及修改回复内容。
+
+**提取规则**(按优先级排序):
+- 从以下命名的章节或表格中提取:
+  * "专项方案审查意见修改回复表"
+  * "专项施工方案专家论证意见修改回复表"
+  * "专家评审意见回复"
+  * "专家意见及回复"
+  * "审查意见及修改回复"
+  * "论证意见及修改情况"
+  * "意见与建议"
+  * "专家审查意见"
+
+- 内容整合要求:
+  * 将"专家意见/审查意见/论证意见"与"修改回复/修改情况/回复说明"进行配对整合
+  * 保留原始的专家意见原文
+  * 保留对应的修改回复或整改措施
+  * 如果有多位专家的意见,按顺序列出
+  * 如果专家意见与回复分散在文档不同位置,需要将它们关联起来
+
+- 格式要求:
+  * 使用清晰的编号列出每条专家意见及其回复
+  * 保留关键的专业术语和数据
+  * 如果原文有表格形式,转换为文本描述
+  * 每条意见格式建议:"意见X: [专家原文意见] -> 回复: [施工单位回复内容]"
+
+- 如果确实无法提取到专家意见内容,标注为"未明确"。
+
+## 输出格式要求
+
+必须以严格的 JSON 格式返回,不要包含任何其他解释文字:
+
+```json
+{{
+  "项目名称": "提取到的项目名称或'未明确'",
+  "方案名称": "提取到的方案名称或'未明确'",
+  "专项方案专家评审意见回复表": "整合后的专家意见与回复内容,或'未明确'"
+}}
+```
+
+## 注意事项
+
+1. **语义理解优先**: 不要依赖固定的正则表达式,而是通过理解文档内容的语义来提取信息。
+2. **容错处理**: 即使文档格式不标准、表格缺失或字段名称不同,也要尝试从上下文中推断。
+3. **信息整合**: 对于分散在文档各处的专家意见和回复,需要整合成完整的记录。
+4. **不要编造**: 如果某项信息确实无法从文档中确定,必须标注为"未明确",严禁编造或猜测。
+5. **保持简洁**: 专家意见回复表的内容可以适当精简,但要保留核心观点和关键数据。
+
+## 待分析文档
+
+```markdown
+{truncated_content}
+```
+
+请直接返回 JSON 格式的提取结果:"""
+    
+    return prompt
+
+
+# ==================== 文件处理 ====================
+
+def read_md_files(root_dir: str) -> List[Dict[str, str]]:
+    """
+    遍历文件夹,读取所有 md 文件内容
+    
+    目录结构要求:
+        root_dir/
+        ├── folder_1/
+        │   └── auto/
+        │       └── xxx.md
+        ├── folder_2/
+        │   └── auto/
+        │       └── yyy.md
+        └── ...
+    
+    Args:
+        root_dir: 源文件夹根目录路径
+    
+    Returns:
+        包含文件信息的字典列表,每个字典包含:
+        - file_name: 子文件夹名称
+        - content: md 文件内容
+    """
+    md_contents = []
+    root_path = Path(root_dir)
+    
+    if not root_path.exists():
+        raise FileNotFoundError(f"源文件夹不存在: {root_dir}")
+    
+    # 遍历所有子文件夹
+    for folder_path in sorted(root_path.iterdir()):
+        if not folder_path.is_dir():
+            continue
+        
+        # 查找 auto 子目录
+        auto_dir = folder_path / "auto"
+        if not auto_dir.exists() or not auto_dir.is_dir():
+            print(f"  [跳过] 未找到 auto 目录: {folder_path.name}")
+            continue
+        
+        # 查找 md 文件
+        md_files = list(auto_dir.glob("*.md"))
+        if not md_files:
+            print(f"  [跳过] auto 目录中无 md 文件: {folder_path.name}")
+            continue
+        
+        # 读取第一个 md 文件
+        md_file = md_files[0]
+        try:
+            content = md_file.read_text(encoding="utf-8")
+            md_contents.append({
+                "file_name": folder_path.name,
+                "content": content,
+                "file_path": str(md_file)
+            })
+            print(f"  [已读取] {folder_path.name} -> {md_file.name}")
+        except Exception as e:
+            print(f"  [错误] 读取文件失败 {md_file}: {e}")
+            continue
+    
+    return md_contents
+
+
+def parse_llm_response(response_text: str) -> Dict[str, str]:
+    """
+    解析大模型返回的 JSON 响应
+    
+    Args:
+        response_text: 大模型返回的原始文本
+    
+    Returns:
+        解析后的字典,包含提取的字段
+    """
+    try:
+        # 尝试直接解析 JSON
+        return json.loads(response_text)
+    except json.JSONDecodeError:
+        pass
+    
+    # 尝试从文本中提取 JSON 块
+    # 匹配 ```json ... ``` 格式
+    json_pattern = r'```json\s*(.*?)\s*```'
+    match = re.search(json_pattern, response_text, re.DOTALL)
+    if match:
+        try:
+            return json.loads(match.group(1))
+        except json.JSONDecodeError:
+            pass
+    
+    # 尝试匹配任意 JSON 对象
+    json_pattern2 = r'\{[\s\S]*?"项目名称"[\s\S]*?\}'
+    match2 = re.search(json_pattern2, response_text)
+    if match2:
+        try:
+            return json.loads(match2.group())
+        except json.JSONDecodeError:
+            pass
+    
+    # 如果都无法解析,返回原始文本作为专家意见
+    print(f"  [警告] 无法解析 JSON 响应,使用原始文本")
+    return {
+        "项目名称": "解析失败",
+        "方案名称": "解析失败",
+        "专项方案专家评审意见回复表": response_text[:500]
+    }
+
+
+def extract_info_with_llm(content: str) -> Dict[str, str]:
+    """
+    使用大模型从文档中提取信息
+    
+    Args:
+        content: Markdown 文档内容
+    
+    Returns:
+        包含提取字段的字典
+    """
+    prompt = build_extraction_prompt(content)
+    
+    try:
+        response_text = call_llm_api(prompt)
+        extracted = parse_llm_response(response_text)
+        
+        # 确保所有必要字段存在
+        return {
+            "项目名称": extracted.get("项目名称", "未明确").strip(),
+            "方案名称": extracted.get("方案名称", "未明确").strip(),
+            "专项方案专家评审意见回复表": extracted.get("专项方案专家评审意见回复表", "未明确").strip()
+        }
+    
+    except Exception as e:
+        print(f"  [错误] LLM 提取失败: {e}")
+        return {
+            "项目名称": f"提取失败: {str(e)[:50]}",
+            "方案名称": f"提取失败: {str(e)[:50]}",
+            "专项方案专家评审意见回复表": f"提取失败: {str(e)}"
+        }
+
+
+# ==================== Excel 生成 ====================
+
+def _init_excel_styles(ws):
+    """初始化 Excel 表头和列宽样式"""
+    # 设置表头样式
+    header_fill = PatternFill(
+        start_color="4472C4",
+        end_color="4472C4",
+        fill_type="solid"
+    )
+    header_font = Font(color="FFFFFF", bold=True, size=12)
+    header_align = Alignment(horizontal="center", vertical="center", wrap_text=True)
+    
+    for col_num, header in enumerate(EXCEL_HEADERS, 1):
+        cell = ws.cell(row=1, column=col_num)
+        cell.value = header
+        cell.fill = header_fill
+        cell.font = header_font
+        cell.alignment = header_align
+    
+    # 设置列宽
+    for col, width in COLUMN_WIDTHS.items():
+        ws.column_dimensions[col].width = width
+    
+    # 冻结首行
+    ws.freeze_panes = 'A2'
+
+
+def _apply_row_style(ws, row_num: int):
+    """为指定行应用数据样式(边框、对齐、行高)"""
+    thin_border = Border(
+        left=Side(style='thin'),
+        right=Side(style='thin'),
+        top=Side(style='thin'),
+        bottom=Side(style='thin')
+    )
+    
+    for col in range(1, len(EXCEL_HEADERS) + 1):
+        cell = ws.cell(row=row_num, column=col)
+        cell.border = thin_border
+        cell.alignment = Alignment(vertical="top", wrap_text=True)
+    
+    ws.row_dimensions[row_num].height = ROW_HEIGHT
+
+
+def append_to_excel(row_data: Dict[str, str], output_file: str):
+    """
+    追加单条数据到 Excel 文件
+    
+    如果文件不存在则创建新文件(含表头),如果存在则在末尾追加。
+    每次追加后立即保存,确保中断不丢失已处理数据。
+    
+    Args:
+        row_data: 单条数据字典
+        output_file: 输出 Excel 文件路径
+    """
+    output_path = Path(output_file)
+    output_path.parent.mkdir(parents=True, exist_ok=True)
+    
+    if output_path.exists():
+        # 文件已存在,加载并追加
+        wb = load_workbook(output_file)
+        ws = wb.active
+        next_row = ws.max_row + 1
+    else:
+        # 文件不存在,创建新文件
+        wb = Workbook()
+        ws = wb.active
+        ws.title = "专家意见汇总"
+        _init_excel_styles(ws)
+        next_row = 2
+    
+    # 写入数据
+    ws.append([
+        row_data.get("文件名称", ""),
+        row_data.get("项目名称", ""),
+        row_data.get("方案名称", ""),
+        row_data.get("专项方案专家评审意见回复表", "")
+    ])
+    
+    # 应用样式到新行
+    _apply_row_style(ws, next_row)
+    
+    # 立即保存
+    wb.save(output_file)
+
+
+def create_excel(data_rows: List[Dict[str, str]], output_file: str):
+    """
+    创建格式化的 Excel 文件(全量写入,用于最终汇总)
+    
+    包含以下样式:
+    - 蓝色表头背景 + 白色粗体文字
+    - 所有单元格细边框
+    - 自动换行
+    - 首行冻结
+    - 指定列宽和行高
+    
+    Args:
+        data_rows: 数据行列表,每行是一个字典
+        output_file: 输出 Excel 文件路径
+    """
+    output_path = Path(output_file)
+    output_path.parent.mkdir(parents=True, exist_ok=True)
+    
+    wb = Workbook()
+    ws = wb.active
+    ws.title = "专家意见汇总"
+    
+    # 初始化样式
+    _init_excel_styles(ws)
+    
+    # 添加数据行
+    for row_data in data_rows:
+        ws.append([
+            row_data.get("文件名称", ""),
+            row_data.get("项目名称", ""),
+            row_data.get("方案名称", ""),
+            row_data.get("专项方案专家评审意见回复表", "")
+        ])
+    
+    # 应用样式到所有数据行
+    for row_num in range(2, ws.max_row + 1):
+        _apply_row_style(ws, row_num)
+    
+    # 保存文件
+    wb.save(output_file)
+    
+    print(f"\n✅ 已成功保存到: {output_file}")
+    print(f"📊 共写入 {len(data_rows)} 条记录")
+
+
+# ==================== 主流程 ====================
+
+def main():
+    """主函数"""
+    # 解析命令行参数
+    if len(sys.argv) < 3:
+        print("用法: python md2excel_extractor.py <源文件夹路径> <输出Excel路径>")
+        print("示例: python md2excel_extractor.py D:/专家意见/temp D:/汇总表.xlsx")
+        sys.exit(1)
+    
+    root_dir = sys.argv[1]
+    output_file = sys.argv[2]
+    
+    # 验证源目录
+    if not os.path.isdir(root_dir):
+        print(f"错误: 源文件夹不存在: {root_dir}")
+        sys.exit(1)
+    
+    print("=" * 70)
+    print("Markdown 专家意见文档批量提取工具")
+    print("=" * 70)
+    print(f"\n📁 源文件夹: {root_dir}")
+    print(f"📄 输出文件: {output_file}")
+    
+    # 读取 md 文件
+    print(f"\n【步骤 1/3】扫描并读取 Markdown 文件...")
+    try:
+        md_contents = read_md_files(root_dir)
+    except Exception as e:
+        print(f"错误: 读取文件失败: {e}")
+        sys.exit(1)
+    
+    if not md_contents:
+        print("未找到任何有效的 md 文件,请检查目录结构")
+        sys.exit(1)
+    
+    print(f"\n✅ 共找到 {len(md_contents)} 个有效文档")
+    
+    # 使用大模型提取信息
+    print(f"\n【步骤 2/3】使用大模型提取信息...")
+    print(f"  LLM 端点: {LLM_API_URL}")
+    print(f"  模型: {LLM_MODEL}")
+    print(f"  Temperature: {LLM_TEMPERATURE} | Max tokens: {LLM_MAX_TOKENS}")
+    print(f"  💡 每处理完一个文件会立即追加写入 Excel,支持断点续传\n")
+    
+    # 检查是否已有进度(Excel 文件已存在)
+    output_path = Path(output_file)
+    processed_files = set()
+    if output_path.exists():
+        try:
+            wb = load_workbook(output_file)
+            ws = wb.active
+            # 读取已处理的文件名称(第1列,从第2行开始)
+            for row in ws.iter_rows(min_row=2, values_only=True):
+                if row and row[0]:
+                    processed_files.add(row[0])
+            print(f"  📋 检测到已有进度,已处理 {len(processed_files)} 个文件,将跳过这些文件")
+        except Exception:
+            pass
+    
+    data_rows = []
+    total = len(md_contents)
+    processed_count = 0
+    
+    for i, item in enumerate(md_contents, 1):
+        file_name = item['file_name']
+        
+        # 跳过已处理的文件
+        if file_name in processed_files:
+            print(f"[{i}/{total}] ⏭️  跳过已处理: {file_name}")
+            continue
+        
+        print(f"[{i}/{total}] 正在处理: {file_name}")
+        
+        try:
+            extracted = extract_info_with_llm(item['content'])
+            row_data = {
+                "文件名称": file_name,
+                "项目名称": extracted["项目名称"],
+                "方案名称": extracted["方案名称"],
+                "专项方案专家评审意见回复表": extracted["专项方案专家评审意见回复表"]
+            }
+            data_rows.append(row_data)
+            
+            # 立即追加写入 Excel
+            append_to_excel(row_data, output_file)
+            processed_count += 1
+            print(f"  ✅ 提取完成并已写入 Excel")
+            
+        except Exception as e:
+            print(f"  ❌ 处理失败: {e}")
+            row_data = {
+                "文件名称": file_name,
+                "项目名称": "处理异常",
+                "方案名称": "处理异常",
+                "专项方案专家评审意见回复表": f"处理异常: {str(e)}"
+            }
+            data_rows.append(row_data)
+            append_to_excel(row_data, output_file)
+            processed_count += 1
+        
+        # API 调用间隔,避免请求过快
+        if i < total:
+            time.sleep(API_DELAY)
+    
+    # 生成最终汇总(可选:重新整理整个 Excel 确保格式一致)
+    print(f"\n【步骤 3/3】生成 Excel 汇总表...")
+    print(f"  本次新处理: {processed_count} 个文件")
+    print(f"  总计写入: {len(processed_files) + processed_count} 个文件")
+    
+    print("\n" + "=" * 70)
+    print("🎉 处理完成!")
+    print("=" * 70)
+
+
+if __name__ == "__main__":
+    main()

+ 108 - 4
uv.lock

@@ -246,7 +246,6 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/f8/0a/a3871375c7b9727edaeeea994bfff7c63ff7804c9829c19309ba2e058807/greenlet-3.3.0-cp312-cp312-macosx_11_0_universal2.whl", hash = "sha256:b01548f6e0b9e9784a2c99c5651e5dc89ffcbe870bc5fb2e5ef864e9cc6b5dcb", size = 276379, upload-time = "2025-12-04T14:23:30.498Z" },
     { url = "https://files.pythonhosted.org/packages/43/ab/7ebfe34dce8b87be0d11dae91acbf76f7b8246bf9d6b319c741f99fa59c6/greenlet-3.3.0-cp312-cp312-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:349345b770dc88f81506c6861d22a6ccd422207829d2c854ae2af8025af303e3", size = 597294, upload-time = "2025-12-04T14:50:06.847Z" },
     { url = "https://files.pythonhosted.org/packages/a4/39/f1c8da50024feecd0793dbd5e08f526809b8ab5609224a2da40aad3a7641/greenlet-3.3.0-cp312-cp312-manylinux_2_24_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:e8e18ed6995e9e2c0b4ed264d2cf89260ab3ac7e13555b8032b25a74c6d18655", size = 607742, upload-time = "2025-12-04T14:57:42.349Z" },
-    { url = "https://files.pythonhosted.org/packages/77/cb/43692bcd5f7a0da6ec0ec6d58ee7cddb606d055ce94a62ac9b1aa481e969/greenlet-3.3.0-cp312-cp312-manylinux_2_24_s390x.manylinux_2_28_s390x.whl", hash = "sha256:c024b1e5696626890038e34f76140ed1daf858e37496d33f2af57f06189e70d7", size = 622297, upload-time = "2025-12-04T15:07:13.552Z" },
     { url = "https://files.pythonhosted.org/packages/75/b0/6bde0b1011a60782108c01de5913c588cf51a839174538d266de15e4bf4d/greenlet-3.3.0-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:047ab3df20ede6a57c35c14bf5200fcf04039d50f908270d3f9a7a82064f543b", size = 609885, upload-time = "2025-12-04T14:26:02.368Z" },
     { url = "https://files.pythonhosted.org/packages/49/0e/49b46ac39f931f59f987b7cd9f34bfec8ef81d2a1e6e00682f55be5de9f4/greenlet-3.3.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:2d9ad37fc657b1102ec880e637cccf20191581f75c64087a549e66c57e1ceb53", size = 1567424, upload-time = "2025-12-04T15:04:23.757Z" },
     { url = "https://files.pythonhosted.org/packages/05/f5/49a9ac2dff7f10091935def9165c90236d8f175afb27cbed38fb1d61ab6b/greenlet-3.3.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:83cd0e36932e0e7f36a64b732a6f60c2fc2df28c351bae79fbaf4f8092fe7614", size = 1636017, upload-time = "2025-12-04T14:27:29.688Z" },
@@ -254,7 +253,6 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/02/2f/28592176381b9ab2cafa12829ba7b472d177f3acc35d8fbcf3673d966fff/greenlet-3.3.0-cp313-cp313-macosx_11_0_universal2.whl", hash = "sha256:a1e41a81c7e2825822f4e068c48cb2196002362619e2d70b148f20a831c00739", size = 275140, upload-time = "2025-12-04T14:23:01.282Z" },
     { url = "https://files.pythonhosted.org/packages/2c/80/fbe937bf81e9fca98c981fe499e59a3f45df2a04da0baa5c2be0dca0d329/greenlet-3.3.0-cp313-cp313-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:9f515a47d02da4d30caaa85b69474cec77b7929b2e936ff7fb853d42f4bf8808", size = 599219, upload-time = "2025-12-04T14:50:08.309Z" },
     { url = "https://files.pythonhosted.org/packages/c2/ff/7c985128f0514271b8268476af89aee6866df5eec04ac17dcfbc676213df/greenlet-3.3.0-cp313-cp313-manylinux_2_24_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:7d2d9fd66bfadf230b385fdc90426fcd6eb64db54b40c495b72ac0feb5766c54", size = 610211, upload-time = "2025-12-04T14:57:43.968Z" },
-    { url = "https://files.pythonhosted.org/packages/79/07/c47a82d881319ec18a4510bb30463ed6891f2ad2c1901ed5ec23d3de351f/greenlet-3.3.0-cp313-cp313-manylinux_2_24_s390x.manylinux_2_28_s390x.whl", hash = "sha256:30a6e28487a790417d036088b3bcb3f3ac7d8babaa7d0139edbaddebf3af9492", size = 624311, upload-time = "2025-12-04T15:07:14.697Z" },
     { url = "https://files.pythonhosted.org/packages/fd/8e/424b8c6e78bd9837d14ff7df01a9829fc883ba2ab4ea787d4f848435f23f/greenlet-3.3.0-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:087ea5e004437321508a8d6f20efc4cfec5e3c30118e1417ea96ed1d93950527", size = 612833, upload-time = "2025-12-04T14:26:03.669Z" },
     { url = "https://files.pythonhosted.org/packages/b5/ba/56699ff9b7c76ca12f1cdc27a886d0f81f2189c3455ff9f65246780f713d/greenlet-3.3.0-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:ab97cf74045343f6c60a39913fa59710e4bd26a536ce7ab2397adf8b27e67c39", size = 1567256, upload-time = "2025-12-04T15:04:25.276Z" },
     { url = "https://files.pythonhosted.org/packages/1e/37/f31136132967982d698c71a281a8901daf1a8fbab935dce7c0cf15f942cc/greenlet-3.3.0-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:5375d2e23184629112ca1ea89a53389dddbffcf417dad40125713d88eb5f96e8", size = 1636483, upload-time = "2025-12-04T14:27:30.804Z" },
@@ -262,7 +260,6 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/d7/7c/f0a6d0ede2c7bf092d00bc83ad5bafb7e6ec9b4aab2fbdfa6f134dc73327/greenlet-3.3.0-cp314-cp314-macosx_11_0_universal2.whl", hash = "sha256:60c2ef0f578afb3c8d92ea07ad327f9a062547137afe91f38408f08aacab667f", size = 275671, upload-time = "2025-12-04T14:23:05.267Z" },
     { url = "https://files.pythonhosted.org/packages/44/06/dac639ae1a50f5969d82d2e3dd9767d30d6dbdbab0e1a54010c8fe90263c/greenlet-3.3.0-cp314-cp314-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:0a5d554d0712ba1de0a6c94c640f7aeba3f85b3a6e1f2899c11c2c0428da9365", size = 646360, upload-time = "2025-12-04T14:50:10.026Z" },
     { url = "https://files.pythonhosted.org/packages/e0/94/0fb76fe6c5369fba9bf98529ada6f4c3a1adf19e406a47332245ef0eb357/greenlet-3.3.0-cp314-cp314-manylinux_2_24_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:3a898b1e9c5f7307ebbde4102908e6cbfcb9ea16284a3abe15cab996bee8b9b3", size = 658160, upload-time = "2025-12-04T14:57:45.41Z" },
-    { url = "https://files.pythonhosted.org/packages/93/79/d2c70cae6e823fac36c3bbc9077962105052b7ef81db2f01ec3b9bf17e2b/greenlet-3.3.0-cp314-cp314-manylinux_2_24_s390x.manylinux_2_28_s390x.whl", hash = "sha256:dcd2bdbd444ff340e8d6bdf54d2f206ccddbb3ccfdcd3c25bf4afaa7b8f0cf45", size = 671388, upload-time = "2025-12-04T15:07:15.789Z" },
     { url = "https://files.pythonhosted.org/packages/b8/14/bab308fc2c1b5228c3224ec2bf928ce2e4d21d8046c161e44a2012b5203e/greenlet-3.3.0-cp314-cp314-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:5773edda4dc00e173820722711d043799d3adb4f01731f40619e07ea2750b955", size = 660166, upload-time = "2025-12-04T14:26:05.099Z" },
     { url = "https://files.pythonhosted.org/packages/4b/d2/91465d39164eaa0085177f61983d80ffe746c5a1860f009811d498e7259c/greenlet-3.3.0-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:ac0549373982b36d5fd5d30beb8a7a33ee541ff98d2b502714a09f1169f31b55", size = 1615193, upload-time = "2025-12-04T15:04:27.041Z" },
     { url = "https://files.pythonhosted.org/packages/42/1b/83d110a37044b92423084d52d5d5a3b3a73cafb51b547e6d7366ff62eff1/greenlet-3.3.0-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:d198d2d977460358c3b3a4dc844f875d1adb33817f0613f663a656f463764ccc", size = 1683653, upload-time = "2025-12-04T14:27:32.366Z" },
@@ -270,7 +267,6 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/a0/66/bd6317bc5932accf351fc19f177ffba53712a202f9df10587da8df257c7e/greenlet-3.3.0-cp314-cp314t-macosx_11_0_universal2.whl", hash = "sha256:d6ed6f85fae6cdfdb9ce04c9bf7a08d666cfcfb914e7d006f44f840b46741931", size = 282638, upload-time = "2025-12-04T14:25:20.941Z" },
     { url = "https://files.pythonhosted.org/packages/30/cf/cc81cb030b40e738d6e69502ccbd0dd1bced0588e958f9e757945de24404/greenlet-3.3.0-cp314-cp314t-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:d9125050fcf24554e69c4cacb086b87b3b55dc395a8b3ebe6487b045b2614388", size = 651145, upload-time = "2025-12-04T14:50:11.039Z" },
     { url = "https://files.pythonhosted.org/packages/9c/ea/1020037b5ecfe95ca7df8d8549959baceb8186031da83d5ecceff8b08cd2/greenlet-3.3.0-cp314-cp314t-manylinux_2_24_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:87e63ccfa13c0a0f6234ed0add552af24cc67dd886731f2261e46e241608bee3", size = 654236, upload-time = "2025-12-04T14:57:47.007Z" },
-    { url = "https://files.pythonhosted.org/packages/69/cc/1e4bae2e45ca2fa55299f4e85854606a78ecc37fead20d69322f96000504/greenlet-3.3.0-cp314-cp314t-manylinux_2_24_s390x.manylinux_2_28_s390x.whl", hash = "sha256:2662433acbca297c9153a4023fe2161c8dcfdcc91f10433171cf7e7d94ba2221", size = 662506, upload-time = "2025-12-04T15:07:16.906Z" },
     { url = "https://files.pythonhosted.org/packages/57/b9/f8025d71a6085c441a7eaff0fd928bbb275a6633773667023d19179fe815/greenlet-3.3.0-cp314-cp314t-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:3c6e9b9c1527a78520357de498b0e709fb9e2f49c3a513afd5a249007261911b", size = 653783, upload-time = "2025-12-04T14:26:06.225Z" },
     { url = "https://files.pythonhosted.org/packages/f6/c7/876a8c7a7485d5d6b5c6821201d542ef28be645aa024cfe1145b35c120c1/greenlet-3.3.0-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:286d093f95ec98fdd92fcb955003b8a3d054b4e2cab3e2707a5039e7b50520fd", size = 1614857, upload-time = "2025-12-04T15:04:28.484Z" },
     { url = "https://files.pythonhosted.org/packages/4f/dc/041be1dff9f23dac5f48a43323cd0789cb798342011c19a248d9c9335536/greenlet-3.3.0-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:6c10513330af5b8ae16f023e8ddbfb486ab355d04467c4679c5cfe4659975dd9", size = 1676034, upload-time = "2025-12-04T14:27:33.531Z" },
@@ -599,7 +595,10 @@ dependencies = [
     { name = "minio" },
     { name = "openai" },
     { name = "openpyxl" },
+    { name = "pandas" },
     { name = "pymilvus" },
+    { name = "pypdf2" },
+    { name = "python-docx" },
     { name = "python-dotenv" },
     { name = "sqlalchemy" },
     { name = "tiktoken" },
@@ -614,12 +613,95 @@ requires-dist = [
     { name = "minio", specifier = ">=7.2.20" },
     { name = "openai", specifier = ">=2.15.0" },
     { name = "openpyxl", specifier = ">=3.1.5" },
+    { name = "pandas", specifier = ">=3.0.0" },
     { name = "pymilvus", specifier = ">=2.6.6" },
+    { name = "pypdf2", specifier = ">=3.0.1" },
+    { name = "python-docx", specifier = ">=1.2.0" },
     { name = "python-dotenv", specifier = ">=1.2.1" },
     { name = "sqlalchemy", specifier = ">=2.0.46" },
     { name = "tiktoken", specifier = ">=0.12.0" },
 ]
 
+[[package]]
+name = "lxml"
+version = "6.1.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/28/30/9abc9e34c657c33834eaf6cd02124c61bdf5944d802aa48e69be8da3585d/lxml-6.1.0.tar.gz", hash = "sha256:bfd57d8008c4965709a919c3e9a98f76c2c7cb319086b3d26858250620023b13", size = 4197006, upload-time = "2026-04-18T04:32:51.613Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/d2/d4/9326838b59dc36dfae42eec9656b97520f9997eee1de47b8316aaeed169c/lxml-6.1.0-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:d2f17a16cd8751e8eb233a7e41aecdf8e511712e00088bf9be455f604cd0d28d", size = 8570663, upload-time = "2026-04-18T04:27:48.253Z" },
+    { url = "https://files.pythonhosted.org/packages/d8/a4/053745ce1f8303ccbb788b86c0db3a91b973675cefc42566a188637b7c40/lxml-6.1.0-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:f0cea5b1d3e6e77d71bd2b9972eb2446221a69dc52bb0b9c3c6f6e5700592d93", size = 4624024, upload-time = "2026-04-18T04:27:52.594Z" },
+    { url = "https://files.pythonhosted.org/packages/90/97/a517944b20f8fd0932ad2109482bee4e29fe721416387a363306667941f6/lxml-6.1.0-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:fc46da94826188ed45cb53bd8e3fc076ae22675aea2087843d4735627f867c6d", size = 4930895, upload-time = "2026-04-18T04:32:56.29Z" },
+    { url = "https://files.pythonhosted.org/packages/94/7c/e08a970727d556caa040a44773c7b7e3ad0f0d73dedc863543e9a8b931f2/lxml-6.1.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:9147d8e386ec3b82c3b15d88927f734f565b0aaadef7def562b853adca45784a", size = 5093820, upload-time = "2026-04-18T04:32:58.94Z" },
+    { url = "https://files.pythonhosted.org/packages/88/ee/2a5c2aa2c32016a226ca25d3e1056a8102ea6e1fe308bf50213586635400/lxml-6.1.0-cp312-cp312-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:5715e0e28736a070f3f34a7ccc09e2fdcba0e3060abbcf61a1a5718ff6d6b105", size = 5005790, upload-time = "2026-04-18T04:33:01.272Z" },
+    { url = "https://files.pythonhosted.org/packages/e3/38/a0db9be8f38ad6043ab9429487c128dd1d30f07956ef43040402f8da49e8/lxml-6.1.0-cp312-cp312-manylinux_2_26_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:4937460dc5df0cdd2f06a86c285c28afda06aefa3af949f9477d3e8df430c485", size = 5630827, upload-time = "2026-04-18T04:33:04.036Z" },
+    { url = "https://files.pythonhosted.org/packages/31/ba/3c13d3fc24b7cacf675f808a3a1baabf43a30d0cd24c98f94548e9aa58eb/lxml-6.1.0-cp312-cp312-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:bc783ee3147e60a25aa0445ea82b3e8aabb83b240f2b95d32cb75587ff781814", size = 5240445, upload-time = "2026-04-18T04:33:06.87Z" },
+    { url = "https://files.pythonhosted.org/packages/55/ba/eeef4ccba09b2212fe239f46c1692a98db1878e0872ae320756488878a94/lxml-6.1.0-cp312-cp312-manylinux_2_28_i686.whl", hash = "sha256:40d9189f80075f2e1f88db21ef815a2b17b28adf8e50aaf5c789bfe737027f32", size = 5350121, upload-time = "2026-04-18T04:33:09.365Z" },
+    { url = "https://files.pythonhosted.org/packages/7e/01/1da87c7b587c38d0cbe77a01aae3b9c1c49ed47d76918ef3db8fc151b1ca/lxml-6.1.0-cp312-cp312-manylinux_2_31_armv7l.whl", hash = "sha256:05b9b8787e35bec69e68daf4952b2e6dfcfb0db7ecf1a06f8cdfbbac4eb71aad", size = 4694949, upload-time = "2026-04-18T04:33:11.628Z" },
+    { url = "https://files.pythonhosted.org/packages/a1/88/7db0fe66d5aaf128443ee1623dec3db1576f3e4c17751ec0ef5866468590/lxml-6.1.0-cp312-cp312-manylinux_2_38_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:0f0f08beb0182e3e9a86fae124b3c47a7b41b7b69b225e1377db983802404e54", size = 5243901, upload-time = "2026-04-18T04:33:13.95Z" },
+    { url = "https://files.pythonhosted.org/packages/00/a8/1346726af7d1f6fca1f11223ba34001462b0a3660416986d37641708d57c/lxml-6.1.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:73becf6d8c81d4c76b1014dbd3584cb26d904492dcf73ca85dc8bff08dcd6d2d", size = 5048054, upload-time = "2026-04-18T04:33:16.965Z" },
+    { url = "https://files.pythonhosted.org/packages/2e/b7/85057012f035d1a0c87e02f8c723ca3c3e6e0728bcf4cb62080b21b1c1e3/lxml-6.1.0-cp312-cp312-musllinux_1_2_armv7l.whl", hash = "sha256:1ae225f66e5938f4fa29d37e009a3bb3b13032ac57eb4eb42afa44f6e4054e69", size = 4777324, upload-time = "2026-04-18T04:33:19.832Z" },
+    { url = "https://files.pythonhosted.org/packages/75/6c/ad2f94a91073ef570f33718040e8e160d5fb93331cf1ab3ca1323f939e2d/lxml-6.1.0-cp312-cp312-musllinux_1_2_ppc64le.whl", hash = "sha256:690022c7fae793b0489aa68a658822cea83e0d5933781811cabbf5ea3bcfe73d", size = 5645702, upload-time = "2026-04-18T04:33:22.436Z" },
+    { url = "https://files.pythonhosted.org/packages/3b/89/0bb6c0bd549c19004c60eea9dc554dd78fd647b72314ef25d460e0d208c6/lxml-6.1.0-cp312-cp312-musllinux_1_2_riscv64.whl", hash = "sha256:63aeafc26aac0be8aff14af7871249e87ea1319be92090bfd632ec68e03b16a5", size = 5232901, upload-time = "2026-04-18T04:33:26.21Z" },
+    { url = "https://files.pythonhosted.org/packages/a1/d9/d609a11fb567da9399f525193e2b49847b5a409cdebe737f06a8b7126bdc/lxml-6.1.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:264c605ab9c0e4aa1a679636f4582c4d3313700009fac3ec9c3412ed0d8f3e1d", size = 5261333, upload-time = "2026-04-18T04:33:28.984Z" },
+    { url = "https://files.pythonhosted.org/packages/a6/3a/ac3f99ec8ac93089e7dd556f279e0d14c24de0a74a507e143a2e4b496e7c/lxml-6.1.0-cp312-cp312-win32.whl", hash = "sha256:56971379bc5ee8037c5a0f09fa88f66cdb7d37c3e38af3e45cf539f41131ac1f", size = 3596289, upload-time = "2026-04-18T04:27:42.819Z" },
+    { url = "https://files.pythonhosted.org/packages/f2/a7/0a915557538593cb1bbeedcd40e13c7a261822c26fecbbdb71dad0c2f540/lxml-6.1.0-cp312-cp312-win_amd64.whl", hash = "sha256:bba078de0031c219e5dd06cf3e6bf8fb8e6e64a77819b358f53bb132e3e03366", size = 3997059, upload-time = "2026-04-18T04:27:46.764Z" },
+    { url = "https://files.pythonhosted.org/packages/92/96/a5dc078cf0126fbfbc35611d77ecd5da80054b5893e28fb213a5613b9e1d/lxml-6.1.0-cp312-cp312-win_arm64.whl", hash = "sha256:c3592631e652afa34999a088f98ba7dfc7d6aff0d535c410bea77a71743f3819", size = 3659552, upload-time = "2026-04-18T04:27:51.133Z" },
+    { url = "https://files.pythonhosted.org/packages/08/03/69347590f1cf4a6d5a4944bb6099e6d37f334784f16062234e1f892fdb1d/lxml-6.1.0-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:a0092f2b107b69601adf562a57c956fbb596e05e3e6651cabd3054113b007e45", size = 8559689, upload-time = "2026-04-18T04:31:57.785Z" },
+    { url = "https://files.pythonhosted.org/packages/3f/58/25e00bb40b185c974cfe156c110474d9a8a8390d5f7c92a4e328189bb60e/lxml-6.1.0-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:fc7140d7a7386e6b545d41b7358f4d02b656d4053f5fa6859f92f4b9c2572c4d", size = 4617892, upload-time = "2026-04-18T04:32:01.78Z" },
+    { url = "https://files.pythonhosted.org/packages/f5/54/92ad98a94ac318dc4f97aaac22ff8d1b94212b2ae8af5b6e9b354bf825f7/lxml-6.1.0-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:419c58fc92cc3a2c3fa5f78c63dbf5da70c1fa9c1b25f25727ecee89a96c7de2", size = 4923489, upload-time = "2026-04-18T04:33:31.401Z" },
+    { url = "https://files.pythonhosted.org/packages/15/3b/a20aecfab42bdf4f9b390590d345857ad3ffd7c51988d1c89c53a0c73faf/lxml-6.1.0-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:37fabd1452852636cf38ecdcc9dd5ca4bba7a35d6c53fa09725deeb894a87491", size = 5082162, upload-time = "2026-04-18T04:33:34.262Z" },
+    { url = "https://files.pythonhosted.org/packages/45/26/2cdb3d281ac1bd175603e290cbe4bad6eff127c0f8de90bafd6f8548f0fd/lxml-6.1.0-cp313-cp313-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:a2853c8b2170cc6cd54a6b4d50d2c1a8a7aeca201f23804b4898525c7a152cfc", size = 4993247, upload-time = "2026-04-18T04:33:36.674Z" },
+    { url = "https://files.pythonhosted.org/packages/f6/05/d735aef963740022a08185c84821f689fc903acb3d50326e6b1e9886cc22/lxml-6.1.0-cp313-cp313-manylinux_2_26_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:8e369cbd690e788c8d15e56222d91a09c6a417f49cbc543040cba0fe2e25a79e", size = 5613042, upload-time = "2026-04-18T04:33:39.205Z" },
+    { url = "https://files.pythonhosted.org/packages/ee/b8/ead7c10efff731738c72e59ed6eb5791854879fbed7ae98781a12006263a/lxml-6.1.0-cp313-cp313-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:e69aa6805905807186eb00e66c6d97a935c928275182eb02ee40ba00da9623b2", size = 5228304, upload-time = "2026-04-18T04:33:41.647Z" },
+    { url = "https://files.pythonhosted.org/packages/6b/10/e9842d2ec322ea65f0a7270aa0315a53abed06058b88ef1b027f620e7a5f/lxml-6.1.0-cp313-cp313-manylinux_2_28_i686.whl", hash = "sha256:4bd1bdb8a9e0e2dd229de19b5f8aebac80e916921b4b2c6ef8a52bc131d0c1f9", size = 5341578, upload-time = "2026-04-18T04:33:44.596Z" },
+    { url = "https://files.pythonhosted.org/packages/89/54/40d9403d7c2775fa7301d3ddd3464689bfe9ba71acc17dfff777071b4fdc/lxml-6.1.0-cp313-cp313-manylinux_2_31_armv7l.whl", hash = "sha256:cbd7b79cdcb4986ad78a2662625882747f09db5e4cd7b2ae178a88c9c51b3dfe", size = 4700209, upload-time = "2026-04-18T04:33:47.552Z" },
+    { url = "https://files.pythonhosted.org/packages/85/b2/bbdcc2cf45dfc7dfffef4fd97e5c47b15919b6a365247d95d6f684ef5e82/lxml-6.1.0-cp313-cp313-manylinux_2_38_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:43e4d297f11080ec9d64a4b1ad7ac02b4484c9f0e2179d9c4ef78e886e747b88", size = 5232365, upload-time = "2026-04-18T04:33:50.249Z" },
+    { url = "https://files.pythonhosted.org/packages/48/5a/b06875665e53aaba7127611a7bed3b7b9658e20b22bc2dd217a0b7ab0091/lxml-6.1.0-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:cc16682cc987a3da00aa56a3aa3075b08edb10d9b1e476938cfdbee8f3b67181", size = 5043654, upload-time = "2026-04-18T04:33:52.71Z" },
+    { url = "https://files.pythonhosted.org/packages/e9/9c/e71a069d09641c1a7abeb30e693f828c7c90a41cbe3d650b2d734d876f85/lxml-6.1.0-cp313-cp313-musllinux_1_2_armv7l.whl", hash = "sha256:d6d8efe71429635f0559579092bb5e60560d7b9115ee38c4adbea35632e7fa24", size = 4769326, upload-time = "2026-04-18T04:33:55.244Z" },
+    { url = "https://files.pythonhosted.org/packages/cc/06/7a9cd84b3d4ed79adf35f874750abb697dec0b4a81a836037b36e47c091a/lxml-6.1.0-cp313-cp313-musllinux_1_2_ppc64le.whl", hash = "sha256:7e39ab3a28af7784e206d8606ec0e4bcad0190f63a492bca95e94e5a4aef7f6e", size = 5635879, upload-time = "2026-04-18T04:33:58.509Z" },
+    { url = "https://files.pythonhosted.org/packages/cc/f0/9d57916befc1e54c451712c7ee48e9e74e80ae4d03bdce49914e0aee42cd/lxml-6.1.0-cp313-cp313-musllinux_1_2_riscv64.whl", hash = "sha256:9eb667bf50856c4a58145f8ca2d5e5be160191e79eb9e30855a476191b3c3495", size = 5224048, upload-time = "2026-04-18T04:34:00.943Z" },
+    { url = "https://files.pythonhosted.org/packages/99/75/90c4eefda0c08c92221fe0753db2d6699a4c628f76ff4465ec20dea84cc1/lxml-6.1.0-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:7f4a77d6f7edf9230cee3e1f7f6764722a41604ee5681844f18db9a81ea0ec33", size = 5250241, upload-time = "2026-04-18T04:34:03.365Z" },
+    { url = "https://files.pythonhosted.org/packages/5e/73/16596f7e4e38fa33084b9ccbccc22a15f82a290a055126f2c1541236d2ff/lxml-6.1.0-cp313-cp313-win32.whl", hash = "sha256:28902146ffbe5222df411c5d19e5352490122e14447e98cd118907ee3fd6ee62", size = 3596938, upload-time = "2026-04-18T04:31:56.206Z" },
+    { url = "https://files.pythonhosted.org/packages/8e/63/981401c5680c1eb30893f00a19641ac80db5d1e7086c62cb4b13ed813038/lxml-6.1.0-cp313-cp313-win_amd64.whl", hash = "sha256:4a1503c56e4e2b38dc76f2f2da7bae69670c0f1933e27cfa34b2fa5876410b16", size = 3995728, upload-time = "2026-04-18T04:31:58.763Z" },
+    { url = "https://files.pythonhosted.org/packages/e7/e8/c358a38ac3e541d16a1b527e4e9cb78c0419b0506a070ace11777e5e8404/lxml-6.1.0-cp313-cp313-win_arm64.whl", hash = "sha256:e0af85773850417d994d019741239b901b22c6680206f46a34766926e466141d", size = 3658372, upload-time = "2026-04-18T04:32:03.629Z" },
+    { url = "https://files.pythonhosted.org/packages/eb/45/cee4cf203ef0bab5c52afc118da61d6b460c928f2893d40023cfa27e0b80/lxml-6.1.0-cp314-cp314-macosx_10_15_universal2.whl", hash = "sha256:ab863fd37458fed6456525f297d21239d987800c46e67da5ef04fc6b3dd93ac8", size = 8576713, upload-time = "2026-04-18T04:32:06.831Z" },
+    { url = "https://files.pythonhosted.org/packages/8a/a7/eda05babeb7e046839204eaf254cd4d7c9130ce2bbf0d9e90ea41af5654d/lxml-6.1.0-cp314-cp314-macosx_10_15_x86_64.whl", hash = "sha256:6fd8b1df8254ff4fd93fd31da1fc15770bde23ac045be9bb1f87425702f61cc9", size = 4623874, upload-time = "2026-04-18T04:32:10.755Z" },
+    { url = "https://files.pythonhosted.org/packages/e7/e9/db5846de9b436b91890a62f29d80cd849ea17948a49bf532d5278ee69a9e/lxml-6.1.0-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:47024feaae386a92a146af0d2aeed65229bf6fff738e6a11dda6b0015fb8fd03", size = 4949535, upload-time = "2026-04-18T04:34:06.657Z" },
+    { url = "https://files.pythonhosted.org/packages/5a/ba/0d3593373dcae1d68f40dc3c41a5a92f2544e68115eb2f62319a4c2a6500/lxml-6.1.0-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:3f00972f84450204cd5d93a5395965e348956aaceaadec693a22ec743f8ae3eb", size = 5086881, upload-time = "2026-04-18T04:34:09.556Z" },
+    { url = "https://files.pythonhosted.org/packages/43/76/759a7484539ad1af0d125a9afe9c3fb5f82a8779fd1f5f56319d9e4ea2fd/lxml-6.1.0-cp314-cp314-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:97faa0860e13b05b15a51fb4986421ef7a30f0b3334061c416e0981e9450ca4c", size = 5031305, upload-time = "2026-04-18T04:34:12.336Z" },
+    { url = "https://files.pythonhosted.org/packages/dc/b9/c1f0daf981a11e47636126901fd4ab82429e18c57aeb0fc3ad2940b42d8b/lxml-6.1.0-cp314-cp314-manylinux_2_26_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:972a6451204798675407beaad97b868d0c733d9a74dafefc63120b81b8c2de28", size = 5647522, upload-time = "2026-04-18T04:34:14.89Z" },
+    { url = "https://files.pythonhosted.org/packages/31/e6/1f533dcd205275363d9ba3511bcec52fa2df86abf8abe6a5f2c599f0dc31/lxml-6.1.0-cp314-cp314-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:fe022f20bc4569ec66b63b3fb275a3d628d9d32da6326b2982584104db6d3086", size = 5239310, upload-time = "2026-04-18T04:34:17.652Z" },
+    { url = "https://files.pythonhosted.org/packages/c3/8c/4175fb709c78a6e315ed814ed33be3defd8b8721067e70419a6cf6f971da/lxml-6.1.0-cp314-cp314-manylinux_2_28_i686.whl", hash = "sha256:75c4c7c619a744f972f4451bf5adf6d0fb00992a1ffc9fd78e13b0bc817cc99f", size = 5350799, upload-time = "2026-04-18T04:34:20.529Z" },
+    { url = "https://files.pythonhosted.org/packages/fd/77/6ffdebc5994975f0dde4acb59761902bd9d9bb84422b9a0bd239a7da9ca8/lxml-6.1.0-cp314-cp314-manylinux_2_31_armv7l.whl", hash = "sha256:3648f20d25102a22b6061c688beb3a805099ea4beb0a01ce62975d926944d292", size = 4697693, upload-time = "2026-04-18T04:34:23.541Z" },
+    { url = "https://files.pythonhosted.org/packages/f8/f1/565f36bd5c73294602d48e04d23f81ff4c8736be6ba5e1d1ec670ac9be80/lxml-6.1.0-cp314-cp314-manylinux_2_38_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:77b9f99b17cbf14026d1e618035077060fc7195dd940d025149f3e2e830fbfcb", size = 5250708, upload-time = "2026-04-18T04:34:26.001Z" },
+    { url = "https://files.pythonhosted.org/packages/5a/11/a68ab9dd18c5c499404deb4005f4bc4e0e88e5b72cd755ad96efec81d18d/lxml-6.1.0-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:32662519149fd7a9db354175aa5e417d83485a8039b8aaa62f873ceee7ea4cad", size = 5084737, upload-time = "2026-04-18T04:34:28.32Z" },
+    { url = "https://files.pythonhosted.org/packages/ab/78/e8f41e2c74f4af564e6a0348aea69fb6daaefa64bc071ef469823d22cc18/lxml-6.1.0-cp314-cp314-musllinux_1_2_armv7l.whl", hash = "sha256:73d658216fc173cf2c939e90e07b941c5e12736b0bf6a99e7af95459cfe8eabb", size = 4737817, upload-time = "2026-04-18T04:34:30.784Z" },
+    { url = "https://files.pythonhosted.org/packages/06/2d/aa4e117aa2ce2f3b35d9ff246be74a2f8e853baba5d2a92c64744474603a/lxml-6.1.0-cp314-cp314-musllinux_1_2_ppc64le.whl", hash = "sha256:ac4db068889f8772a4a698c5980ec302771bb545e10c4b095d4c8be26749616f", size = 5670753, upload-time = "2026-04-18T04:34:33.675Z" },
+    { url = "https://files.pythonhosted.org/packages/08/f5/dd745d50c0409031dbfcc4881740542a01e54d6f0110bd420fa7782110b8/lxml-6.1.0-cp314-cp314-musllinux_1_2_riscv64.whl", hash = "sha256:45e9dfbd1b661eb64ba0d4dbe762bd210c42d86dd1e5bd2bdf89d634231beb43", size = 5238071, upload-time = "2026-04-18T04:34:36.12Z" },
+    { url = "https://files.pythonhosted.org/packages/3e/74/ad424f36d0340a904665867dab310a3f1f4c96ff4039698de83b77f44c1f/lxml-6.1.0-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:89e8d73d09ac696a5ba42ec69787913d53284f12092f651506779314f10ba585", size = 5264319, upload-time = "2026-04-18T04:34:39.035Z" },
+    { url = "https://files.pythonhosted.org/packages/53/36/a15d8b3514ec889bfd6aa3609107fcb6c9189f8dc347f1c0b81eded8d87c/lxml-6.1.0-cp314-cp314-win32.whl", hash = "sha256:ebe33f4ec1b2de38ceb225a1749a2965855bffeef435ba93cd2d5d540783bf2f", size = 3657139, upload-time = "2026-04-18T04:32:20.006Z" },
+    { url = "https://files.pythonhosted.org/packages/1a/a4/263ebb0710851a3c6c937180a9a86df1206fdfe53cc43005aa2237fd7736/lxml-6.1.0-cp314-cp314-win_amd64.whl", hash = "sha256:398443df51c538bd578529aa7e5f7afc6c292644174b47961f3bf87fe5741120", size = 4064195, upload-time = "2026-04-18T04:32:23.876Z" },
+    { url = "https://files.pythonhosted.org/packages/80/68/2000f29d323b6c286de077ad20b429fc52272e44eae6d295467043e56012/lxml-6.1.0-cp314-cp314-win_arm64.whl", hash = "sha256:8c8984e1d8c4b3949e419158fda14d921ff703a9ed8a47236c6eb7a2b6cb4946", size = 3741870, upload-time = "2026-04-18T04:32:27.922Z" },
+    { url = "https://files.pythonhosted.org/packages/30/e9/21383c7c8d43799f0da90224c0d7c921870d476ec9b3e01e1b2c0b8237c5/lxml-6.1.0-cp314-cp314t-macosx_10_15_universal2.whl", hash = "sha256:1081dd10bc6fa437db2500e13993abf7cc30716d0a2f40e65abb935f02ec559c", size = 8827548, upload-time = "2026-04-18T04:32:15.094Z" },
+    { url = "https://files.pythonhosted.org/packages/a5/01/c6bc11cd587030dd4f719f65c5657960649fe3e19196c844c75bf32cd0d6/lxml-6.1.0-cp314-cp314t-macosx_10_15_x86_64.whl", hash = "sha256:dabecc48db5f42ba348d1f5d5afdc54c6c4cc758e676926c7cd327045749517d", size = 4735866, upload-time = "2026-04-18T04:32:18.924Z" },
+    { url = "https://files.pythonhosted.org/packages/f3/01/757132fff5f4acf25463b5298f1a46099f3a94480b806547b29ce5e385de/lxml-6.1.0-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:e3dd5fe19c9e0ac818a9c7f132a5e43c1339ec1cbbfecb1a938bd3a47875b7c9", size = 4969476, upload-time = "2026-04-18T04:34:41.889Z" },
+    { url = "https://files.pythonhosted.org/packages/fd/fb/1bc8b9d27ed64be7c8903db6c89e74dc8c2cd9ec630a7462e4654316dc5b/lxml-6.1.0-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:9e7b0a4ca6dcc007a4cef00a761bba2dea959de4bd2df98f926b33c92ca5dfb9", size = 5103719, upload-time = "2026-04-18T04:34:44.797Z" },
+    { url = "https://files.pythonhosted.org/packages/d5/e7/5bf82fa28133536a54601aae633b14988e89ed61d4c1eb6b899b023233aa/lxml-6.1.0-cp314-cp314t-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:5d27bbe326c6b539c64b42638b18bc6003a8d88f76213a97ac9ed4f885efeab7", size = 5027890, upload-time = "2026-04-18T04:34:47.634Z" },
+    { url = "https://files.pythonhosted.org/packages/2d/20/e048db5d4b4ea0366648aa595f26bb764b2670903fc585b87436d0a5032c/lxml-6.1.0-cp314-cp314t-manylinux_2_26_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:c4e425db0c5445ef0ad56b0eec54f89b88b2d884656e536a90b2f52aecb4ca86", size = 5596008, upload-time = "2026-04-18T04:34:51.503Z" },
+    { url = "https://files.pythonhosted.org/packages/9a/c2/d10807bc8da4824b39e5bd01b5d05c077b6fd01bd91584167edf6b269d22/lxml-6.1.0-cp314-cp314t-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:4b89b098105b8599dc57adac95d1813409ac476d3c948a498775d3d0c6124bfb", size = 5224451, upload-time = "2026-04-18T04:34:54.263Z" },
+    { url = "https://files.pythonhosted.org/packages/3c/15/2ebea45bea427e7f0057e9ce7b2d62c5aba20c6b001cca89ed0aadb3ad41/lxml-6.1.0-cp314-cp314t-manylinux_2_28_i686.whl", hash = "sha256:c4a699432846df86cc3de502ee85f445ebad748a1c6021d445f3e514d2cd4b1c", size = 5312135, upload-time = "2026-04-18T04:34:56.818Z" },
+    { url = "https://files.pythonhosted.org/packages/31/e2/87eeae151b0be2a308d49a7ec444ff3eb192b14251e62addb29d0bf3778f/lxml-6.1.0-cp314-cp314t-manylinux_2_31_armv7l.whl", hash = "sha256:30e7b2ed63b6c8e97cca8af048589a788ab5c9c905f36d9cf1c2bb549f450d2f", size = 4639126, upload-time = "2026-04-18T04:34:59.704Z" },
+    { url = "https://files.pythonhosted.org/packages/a3/51/8a3f6a20902ad604dd746ec7b4000311b240d389dac5e9d95adefd349e0c/lxml-6.1.0-cp314-cp314t-manylinux_2_38_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:022981127642fe19866d2907d76241bb07ed21749601f727d5d5dd1ce5d1b773", size = 5232579, upload-time = "2026-04-18T04:35:02.658Z" },
+    { url = "https://files.pythonhosted.org/packages/6d/d2/650d619bdbe048d2c3f2c31edb00e35670a5e2d65b4fe3b61bce37b19121/lxml-6.1.0-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:23cad0cc86046d4222f7f418910e46b89971c5a45d3c8abfad0f64b7b05e4a9b", size = 5084206, upload-time = "2026-04-18T04:35:05.175Z" },
+    { url = "https://files.pythonhosted.org/packages/dd/8a/672ca1a3cbeabd1f511ca275a916c0514b747f4b85bdaae103b8fa92f307/lxml-6.1.0-cp314-cp314t-musllinux_1_2_armv7l.whl", hash = "sha256:21c3302068f50d1e8728c67c87ba92aa87043abee517aa2576cca1855326b405", size = 4758906, upload-time = "2026-04-18T04:35:08.098Z" },
+    { url = "https://files.pythonhosted.org/packages/be/f1/ef4b691da85c916cb2feb1eec7414f678162798ac85e042fa164419ac05c/lxml-6.1.0-cp314-cp314t-musllinux_1_2_ppc64le.whl", hash = "sha256:be10838781cb3be19251e276910cd508fe127e27c3242e50521521a0f3781690", size = 5620553, upload-time = "2026-04-18T04:35:11.23Z" },
+    { url = "https://files.pythonhosted.org/packages/59/17/94e81def74107809755ac2782fdad4404420f1c92ca83433d117a6d5acf0/lxml-6.1.0-cp314-cp314t-musllinux_1_2_riscv64.whl", hash = "sha256:2173a7bffe97667bbf0767f8a99e587740a8c56fdf3befac4b09cb29a80276fd", size = 5229458, upload-time = "2026-04-18T04:35:14.254Z" },
+    { url = "https://files.pythonhosted.org/packages/21/55/c4be91b0f830a871fc1b0d730943d56013b683d4671d5198260e2eae722b/lxml-6.1.0-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:c6854e9cf99c84beb004eecd7d3a3868ef1109bf2b1df92d7bc11e96a36c2180", size = 5247861, upload-time = "2026-04-18T04:35:17.006Z" },
+    { url = "https://files.pythonhosted.org/packages/c2/ca/77123e4d77df3cb1e968ade7b1f808f5d3a5c1c96b18a33895397de292c1/lxml-6.1.0-cp314-cp314t-win32.whl", hash = "sha256:00750d63ef0031a05331b9223463b1c7c02b9004cef2346a5b2877f0f9494dd2", size = 3897377, upload-time = "2026-04-18T04:32:07.656Z" },
+    { url = "https://files.pythonhosted.org/packages/64/ce/3554833989d074267c063209bae8b09815e5656456a2d332b947806b05ff/lxml-6.1.0-cp314-cp314t-win_amd64.whl", hash = "sha256:80410c3a7e3c617af04de17caa9f9f20adaa817093293d69eae7d7d0522836f5", size = 4392701, upload-time = "2026-04-18T04:32:12.113Z" },
+    { url = "https://files.pythonhosted.org/packages/2b/a0/9b916c68c0e57752c07f8f64b30138d9d4059dbeb27b90274dedbea128ff/lxml-6.1.0-cp314-cp314t-win_arm64.whl", hash = "sha256:26dd9f57ee3bd41e7d35b4c98a2ffd89ed11591649f421f0ec19f67d50ec67ac", size = 3817120, upload-time = "2026-04-18T04:32:15.803Z" },
+]
+
 [[package]]
 name = "minio"
 version = "7.2.20"
@@ -1047,6 +1129,15 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/7c/4c/ad33b92b9864cbde84f259d5df035a6447f91891f5be77788e2a3892bce3/pymysql-1.1.2-py3-none-any.whl", hash = "sha256:e6b1d89711dd51f8f74b1631fe08f039e7d76cf67a42a323d3178f0f25762ed9", size = 45300, upload-time = "2025-08-24T12:55:53.394Z" },
 ]
 
+[[package]]
+name = "pypdf2"
+version = "3.0.1"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/9f/bb/18dc3062d37db6c491392007dfd1a7f524bb95886eb956569ac38a23a784/PyPDF2-3.0.1.tar.gz", hash = "sha256:a74408f69ba6271f71b9352ef4ed03dc53a31aa404d29b5d31f53bfecfee1440", size = 227419, upload-time = "2022-12-31T10:36:13.13Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/8e/5e/c86a5643653825d3c913719e788e41386bee415c2b87b4f955432f2de6b2/pypdf2-3.0.1-py3-none-any.whl", hash = "sha256:d16e4205cfee272fbdc0568b68d82be796540b1537508cef59388f839c191928", size = 232572, upload-time = "2022-12-31T10:36:10.327Z" },
+]
+
 [[package]]
 name = "python-dateutil"
 version = "2.9.0.post0"
@@ -1059,6 +1150,19 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/ec/57/56b9bcc3c9c6a792fcbaf139543cee77261f3651ca9da0c93f5c1221264b/python_dateutil-2.9.0.post0-py2.py3-none-any.whl", hash = "sha256:a8b2bc7bffae282281c8140a97d3aa9c14da0b136dfe83f850eea9a5f7470427", size = 229892, upload-time = "2024-03-01T18:36:18.57Z" },
 ]
 
+[[package]]
+name = "python-docx"
+version = "1.2.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "lxml" },
+    { name = "typing-extensions" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/a9/f7/eddfe33871520adab45aaa1a71f0402a2252050c14c7e3009446c8f4701c/python_docx-1.2.0.tar.gz", hash = "sha256:7bc9d7b7d8a69c9c02ca09216118c86552704edc23bac179283f2e38f86220ce", size = 5723256, upload-time = "2025-06-16T20:46:27.921Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/d0/00/1e03a4989fa5795da308cd774f05b704ace555a70f9bf9d3be057b680bcf/python_docx-1.2.0-py3-none-any.whl", hash = "sha256:3fd478f3250fbbbfd3b94fe1e985955737c145627498896a8a6bf81f4baf66c7", size = 252987, upload-time = "2025-06-16T20:46:22.506Z" },
+]
+
 [[package]]
 name = "python-dotenv"
 version = "1.2.1"