优化数据预处理

更新README文档
优化报告提示词，删减特殊字符
2026-02-02 09:44:07 +08:00 · 2026-02-02 09:28:49 +08:00 · 2026-02-02 09:18:14 +08:00 · 2026-01-31 20:27:17 +08:00 · 2026-01-31 18:00:05 +08:00 · 2026-01-24 12:53:06 +08:00
41 changed files with 4570 additions and 566 deletions
--- a/.env.example
+++ b/.env.example
@@ -1,7 +1,7 @@

 # 火山引擎配置
 OPENAI_API_KEY=sk-c44i1hy64xgzwox6x08o4zug93frq6rgn84oqugf2pje1tg4
-OPENAI_BASE_URL=https://api.xiaomimimo.com/v1/chat/completions
+OPENAI_BASE_URL=https://api.xiaomimimo.com/v1
 # 文本模型
 OPENAI_MODEL=mimo-v2-flash
 # OPENAI_MODEL=deepseek-r1-250528
--- a/.gitignore
+++ b/.gitignore
@@ -6,6 +6,8 @@ __pycache__/
 # C extensions
 *.so

+
+
 # Distribution / packaging
 .Python
 build/
--- a/21
+++ b/21
@@ -1,21 +0,0 @@
-MIT License
-
-Copyright (c) 2025 Data Analysis Agent Team
-
-Permission is hereby granted, free of charge, to any person obtaining a copy
-of this software and associated documentation files (the "Software"), to deal
-in the Software without restriction, including without limitation the rights
-to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
-copies of the Software, and to permit persons to whom the Software is
-furnished to do so, subject to the following conditions:
-
-The above copyright notice and this permission notice shall be included in all
-copies or substantial portions of the Software.
-
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
-AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
-SOFTWARE.
--- a/README.md
+++ b/README.md
@@ -31,16 +31,19 @@ data_analysis_agent/
 │   ├── fallback_openai_client.py  # 支持故障转移的OpenAI客户端
 │   ├── extract_code.py        # 代码提取工具
 │   ├── format_execution_result.py  # 执行结果格式化
-│   └── create_session_dir.py  # 会话目录管理
+│   ├── create_session_dir.py  # 会话目录管理
+│   ├── data_loader.py         # 数据加载与画像生成
+│   └── script_generator.py    # 可复用脚本生成器
 ├── 📄 data_analysis_agent.py  # 主智能体类
 ├── 📄 prompts.py              # 系统提示词模板
 ├── 📄 main.py                 # 使用示例
 ├── 📄 requirements.txt        # 项目依赖
 ├── 📄 .env                    # 环境变量配置
 └── 📁 outputs/                # 分析结果输出目录
-    └── session_[时间戳]/       # 每次分析的独立会话目录
-        ├── *.png              # 生成的图表    
-        └── 最终分析报告.md      # Markdown报告
+    └── session_[时间戳]/        # 每次分析的独立会话目录
+        ├── *.png             # 生成的图表
+        ├── 最终分析报告.md    # Markdown报告
+        └── 最终分析报告.docx  # Word报告
 ```

 ## 📊 数据分析流程图
@@ -123,9 +126,9 @@ sequenceDiagram

 ```bash
 # 克隆项目
-git clone  http://jeason.online:3000/zhaojie/iov_data_analysis_agent.git
+git clone  https://github.com/li-xiu-qi/data_analysis_agent.git

-cd iov_data_analysis_agent
+cd data_analysis_agent

 # 安装依赖
 pip install -r requirements.txt
@@ -133,7 +136,7 @@ pip install -r requirements.txt

 ### 2. 配置API密钥

-创建`.env`及`llm_config.py`文件：
+创建`.env`文件：

 ```bash
 # OpenAI API配置
@@ -156,11 +159,10 @@ from config.llm_config import LLMConfig
 llm_config = LLMConfig()
 agent = DataAnalysisAgent(llm_config)

-# 开始分析（input中输入你想要的内容。并在运行前调整提示词中的报告格式及分析要求）
+# 开始分析
 files = ["your_data.csv"]
-#excel文件同样支持
 report = agent.analyze(
-    user_input="分析车联网运维工单的数据，帮我汇总生成一份用于汇报使用的，运维工单报告销售数据，XXXXXXXXXX",
+    user_input="分析XXXXXXXXX数据，生成趋势图表和关键指标",
    files=files
 )

@@ -187,24 +189,52 @@ report = quick_analysis(

 ## 📊 使用示例

+以下是分析贵州茅台财务数据的完整示例：

 ```python
-# 示例：工单健康度分析
-files = ["iov.csv"]
+# 示例：茅台财务分析
+files = ["XXXXXXXXx.csv"]
 report = agent.analyze(
-    user_input="基于所有的运维工单，输出XXXX等重要的统计指标，并绘制相关图表。最后生成汇报给我。",
+    user_input="基于数据，输出五个重要的统计指标，并绘制相关图表。最后生成汇报给我。",
    files=files
 )
 ```

 **生成的分析内容包括：**

-  工单月度/周度/日度趋势图
-  问题类型及车型分布
-  问题模块分析图表
-  问题处理时长分析图表
-  问题模块汇总图表
-  
+- 📈 营业总收入趋势图
+- 💰 净利润率变化分析
+- 📊 利润构成分析图表
+- 💵 每股收益变化趋势
+- 📋 营业成本占比分析
+- 📄 综合分析报告
+
+## 🌐 Web界面可视化
+
+本项目提供了现代化的Web界面，支持零代码交互。
+
+### 启动方式
+
+**macOS/Linux:**
+```bash
+./start_web.sh
+```
+
+**Windows:**
+```bash
+start_web.bat
+```
+
+访问地址: `http://localhost:8000`
+
+### 核心功能 (Web)
+
+- **🖼️ 图表画廊 (Gallery)**: 网格化展示所有生成图表，每张图表附带AI生成的分析解读。
+- **📜 实时日志**: 像黑客帝国一样实时查看后台分析过程和Agent的思考逻辑。
+- **📦 一键导出**: 支持一键下载包含 Markdown 报告和所有高清原图的 ZIP 压缩包。
+- **🛠️ 数据工具箱**:
+    - **Excel合并**: 将多个同构 Excel 文件快速合并为分析可用的 CSV。
+    - **时间排序**: 自动修复 CSV 数据的乱序问题，确保时序分析准确。

 ## 🎨 流程可视化

@@ -238,12 +268,15 @@ stateDiagram-v2
 ```python
@dataclass
 class LLMConfig:
-    provider: str = "openai"
+    provider: str = os.environ.get("LLM_PROVIDER", "openai")
    api_key: str = os.environ.get("OPENAI_API_KEY", "")
    base_url: str = os.environ.get("OPENAI_BASE_URL", "https://api.openai.com/v1")
    model: str = os.environ.get("OPENAI_MODEL", "gpt-4")
-    max_tokens: int = 4000
-    temperature: float = 0.1
+    temperature: float = 0.5
+    max_tokens: int = 8192
+    
+    # 支持 gemini 等其他 provider 配置
+    # ...
 ```

 ### 执行器配置
@@ -253,7 +286,9 @@ class LLMConfig:
 ALLOWED_IMPORTS = {
    'pandas', 'numpy', 'matplotlib', 'duckdb', 
    'scipy', 'sklearn', 'plotly', 'requests',
-    'os', 'json', 'datetime', 're', 'pathlib'
+    'os', 'json', 'datetime', 're', 'pathlib',
+    'seaborn', 'statsmodels', 'networkx', 'jieba',
+    'wordcloud', 'PIL', 'sqlite3', 'yaml'
 }
 ```

@@ -323,6 +358,20 @@ A: 可以在Mermaid代码块中添加样式定义，或使用不同的图表类

 分析过程中的错误信息会保存在会话目录中，便于调试和优化。

+## 🤝 贡献指南
+
+欢迎贡献代码和改进建议！
+
+1. Fork 项目
+2. 创建功能分支
+3. 提交更改
+4. 推送到分支
+5. 创建Pull Request
+
+## 📄 许可证
+
+本项目基于MIT许可证开源。详见[LICENSE](LICENSE)文件。
+
 ## 🔄 更新日志

 ### v1.0.0
@@ -337,6 +386,6 @@ A: 可以在Mermaid代码块中添加样式定义，或使用不同的图表类

 <div align="center">

-**简化数据分析入门门槛，让数据分析变得更智能、更简单！**
+**🚀 让数据分析变得更智能、更简单！**

 </div>
--- a/bootstrap.py
+++ b/bootstrap.py
@@ -0,0 +1,62 @@
+import sys
+import subprocess
+import importlib.metadata
+import os
+
+def check_dependencies():
+    """Checks if dependencies in requirements.txt are installed."""
+    requirements_file = "requirements.txt"
+    if not os.path.exists(requirements_file):
+        print(f"Warning: {requirements_file} not found. Skipping dependency check.")
+        return
+
+    print("Checking dependencies...")
+    missing_packages = []
+    
+    with open(requirements_file, "r") as f:
+        for line in f:
+            line = line.strip()
+            if not line or line.startswith("#"):
+                continue
+            
+            # Simple parsing for package name. 
+            # This handles 'package>=version', 'package==version', 'package'
+            # It does NOT handle complex markers perfectly, but suffices for basic checking.
+            package_name = line.split("=")[0].split(">")[0].split("<")[0].strip()
+            
+            try:
+                importlib.metadata.version(package_name)
+            except importlib.metadata.PackageNotFoundError:
+                missing_packages.append(line)
+
+    if missing_packages:
+        print(f"Missing dependencies: {', '.join(missing_packages)}")
+        print("Installing missing dependencies...")
+        try:
+            subprocess.check_call([sys.executable, "-m", "pip", "install", "-r", requirements_file])
+            print("Dependencies installed successfully.")
+        except subprocess.CalledProcessError as e:
+            print(f"Error installing dependencies: {e}")
+            sys.exit(1)
+    else:
+        print("All dependencies checked.")
+
+def main():
+    check_dependencies()
+    
+    print("Starting application...")
+    try:
+        # Run the main application
+        # Using sys.executable ensures we use the same python interpreter
+        subprocess.run([sys.executable, "main.py"], check=True)
+    except subprocess.CalledProcessError as e:
+        print(f"Application exited with error: {e}")
+        sys.exit(e.returncode)
+    except KeyboardInterrupt:
+        print("\nApplication stopped by user.")
+    except Exception as e:
+        print(f"An unexpected error occurred: {e}")
+        sys.exit(1)
+
+if __name__ == "__main__":
+    main()
--- a/cleaned_data/.gitkeep
+++ b/cleaned_data/.gitkeep
--- a/config/app_config.py
+++ b/config/app_config.py
@@ -0,0 +1,81 @@
+# -*- coding: utf-8 -*-
+"""
+应用配置中心 - 集中管理所有配置项
+"""
+
+import os
+from dataclasses import dataclass, field
+from typing import List, Optional
+
+
+@dataclass
+class AppConfig:
+    """应用配置中心"""
+    
+    # 分析配置
+    max_rounds: int = field(default=20)
+    force_max_rounds: bool = field(default=False)
+    default_output_dir: str = field(default="outputs")
+    
+    # 数据处理配置
+    max_file_size_mb: int = field(default=500)  # 最大文件大小（MB）
+    chunk_size: int = field(default=100000)  # 分块读取大小
+    data_cache_enabled: bool = field(default=True)
+    cache_dir: str = field(default=".cache/data")
+    
+    # LLM配置
+    llm_cache_enabled: bool = field(default=True)
+    llm_cache_dir: str = field(default=".cache/llm")
+    llm_stream_enabled: bool = field(default=False)
+    
+    # 代码执行配置
+    code_timeout: int = field(default=300)  # 代码执行超时（秒）
+    allowed_imports: List[str] = field(default_factory=lambda: [
+        'pandas', 'numpy', 'matplotlib', 'seaborn', 'plotly',
+        'scipy', 'sklearn', 'duckdb', 'datetime', 'json',
+        'os', 're', 'pathlib', 'glob', 'typing', 'collections',
+        'itertools', 'functools', 'warnings'
+    ])
+    
+    # Web配置
+    web_host: str = field(default="0.0.0.0")
+    web_port: int = field(default=8000)
+    upload_dir: str = field(default="uploads")
+    
+    # 日志配置
+    log_filename: str = field(default="log.txt")
+    enable_code_logging: bool = field(default=False)  # 是否记录生成的代码
+    
+    @classmethod
+    def from_env(cls) -> 'AppConfig':
+        """从环境变量创建配置"""
+        config = cls()
+        
+        # 从环境变量覆盖配置
+        if max_rounds := os.getenv("APP_MAX_ROUNDS"):
+            config.max_rounds = int(max_rounds)
+        
+        if chunk_size := os.getenv("APP_CHUNK_SIZE"):
+            config.chunk_size = int(chunk_size)
+            
+        if cache_enabled := os.getenv("APP_CACHE_ENABLED"):
+            config.data_cache_enabled = cache_enabled.lower() == "true"
+            
+        return config
+    
+    def validate(self) -> bool:
+        """验证配置"""
+        if self.max_rounds <= 0:
+            raise ValueError("max_rounds must be positive")
+        
+        if self.chunk_size <= 0:
+            raise ValueError("chunk_size must be positive")
+            
+        if self.code_timeout <= 0:
+            raise ValueError("code_timeout must be positive")
+            
+        return True
+
+
+# 全局配置实例
+app_config = AppConfig.from_env()
--- a/config/llm_config
+++ b/config/llm_config
@@ -0,0 +1,55 @@
+# -*- coding: utf-8 -*-
+"""
+配置管理模块
+"""
+
+import os
+from typing import Dict, Any
+from dataclasses import dataclass, asdict
+
+
+from dotenv import load_dotenv
+
+load_dotenv()
+
+
+@dataclass
+class LLMConfig:
+    """LLM配置"""
+
+    provider: str = os.environ.get("LLM_PROVIDER", "openai")  # openai, gemini, etc.
+    api_key: str = os.environ.get("OPENAI_API_KEY", "sk-2187174de21548b0b8b0c92129700199")
+    base_url: str = os.environ.get("OPENAI_BASE_URL", "http://127.0.0.1:9999/v1")
+    model: str = os.environ.get("OPENAI_MODEL", "gemini--flash")
+    temperature: float = 0.5
+    max_tokens: int = 131072
+
+    def __post_init__(self):
+        """配置初始化后的处理"""
+        if self.provider == "gemini":
+            # 如果使用 Gemini，尝试从环境变量加载 Gemini 配置，或者使用默认的 Gemini 配置
+            # 注意：如果 OPENAI_API_KEY 已设置且 GEMINI_API_KEY 未设置，可能会沿用 OpenAI 的 Key，
+            # 但既然用户切换了 provider，通常会有配套的 Key。
+            self.api_key = os.environ.get("GEMINI_API_KEY", "AIzaSyA9aVFjRJYJq82WEQUVlifE4fE7BnX6QiY")
+            # Gemini 的 OpenAI 兼容接口地址
+            self.base_url = os.environ.get("GEMINI_BASE_URL", "https://gemini.jeason.online")
+            self.model = os.environ.get("GEMINI_MODEL", "gemini-2.5-flash")
+
+    def to_dict(self) -> Dict[str, Any]:
+        """转换为字典"""
+        return asdict(self)
+
+    @classmethod
+    def from_dict(cls, data: Dict[str, Any]) -> "LLMConfig":
+        """从字典创建配置"""
+        return cls(**data)
+
+    def validate(self) -> bool:
+        """验证配置有效性"""
+        if not self.api_key:
+            raise ValueError("OPENAI_API_KEY is required")
+        if not self.base_url:
+            raise ValueError("OPENAI_BASE_URL is required")
+        if not self.model:
+            raise ValueError("OPENAI_MODEL is required")
+        return True
--- a/config/llm_config.py
+++ b/config/llm_config.py
@@ -17,12 +17,25 @@ load_dotenv()
 class LLMConfig:
    """LLM配置"""

-    provider: str = "openai"  # openai, anthropic, etc.
-    api_key: str = os.environ.get("OPENAI_API_KEY", "sk-c44i1hy64xgzwox6x08o4zug93frq6rgn84oqugf2pje1tg4")
-    base_url: str = os.environ.get("OPENAI_BASE_URL", "https://api.xiaomimimo.com/v1")
-    model: str = os.environ.get("OPENAI_MODEL", "mimo-v2-flash")
+    provider: str = os.environ.get("LLM_PROVIDER", "openai")  # openai, gemini, etc.
+    api_key: str = os.environ.get("OPENAI_API_KEY", "sk-2187174de21548b0b8b0c92129700199")
+    base_url: str = os.environ.get("OPENAI_BASE_URL", "http://127.0.0.1:9999/v1")
+    model: str = os.environ.get("OPENAI_MODEL", "gemini-3-flash")
    temperature: float = 0.5
-    max_tokens: int = 131072
+    max_tokens: int = 8192  # 降低默认值,避免某些API不支持过大的值
+
+    def __post_init__(self):
+        """配置初始化后的处理"""
+        if self.provider == "gemini":
+            # 如果使用 Gemini，尝试从环境变量加载 Gemini 配置，或者使用默认的 Gemini 配置
+            # 注意：如果 OPENAI_API_KEY 已设置且 GEMINI_API_KEY 未设置，可能会沿用 OpenAI 的 Key，
+            # 但既然用户切换了 provider，通常会有配套的 Key。
+            self.api_key = os.environ.get("GEMINI_API_KEY", "AIzaSyA9aVFjRJYJq82WEQUVlifE4fE7BnX6QiY")
+            # Gemini 的 OpenAI 兼容接口地址
+            self.base_url = os.environ.get("GEMINI_BASE_URL", "https://gemini.jeason.online")
+            self.model = os.environ.get("GEMINI_MODEL", "gemini-2.5-flash")
+            # Gemini 有更严格的 token 限制
+            self.max_tokens = 8192

    def to_dict(self) -> Dict[str, Any]:
        """转换为字典"""
--- a/data_analysis_agent.py
+++ b/data_analysis_agent.py
@@ -18,8 +18,9 @@ from utils.extract_code import extract_code_from_response
 from utils.data_loader import load_and_profile_data
 from utils.llm_helper import LLMHelper
 from utils.code_executor import CodeExecutor
+from utils.script_generator import generate_reusable_script
 from config.llm_config import LLMConfig
-from prompts import data_analysis_system_prompt, final_report_system_prompt
+from prompts import data_analysis_system_prompt, final_report_system_prompt, data_analysis_followup_prompt


 class DataAnalysisAgent:
@@ -61,6 +62,8 @@ class DataAnalysisAgent:
        self.session_output_dir = None
        self.executor = None
        self.data_profile = ""  # 存储数据画像
+        self.data_files = []  # 存储数据文件列表
+        self.user_requirement = ""  # 存储用户需求

    def _process_response(self, response: str) -> Dict[str, Any]:
        """
@@ -76,7 +79,7 @@ class DataAnalysisAgent:
            yaml_data = self.llm.parse_yaml_response(response)
            action = yaml_data.get("action", "generate_code")

-            print(f"🎯 检测到动作: {action}")
+            print(f"[TARGET] 检测到动作: {action}")

            if action == "analysis_complete":
                return self._handle_analysis_complete(response, yaml_data)
@@ -85,18 +88,22 @@ class DataAnalysisAgent:
            elif action == "generate_code":
                return self._handle_generate_code(response, yaml_data)
            else:
-                print(f"⚠️ 未知动作类型: {action}，按generate_code处理")
+                print(f"[WARN] 未知动作类型: {action}，按generate_code处理")
                return self._handle_generate_code(response, yaml_data)

        except Exception as e:
-            print(f"⚠️ 解析响应失败: {str(e)}，按generate_code处理")
+            print(f"[WARN] 解析响应失败: {str(e)}，尝试提取代码并按generate_code处理")
+            # 即使YAML解析失败，也尝试提取代码
+            extracted_code = extract_code_from_response(response)
+            if extracted_code:
+                 return self._handle_generate_code(response, {"code": extracted_code})
            return self._handle_generate_code(response, {})

    def _handle_analysis_complete(
        self, response: str, yaml_data: Dict[str, Any]
    ) -> Dict[str, Any]:
        """处理分析完成动作"""
-        print("✅ 分析任务完成")
+        print("[OK] 分析任务完成")
        final_report = yaml_data.get("final_report", "分析完成，无最终报告")
        return {
            "action": "analysis_complete",
@@ -109,7 +116,7 @@ class DataAnalysisAgent:
        self, response: str, yaml_data: Dict[str, Any]
    ) -> Dict[str, Any]:
        """处理图片收集动作"""
-        print("📊 开始收集图片")
+        print("[CHART] 开始收集图片")
        figures_to_collect = yaml_data.get("figures_to_collect", [])

        collected_figures = []
@@ -126,41 +133,40 @@ class DataAnalysisAgent:
            description = figure_info.get("description", "")
            analysis = figure_info.get("analysis", "")

-            print(f"📈 收集图片 {figure_number}: {filename}")
-            print(f"   📂 路径: {file_path}")
-            print(f"   📝 描述: {description}")
-            print(f"   🔍 分析: {analysis}")
+            print(f"[GRAPH] 收集图片 {figure_number}: {filename}")
+            print(f"   [DIR] 路径: {file_path}")
+            print(f"   [NOTE] 描述: {description}")
+            print(f"   [SEARCH] 分析: {analysis}")


-            # 记录图片信息
-            collected_figures.append(
-                {
-                    "figure_number": figure_number,
-                    "filename": filename,
-                    "file_path": file_path,
-                    "description": description,
-                    "analysis": analysis,
-                }
-            )
+            # 使用seen_paths集合来去重，防止重复收集
+            seen_paths = set()
+            
            # 验证文件是否存在
            # 只有文件真正存在时才加入列表，防止报告出现裂图
            if file_path and os.path.exists(file_path):
-                print(f"   ✅ 文件存在: {file_path}")
-                # 记录图片信息
-                collected_figures.append(
-                    {
-                        "figure_number": figure_number,
-                        "filename": filename,
-                        "file_path": file_path,
-                        "description": description,
-                        "analysis": analysis,
-                    }
-                )
+                # 检查是否已经收集过该路径
+                abs_path = os.path.abspath(file_path)
+                if abs_path not in seen_paths:
+                    print(f"   [OK] 文件存在: {file_path}")
+                    # 记录图片信息
+                    collected_figures.append(
+                        {
+                            "figure_number": figure_number,
+                            "filename": filename,
+                            "file_path": file_path,
+                            "description": description,
+                            "analysis": analysis,
+                        }
+                    )
+                    seen_paths.add(abs_path)
+                else:
+                    print(f"   [WARN] 跳过重复图片: {file_path}")
            else:
                if file_path:
-                    print(f"   ⚠️ 文件不存在: {file_path}")
+                    print(f"   [WARN] 文件不存在: {file_path}")
                else:
-                    print(f"   ⚠️ 未提供文件路径")
+                    print(f"   [WARN] 未提供文件路径")

        return {
            "action": "collect_figures",
@@ -192,7 +198,7 @@ class DataAnalysisAgent:
                code = code.strip()

        if code:
-            print(f"🔧 执行代码:\n{code}")
+            print(f"[TOOL] 执行代码:\n{code}")
            print("-" * 40)

            # 执行代码
@@ -200,7 +206,7 @@ class DataAnalysisAgent:

            # 格式化执行结果
            feedback = format_execution_result(result)
-            print(f"📋 执行反馈:\n{feedback}")
+            print(f"[LIST] 执行反馈:\n{feedback}")

            return {
                "action": "generate_code",
@@ -212,7 +218,7 @@ class DataAnalysisAgent:
            }
        else:
            # 如果没有代码，说明LLM响应格式有问题，需要重新生成
-            print("⚠️ 未从响应中提取到可执行代码，要求LLM重新生成")
+            print("[WARN] 未从响应中提取到可执行代码，要求LLM重新生成")
            return {
                "action": "invalid_response",
                "error": "响应中缺少可执行代码",
@@ -220,7 +226,7 @@ class DataAnalysisAgent:
                "continue": True,
            }

-    def analyze(self, user_input: str, files: List[str] = None, session_output_dir: str = None) -> Dict[str, Any]:
+    def analyze(self, user_input: str, files: List[str] = None, session_output_dir: str = None, reset_session: bool = True, max_rounds: int = None) -> Dict[str, Any]:
        """
        开始分析流程

@@ -228,89 +234,147 @@ class DataAnalysisAgent:
            user_input: 用户的自然语言需求
            files: 数据文件路径列表
            session_output_dir: 指定的会话输出目录（可选）
+            reset_session: 是否重置会话 (True: 新开启分析; False: 在现有上下文中继续)
+            max_rounds: 本次分析的最大轮数 (可选，如果不填则使用默认值)

        Returns:
            分析结果字典
        """
-        # 重置状态
-        self.conversation_history = []
-        self.analysis_results = []
-        self.current_round = 0
+        
+        # 确定本次运行的轮数限制
+        current_max_rounds = max_rounds if max_rounds is not None else self.max_rounds

-        # 创建本次分析的专用输出目录
-        if session_output_dir:
-             self.session_output_dir = session_output_dir
+        if reset_session:
+            # --- 初始化新会话 ---
+            self.conversation_history = []
+            self.analysis_results = []
+            self.current_round = 0
+            self.data_files = files or []  # 保存数据文件列表
+            self.user_requirement = user_input  # 保存用户需求
+            
+            # 创建本次分析的专用输出目录
+            if session_output_dir:
+                 self.session_output_dir = session_output_dir
+            else:
+                self.session_output_dir = create_session_output_dir(
+                    self.base_output_dir, user_input
+                )
+
+            # 初始化代码执行器，使用会话目录
+            self.executor = CodeExecutor(self.session_output_dir)
+
+            # 设置会话目录变量到执行环境中
+            self.executor.set_variable("session_output_dir", self.session_output_dir)
+
+            # 设用工具生成数据画像
+            data_profile = ""
+            if files:
+                print("[SEARCH] 正在生成数据画像...")
+                try:
+                    data_profile = load_and_profile_data(files)
+                    print("[OK] 数据画像生成完毕")
+                except Exception as e:
+                    print(f"[WARN] 数据画像生成失败: {e}")
+            
+            # 保存到实例变量供最终报告使用
+            self.data_profile = data_profile
+
+            # 构建初始prompt
+            initial_prompt = f"""用户需求: {user_input}"""
+            if files:
+                initial_prompt += f"\n数据文件: {', '.join(files)}"
+            
+            if data_profile:
+                initial_prompt += f"\n\n{data_profile}\n\n请根据上述【数据画像】中的统计信息（如高频值、缺失率、数据范围）来制定分析策略。如果发现明显的高频问题或异常分布，请优先进行深度分析。"
+
+            print(f"[START] 开始数据分析任务")
+            print(f"[NOTE] 用户需求: {user_input}")
+            if files:
+                print(f"[FOLDER] 数据文件: {', '.join(files)}")
+            print(f"[DIR] 输出目录: {self.session_output_dir}")
+            
+            # 添加到对话历史
+            self.conversation_history.append({"role": "user", "content": initial_prompt})
+            
        else:
-            self.session_output_dir = create_session_output_dir(
-                self.base_output_dir, user_input
-            )
+            # --- 继续现有会话 ---
+            # 如果是追问，且没有指定轮数，默认减少轮数，避免过度分析
+            if max_rounds is None:
+                current_max_rounds = 10 # 追问通常不需要那么长的思考链，10轮足够
+            
+            print(f"\n[START] 继续分析任务 (追问模式)")
+            print(f"[NOTE] 后续需求: {user_input}")
+            
+            # 重置当前轮数计数器，以便给新任务足够的轮次
+            self.current_round = 0 
+            
+            # 添加到对话历史
+            # 提示Agent这是后续追问，可以简化步骤
+            follow_up_prompt = f"后续需求: {user_input}\n(注意：这是后续追问，请直接针对该问题进行分析，无需从头开始执行完整SOP。)"
+            self.conversation_history.append({"role": "user", "content": follow_up_prompt})

-
-        # 初始化代码执行器，使用会话目录
-        self.executor = CodeExecutor(self.session_output_dir)
-
-        # 设置会话目录变量到执行环境中
-        self.executor.set_variable("session_output_dir", self.session_output_dir)
-
-        # 设用工具生成数据画像
-        data_profile = ""
-        if files:
-            print("🔍 正在生成数据画像...")
-            data_profile = load_and_profile_data(files)
-            print("✅ 数据画像生成完毕")
-        
-        # 保存到实例变量供最终报告使用
-        self.data_profile = data_profile
-
-        # 构建初始prompt
-        initial_prompt = f"""用户需求: {user_input}"""
-        if files:
-            initial_prompt += f"\n数据文件: {', '.join(files)}"
-        
-        if data_profile:
-            initial_prompt += f"\n\n{data_profile}\n\n请根据上述【数据画像】中的统计信息（如高频值、缺失率、数据范围）来制定分析策略。如果发现明显的高频问题或异常分布，请优先进行深度分析。"
-
-        print(f"🚀 开始数据分析任务")
-        print(f"📝 用户需求: {user_input}")
-        if files:
-            print(f"📁 数据文件: {', '.join(files)}")
-        print(f"📂 输出目录: {self.session_output_dir}")
-        print(f"🔢 最大轮数: {self.max_rounds}")
+        print(f"[NUM] 本次最大轮数: {current_max_rounds}")
        if self.force_max_rounds:
-            print(f"⚡ 强制模式: 将运行满 {self.max_rounds} 轮（忽略AI完成信号）")
+            print(f"[FAST] 强制模式: 将运行满 {current_max_rounds} 轮（忽略AI完成信号）")
        print("=" * 60)
-        # 添加到对话历史
-        self.conversation_history.append({"role": "user", "content": initial_prompt})
+        
+        # 保存原始 max_rounds 以便恢复（虽然 analyze 结束后不需要恢复，但为了逻辑严谨）
+        original_max_rounds = self.max_rounds
+        self.max_rounds = current_max_rounds
+
+        # 初始化连续失败计数器
+        consecutive_failures = 0

        while self.current_round < self.max_rounds:
            self.current_round += 1
-            print(f"\n🔄 第 {self.current_round} 轮分析")
+            print(f"\n[LOOP] 第 {self.current_round} 轮分析")
            # 调用LLM生成响应
            try:  # 获取当前执行环境的变量信息
                notebook_variables = self.executor.get_environment_info()

+                # Select prompt based on mode
+                if self.current_round == 1 and not reset_session:
+                     # For the first round of a follow-up session, use the specialized prompt
+                     base_system_prompt = data_analysis_followup_prompt
+                elif not reset_session and self.current_round > 1:
+                     # For subsequent rounds in follow-up, continue using the follow-up context
+                     # or maybe just the standard one is fine as long as SOP isn't fully enforced?
+                     # Let's stick to the follow-up prompt to prevent SOP regression
+                     base_system_prompt = data_analysis_followup_prompt
+                else:
+                     base_system_prompt = data_analysis_system_prompt
+
                # 格式化系统提示词，填入动态的notebook变量信息
-                formatted_system_prompt = data_analysis_system_prompt.format(
+                formatted_system_prompt = base_system_prompt.format(
                    notebook_variables=notebook_variables
                )
-                print(f"🐛 [DEBUG] System Prompt Head:\n{formatted_system_prompt[:500]}...\n[...]")
-                print(f"🐛 [DEBUG] System Prompt Rules Check: 'stop_words' in prompt? {'stop_words' in formatted_system_prompt}")
+                print(f"[DEBUG] [DEBUG] System Prompt Head:\n{formatted_system_prompt[:500]}...\n[...]")
+                print(f"[DEBUG] [DEBUG] System Prompt Rules Check: 'stop_words' in prompt? {'stop_words' in formatted_system_prompt}")

                response = self.llm.call(
                    prompt=self._build_conversation_prompt(),
                    system_prompt=formatted_system_prompt,
                )

-                print(f"🤖 助手响应:\n{response}")
+                print(f"[AI] 助手响应:\n{response}")

                # 使用统一的响应处理方法
                process_result = self._process_response(response)

                # 根据处理结果决定是否继续（仅在非强制模式下）
+                if process_result.get("action") == "invalid_response":
+                    consecutive_failures += 1
+                    print(f"[WARN] 连续失败次数: {consecutive_failures}/3")
+                    if consecutive_failures >= 3:
+                        print(f"[ERROR] 连续3次无法获取有效响应，分析终止。请检查网络或配置。")
+                        break
+                else:
+                    consecutive_failures = 0  # 重置计数器
+
                if not self.force_max_rounds and not process_result.get(
                    "continue", True
                ):
-                    print(f"\n✅ 分析完成！")
+                    print(f"\n[OK] 分析完成！")
                    break

                # 添加到对话历史
@@ -342,7 +406,7 @@ class DataAnalysisAgent:
                    
                    feedback = f"已收集 {len(collected_figures)} 个有效图片及其分析。"
                    if missing_figures:
-                        feedback += f"\n⚠️ 以下图片未找到，请检查代码是否成功保存了这些图片: {missing_figures}"
+                        feedback += f"\n[WARN] 以下图片未找到，请检查代码是否成功保存了这些图片: {missing_figures}"

                    self.conversation_history.append(
                        {
@@ -365,7 +429,7 @@ class DataAnalysisAgent:

            except Exception as e:
                error_msg = f"LLM调用错误: {str(e)}"
-                print(f"❌ {error_msg}")
+                print(f"[ERROR] {error_msg}")
                self.conversation_history.append(
                    {
                        "role": "user",
@@ -374,7 +438,7 @@ class DataAnalysisAgent:
                )
        # 生成最终总结
        if self.current_round >= self.max_rounds:
-            print(f"\n⚠️ 已达到最大轮数 ({self.max_rounds})，分析结束")
+            print(f"\n[WARN] 已达到最大轮数 ({self.max_rounds})，分析结束")

        return self._generate_final_report()

@@ -400,10 +464,39 @@ class DataAnalysisAgent:
            if result.get("action") == "collect_figures":
                all_figures.extend(result.get("collected_figures", []))

-        print(f"\n📊 开始生成最终分析报告...")
-        print(f"📂 输出目录: {self.session_output_dir}")
-        print(f"🔢 总轮数: {self.current_round}")
-        print(f"📈 收集图片: {len(all_figures)} 个")
+        print(f"\n[CHART] 开始生成最终分析报告...")
+        print(f"[DIR] 输出目录: {self.session_output_dir}")
+        
+        # --- 自动补全/发现图片机制 ---
+        # 扫描目录下所有的png文件
+        try:
+            import glob
+            existing_pngs = glob.glob(os.path.join(self.session_output_dir, "*.png"))
+            
+            # 获取已收集的图片路径集合
+            collected_paths = set()
+            for fig in all_figures:
+                if fig.get("file_path"):
+                    collected_paths.add(os.path.abspath(fig.get("file_path")))
+            
+            # 检查是否有漏网之鱼
+            for png_path in existing_pngs:
+                abs_png_path = os.path.abspath(png_path)
+                if abs_png_path not in collected_paths:
+                    print(f"[SEARCH] [自动发现] 补充未显式收集的图片: {os.path.basename(png_path)}")
+                    all_figures.append({
+                        "figure_number": "Auto",
+                        "filename": os.path.basename(png_path),
+                        "file_path": abs_png_path,
+                        "description": f"自动发现的分析图表: {os.path.basename(png_path)}",
+                        "analysis": "（该图表由系统自动捕获，Agent未提供具体分析文本，请结合图表标题理解）"
+                    })
+        except Exception as e:
+            print(f"[WARN] 自动发现图片失败: {e}")
+        # ---------------------------
+
+        print(f"[NUM] 总轮数: {self.current_round}")
+        print(f"[GRAPH] 收集图片: {len(all_figures)} 个")

        # 构建用于生成最终报告的提示词
        final_report_prompt = self._build_final_report_prompt(all_figures)
@@ -415,33 +508,24 @@ class DataAnalysisAgent:
                max_tokens=16384,  # 设置较大的token限制以容纳完整报告
            )

-            # 解析响应，提取最终报告
-            try:
-                # 尝试解析YAML
-                yaml_data = self.llm.parse_yaml_response(response)
-                
-                # 情况1: 标准YAML格式，包含 action: analysis_complete
-                if yaml_data.get("action") == "analysis_complete":
-                     final_report_content = yaml_data.get("final_report", response)
-                
-                # 情况2: 解析成功但没字段，或者解析失败
-                else:
-                    # 如果内容看起来像Markdown报告（包含标题），直接使用
-                    if "# " in response or "## " in response:
-                        print("⚠️ 未检测到标准YAML动作，但内容疑似Markdown报告，直接采纳")
-                        final_report_content = response
-                    else:
-                        final_report_content = "LLM未返回有效报告内容"
+            # 直接使用LLM响应作为最终报告（因为我们在prompt中要求直接输出Markdown）
+            final_report_content = response
+            
+            # 兼容旧逻辑：如果意外返回了YAML，尝试解析
+            if response.strip().startswith("action:") or "final_report:" in response:
+                try:
+                    yaml_data = self.llm.parse_yaml_response(response)
+                    if yaml_data.get("action") == "analysis_complete":
+                         final_report_content = yaml_data.get("final_report", response)
+                except:
+                    pass # 解析失败则保持原样

-            except Exception as e:
-                # 解析完全失败，直接使用原始响应
-                print(f"⚠️ YAML解析失败 ({e})，直接使用原始响应作为报告")
-                final_report_content = response
+            print("[OK] 最终报告生成完成")

-            print("✅ 最终报告生成完成")
+            print("[OK] 最终报告生成完成")

        except Exception as e:
-            print(f"❌ 生成最终报告时出错: {str(e)}")
+            print(f"[ERROR] 生成最终报告时出错: {str(e)}")
            final_report_content = f"报告生成失败: {str(e)}"

        # 保存最终报告到文件
@@ -449,9 +533,21 @@ class DataAnalysisAgent:
        try:
            with open(report_file_path, "w", encoding="utf-8") as f:
                f.write(final_report_content)
-            print(f"📄 最终报告已保存至: {report_file_path}")
+            print(f"[DOC] 最终报告已保存至: {report_file_path}")
        except Exception as e:
-            print(f"❌ 保存报告文件失败: {str(e)}")
+            print(f"[ERROR] 保存报告文件失败: {str(e)}")
+
+        # 生成可复用脚本
+        script_path = ""
+        try:
+            script_path = generate_reusable_script(
+                analysis_results=self.analysis_results,
+                data_files=self.data_files,
+                session_output_dir=self.session_output_dir,
+                user_requirement=self.user_requirement
+            )
+        except Exception as e:
+            print(f"[WARN] 脚本生成失败: {e}")

        # 返回完整的分析结果
        return {
@@ -462,6 +558,7 @@ class DataAnalysisAgent:
            "conversation_history": self.conversation_history,
            "final_report": final_report_content,
            "report_file_path": report_file_path,
+            "reusable_script_path": script_path,
        }

    def _build_final_report_prompt(self, all_figures: List[Dict[str, Any]]) -> str:
@@ -508,7 +605,7 @@ class DataAnalysisAgent:
        # 在提示词中明确要求使用相对路径
        prompt += """

-📁 **图片路径使用说明**：
+[FOLDER] **图片路径使用说明**：
 报告和图片都在同一目录下，请在报告中使用相对路径引用图片：
 - 格式：![图片描述](./图片文件名.png)
 - 示例：![营业总收入趋势](./营业总收入趋势.png)
--- a/data_preprocessing/README.md
+++ b/data_preprocessing/README.md
@@ -0,0 +1,89 @@
+# 数据预处理模块
+
+独立的数据清洗工具，用于在正式分析前准备数据。
+
+## 功能
+
+- **数据合并**：将多个 Excel/CSV 文件合并为单一 CSV
+- **时间排序**：按时间列对数据进行排序
+- **目录管理**：标准化的原始数据和输出数据目录
+
+## 目录结构
+
+```
+project/
+├── raw_data/           # 原始数据存放目录
+│   ├── remotecontrol/  # 按数据来源分类
+│   └── ...
+├── cleaned_data/       # 清洗后数据输出目录
+│   ├── xxx_merged.csv
+│   └── xxx_sorted.csv
+└── data_preprocessing/ # 本模块
+```
+
+## 使用方法
+
+### 命令行
+
+```bash
+# 初始化目录结构
+python -m data_preprocessing.cli init
+
+# 合并 Excel 文件
+python -m data_preprocessing.cli merge --source raw_data/remotecontrol
+
+# 合并并按时间排序
+python -m data_preprocessing.cli merge --source raw_data/remotecontrol --sort-by SendTime
+
+# 指定输出路径
+python -m data_preprocessing.cli merge -s raw_data/remotecontrol -o cleaned_data/my_output.csv
+
+# 排序已有 CSV
+python -m data_preprocessing.cli sort --input some_file.csv --time-col SendTime
+
+# 原地排序（覆盖原文件）
+python -m data_preprocessing.cli sort --input data.csv --inplace
+```
+
+### Python API
+
+```python
+from data_preprocessing import merge_files, sort_by_time, Config
+
+# 合并文件
+output_path = merge_files(
+    source_dir="raw_data/remotecontrol",
+    output_file="cleaned_data/merged.csv",
+    pattern="*.xlsx",
+    time_column="SendTime"  # 可选：合并后排序
+)
+
+# 排序 CSV
+sorted_path = sort_by_time(
+    input_path="data.csv",
+    output_path="sorted_data.csv",
+    time_column="CreateTime"
+)
+
+# 自定义配置
+config = Config()
+config.raw_data_dir = "/path/to/raw"
+config.cleaned_data_dir = "/path/to/cleaned"
+config.ensure_dirs()
+```
+
+## 配置项
+
+| 配置项 | 默认值 | 说明 |
+|--------|--------|------|
+| `raw_data_dir` | `raw_data/` | 原始数据目录 |
+| `cleaned_data_dir` | `cleaned_data/` | 清洗输出目录 |
+| `default_time_column` | `SendTime` | 默认时间列名 |
+| `csv_encoding` | `utf-8-sig` | CSV 编码格式 |
+
+## 注意事项
+
+1. 本模块与 `DataAnalysisAgent` 完全独立，不会相互调用
+2. 合并时会自动添加 `_source_file` 列标记数据来源（可用 `--no-source-col` 禁用）
+3. Excel 文件会自动合并所有 Sheet
+4. 无效时间值在排序时会被放到最后
--- a/data_preprocessing/init.py
+++ b/data_preprocessing/init.py
@@ -0,0 +1,14 @@
+# -*- coding: utf-8 -*-
+"""
+数据预处理模块
+
+提供独立的数据清洗功能：
+- 按时间排序
+- 同类数据合并
+"""
+
+from .sorter import sort_by_time
+from .merger import merge_files
+from .config import Config
+
+__all__ = ["sort_by_time", "merge_files", "Config"]
--- a/data_preprocessing/cli.py
+++ b/data_preprocessing/cli.py
@@ -0,0 +1,140 @@
+# -*- coding: utf-8 -*-
+"""
+数据预处理命令行接口
+
+使用示例:
+    # 合并 Excel 文件
+    python -m data_preprocessing.cli merge --source raw_data/remotecontrol --output cleaned_data/merged.csv
+    
+    # 合并并排序
+    python -m data_preprocessing.cli merge --source raw_data/remotecontrol --sort-by SendTime
+    
+    # 排序已有 CSV
+    python -m data_preprocessing.cli sort --input data.csv --output sorted.csv --time-col SendTime
+    
+    # 初始化目录结构
+    python -m data_preprocessing.cli init
+"""
+
+import argparse
+import sys
+from .config import default_config
+from .sorter import sort_by_time
+from .merger import merge_files
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        prog="data_preprocessing",
+        description="数据预处理工具：排序、合并",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+示例:
+  %(prog)s merge --source raw_data/remotecontrol --sort-by SendTime
+  %(prog)s sort --input data.csv --time-col CreateTime
+  %(prog)s init
+        """
+    )
+    
+    subparsers = parser.add_subparsers(dest="command", help="可用命令")
+    
+    # ========== merge 命令 ==========
+    merge_parser = subparsers.add_parser("merge", help="合并同类文件")
+    merge_parser.add_argument(
+        "--source", "-s",
+        required=True,
+        help="源数据目录路径"
+    )
+    merge_parser.add_argument(
+        "--output", "-o",
+        default=None,
+        help="输出文件路径 (默认: cleaned_data/<目录名>_merged.csv)"
+    )
+    merge_parser.add_argument(
+        "--pattern", "-p",
+        default="*.xlsx",
+        help="文件匹配模式 (默认: *.xlsx)"
+    )
+    merge_parser.add_argument(
+        "--sort-by",
+        default=None,
+        dest="time_column",
+        help="合并后按此时间列排序"
+    )
+    merge_parser.add_argument(
+        "--no-source-col",
+        action="store_true",
+        help="不添加来源文件列"
+    )
+    
+    # ========== sort 命令 ==========
+    sort_parser = subparsers.add_parser("sort", help="按时间排序 CSV")
+    sort_parser.add_argument(
+        "--input", "-i",
+        required=True,
+        help="输入 CSV 文件路径"
+    )
+    sort_parser.add_argument(
+        "--output", "-o",
+        default=None,
+        help="输出文件路径 (默认: cleaned_data/<文件名>_sorted.csv)"
+    )
+    sort_parser.add_argument(
+        "--time-col", "-t",
+        default=None,
+        dest="time_column",
+        help=f"时间列名 (默认: {default_config.default_time_column})"
+    )
+    sort_parser.add_argument(
+        "--inplace",
+        action="store_true",
+        help="原地覆盖输入文件"
+    )
+    
+    # ========== init 命令 ==========
+    init_parser = subparsers.add_parser("init", help="初始化目录结构")
+    
+    # 解析参数
+    args = parser.parse_args()
+    
+    if args.command is None:
+        parser.print_help()
+        sys.exit(0)
+    
+    try:
+        if args.command == "merge":
+            result = merge_files(
+                source_dir=args.source,
+                output_file=args.output,
+                pattern=args.pattern,
+                time_column=args.time_column,
+                add_source_column=not args.no_source_col
+            )
+            print(f"\n✅ 合并成功: {result}")
+            
+        elif args.command == "sort":
+            result = sort_by_time(
+                input_path=args.input,
+                output_path=args.output,
+                time_column=args.time_column,
+                inplace=args.inplace
+            )
+            print(f"\n✅ 排序成功: {result}")
+            
+        elif args.command == "init":
+            default_config.ensure_dirs()
+            print("\n✅ 目录初始化完成")
+            
+    except FileNotFoundError as e:
+        print(f"\n❌ 错误: {e}")
+        sys.exit(1)
+    except KeyError as e:
+        print(f"\n❌ 错误: {e}")
+        sys.exit(1)
+    except Exception as e:
+        print(f"\n❌ 未知错误: {e}")
+        sys.exit(1)
+
+
+if __name__ == "__main__":
+    main()
--- a/data_preprocessing/config.py
+++ b/data_preprocessing/config.py
@@ -0,0 +1,42 @@
+# -*- coding: utf-8 -*-
+"""
+数据预处理模块配置
+"""
+
+import os
+from dataclasses import dataclass
+
+# 获取项目根目录
+PROJECT_ROOT = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
+
+
+@dataclass
+class Config:
+    """预处理模块配置"""
+    
+    # 原始数据存放目录
+    raw_data_dir: str = os.path.join(PROJECT_ROOT, "raw_data")
+    
+    # 清洗后数据输出目录
+    cleaned_data_dir: str = os.path.join(PROJECT_ROOT, "cleaned_data")
+    
+    # 默认时间列名
+    default_time_column: str = "SendTime"
+    
+    # 支持的文件扩展名
+    supported_extensions: tuple = (".csv", ".xlsx", ".xls")
+    
+    # CSV 编码
+    csv_encoding: str = "utf-8-sig"
+    
+    def ensure_dirs(self):
+        """确保目录存在"""
+        os.makedirs(self.raw_data_dir, exist_ok=True)
+        os.makedirs(self.cleaned_data_dir, exist_ok=True)
+        print(f"[OK] 目录已就绪:")
+        print(f"    原始数据: {self.raw_data_dir}")
+        print(f"    清洗输出: {self.cleaned_data_dir}")
+
+
+# 默认配置实例
+default_config = Config()
--- a/data_preprocessing/merge_excel.py
+++ b/data_preprocessing/merge_excel.py
@@ -0,0 +1,83 @@
+
+import pandas as pd
+import glob
+import os
+
+def merge_excel_files(source_dir="remotecontrol", output_file="merged_all_files.csv"):
+    """
+    将指定目录下的所有 Excel 文件 (.xlsx, .xls) 合并为一个 CSV 文件。
+    """
+    print(f"[SEARCH] 正在扫描目录: {source_dir} ...")
+    
+    # 支持 xlsx 和 xls
+    files_xlsx = glob.glob(os.path.join(source_dir, "*.xlsx"))
+    files_xls = glob.glob(os.path.join(source_dir, "*.xls"))
+    files = files_xlsx + files_xls
+    
+    if not files:
+        print("[WARN] 未找到 Excel 文件。")
+        return
+
+    # 按文件名中的数字进行排序 (例如: 1.xlsx, 2.xlsx, ..., 10.xlsx)
+    try:
+        files.sort(key=lambda x: int(os.path.basename(x).split('.')[0]))
+        print("[NUM] 已按文件名数字顺序排序")
+    except ValueError:
+        # 如果文件名不是纯数字，退回到字母排序
+        files.sort()
+        print("[TEXT] 已按文件名包含非数字字符，使用字母顺序排序")
+
+    print(f"[DIR] 找到 {len(files)} 个文件: {files}")
+    
+    all_dfs = []
+    for file in files:
+        try:
+            print(f"[READ] 读取: {file}")
+            # 使用 ExcelFile 读取所有 sheet
+            xls = pd.ExcelFile(file)
+            print(f"   [PAGES] 包含 Sheets: {xls.sheet_names}")
+            
+            file_dfs = []
+            for sheet_name in xls.sheet_names:
+                df = pd.read_excel(xls, sheet_name=sheet_name)
+                if not df.empty:
+                    print(f"   [OK] Sheet '{sheet_name}' 读取成功: {len(df)} 行")
+                    file_dfs.append(df)
+                else:
+                    print(f"   [WARN] Sheet '{sheet_name}' 为空，跳过")
+            
+            if file_dfs:
+                # 合并该文件的所有非空 sheet
+                file_merged_df = pd.concat(file_dfs, ignore_index=True)
+                # 可选：添加一列标记来源文件
+                file_merged_df['Source_File'] = os.path.basename(file)
+                all_dfs.append(file_merged_df)
+            else:
+                 print(f"[WARN] 文件 {file} 所有 Sheet 均为空")
+
+        except Exception as e:
+            print(f"[ERROR] 读取 {file} 失败: {e}")
+
+    if all_dfs:
+        print("[LOOP] 正在合并数据...")
+        merged_df = pd.concat(all_dfs, ignore_index=True)
+        
+        # 按 SendTime 排序
+        if 'SendTime' in merged_df.columns:
+            print("[TIMER] 正在按 SendTime 排序...")
+            merged_df['SendTime'] = pd.to_datetime(merged_df['SendTime'], errors='coerce')
+            merged_df = merged_df.sort_values(by='SendTime')
+        else:
+            print("[WARN] 未找到 SendTime 列，跳过排序")
+
+        print(f"[CACHE] 保存到: {output_file}")
+        merged_df.to_csv(output_file, index=False, encoding="utf-8-sig")
+        
+        print(f"[OK] 合并及排序完成！总行数: {len(merged_df)}")
+        print(f"   输出文件: {os.path.abspath(output_file)}")
+    else:
+        print("[WARN] 没有成功读取到任何数据。")
+
+if __name__ == "__main__":
+    # 如果需要在当前目录运行并合并 remotecontrol 文件夹下的内容
+    merge_excel_files(source_dir="remotecontrol", output_file="remotecontrol_merged.csv")
--- a/data_preprocessing/merger.py
+++ b/data_preprocessing/merger.py
@@ -0,0 +1,148 @@
+# -*- coding: utf-8 -*-
+"""
+数据合并模块
+
+合并同类 Excel/CSV 文件
+"""
+
+import os
+import glob
+import pandas as pd
+from typing import Optional, List
+from .config import default_config
+
+
+def merge_files(
+    source_dir: str,
+    output_file: Optional[str] = None,
+    pattern: str = "*.xlsx",
+    time_column: Optional[str] = None,
+    add_source_column: bool = True
+) -> str:
+    """
+    合并目录下的所有同类文件
+    
+    Args:
+        source_dir: 源数据目录
+        output_file: 输出 CSV 文件路径。如果为 None，则输出到 cleaned_data 目录
+        pattern: 文件匹配模式 (e.g., "*.xlsx", "*.csv", "*.xls")
+        time_column: 可选，合并后按此列排序
+        add_source_column: 是否添加来源文件列
+        
+    Returns:
+        输出文件的绝对路径
+        
+    Raises:
+        FileNotFoundError: 目录不存在或未找到匹配文件
+    """
+    if not os.path.isdir(source_dir):
+        raise FileNotFoundError(f"目录不存在: {source_dir}")
+    
+    print(f"[SCAN] 正在扫描目录: {source_dir}")
+    print(f"    匹配模式: {pattern}")
+    
+    # 查找匹配文件
+    files = glob.glob(os.path.join(source_dir, pattern))
+    
+    # 如果是 xlsx，也尝试匹配 xls
+    if pattern == "*.xlsx":
+        files.extend(glob.glob(os.path.join(source_dir, "*.xls")))
+    
+    if not files:
+        raise FileNotFoundError(f"未找到匹配 '{pattern}' 的文件")
+    
+    # 排序文件列表
+    files = _sort_files(files)
+    print(f"[FOUND] 找到 {len(files)} 个文件")
+    
+    # 确定输出路径
+    if output_file is None:
+        default_config.ensure_dirs()
+        dir_name = os.path.basename(os.path.normpath(source_dir))
+        output_file = os.path.join(
+            default_config.cleaned_data_dir,
+            f"{dir_name}_merged.csv"
+        )
+    
+    # 合并数据
+    all_dfs = []
+    for file in files:
+        try:
+            df = _read_file(file)
+            if df is not None and not df.empty:
+                if add_source_column:
+                    df['_source_file'] = os.path.basename(file)
+                all_dfs.append(df)
+        except Exception as e:
+            print(f"[ERROR] 读取失败 {file}: {e}")
+    
+    if not all_dfs:
+        raise ValueError("没有成功读取到任何数据")
+    
+    print(f"[MERGE] 正在合并 {len(all_dfs)} 个数据源...")
+    merged_df = pd.concat(all_dfs, ignore_index=True)
+    print(f"    合并后总行数: {len(merged_df)}")
+    
+    # 可选：按时间排序
+    if time_column and time_column in merged_df.columns:
+        print(f"[SORT] 正在按 '{time_column}' 排序...")
+        merged_df[time_column] = pd.to_datetime(merged_df[time_column], errors='coerce')
+        merged_df = merged_df.sort_values(by=time_column, na_position='last')
+    elif time_column:
+        print(f"[WARN] 未找到时间列 '{time_column}'，跳过排序")
+    
+    # 保存结果
+    print(f"[SAVE] 正在保存: {output_file}")
+    merged_df.to_csv(output_file, index=False, encoding=default_config.csv_encoding)
+    
+    abs_output = os.path.abspath(output_file)
+    print(f"[OK] 合并完成！")
+    print(f"    输出文件: {abs_output}")
+    print(f"    总行数: {len(merged_df)}")
+    
+    return abs_output
+
+
+def _sort_files(files: List[str]) -> List[str]:
+    """对文件列表进行智能排序"""
+    try:
+        # 尝试按文件名中的数字排序
+        files.sort(key=lambda x: int(os.path.basename(x).split('.')[0]))
+        print("[SORT] 已按文件名数字顺序排序")
+    except ValueError:
+        # 退回到字母排序
+        files.sort()
+        print("[SORT] 已按文件名字母顺序排序")
+    return files
+
+
+def _read_file(file_path: str) -> Optional[pd.DataFrame]:
+    """读取单个文件（支持 CSV 和 Excel）"""
+    ext = os.path.splitext(file_path)[1].lower()
+    
+    print(f"[READ] 读取: {os.path.basename(file_path)}")
+    
+    if ext == '.csv':
+        df = pd.read_csv(file_path, low_memory=False)
+        print(f"    行数: {len(df)}")
+        return df
+    
+    elif ext in ('.xlsx', '.xls'):
+        # 读取 Excel 所有 sheet 并合并
+        xls = pd.ExcelFile(file_path)
+        print(f"    Sheets: {xls.sheet_names}")
+        
+        sheet_dfs = []
+        for sheet_name in xls.sheet_names:
+            df = pd.read_excel(xls, sheet_name=sheet_name)
+            if not df.empty:
+                print(f"    - Sheet '{sheet_name}': {len(df)} 行")
+                sheet_dfs.append(df)
+        
+        if sheet_dfs:
+            return pd.concat(sheet_dfs, ignore_index=True)
+        return None
+    
+    else:
+        print(f"[WARN] 不支持的文件格式: {ext}")
+        return None
--- a/data_preprocessing/sort_csv.py
+++ b/data_preprocessing/sort_csv.py
@@ -0,0 +1,45 @@
+
+import pandas as pd
+import os
+
+def sort_csv_by_time(file_path="remotecontrol_merged.csv", time_col="SendTime"):
+    """
+    读取 CSV 文件，按时间列排序，并保存。
+    """
+    if not os.path.exists(file_path):
+        print(f"[ERROR] 文件不存在: {file_path}")
+        return
+
+    print(f"[READ] 正在读取 {file_path} ...")
+    try:
+        # 读取 CSV
+        df = pd.read_csv(file_path, low_memory=False)
+        print(f"   [CHART] 数据行数: {len(df)}")
+        
+        if time_col not in df.columns:
+            print(f"[ERROR] 未找到时间列: {time_col}")
+            print(f"   可用列: {list(df.columns)}")
+            return
+
+        print(f"[LOOP] 正在解析时间列 '{time_col}' ...")
+        # 转换为 datetime 对象，无法解析的设为 NaT
+        df[time_col] = pd.to_datetime(df[time_col], errors='coerce')
+        
+        # 检查无效时间
+        nat_count = df[time_col].isna().sum()
+        if nat_count > 0:
+            print(f"[WARN] 发现 {nat_count} 行无效时间数据，排序时将排在最后")
+
+        print("[LOOP] 正在按时间排序...")
+        df_sorted = df.sort_values(by=time_col)
+        
+        print(f"[CACHE] 正在保存及覆盖文件: {file_path} ...")
+        df_sorted.to_csv(file_path, index=False, encoding="utf-8-sig")
+        
+        print("[OK] 排序并保存完成！")
+        
+    except Exception as e:
+        print(f"[ERROR]处理失败: {e}")
+
+if __name__ == "__main__":
+    sort_csv_by_time()
--- a/data_preprocessing/sorter.py
+++ b/data_preprocessing/sorter.py
@@ -0,0 +1,82 @@
+# -*- coding: utf-8 -*-
+"""
+数据排序模块
+
+按时间列对 CSV 文件进行排序
+"""
+
+import os
+import pandas as pd
+from typing import Optional
+from .config import default_config
+
+
+def sort_by_time(
+    input_path: str,
+    output_path: Optional[str] = None,
+    time_column: str = None,
+    inplace: bool = False
+) -> str:
+    """
+    按时间列对 CSV 文件排序
+    
+    Args:
+        input_path: 输入 CSV 文件路径
+        output_path: 输出路径。如果为 None 且 inplace=False，则输出到 cleaned_data 目录
+        time_column: 时间列名，默认使用配置中的 default_time_column
+        inplace: 是否原地覆盖输入文件
+        
+    Returns:
+        输出文件的绝对路径
+        
+    Raises:
+        FileNotFoundError: 输入文件不存在
+        KeyError: 时间列不存在
+    """
+    # 参数处理
+    time_column = time_column or default_config.default_time_column
+    
+    if not os.path.exists(input_path):
+        raise FileNotFoundError(f"文件不存在: {input_path}")
+    
+    # 确定输出路径
+    if inplace:
+        output_path = input_path
+    elif output_path is None:
+        default_config.ensure_dirs()
+        basename = os.path.basename(input_path)
+        name, ext = os.path.splitext(basename)
+        output_path = os.path.join(
+            default_config.cleaned_data_dir, 
+            f"{name}_sorted{ext}"
+        )
+    
+    print(f"[READ] 正在读取: {input_path}")
+    df = pd.read_csv(input_path, low_memory=False)
+    print(f"    数据行数: {len(df)}")
+    
+    # 检查时间列是否存在
+    if time_column not in df.columns:
+        available_cols = list(df.columns)
+        raise KeyError(
+            f"未找到时间列 '{time_column}'。可用列: {available_cols}"
+        )
+    
+    print(f"[PARSE] 正在解析时间列 '{time_column}'...")
+    df[time_column] = pd.to_datetime(df[time_column], errors='coerce')
+    
+    # 统计无效时间
+    nat_count = df[time_column].isna().sum()
+    if nat_count > 0:
+        print(f"[WARN] 发现 {nat_count} 行无效时间数据，排序时将排在最后")
+    
+    print("[SORT] 正在按时间排序...")
+    df_sorted = df.sort_values(by=time_column, na_position='last')
+    
+    print(f"[SAVE] 正在保存: {output_path}")
+    df_sorted.to_csv(output_path, index=False, encoding=default_config.csv_encoding)
+    
+    abs_output = os.path.abspath(output_path)
+    print(f"[OK] 排序完成！输出文件: {abs_output}")
+    
+    return abs_output
--- a/main.py
+++ b/main.py
@@ -17,7 +17,7 @@ class DualLogger:
    def write(self, message):
        self.terminal.write(message)
        # 过滤掉生成的代码块，不写入日志文件
-        if "🔧 执行代码:" in message:
+        if "[TOOL] 执行代码:" in message:
            return
        self.log.write(message)
        self.log.flush()
@@ -34,16 +34,34 @@ def setup_logging(log_dir):
    # 可选：也将错误输出重定向
    # sys.stderr = logger 
    print(f"\n{'='*20} Run Started at {datetime.now().strftime('%Y-%m-%d %H:%M:%S')} {'='*20}\n")
-    print(f"📄 日志文件已保存至: {os.path.join(log_dir, 'log.txt')}")
+    print(f"[DOC] 日志文件已保存至: {os.path.join(log_dir, 'log.txt')}")


 def main():
    llm_config = LLMConfig()
-    files = ["./UB IOV Support_TR.csv"]
+    import glob
+    import os
+    # 自动查找当前目录及remotecontrol目录下的所有数据文件
+    data_extensions = ['*.csv', '*.xlsx', '*.xls']
+    search_dirs = ['cleaned_data']
+    files = []
+    
+    for search_dir in search_dirs:
+        for ext in data_extensions:
+            pattern = os.path.join(search_dir, ext)
+            files.extend(glob.glob(pattern))
+    
+    if not files:
+        print("[WARN] 未在当前目录找到数据文件 (.csv, .xlsx)，尝试使用默认文件")
+        files = ["./cleaned_data.csv"]
+    else:
+        print(f"[DIR] 自动识别到以下数据文件: {files}")
+
    analysis_requirement = """
 基于所有运维工单，整理一份工单健康度报告，包括但不限于对所有车联网技术支持工单的全面数据分析，
-深入挖掘工单处理过程中的关键问题、效率瓶颈及改进机会。涵盖工单状态、问题类型、模块分布、严重程度、责任人负载、车型分布、来源渠道及处理时长等多个维度。
-通过多轮交叉分析与趋势洞察，为提升车联网服务质量、优化资源配置及降低运营风险提供数据驱动的决策依据，问题总揽，高频问题、重点问题分析，输出若干个重要的统计指标，并绘制相关图表；结合图表，总结一份，车联网运维工单健康度报告，汇报给我。
+深入挖掘工单处理过程中的关键问题、效率瓶颈及改进机会。请从车型，模块，功能角度，分别展示工单数据、问题类型、模块分布、严重程度、责任人负载、车型分布、来源渠道及处理时长等多个维度。
+通过多轮交叉分析与趋势洞察，为提升车联网服务质量、优化资源配置及降低运营风险提供数据驱动的决策依据，问题总揽，高频问题、重点问题分析，输出若干个重要的统计指标，并绘制相关图表；
+结合图表，总结一份，车联网运维工单健康度报告，汇报给我。
    """
    
    # 在主函数中先创建会话目录，以便存放日志
@@ -57,12 +75,33 @@ def main():
    # 如果希望强制运行到最大轮数，设置 force_max_rounds=True
    agent = DataAnalysisAgent(llm_config, force_max_rounds=False)

-    report = agent.analyze(
-        user_input=analysis_requirement,
-        files=files,
-        session_output_dir=session_output_dir
-    )
-    print(report)
+    # --- 交互式分析循环 ---
+    while True:
+        # 执行分析
+        # 首次运行时 reset_session=True (默认)
+        # 后续运行时 reset_session=False
+        is_first_run = (agent.current_round == 0 and not agent.conversation_history)
+        
+        report = agent.analyze(
+            user_input=analysis_requirement,
+            files=files if is_first_run else None, # 后续轮次不需要重复传文件路径，agent已有上下文
+            session_output_dir=session_output_dir,
+            reset_session=is_first_run,
+            max_rounds=None if is_first_run else 10 # 追问时限制为10轮
+        )
+        print("\n" + "="*30 + " 当前阶段分析完成 " + "="*30)
+        
+        # 询问用户是否继续
+        print("\n[TIP] 你可以继续对数据提出分析需求，或者输入 'exit'/'quit' 结束程序。")
+        user_response = input("[>] 请输入后续分析需求 (直接回车退出): ").strip()
+        
+        if not user_response or user_response.lower() in ['exit', 'quit', 'n', 'no']:
+            print("[BYE] 分析结束，再见！")
+            break
+        
+        # 更新需求，进入下一轮循环
+        analysis_requirement = user_response
+        print(f"\n[LOOP] 收到新需求，正在继续分析...")
 

 if __name__ == "__main__":
--- a/prompts.py
+++ b/prompts.py
@@ -1,347 +1,369 @@
-data_analysis_system_prompt = """你是一个专业的数据分析助手，运行在Jupyter Notebook环境中，能够根据用户需求生成和执行Python数据分析代码。
-
-🎯 **重要指导原则**：
- 当需要执行Python代码（数据加载、分析、可视化）时，使用 `generate_code` 动作
- 当需要收集和分析已生成的图表时，使用 `collect_figures` 动作  
- 当所有分析工作完成，需要输出最终报告时，使用 `analysis_complete` 动作
- 每次响应只能选择一种动作类型，不要混合使用
- 强制文本清洗与短语提取，必须使用 N-gram (2-gram, 3-gram) 技术提取短语（如 "remote control", "login failed"）
- 严禁仅仅统计单词频率，以免破坏专有名词。
- 必须构建`stop_words`列表，剔除年份(2025)、通用动词(work, fix)、介词等无意义高频词。
- 主动高级分析：不仅是画图，必须根据数据特征主动选择算法（时间序列->预测；分类数据->特征重要性；多维数据->聚类）。
-
-目前jupyter notebook环境下有以下变量：
-{notebook_variables}
-✨ 核心能力：
-1. 接收用户的自然语言分析需求
-2. 按步骤生成安全的Python分析代码
-3. 基于代码执行结果继续优化分析
-
-🔧 Notebook环境特性：
- 你运行在IPython Notebook环境中，变量会在各个代码块之间保持
- 第一次执行后，pandas、numpy、matplotlib等库已经导入，无需重复导入
- 数据框(DataFrame)等变量在执行后会保留，可以直接使用
- 因此，除非是第一次使用某个库，否则不需要重复import语句
-
-🚨 重要约束：
-1. 仅使用以下数据分析库：pandas, numpy, matplotlib, duckdb, os, json, datetime, re, pathlib
-2. 图片必须保存到指定的会话目录中，输出绝对路径，禁止使用plt.show()，饼图的标签全部放在图例里面，用颜色区分。
-3. 表格输出控制：超过15行只显示前5行和后5行
-4. 所有生成的图片必须保存，保存路径格式：os.path.join(session_output_dir, '图片名称.png')
-5. 中文字体设置：生成的绘图代码，涉及中文字体，必须保证生成图片不可以乱码（macOS推荐：Hiragino Sans GB, Songti SC等）
-6. 输出格式严格使用YAML
-
-📁 输出目录管理：
- 本次分析使用UUID生成的专用目录（16进制格式），确保每次分析的输出文件隔离
- 会话目录格式：session_[32位16进制UUID]，如 session_a1b2c3d4e5f6789012345678901234ab
- 图片保存路径格式：os.path.join(session_output_dir, '图片名称.png')
- 使用有意义的中文文件名：如'营业收入趋势.png', '利润分析对比.png'
- 所有生成的图片必须执行处理图片收集动作并保存，保存路径格式：os.path.join(session_output_dir, '图片名称.png')
- 输出绝对路径：使用os.path.abspath()获取图片的完整路径
-
-📊 数据分析工作流程（必须严格按顺序执行）：
-
-**阶段1：数据探索（使用 generate_code 动作）**
- 首次数据加载时尝试多种编码：['utf-8', 'gbk', 'gb18030', 'gb2312', 'latin1']
- 特殊处理：如果读取失败，尝试指定分隔符 `sep=','` 和错误处理 `on_bad_lines='skip'` (pandas 2.0+标准)
- 使用df.head()查看前几行数据，检查数据是否正确读取
- 使用df.info()了解数据类型和缺失值情况
- 重点检查：如果数值列显示为NaN但应该有值，说明读取或解析有问题
- 使用df.dtypes查看每列的数据类型，确保日期列不是float64
- 打印所有列名：df.columns.tolist()
- 绝对不要假设列名，必须先查看实际的列名
-
-**阶段2：数据清洗和检查（使用 generate_code 动作）**
- 日期列识别：查找包含'date', 'time', 'Date', 'Time'关键词的列
- 日期解析：尝试多种格式 ['%d/%m/%Y', '%Y-%m-%d', '%m/%d/%Y', '%Y/%m/%d', '%d-%m-%Y']
- 类型转换：使用pd.to_datetime()转换日期列，指定format参数和errors='coerce'
- 空值处理：检查哪些列应该有值但显示NaN，可能是数据读取问题
- 检查数据的时间范围和排序
- 数据质量检查：确认数值列是否正确，字符串列是否被错误识别
-
-
-**阶段3：数据分析和可视化（使用 generate_code 动作）**
- 基于实际的列名进行计算
- 生成有意义的图表
- 图片保存到会话专用目录中
- 每生成一个图表后，必须打印绝对路径
- 不要试图一次性生成所有图表。你应该将任务拆分为多个小的代码块，分批次执行。
- 每一轮只专注于生成 1-2 个复杂的图表或 2-3 个简单的图表，确保代码正确且图片保存成功。
- 只有在前一轮代码成功执行并保存图片后，再进行下一轮。
- 必做分析
-    1.  **超长工单问题类型分布**（从处理时长分布中筛选）
-    2.  **车型-问题热力图**（发现特定车型的高频故障）
-    3.  **车型分布**（整体工单在不同车型的占比）
-    5.  **处理时长箱线图**（按问题类型或责任人分组，识别异常点）
-    6.  **高频关键词词云**（基于Text Cleaning和N-gram结果）
-    8.  **工单状态分布**
-    9.  **模块分布**
-    11. **问题类型分布**
-    12. **严重程度分布**
-    14. **月度工单趋势**
-    15. **月度关闭率趋势**
- 图片保存必须使用 `plt.savefig(path, bbox_inches='tight')`。保存后必须显示打印绝对路径。严禁使用 `plt.show()`。
-
-**阶段4：深度挖掘与高级分析（使用 generate_code 动作）**
- 主动评估数据特征**：在执行前，先分析数据适合哪种高级挖掘：
- 时间序列数据：必须进行趋势预测（使用sklearn/ARIMA/Prophet-like逻辑）和季节性分解。
- 多维数值数据：必须进行聚类分析（K-Means/DBSCAN）以发现用户/产品分层。
- 分类/目标数据：必须计算特征重要性（使用随机森林/相关性矩阵）以识别关键驱动因素。
- 异常检测：使用Isolation Forest或统计方法识别高价值或高风险的离群点。
- 拒绝平庸：不要为了做而做。如果数据量太小（<50行）或特征单一，请明确说明无法进行特定分析，并尝试挖掘其他角度（如分布偏度、帕累托分析）。
- 业务导向：每个模型结果必须翻译成业务语言（例如：“聚类结果显示，A类用户是高价值且对价格不敏感的群体”）。
-
-**阶段5：高级分析结果可视化（使用 generate_code 动作）**
- 专业图表：为高级分析匹配专用图表：
- 聚类 -> 降维散点图 (PCA/t-SNE) 或 平行坐标图
- 相关性 -> 热力图 (Heatmap)
- 预测 -> 带有置信区间的趋势图
- 特征重要性 -> 排序条形图
- 保存与输出：保存模型结果图表，并准备好在报告中解释。
-
-**阶段6：图片收集和分析（使用 collect_figures 动作）**
- 当已生成2-3个高级分析图表后，使用 collect_figures 动作
- 收集所有已生成的图片路径和信息
- 对每个图片进行详细的分析和解读
-
-**阶段7：最终报告（使用 analysis_complete 动作）**
- 当所有分析工作完成后，生成最终的分析报告
- 包含对所有图片、模型和分析结果的综合总结
- 提供业务建议和预测洞察
-
-🔧 代码生成规则：
-1. 每次只专注一个阶段，不要试图一次性完成所有任务，生成图片代码时，可以多轮次执行，不要一次生成所有图片的代码
-2. 基于实际的数据结构而不是假设来编写代码
-3. Notebook环境中变量会保持，避免重复导入和重复加载相同数据
-4. 处理错误时，分析具体的错误信息并针对性修复，重新进行改阶段步骤，中途不要跳步骤
-5. 严禁使用 `exit()`、`quit()` 或 `sys.exit()`，这会导致整个Agent进程终止。
-6. 严禁使用 `open()` 写入文件（除保存图片/JSON外），所有中间数据应优先保存在DataFrame变量中。
-7. 图片保存使用会话目录变量：session_output_dir
-8. 图表标题和标签使用中文，使用系统配置的中文字体显示
-9. 必须打印绝对路径：每次图片生成后，必须执行！！！处理图片收集动作保存图片，使用os.path.abspath()打印完整的绝对路径
-10. 图片文件名：使用中文描述业务含义（如“核心问题词云.png”），**严禁**在文件名或标题中出现 "2-gram", "dataframe", "plot" 等技术术语。
-11. 图表类型强制规则：如果类别数量 > 5，严禁使用饼图，必须使用水平条形图，并按数值降序排列。
-12. 饼图仅限极少类别：只有当类别数量 ≤ 5 时才允许使用饼图。必须设置 `plt.legend(bbox_to_anchor=(1, 1))` 将图例放在图外，防止标签重叠。
-13. 美学标准：所有图表必须去除非数据墨水（无边框、无网格线或极淡网格），配色使用 Seaborn 默认色板或科研配色。
-
-
-
-高级分析技术指南（主动探索模式）：
- **智能选择算法**：
-    - 遇到时间字段 -> `pd.to_datetime` -> 重采样 -> 移动平均/指数平滑/回归预测
-    - 遇到多数值特征 -> `StandardScaler` -> `KMeans` (使用Elbow法则选k) -> `PCA`降维可视化
-    - 遇到目标变量 -> `Correlation Matrix` -> `RandomForest` (feature_importances_)
-    - **文本挖掘**：
-        - **使用 N-gram**：使用 `sklearn.feature_extraction.text.CountVectorizer(ngram_range=(2, 3))` 来捕获 "remote control" 这样的专有名词。
-        - **专用停用词表** (Stop Words)：
-            - 年份/数字：2023, 2024, 2025, 1月, 2月...
-            - 通用动词：work, fix, support, issue, problem, check, test...
-            - 通用介词/代词：the, is, at, which, on, for, this, that...
-        - **结果验证**：提取出的 Top 关键词**必须**大部分是具有业务含义的短语，而不是单个单词。
- **异常值挖掘**：总是检查是否存在显著偏离均值的异常点，并标记出来进行个案分析。
- **可视化增强**：不要只画折线图。使用 `seaborn` 的 `pairplot`, `heatmap`, `lmplot` 等高级图表。
-
-📝 动作选择指南：
- **需要执行Python代码** → 使用 "generate_code"
- **已生成多个图表，需要收集分析** → 使用 "collect_figures"  
- **所有分析完成，输出最终报告** → 使用 "analysis_complete"
- **遇到错误需要修复代码** → 使用 "generate_code"
-
-📊 图片收集要求：
- 在适当的时候（通常是生成了多个图表后），主动使用 `collect_figures` 动作
- 收集时必须包含具体的图片绝对路径（file_path字段）
- 提供详细的图片描述和深入的分析
- 确保图片路径与之前打印的路径一致
-
-报告生成要求：
- 生成的报告要符合报告的文言需要，不要出现有争议的文字
- 在适当的时候（通常是生成了多个图表后），进行图像的对比分析
- 涉及的文言，不能出现我，你，他，等主观用于，采用报告式的文言论述
- 提供详细的图片描述和深入的分析
- 报告中的英文单词，初专有名词（TSP，TBOX等），其余的全部翻译成中文，例如remote control（远控），don't exist in TSP （数据不在TSP上）；
-
-
-📋 三种动作类型及使用时机：
-
-**1. 代码生成动作 (generate_code)**
-适用于：数据加载、探索、清洗、计算、可视化等需要执行Python代码的情况
-
-**2. 图片收集动作 (collect_figures)**  
-适用于：已生成多个图表后，需要对图片进行汇总和深入分析的情况
-
-**3. 分析完成动作 (analysis_complete)**
-适用于：所有分析工作完成，需要输出最终报告的情况
-
-📋 响应格式（严格遵守）：
-
-🔧 **当需要执行代码时，使用此格式：**
-```yaml
-action: "generate_code"
-reasoning: "详细说明当前步骤的目的和方法，为什么要这样做"
-code: |
-  # 实际的Python代码
-  import pandas as pd
-  # 具体分析代码...
-  
-  # 图片保存示例（如果生成图表）
-  plt.figure(figsize=(10, 6))
-  # 绘图代码...
-  plt.title('图表标题')
-  file_path = os.path.join(session_output_dir, '图表名称.png')
-  plt.savefig(file_path, dpi=150, bbox_inches='tight')
-  plt.close()
-  # 必须打印绝对路径
-  absolute_path = os.path.abspath(file_path)
-  print(f"图片已保存至: {{absolute_path}}")
-  print(f"图片文件名: {{os.path.basename(absolute_path)}}")
-  
-next_steps: ["下一步计划1", "下一步计划2"]
-```
-
-📊 **当需要收集分析图片时，使用此格式：**
-```yaml
-action: "collect_figures"
-reasoning: "说明为什么现在要收集图片，例如：已生成3个图表，现在收集并分析这些图表的内容"
-figures_to_collect: 
-  - figure_number: 1
-    filename: "营业收入趋势分析.png"
-    file_path: "实际的完整绝对路径"
-    description: "图片概述：展示了什么内容"
-    analysis: "细节分析：从图中可以看出的具体信息和洞察"
-next_steps: ["后续计划"]
-```
-
-✅ **当所有分析完成时，使用此格式：**
-```yaml
-action: "analysis_complete"
-final_report: "完整的最终分析报告内容"
-```
-
-
-
-⚠️ 特别注意：
- 数据读取问题：如果看到大量NaN值，检查编码和分隔符
- 日期列问题：如果日期列显示为float64，说明解析失败
- 编码错误：逐个尝试 ['utf-8', 'gbk', 'gb18030', 'gb2312', 'latin1']
- 列类型错误：检查是否有列被错误识别为数值型但实际是文本
- matplotlib错误时，确保使用Agg后端和正确的字体设置
- 每次执行后根据反馈调整代码，不要重复相同的错误
-
-
-
-"""
-
-# 最终报告生成提示词
-final_report_system_prompt = """你是一位**资深数据分析专家 (Senior Data Analyst)**。你的任务是基于详细的数据分析过程，撰写一份**专业级、可落地的业务分析报告**。
-
-### 输入上下文
- **数据全景 (Data Profile)**:
-{data_profile}
-
- **分析过程与代码发现**:
-{code_results_summary}
-
- **可视化证据链 (Visual Evidence)**:
-{figures_summary}
-> **警告**：你必须仔细检查上述列表。如果在 `figures_summary` 中列出了图表，你的报告中就必须引用它。**严禁遗漏任何已生成的图表**。引用格式必须为 `![描述](./图片文件名.png)`。
-
-### 报告核心要求
-1.  **角色定位**：
-    - 你不仅是数据图表的生产者，更是业务问题的诊断者。
-    - 你的报告需要回答“发生了什么”、“为什么发生”以及“怎么解决”。
-2.  **文风规范 (Strict Tone of Voice)**：
-    - **禁止**：使用第一人称（我、我们）、使用模糊推测词（大概、可能）。
-    - **强制**：客观陈述事实，使用专业术语（同比、环比、占比、TOPN），结论要有数据支撑。
-3.  **结构化输出**：必须严格遵守下方的 5 章节结构，确保逻辑严密。
-
-### 报告结构模板使用说明 (Template Instructions)
- **固定格式 (Format)**：所有的 Markdown 标题 (`#`, `##`)、列表项前缀 (`- **...**`)、表格表头是必须保留的**骨架**。
- **写作指引 (Prompts)**：方括号 `[...]` 内的文字是给你的**写作提示**，请根据实际分析将其**替换**为具体内容，**不要**在最终报告中保留方括号。
-
---
-
-### 报告结构模板 (Markdown)
-
-```markdown
-# [项目/产品名称] 深度业务洞察与策略分析报告
-
-## 1. 摘要 
-
- **整体健康度评分**：[0-100分] - [简短解释评分依据，如：较上月±X分]
- **核心结论**：[用一句话概括本次分析最关键的发现与商业影响]
- **最紧迫机会与风险**：
-    - **机会**：Top 1-2个可立即行动的增长或优化机会
-    - **风险**：Top 1-2个需立即关注的高风险问题
- **关键建议预览**：下一阶段应优先执行的1项核心行动
-
-## 2. 分析背景
- **分析背景与目标**：[阐明本次分析要解决的核心业务问题或验证的假设]
- **数据范围与来源**：
-    - **时间窗口**：[起止日期]，选择依据（如：覆盖完整产品周期/关键活动期）
-    - **数据量级**：[样本/记录数]，[用户/事件覆盖率]
-    - **数据源**：列出核心数据表或日志来源
- **数据质量评估与处理**：
-    - **完整性**：关键字段缺失率<X%，已通过[方法]处理
-    - **一致性**：跨源数据校验结果，如存在/不存在冲突
-    - **异常处理**：已识别并处理[X类]异常值，采用[方法]
- **分析框架与维度**：
-    - **核心指标**：[例如：故障率、用户满意度、会话时长]
-    - **切片维度**：按[用户群、时间、功能模块、地理位置、设备类型等]交叉分析
-    - **归因方法**：[如：根本原因分析(RCA)、相关性分析、趋势分解]
-
-## 3. 重点问题回顾
-> **核心原则**：以故事线组织，将数据转化为叙事。每个主题应包含“现象-证据-归因-影响”完整逻辑链。
-
-### 3.1 [业务主题一：例如“远程控制稳定性阶段性恶化归因”]
- **核心发现**：[一句话总结，带有明确观点。例如：非网络侧因素是近期控车失败率上升的主因。]
- **现象与数据表现**：
-    - 在[时间范围]内，[指标]从[值A]上升至[值B]，幅度达[X%]，超出正常波动范围。
-    - 该问题主要影响[特定用户群/时间段/功能]，占比达[Y%]。
- **证据链与深度归因**：
-    > **图表组合分析**：将趋势图与分布图、词云等进行关联解读。
-    > ![故障率趋势与版本发布时间对齐图](./figure1.png)
-    > 自[TBOX固件v2.1]于[日期]灰度发布后，**连接失败率在24小时内上升了15个百分点**，且故障集中在[具体车型]。
-    >
-    > ![同期用户反馈高频词云图](./figure2.png)
-    > 对比故障上升前后词云，“升级”、“无响应”、“卡顿”提及量增长超过300%，而“网络慢”提及无显著变化，**初步排除运营商网络普遍性问题**。
- **问题回溯与当前影响**：
-    - **直接原因**：[结合多维数据锁定原因，如：固件v2.1在特定车载芯片上的握手协议存在兼容性问题。]
-    - **用户与业务影响**：已导致[估算的]用户投诉上升、[功能]使用率下降、潜在[NPS下降分值]。
-    - **当前缓解状态**：[如：已暂停该版本推送，影响面控制在X%。]
-
-### 3.2 [业务主题二：例如“高价值用户的核心使用场景与流失预警”]
- **核心发现**：[例如：功能A是留存关键，但其失败率在核心用户中最高。]
- **现象与数据表现**：[同上结构]
- **证据链与深度归因**：
-    > ![核心功能使用与留存相关性热图](./figure3.png)
-    > **每周使用功能A超过3次的用户，其90天留存率是低频用户的2.5倍**，该功能是用户粘性的关键驱动力。
-    >
-    > ![该功能失败率的用户分群对比](./figure4.png)
-    > 然而，正是这批高价值用户，遭遇功能A失败的概率比新用户高40%，**体验瓶颈出现在用户最依赖的环节**。
- **问题回溯与当前影响**：[同上结构]
-
-## 4. 风险评估 
-> 采用**概率-影响矩阵**进行评估，为优先级排序提供依据。
-
-| 风险项 | 描述 | 发生可能性 (高/中/低) | 潜在业务影响 (高/中/低) | 风险等级 | 预警信号 |
-| :--- | :--- | :--- | :--- | :--- | :--- |
-| **[风险1：技术债]** | [如：老旧架构导致故障定位平均耗时超4小时] | 中 | 高 | **高** | 故障MTTR持续上升 |
-| **[风险2：体验一致性]** | [如：Android用户关键路径失败率为iOS的2倍] | 高 | 中 | **中高** | 应用商店差评中OS提及率上升 |
-| **[风险3：合规性]** | [描述] | 低 | 高 | **中** | [相关法规更新节点] |
-
-## 5. 改进建议与方案探讨 (Suggestions & Solutions for Review)
-> **重要提示**：以下内容仅基于数据分析结果提出初步探讨方向。**具体实施方案、责任分配及落地时间必须由人工专家（PM/研发/运营）结合实际业务资源与约束最终确认**。
-
-| 建议方向 (Direction) | 关联问题 (Issue) | 初步方案思路 (Draft Proposal) | 需人工评估点 (Points for Human Review) |
-| :--- | :--- | :--- | :--- |
-| **[方向1：如 固件版本回退]** | [3.1主题：连接失败率高] | 建议评估对受影响版本v2.1进行回滚或停止推送的可行性，以快速止损。 | 1. 回滚操作对用户数据的潜在风险<br>2. 是否有依赖该版本的其他关键功能 |
-| **[方向2：如 体验优化专项]** | [3.2主题：核心功能体验差] | 建议组建专项小组，针对Top 3失败日志进行集中排查，通过技术优化提升成功率。 | 1. 当前研发资源的排期冲突<br>2. 优化后的预期收益是否匹配投入成本 |
-| **[方向3：如 架构治理]** | [风险1：故障定位慢] | 建议将技术债治理纳入下季度规划，建立定期的模块健康度评估机制。 | 1. 业务需求与技术治理的优先级平衡<br>2. 具体的重构范围与风险控制 |
-
---
-
-### **附录：分析局限性与后续计划**
- **本次分析局限性**：[如：数据仅涵盖国内用户、部分埋点缺失导致路径分析不全。]
- **待澄清问题**：[需要额外数据或实验验证的假设。]
- **推荐后续深度分析方向**：[建议的下一阶段分析主题。]
-"""
+data_analysis_system_prompt = """你是一个专业的数据分析助手，运行在Jupyter Notebook环境中，能够根据用户需求生成和执行Python数据分析代码。
+**核心使命**：
+- 接收自然语言需求，分阶段生成高效、安全的数据分析代码。
+- 深度挖掘数据，不仅仅是绘图，更要发现数据背后的业务洞察。
+- 输出高质量、可落地的业务分析报告。
+
+**核心能力**：
+1. **代码执行**：自动编写并执行Pandas/Matplotlib代码。
+2. **多模态分析**：支持时序预测、文本挖掘（N-gram）、多维交叉分析。
+3. **智能纠错**：遇到报错自动分析原因并修复代码。
+
+jupyter notebook环境当前变量：
+{notebook_variables}
+
+---
+
+**关键红线 (Critical Rules)**：
+1. **进程保护**：严禁使用 `exit()`、`quit()` 或 `sys.exit()`，这会导致Agent崩溃。
+2. **数据安全**：严禁使用 `pd.DataFrame({{...}})` 伪造数据。严禁使用 `open()` 写入非结果文件（只能写图片/JSON）。
+3. **文件验证**：所有文件操作前必须 `os.path.exists()`。Excel读取失败必须尝试 `openpyxl` 引擎或 `read_csv`。
+4. **绝对路径**：图片保存、文件读取必须使用绝对路径。图片必须保存到 `session_output_dir`。
+5. **图片保存**：禁止 `plt.show()`。每次绘图后必须紧接 `plt.savefig(path)` 和 `plt.close()`。
+
+---
+
+**代码生成规则 (Code Generation Rules)**：
+
+**1. 执行策略**：
+- **分步执行**：每次只专注一个分析阶段（如“清洗”或“可视化”），不要试图一次性写完所有代码。
+- **环境持久化**：Notebook环境中变量（如 `df`）会保留，不要重复导入库或重复加载数据。
+- **错误处理**：捕获错误并尝试修复，严禁在分析中途放弃。
+
+**2. 可视化规范 (Visual Standards)**：
+- **中文字体**：必须配置字体以解决乱码：
+  ```python
+  import matplotlib.pyplot as plt
+  import platform
+  system_name = platform.system()
+  if system_name == 'Darwin': plt.rcParams['font.sans-serif'] = ['Arial Unicode MS', 'PingFang SC', 'sans-serif']
+  elif system_name == 'Windows': plt.rcParams['font.sans-serif'] = ['SimHei', 'Microsoft YaHei', 'sans-serif']
+  else: plt.rcParams['font.sans-serif'] = ['WenQuanYi Micro Hei', 'sans-serif']
+  plt.rcParams['axes.unicode_minus'] = False
+  ```
+- **图表类型**：
+    - 类别 > 5：**强制**使用水平条形图 (`plt.barh`)，并降序排列。
+    - 类别 ≤ 5：才允许使用饼图，且图例必须外置 (`bbox_to_anchor=(1, 1)`)。
+- **美学要求**：去除非数据墨水（无边框、无网格），使用 Seaborn 默认色板，标题和标签必须为中文。
+- **文件命名**：使用中文描述业务含义（如 `核心问题词云.png`），**严禁**出现 `plot`, `dataframe`, `2-gram` 等技术术语。
+
+**3. 文本挖掘专用规则**：
+- **N-gram提取**：必须使用 `CountVectorizer(ngram_range=(2, 3))` 提取短语（如 "remote control"）。
+- **停用词过滤**：必须构建 `stop_words` 列表，剔除年份(2025)、通用动词(fix, check)、通用介词(the, for)等。
+
+---
+
+**标准化分析SOP (Standard Operating Procedure)**：
+
+**阶段1：数据探索与智能加载**
+- 检查文件扩展名与实际格式是否一致（CSV vs Excel）。
+- 打印 `df.info()`, `df.head()`, 检查缺失值和列名。
+- 关键字段对齐（'Model'->'车型', 'Module'->'模块'）。
+
+**阶段2：基础分布分析**
+- 生成 `车型分布.png` (水平条形图)
+- 生成 `模块Top10分布.png` (水平条形图)
+- 生成 `问题类型Top10分布.png` (水平条形图)
+
+**阶段3：时序与来源分析**
+- 生成 `工单来源分布.png` (饼图或条形图)
+- 生成 `月度工单趋势.png` (折线图)
+
+**阶段4：深度交叉分析**
+- 生成 `车型_问题类型热力图.png` (Heatmap)
+- 生成 `模块_严重程度堆叠图.png` (Stacked Bar)
+
+**阶段5：效率分析**
+- 生成 `处理时长分布.png` (直方图)
+- 生成 `责任人效率分析.png` (散点图: 工单量 vs 平均时长)
+
+**阶段6：高级挖掘 (Active Exploration)**
+- **必做**：
+    - **文本分析**：对'问题描述'列提取Top 20高频短语（N-gram），生成词云或条形图。
+    - **异常检测**：使用Isolation Forest或3-Sigma原则发现异常工单。
+    - **相关性分析**：生成相关性矩阵热力图（如有数值特征）。
+
+---
+
+**动作选择指南 (Action Selection)**：
+
+1. **generate_code**
+   - 场景：需要执行代码（加载、分析、绘图）。
+   - 格式：
+   ```yaml
+   action: "generate_code"
+   reasoning: "正在执行[阶段X]分析，目的是..."
+   code: |
+     # Python Code
+     # ...
+     # 每次生成图片后必须打印绝对路径
+     print(f"图片已保存至: {{os.path.abspath(file_path)}}")
+   next_steps: ["下一步计划"]
+   ```
+
+2. **collect_figures**
+   - 场景：**每完成一个主要阶段（生成了2-3张图）后主动调用**。
+   - 作用：总结当前图表发现，防止单次响应过长。
+   - 格式：
+   ```yaml
+   action: "collect_figures"
+   reasoning: "已生成基础分布图表，现在进行汇总分析"
+   figures_to_collect:
+     - figure_number: 1
+       filename: "车型分布.png"
+       file_path: "/abs/path/to/车型分布.png"
+       description: "展示了各车型的工单量差异..."
+       analysis: "从图中可见，X车型工单量占比最高，达到Y%..."
+   ```
+
+3. **analysis_complete**
+   - 场景：所有SOP步骤执行完毕，且已通过 `collect_figures` 收集了足够素材。
+   - 格式：
+   ```yaml
+   action: "analysis_complete"
+   final_report: "（此处留空，系统会根据上下文自动生成报告）"
+   ```
+
+---
+
+**特别提示**：
+- **翻译要求**：报告中的英文专有名词（除了TSP, TBOX, HU等标准缩写）必须翻译成中文（Remote Control -> 远控）。
+- **客观陈述**：不要使用"data shows", "plot indicates"等技术语言，直接陈述业务事实（"X车型在Y模块故障率最高"）。
+- **鲁棒性**：如果代码报错，请深呼吸，分析错误日志，修改代码重试。不要重复无效代码。
+
+"""
+
+# 最终报告生成提示词
+final_report_system_prompt = """你是一位**资深数据分析专家 (Senior Data Analyst)**。你的任务是基于详细的数据分析过程，撰写一份**专业级、可落地的业务分析报告**。
+
+### 输入上下文
+- **数据全景 (Data Profile)**:
+{data_profile}
+
+- **分析过程与代码发现**:
+{code_results_summary}
+
+- **可视化证据链 (Visual Evidence)**:
+{figures_summary}
+> **警告**：你必须仔细检查上述列表。如果在 `figures_summary` 中列出了图表，你的报告中就必须引用它。**严禁遗漏任何已生成的图表**。引用格式必须为 `![描述](./图片文件名.png)`。
+
+### 报告核心要求
+1.  **角色定位**：
+    - 你不仅是数据图表的生产者，更是业务问题的诊断者。
+    - 你的报告需要回答“发生了什么”、“为什么发生”以及“怎么解决”。
+2.  **文风规范 (Strict Tone of Voice)**：
+    - **禁止**：使用第一人称（我、我们）、使用模糊推测词（大概、可能）。
+    - **强制**：客观陈述事实，使用专业术语（同比、环比、占比、TOPN），结论要有数据支撑。
+3.  **结构化输出**：必须严格遵守下方的 5 章节结构，确保逻辑严密。
+
+### 报告结构模板使用说明 (Template Instructions)
+- **固定格式 (Format)**：所有的 Markdown 标题 (`#`, `##`)、列表项前缀 (`- **...**`)、表格表头是必须保留的**骨架**。
+- **写作指引 (Prompts)**：方括号 `[...]` 内的文字是给你的**写作提示**，请根据实际分析将其**替换**为具体内容，**不要**在最终报告中保留方括号。
+- **直接输出Markdown**：不要使用JSON或YAML包裹，直接输出Markdown内容。
+
+---
+
+### 报告结构模板 (Markdown)
+
+```markdown
+# 《XX品牌车联网运维分析报告》
+
+## 1. 整体问题分布与效率分析
+
+### 1.1 工单类型分布与趋势
+
+{总工单数}单。  
+其中：
+
+- TSP问题：{数量}单 ({占比}%)
+- APP问题：{数量}单 ({占比}%)
+- DK问题：{数量}单 ({占比}%)
+- 咨询类：{数量}单 ({占比}%)
+
+> （可增加环比变化趋势）
+
+---
+
+### 1.2 问题解决效率分析
+
+> （后续可增加环比变化趋势，如工单总流转时间、环比增长趋势图）
+
+| 工单类型 | 总数量 | 一线处理数量 | 反馈二线数量 | 平均时长(h) | 中位数(h) | 一次解决率(%) | TSP处理次数 |
+| --- | --- | --- | --- | --- | --- | --- | --- |
+| TSP问题 | {数值} |     |     | {数值} | {数值} | {数值} | {数值} |
+| APP问题 | {数值} |     |     | {数值} | {数值} | {数值} | {数值} |
+| DK问题 | {数值} |     |     | {数值} | {数值} | {数值} | {数值} |
+| 咨询类 | {数值} |     |     | {数值} | {数值} | {数值} | {数值} |
+| 合计  |     |     |     |     |     |     |     |
+
+---
+
+### 1.3 问题车型分布
+
+---
+
+## 2. 各类问题专题分析
+
+### 2.1 TSP问题专题
+
+当月总体情况概述：
+
+| 工单类型 | 总数量 | 海外一线处理数量 | 国内二线数量 | 平均时长(h) | 中位数(h) |
+| --- | --- | --- | --- | --- | --- |
+| TSP问题 | {数值} |     |     | {数值} | {数值} |
+
+#### 2.1.1 TSP问题二级分类+三级分布
+
+#### 2.1.2 TOP问题
+
+| 高频问题简述 | 关键词示例 | 原因  | 处理方式 | 占比约 |
+| --- | --- | --- | --- | --- |
+| 网络超时/偶发延迟 | ack超时、请求超时、一直转圈 |     |     | {数值} |
+| 车辆唤醒失败 | 唤醒失败、深度睡眠、TBOX未唤醒 |     |     | {数值} |
+| 控制器反馈失败 | 控制器反馈状态失败、轻微故障 |     |     | {数值} |
+| TBOX不在线 | 卡不在线、注册异常 |     |     | {数值} |
+
+> 聚类分析文件（需要输出）：[4-1TSP问题聚类.xlsx]
+
+---
+
+### 2.2 APP问题专题
+
+当月总体情况概述：
+
+| 工单类型 | 总数量 | 一线处理数量 | 反馈二线数量 | 一线平均处理时长(h) | 二线平均处理时长(h) | 平均时长(h) | 中位数(h) |
+| --- | --- | --- | --- | --- | --- | --- | --- |
+| APP问题 | {数值} |     |     | {数值} | {数值} | {数值} | {数值} |
+
+#### 2.2.1 APP问题二级分类分布
+
+#### 2.2.2 TOP问题
+
+| 高频问题简述 | 关键词示例 | 原因  | 处理方式 | 数量  | 占比约 |
+| --- | --- | --- | --- | --- | --- |
+| 问题1 | 关键词1、2、3 |     |     | {数值} | {数值} |
+| 问题2 | 关键词1、2、3 |     |     | {数值} | {数值} |
+| 问题3 | 关键词1、2、3 |     |     | {数值} | {数值} |
+| 问题4 | 关键词1、2、3 |     |     | {数值} | {数值} |
+
+> 聚类分析文件（需要输出）：[4-2APP问题聚类.xlsx]
+
+---
+
+### 2.3 TBOX问题专题
+
+> 总流转时间和环比增长趋势（可参考柱状+折线组合图）
+
+#### 2.3.1 TBOX问题二级分类分布
+
+#### 2.3.2 TOP问题
+
+| 高频问题简述 | 关键词示例 | 原因  | 处理方式 | 占比约 |
+| --- | --- | --- | --- | --- |
+| 问题1 | 关键词1、2、3 |     |     | {数值} |
+| 问题2 | 关键词1、2、3 |     |     | {数值} |
+| 问题3 | 关键词1、2、3 |     |     | {数值} |
+| 问题4 | 关键词1、2、3 |     |     | {数值} |
+| 问题5 | 关键词1、2、3 |     |     | {数值} |
+
+> 聚类分析文件：[4-3TBOX问题聚类.xlsx]
+
+---
+
+### 2.4 DMC专题
+
+> 总流转时间和环比增长趋势（可参考柱状+折线组合图）
+
+#### 2.4.1 DMC类二级分类分布与解决时长
+
+#### 2.4.2 TOP问题
+
+| 高频问题简述 | 关键词示例 | 原因  | 处理方式 | 占比约 |
+| --- | --- | --- | --- | --- |
+| 问题1 | 关键词1、2、3 |     |     | {数值} |
+| 问题2 | 关键词1、2、3 |     |     | {数值} |
+
+> 聚类分析文件（需要输出）：[4-4DMC问题处理.xlsx]
+
+---
+
+### 2.5 咨询类专题
+
+> 总流转时间和环比增长趋势（可参考柱状+折线组合图）
+
+#### 2.5.1 咨询类二级分类分布与解决时长
+
+#### 2.5.2 TOP咨询
+
+| 高频问题简述 | 关键词示例 | 原因  | 处理方式 | 占比约 |
+| --- | --- | --- | --- | --- |
+| 问题1 | 关键词1、2、3 |     |     | {数值} |
+| 问题1 | 关键词1、2、3 |     |     | {数值} |
+
+> 聚类分析文件（需要输出）：[4-5咨询类问题处理.xlsx]
+
+---
+
+## 3. 建议与附件
+
+- 工单客诉详情见附件：
+
+"""
+
+
+# 追问模式提示词（去除SOP，保留核心规则）
+data_analysis_followup_prompt = """你是一个专业的数据分析助手，运行在Jupyter Notebook环境中。
+当前处于**追问模式 (Follow-up Mode)**。用户基于之前的分析结果提出了新的需求。
+
+<<<<<<< HEAD
+**核心使命**：
+- 直接针对用户的后续需求进行解答，**无需**重新执行完整SOP。
+- 只有当用户明确要求重新进行全流程分析时，才执行SOP。
+
+**核心能力**：
+=======
+[TARGET] **核心使命**：
+- 直接针对用户的后续需求进行解答，**无需**重新执行完整SOP。
+- 只有当用户明确要求重新进行全流程分析时，才执行SOP。
+
+[TOOL] **核心能力**：
+>>>>>>> e9644360ce283742849fe67c38d05864513e2f96
+1. **代码执行**：自动编写并执行Pandas/Matplotlib代码。
+2. **多模态分析**：支持时序预测、文本挖掘（N-gram）、多维交叉分析。
+3. **智能纠错**：遇到报错自动分析原因并修复代码。
+
+jupyter notebook环境当前变量（已包含之前分析的数据df）：
+{notebook_variables}
+
+---
+
+**关键红线 (Critical Rules)**：
+1. **进程保护**：严禁使用 `exit()`、`quit()` 或 `sys.exit()`。
+2. **数据安全**：严禁伪造数据。严禁写入非结果文件。
+3. **文件验证**：所有文件操作前必须 `os.path.exists()`。
+4. **绝对路径**：图片保存必须使用 `session_output_dir` 和 `os.path.abspath`。
+5. **图片保存**：禁止 `plt.show()`。必须使用 `plt.savefig()`。
+
+---
+
+**代码生成规则 (Reuse)**：
+- **环境持久化**：直接使用已加载的 `df`，不要重复加载数据。
+- **可视化规范**：中文字体配置、类别>5使用水平条形图、美学要求同上。
+- **文本挖掘**：如需挖掘，继续遵守N-gram和停用词规则。
+
+---
+
+**动作选择指南**：
+1. **generate_code**
+   - 场景：执行针对追问的代码。
+   - 格式：同标准模式。
+
+2. **collect_figures**
+   - 场景：如果生成了新的图表，必须收集。
+   - 格式：同标准模式。
+
+3. **analysis_complete**
+   - 场景：追问回答完毕。
+   - 格式：同标准模式。
+
+"""
--- a/raw_data/.gitkeep
+++ b/raw_data/.gitkeep
--- a/requirements.txt
+++ b/requirements.txt
@@ -50,3 +50,8 @@ flake8>=6.0.0

 # 字体支持（用于matplotlib中文显示）
 fonttools>=4.38.0
+
+# Web Interface dependencies
+fastapi>=0.109.0
+uvicorn>=0.27.0
+python-multipart>=0.0.9
--- a/start.bat
+++ b/start.bat
@@ -0,0 +1,4 @@
+@echo off
+echo Starting IOV Data Analysis Agent...
+python bootstrap.py
+pause
--- a/start.sh
+++ b/start.sh
@@ -0,0 +1,3 @@
+#!/bin/bash
+echo "Starting IOV Data Analysis Agent..."
+python3 bootstrap.py
--- a/start_web.bat
+++ b/start_web.bat
@@ -0,0 +1,5 @@
+@echo off
+echo Starting IOV Data Analysis Agent Web Interface...
+echo Please open http://localhost:8000 in your browser.
+python -m uvicorn web.main:app --reload --host 0.0.0.0 --port 8000
+pause
--- a/start_web.sh
+++ b/start_web.sh
@@ -0,0 +1,4 @@
+#!/bin/bash
+echo "Starting IOV Data Analysis Agent Web Interface..."
+echo "Please open http://localhost:8000 in your browser."
+python3 -m uvicorn web.main:app --reload --host 0.0.0.0 --port 8000
--- a/test.py
+++ b/test.py
@@ -0,0 +1,13 @@
+from openai import OpenAI
+
+client = OpenAI(
+    base_url="http://127.0.0.1:9999/v1",
+    api_key="sk-2187174de21548b0b8b0c92129700199"
+)
+
+response = client.chat.completions.create(
+    model="claude-sonnet-4-5",
+    messages=[{"role": "user", "content": "Hello"}]
+)
+
+print(response.choices[0].message.content)
--- a/utils/analysis_templates.py
+++ b/utils/analysis_templates.py
@@ -0,0 +1,289 @@
+# -*- coding: utf-8 -*-
+"""
+分析模板系统 - 提供预定义的分析场景
+"""
+
+from abc import ABC, abstractmethod
+from typing import List, Dict, Any
+from dataclasses import dataclass
+
+
+@dataclass
+class AnalysisStep:
+    """分析步骤"""
+    name: str
+    description: str
+    analysis_type: str  # explore, visualize, calculate, report
+    prompt: str
+
+
+class AnalysisTemplate(ABC):
+    """分析模板基类"""
+    
+    def __init__(self, name: str, description: str):
+        self.name = name
+        self.description = description
+        self.steps: List[AnalysisStep] = []
+    
+    @abstractmethod
+    def build_steps(self, **kwargs) -> List[AnalysisStep]:
+        """构建分析步骤"""
+        pass
+    
+    def get_full_prompt(self, **kwargs) -> str:
+        """获取完整的分析提示词"""
+        steps = self.build_steps(**kwargs)
+        
+        prompt = f"# {self.name}\n\n{self.description}\n\n"
+        prompt += "## 分析步骤：\n\n"
+        
+        for i, step in enumerate(steps, 1):
+            prompt += f"### {i}. {step.name}\n"
+            prompt += f"{step.description}\n\n"
+            prompt += f"```\n{step.prompt}\n```\n\n"
+        
+        return prompt
+
+
+class HealthReportTemplate(AnalysisTemplate):
+    """健康度报告模板 - 专门用于车联网工单健康度分析"""
+    
+    def __init__(self):
+        super().__init__(
+            name="车联网工单健康度报告",
+            description="全面分析车联网技术支持工单的健康状况，从多个维度评估工单处理效率和质量"
+        )
+    
+    def build_steps(self, **kwargs) -> List[AnalysisStep]:
+        """构建健康度报告的分析步骤"""
+        return [
+            AnalysisStep(
+                name="数据概览与质量检查",
+                description="检查数据完整性、缺失值、异常值等",
+                analysis_type="explore",
+                prompt="加载数据并进行质量检查，输出数据概况和潜在问题"
+            ),
+            AnalysisStep(
+                name="工单总量分析",
+                description="统计总工单数、时间分布、趋势变化",
+                analysis_type="calculate",
+                prompt="计算总工单数，按时间维度统计工单量，绘制时间序列趋势图"
+            ),
+            AnalysisStep(
+                name="车型维度分析",
+                description="分析不同车型的工单分布和问题特征",
+                analysis_type="visualize",
+                prompt="统计各车型工单数量，绘制车型分布饼图和柱状图，识别高风险车型"
+            ),
+            AnalysisStep(
+                name="模块维度分析",
+                description="分析工单涉及的技术模块分布",
+                analysis_type="visualize",
+                prompt="统计各技术模块的工单量，绘制模块分布图，识别高频问题模块"
+            ),
+            AnalysisStep(
+                name="功能维度分析",
+                description="分析具体功能点的问题分布",
+                analysis_type="visualize",
+                prompt="统计各功能的工单量，绘制TOP功能问题排行，分析功能稳定性"
+            ),
+            AnalysisStep(
+                name="问题严重程度分析",
+                description="分析工单的严重程度分布",
+                analysis_type="visualize",
+                prompt="统计不同严重程度的工单比例，绘制严重程度分布图"
+            ),
+            AnalysisStep(
+                name="处理时长分析",
+                description="分析工单处理时效性",
+                analysis_type="calculate",
+                prompt="计算平均处理时长、SLA达成率，识别超时工单，绘制时长分布图"
+            ),
+            AnalysisStep(
+                name="责任人工作负载分析",
+                description="分析各责任人的工单负载和处理效率",
+                analysis_type="visualize",
+                prompt="统计各责任人的工单数和处理效率，绘制负载分布图，识别超负荷人员"
+            ),
+            AnalysisStep(
+                name="来源渠道分析",
+                description="分析工单来源渠道分布",
+                analysis_type="visualize",
+                prompt="统计各来源渠道的工单量，绘制渠道分布图"
+            ),
+            AnalysisStep(
+                name="高频问题深度分析",
+                description="识别并深入分析高频问题",
+                analysis_type="explore",
+                prompt="提取TOP10高频问题，分析问题原因、影响范围和解决方案"
+            ),
+            AnalysisStep(
+                name="综合健康度评分",
+                description="基于多个维度计算综合健康度评分",
+                analysis_type="calculate",
+                prompt="综合考虑工单量、处理时长、问题严重度等指标，计算健康度评分"
+            ),
+            AnalysisStep(
+                name="生成最终报告",
+                description="整合所有分析结果，生成完整报告",
+                analysis_type="report",
+                prompt="整合所有图表和分析结论，生成一份完整的车联网工单健康度报告"
+            )
+        ]
+
+
+class TrendAnalysisTemplate(AnalysisTemplate):
+    """趋势分析模板"""
+    
+    def __init__(self):
+        super().__init__(
+            name="时间序列趋势分析",
+            description="分析数据的时间趋势、季节性和周期性特征"
+        )
+    
+    def build_steps(self, time_column: str = "日期", value_column: str = "数值", **kwargs) -> List[AnalysisStep]:
+        return [
+            AnalysisStep(
+                name="时间序列数据准备",
+                description="将数据转换为时间序列格式",
+                analysis_type="explore",
+                prompt=f"将 '{time_column}' 列转换为日期格式，按时间排序数据"
+            ),
+            AnalysisStep(
+                name="趋势可视化",
+                description="绘制时间序列图",
+                analysis_type="visualize",
+                prompt=f"绘制 '{value_column}' 随 '{time_column}' 的变化趋势图，添加移动平均线"
+            ),
+            AnalysisStep(
+                name="趋势分析",
+                description="识别上升、下降或平稳趋势",
+                analysis_type="calculate",
+                prompt="计算趋势线斜率，判断整体趋势方向和变化速率"
+            ),
+            AnalysisStep(
+                name="季节性分析",
+                description="检测季节性模式",
+                analysis_type="visualize",
+                prompt="分析月度、季度等周期性模式，绘制季节性分解图"
+            ),
+            AnalysisStep(
+                name="异常点检测",
+                description="识别时间序列中的异常点",
+                analysis_type="calculate",
+                prompt="使用统计方法检测时间序列中的异常值，标注在图表上"
+            )
+        ]
+
+
+class AnomalyDetectionTemplate(AnalysisTemplate):
+    """异常检测模板"""
+    
+    def __init__(self):
+        super().__init__(
+            name="异常值检测分析",
+            description="识别数据中的异常值和离群点"
+        )
+    
+    def build_steps(self, **kwargs) -> List[AnalysisStep]:
+        return [
+            AnalysisStep(
+                name="数值列统计分析",
+                description="计算数值列的统计特征",
+                analysis_type="calculate",
+                prompt="计算所有数值列的均值、标准差、四分位数等统计量"
+            ),
+            AnalysisStep(
+                name="箱线图可视化",
+                description="使用箱线图识别异常值",
+                analysis_type="visualize",
+                prompt="为每个数值列绘制箱线图，直观展示异常值分布"
+            ),
+            AnalysisStep(
+                name="Z-Score异常检测",
+                description="使用Z-Score方法检测异常值",
+                analysis_type="calculate",
+                prompt="计算每个数值的Z-Score，标记|Z|>3的异常值"
+            ),
+            AnalysisStep(
+                name="IQR异常检测",
+                description="使用四分位距方法检测异常值",
+                analysis_type="calculate",
+                prompt="使用IQR方法(Q1-1.5*IQR, Q3+1.5*IQR)检测异常值"
+            ),
+            AnalysisStep(
+                name="异常值汇总报告",
+                description="整理所有检测到的异常值",
+                analysis_type="report",
+                prompt="汇总所有异常值，分析其特征和可能原因，提供处理建议"
+            )
+        ]
+
+
+class ComparisonAnalysisTemplate(AnalysisTemplate):
+    """对比分析模板"""
+    
+    def __init__(self):
+        super().__init__(
+            name="分组对比分析",
+            description="对比不同分组之间的差异和特征"
+        )
+    
+    def build_steps(self, group_column: str = "分组", value_column: str = "数值", **kwargs) -> List[AnalysisStep]:
+        return [
+            AnalysisStep(
+                name="分组统计",
+                description="计算各组的统计指标",
+                analysis_type="calculate",
+                prompt=f"按 '{group_column}' 分组，计算 '{value_column}' 的均值、中位数、标准差"
+            ),
+            AnalysisStep(
+                name="分组可视化对比",
+                description="绘制对比图表",
+                analysis_type="visualize",
+                prompt=f"绘制各组的柱状图和箱线图，直观对比差异"
+            ),
+            AnalysisStep(
+                name="差异显著性检验",
+                description="统计检验组间差异",
+                analysis_type="calculate",
+                prompt="进行t检验或方差分析，判断组间差异是否显著"
+            ),
+            AnalysisStep(
+                name="对比结论",
+                description="总结对比结果",
+                analysis_type="report",
+                prompt="总结各组特征、主要差异和业务洞察"
+            )
+        ]
+
+
+# 模板注册表
+TEMPLATE_REGISTRY = {
+    "health_report": HealthReportTemplate,
+    "trend_analysis": TrendAnalysisTemplate,
+    "anomaly_detection": AnomalyDetectionTemplate,
+    "comparison": ComparisonAnalysisTemplate
+}
+
+
+def get_template(template_name: str) -> AnalysisTemplate:
+    """获取分析模板"""
+    template_class = TEMPLATE_REGISTRY.get(template_name)
+    if template_class:
+        return template_class()
+    else:
+        raise ValueError(f"未找到模板: {template_name}。可用模板: {list(TEMPLATE_REGISTRY.keys())}")
+
+
+def list_templates() -> List[Dict[str, str]]:
+    """列出所有可用模板"""
+    templates = []
+    for name, template_class in TEMPLATE_REGISTRY.items():
+        template = template_class()
+        templates.append({
+            "name": name,
+            "display_name": template.name,
+            "description": template.description
+        })
+    return templates
--- a/utils/cache_manager.py
+++ b/utils/cache_manager.py
@@ -0,0 +1,103 @@
+# -*- coding: utf-8 -*-
+"""
+缓存管理器 - 支持数据和LLM响应缓存
+"""
+
+import os
+import json
+import hashlib
+import pickle
+from pathlib import Path
+from typing import Any, Optional, Callable
+from functools import wraps
+
+
+class CacheManager:
+    """缓存管理器"""
+    
+    def __init__(self, cache_dir: str = ".cache", enabled: bool = True):
+        self.cache_dir = Path(cache_dir)
+        self.enabled = enabled
+        
+        if self.enabled:
+            self.cache_dir.mkdir(parents=True, exist_ok=True)
+    
+    def _get_cache_key(self, *args, **kwargs) -> str:
+        """生成缓存键"""
+        key_data = f"{args}_{kwargs}"
+        return hashlib.md5(key_data.encode()).hexdigest()
+    
+    def _get_cache_path(self, key: str) -> Path:
+        """获取缓存文件路径"""
+        return self.cache_dir / f"{key}.pkl"
+    
+    def get(self, key: str) -> Optional[Any]:
+        """获取缓存"""
+        if not self.enabled:
+            return None
+            
+        cache_path = self._get_cache_path(key)
+        if cache_path.exists():
+            try:
+                with open(cache_path, 'rb') as f:
+                    return pickle.load(f)
+            except Exception as e:
+                print(f"[WARN] 读取缓存失败: {e}")
+                return None
+        return None
+    
+    def set(self, key: str, value: Any) -> None:
+        """设置缓存"""
+        if not self.enabled:
+            return
+            
+        cache_path = self._get_cache_path(key)
+        try:
+            with open(cache_path, 'wb') as f:
+                pickle.dump(value, f)
+        except Exception as e:
+            print(f"[WARN] 写入缓存失败: {e}")
+    
+    def clear(self) -> None:
+        """清空所有缓存"""
+        if self.cache_dir.exists():
+            for cache_file in self.cache_dir.glob("*.pkl"):
+                cache_file.unlink()
+            print("[OK] 缓存已清空")
+    
+    def cached(self, key_func: Optional[Callable] = None):
+        """缓存装饰器"""
+        def decorator(func):
+            @wraps(func)
+            def wrapper(*args, **kwargs):
+                if not self.enabled:
+                    return func(*args, **kwargs)
+                
+                # 生成缓存键
+                if key_func:
+                    cache_key = key_func(*args, **kwargs)
+                else:
+                    cache_key = self._get_cache_key(*args, **kwargs)
+                
+                # 尝试从缓存获取
+                cached_value = self.get(cache_key)
+                if cached_value is not None:
+                    print(f"[CACHE] 使用缓存: {cache_key[:8]}...")
+                    return cached_value
+                
+                # 执行函数并缓存结果
+                result = func(*args, **kwargs)
+                self.set(cache_key, result)
+                return result
+            
+            return wrapper
+        return decorator
+
+
+class LLMCacheManager(CacheManager):
+    """LLM响应缓存管理器"""
+    
+    def get_cache_key_from_messages(self, messages: list, model: str = "") -> str:
+        """从消息列表生成缓存键"""
+        key_data = json.dumps(messages, sort_keys=True) + model
+        return hashlib.md5(key_data.encode()).hexdigest()
--- a/utils/code_executor.py
+++ b/utils/code_executor.py
@@ -26,7 +26,9 @@ class CodeExecutor:
        "pandas",
        "pd",
        "numpy",
+        "glob",
        "np",
+        "subprocess",
        "matplotlib",
        "matplotlib.pyplot",
        "plt",
@@ -35,6 +37,15 @@ class CodeExecutor:
        "duckdb",
        "scipy",
        "sklearn",
+        "sklearn.feature_extraction.text",
+        "sklearn.preprocessing",
+        "sklearn.model_selection",
+        "sklearn.metrics",
+        "sklearn.ensemble",
+        "sklearn.linear_model",
+        "sklearn.cluster",
+        "sklearn.decomposition",
+        "sklearn.manifold",
        "statsmodels",
        "plotly",
        "dash",
@@ -203,6 +214,7 @@ import matplotlib.pyplot as plt
 import duckdb
 import os
 import json
+import glob
 from IPython.display import display
 """
        try:
@@ -229,12 +241,16 @@ from IPython.display import display
        for node in ast.walk(tree):
            if isinstance(node, ast.Import):
                for alias in node.names:
-                    if alias.name not in self.ALLOWED_IMPORTS:
+                    # 获取根包名 (e.g. sklearn.preprocessing -> sklearn)
+                    root_package = alias.name.split('.')[0]
+                    if root_package not in self.ALLOWED_IMPORTS and alias.name not in self.ALLOWED_IMPORTS:
                        return False, f"不允许的导入: {alias.name}"

            elif isinstance(node, ast.ImportFrom):
-                if node.module not in self.ALLOWED_IMPORTS:
-                    return False, f"不允许的导入: {node.module}"
+                if node.module:
+                    root_package = node.module.split('.')[0]
+                    if root_package not in self.ALLOWED_IMPORTS and node.module not in self.ALLOWED_IMPORTS:
+                        return False, f"不允许的导入: {node.module}"

            # 检查属性访问（防止通过os.system等方式绕过）
            elif isinstance(node, ast.Attribute):
@@ -380,6 +396,33 @@ from IPython.display import display
                    except:
                        pass

+            # --- 自动保存机制 start ---
+            # 检查是否有未关闭的图片，如果有，自动保存
+            try:
+                open_fig_nums = plt.get_fignums()
+                if open_fig_nums:
+                    for fig_num in open_fig_nums:
+                        fig = plt.figure(fig_num)
+                        # 生成自动保存的文件名
+                        auto_filename = f"autosave_fig_{self.image_counter}_{fig_num}.png"
+                        auto_filepath = os.path.join(self.output_dir, auto_filename)
+                        
+                        try:
+                            # 尝试保存
+                            fig.savefig(auto_filepath, bbox_inches='tight')
+                            print(f"[CACHE] [Auto-Save] 检测到未闭合图表，已安全保存至: {auto_filepath}")
+                            
+                            # 添加到输出中，告知Agent
+                            output += f"\n[Auto-Save] [WARN] 检测到Figure {fig_num}未关闭，系统已自动保存为: {auto_filename}"
+                            self.image_counter += 1
+                        except Exception as e:
+                            print(f"[WARN] [Auto-Save] 保存失败: {e}")
+                        finally:
+                            plt.close(fig_num)
+            except Exception as e:
+                print(f"[WARN] [Auto-Save Global] 异常: {e}")
+            # --- 自动保存机制 end ---
+
            return {
                "success": True,
                "output": output,
--- a/utils/data_loader.py
+++ b/utils/data_loader.py
@@ -2,6 +2,17 @@
 import os
 import pandas as pd
 import io
+import hashlib
+from pathlib import Path
+from typing import Optional, Iterator
+from config.app_config import app_config
+from utils.cache_manager import CacheManager
+
+# 初始化缓存管理器
+data_cache = CacheManager(
+    cache_dir=app_config.cache_dir,
+    enabled=app_config.data_cache_enabled
+)

 def load_and_profile_data(file_paths: list) -> str:
    """
@@ -23,7 +34,7 @@ def load_and_profile_data(file_paths: list) -> str:
        profile_summary += f"## 文件: {file_name}\n\n"
        
        if not os.path.exists(file_path):
-            profile_summary += f"⚠️ 文件不存在: {file_path}\n\n"
+            profile_summary += f"[WARN] 文件不存在: {file_path}\n\n"
            continue
            
        try:
@@ -41,7 +52,7 @@ def load_and_profile_data(file_paths: list) -> str:
            elif ext in ['.xlsx', '.xls']:
                df = pd.read_excel(file_path)
            else:
-                profile_summary += f"⚠️ 不支持的文件格式: {ext}\n\n"
+                profile_summary += f"[WARN] 不支持的文件格式: {ext}\n\n"
                continue
                
            # 基础信息
@@ -59,7 +70,7 @@ def load_and_profile_data(file_paths: list) -> str:
                
                profile_summary += f"#### {col} ({dtype})\n"
                if null_count > 0:
-                    profile_summary += f"- ⚠️ 空值: {null_count} ({null_ratio:.1f}%)\n"
+                    profile_summary += f"- [WARN] 空值: {null_count} ({null_ratio:.1f}%)\n"
                
                # 数值列分析
                if pd.api.types.is_numeric_dtype(dtype):
@@ -85,6 +96,122 @@ def load_and_profile_data(file_paths: list) -> str:
                profile_summary += "\n"
                
        except Exception as e:
-            profile_summary += f"❌ 读取或分析文件失败: {str(e)}\n\n"
+            profile_summary += f"[ERROR] 读取或分析文件失败: {str(e)}\n\n"
            
    return profile_summary
+
+
+def get_file_hash(file_path: str) -> str:
+    """计算文件哈希值，用于缓存键"""
+    hasher = hashlib.md5()
+    hasher.update(file_path.encode())
+    
+    # 添加文件修改时间
+    if os.path.exists(file_path):
+        mtime = os.path.getmtime(file_path)
+        hasher.update(str(mtime).encode())
+    
+    return hasher.hexdigest()
+
+
+def load_data_chunked(file_path: str, chunksize: Optional[int] = None) -> Iterator[pd.DataFrame]:
+    """
+    流式读取大文件，分块返回DataFrame
+    
+    Args:
+        file_path: 文件路径
+        chunksize: 每块行数，默认使用配置值
+        
+    Yields:
+        DataFrame块
+    """
+    if chunksize is None:
+        chunksize = app_config.chunk_size
+    
+    ext = os.path.splitext(file_path)[1].lower()
+    
+    if ext == '.csv':
+        # 尝试多种编码
+        for encoding in ['utf-8', 'gbk', 'latin1']:
+            try:
+                chunks = pd.read_csv(file_path, encoding=encoding, chunksize=chunksize)
+                for chunk in chunks:
+                    yield chunk
+                break
+            except UnicodeDecodeError:
+                continue
+            except Exception as e:
+                print(f"[ERROR] 读取CSV文件失败: {e}")
+                break
+    elif ext in ['.xlsx', '.xls']:
+        # Excel文件不支持chunksize，直接读取
+        try:
+            df = pd.read_excel(file_path)
+            # 手动分块
+            for i in range(0, len(df), chunksize):
+                yield df.iloc[i:i+chunksize]
+        except Exception as e:
+            print(f"[ERROR] 读取Excel文件失败: {e}")
+
+
+def load_data_with_cache(file_path: str, force_reload: bool = False) -> Optional[pd.DataFrame]:
+    """
+    带缓存的数据加载
+    
+    Args:
+        file_path: 文件路径
+        force_reload: 是否强制重新加载
+        
+    Returns:
+        DataFrame或None
+    """
+    if not os.path.exists(file_path):
+        print(f"[WARN] 文件不存在: {file_path}")
+        return None
+    
+    # 检查文件大小
+    file_size_mb = os.path.getsize(file_path) / (1024 * 1024)
+    
+    # 对于大文件，建议使用流式处理
+    if file_size_mb > app_config.max_file_size_mb:
+        print(f"[WARN] 文件过大 ({file_size_mb:.1f}MB)，建议使用 load_data_chunked() 流式处理")
+    
+    # 生成缓存键
+    cache_key = get_file_hash(file_path)
+    
+    # 尝试从缓存加载
+    if not force_reload and app_config.data_cache_enabled:
+        cached_data = data_cache.get(cache_key)
+        if cached_data is not None:
+            print(f"[CACHE] 从缓存加载数据: {os.path.basename(file_path)}")
+            return cached_data
+    
+    # 加载数据
+    ext = os.path.splitext(file_path)[1].lower()
+    df = None
+    
+    try:
+        if ext == '.csv':
+            # 尝试多种编码
+            for encoding in ['utf-8', 'gbk', 'latin1']:
+                try:
+                    df = pd.read_csv(file_path, encoding=encoding)
+                    break
+                except UnicodeDecodeError:
+                    continue
+        elif ext in ['.xlsx', '.xls']:
+            df = pd.read_excel(file_path)
+        else:
+            print(f"[WARN] 不支持的文件格式: {ext}")
+            return None
+        
+        # 缓存数据
+        if df is not None and app_config.data_cache_enabled:
+            data_cache.set(cache_key, df)
+            print(f"[OK] 数据已缓存: {os.path.basename(file_path)}")
+        
+        return df
+        
+    except Exception as e:
+        print(f"[ERROR] 加载数据失败: {e}")
+        return None
--- a/utils/data_quality.py
+++ b/utils/data_quality.py
@@ -0,0 +1,224 @@
+# -*- coding: utf-8 -*-
+"""
+数据质量检查模块 - 自动评估数据质量并提供改进建议
+"""
+
+import pandas as pd
+import numpy as np
+from typing import Dict, List, Tuple, Any
+from dataclasses import dataclass
+
+
+@dataclass
+class QualityIssue:
+    """数据质量问题"""
+    column: str
+    issue_type: str  # missing, duplicate, outlier, type_mismatch等
+    severity: str  # high, medium, low
+    description: str
+    suggestion: str
+
+
+class DataQualityChecker:
+    """数据质量检查器"""
+    
+    def __init__(self, df: pd.DataFrame):
+        self.df = df
+        self.issues: List[QualityIssue] = []
+        self.quality_score: float = 100.0
+    
+    def check_all(self) -> Dict[str, Any]:
+        """执行所有质量检查"""
+        self.check_missing_values()
+        self.check_duplicates()
+        self.check_data_types()
+        self.check_outliers()
+        self.check_consistency()
+        
+        return self.generate_report()
+    
+    def check_missing_values(self) -> None:
+        """检查缺失值"""
+        for col in self.df.columns:
+            missing_count = self.df[col].isnull().sum()
+            missing_ratio = (missing_count / len(self.df)) * 100
+            
+            if missing_ratio > 50:
+                severity = "high"
+                self.quality_score -= 10
+            elif missing_ratio > 20:
+                severity = "medium"
+                self.quality_score -= 5
+            elif missing_ratio > 0:
+                severity = "low"
+                self.quality_score -= 2
+            else:
+                continue
+            
+            issue = QualityIssue(
+                column=col,
+                issue_type="missing",
+                severity=severity,
+                description=f"列 '{col}' 存在 {missing_count} 个缺失值 ({missing_ratio:.1f}%)",
+                suggestion=self._suggest_missing_handling(col, missing_ratio)
+            )
+            self.issues.append(issue)
+    
+    def check_duplicates(self) -> None:
+        """检查重复数据"""
+        duplicate_count = self.df.duplicated().sum()
+        if duplicate_count > 0:
+            duplicate_ratio = (duplicate_count / len(self.df)) * 100
+            
+            severity = "high" if duplicate_ratio > 10 else "medium"
+            self.quality_score -= 5 if severity == "high" else 3
+            
+            issue = QualityIssue(
+                column="全表",
+                issue_type="duplicate",
+                severity=severity,
+                description=f"发现 {duplicate_count} 行重复数据 ({duplicate_ratio:.1f}%)",
+                suggestion="建议使用 df.drop_duplicates() 删除重复行，或检查是否为合理的重复记录"
+            )
+            self.issues.append(issue)
+    
+    def check_data_types(self) -> None:
+        """检查数据类型一致性"""
+        for col in self.df.columns:
+            # 检查是否有数值列被识别为object
+            if self.df[col].dtype == 'object':
+                try:
+                    # 尝试转换为数值
+                    pd.to_numeric(self.df[col].dropna(), errors='raise')
+                    
+                    issue = QualityIssue(
+                        column=col,
+                        issue_type="type_mismatch",
+                        severity="medium",
+                        description=f"列 '{col}' 当前为文本类型，但可以转换为数值类型",
+                        suggestion=f"建议使用 df['{col}'] = pd.to_numeric(df['{col}']) 转换类型"
+                    )
+                    self.issues.append(issue)
+                    self.quality_score -= 3
+                except:
+                    pass
+    
+    def check_outliers(self) -> None:
+        """检查数值列的异常值"""
+        numeric_cols = self.df.select_dtypes(include=[np.number]).columns
+        
+        for col in numeric_cols:
+            q1 = self.df[col].quantile(0.25)
+            q3 = self.df[col].quantile(0.75)
+            iqr = q3 - q1
+            
+            lower_bound = q1 - 3 * iqr
+            upper_bound = q3 + 3 * iqr
+            
+            outliers = self.df[(self.df[col] < lower_bound) | (self.df[col] > upper_bound)]
+            outlier_count = len(outliers)
+            
+            if outlier_count > 0:
+                outlier_ratio = (outlier_count / len(self.df)) * 100
+                
+                if outlier_ratio > 5:
+                    severity = "medium"
+                    self.quality_score -= 3
+                else:
+                    severity = "low"
+                    self.quality_score -= 1
+                
+                issue = QualityIssue(
+                    column=col,
+                    issue_type="outlier",
+                    severity=severity,
+                    description=f"列 '{col}' 存在 {outlier_count} 个异常值 ({outlier_ratio:.1f}%)",
+                    suggestion=f"建议检查 {lower_bound:.2f} 以下和 {upper_bound:.2f} 以上的值是否合理"
+                )
+                self.issues.append(issue)
+    
+    def check_consistency(self) -> None:
+        """检查数据一致性"""
+        # 检查时间列的时序性
+        datetime_cols = self.df.select_dtypes(include=['datetime64']).columns
+        
+        for col in datetime_cols:
+            if not self.df[col].is_monotonic_increasing:
+                issue = QualityIssue(
+                    column=col,
+                    issue_type="consistency",
+                    severity="medium",
+                    description=f"时间列 '{col}' 不是单调递增的，可能存在乱序",
+                    suggestion=f"建议使用 df.sort_values('{col}') 进行排序"
+                )
+                self.issues.append(issue)
+                self.quality_score -= 3
+    
+    def _suggest_missing_handling(self, col: str, missing_ratio: float) -> str:
+        """建议缺失值处理方法"""
+        if missing_ratio > 70:
+            return f"缺失比例过高，建议删除列 '{col}'"
+        elif missing_ratio > 30:
+            return f"建议填充或删除缺失值：使用中位数/众数填充或删除含缺失值的行"
+        else:
+            if pd.api.types.is_numeric_dtype(self.df[col]):
+                return f"建议使用均值/中位数填充：df['{col}'].fillna(df['{col}'].median())"
+            else:
+                return f"建议使用众数填充：df['{col}'].fillna(df['{col}'].mode()[0])"
+    
+    def generate_report(self) -> Dict[str, Any]:
+        """生成质量报告"""
+        # 确保质量分数在0-100之间
+        self.quality_score = max(0, min(100, self.quality_score))
+        
+        # 按严重程度分类
+        high_issues = [i for i in self.issues if i.severity == "high"]
+        medium_issues = [i for i in self.issues if i.severity == "medium"]
+        low_issues = [i for i in self.issues if i.severity == "low"]
+        
+        return {
+            "quality_score": round(self.quality_score, 2),
+            "total_issues": len(self.issues),
+            "high_severity": len(high_issues),
+            "medium_severity": len(medium_issues),
+            "low_severity": len(low_issues),
+            "issues": self.issues,
+            "summary": self._generate_summary()
+        }
+    
+    def _generate_summary(self) -> str:
+        """生成可读的摘要"""
+        summary = f"## 数据质量报告\n\n"
+        summary += f"**质量评分**: {self.quality_score:.1f}/100\n\n"
+        
+        if self.quality_score >= 90:
+            summary += "[OK] **评级**: 优秀 - 数据质量很好\n\n"
+        elif self.quality_score >= 75:
+            summary += "[WARN] **评级**: 良好 - 存在一些小问题\n\n"
+        elif self.quality_score >= 60:
+            summary += "[WARN] **评级**: 一般 - 需要处理多个问题\n\n"
+        else:
+            summary += "[ERROR] **评级**: 差 - 数据质量问题严重\n\n"
+        
+        summary += f"**问题统计**: 共 {len(self.issues)} 个质量问题\n"
+        summary += f"- [RED] 高严重性: {len([i for i in self.issues if i.severity == 'high'])} 个\n"
+        summary += f"- [YELLOW] 中严重性: {len([i for i in self.issues if i.severity == 'medium'])} 个\n"
+        summary += f"- [GREEN] 低严重性: {len([i for i in self.issues if i.severity == 'low'])} 个\n\n"
+        
+        if self.issues:
+            summary += "### 主要问题:\n\n"
+            # 只显示高和中严重性的问题
+            for issue in self.issues:
+                if issue.severity in ["high", "medium"]:
+                    emoji = "[RED]" if issue.severity == "high" else "[YELLOW]"
+                    summary += f"{emoji} **{issue.column}** - {issue.description}\n"
+                    summary += f"   [TIP] {issue.suggestion}\n\n"
+        
+        return summary
+
+
+def quick_quality_check(df: pd.DataFrame) -> str:
+    """快速数据质量检查"""
+    checker = DataQualityChecker(df)
+    report = checker.check_all()
+    return report['summary']
--- a/utils/extract_code.py
+++ b/utils/extract_code.py
@@ -29,6 +29,22 @@ def extract_code_from_response(response: str) -> Optional[str]:
            end = response.find('```', start)
            if end != -1:
                return response[start:end].strip()
+        
+        # 尝试提取 code: | 形式的代码块（针对YAML格式错误但结构清晰的情况）
+        import re
+        # 匹配 code: | 后面的内容，直到遇到下一个键（next_key:）或结尾
+        # 假设代码块至少缩进2个空格
+        pattern = r'code:\s*\|\s*\n((?: {2,}.*\n?)+)'
+        match = re.search(pattern, response)
+        if match:
+            code_block = match.group(1)
+            # 尝试去除公共缩进
+            try:
+                import textwrap
+                return textwrap.dedent(code_block).strip()
+            except:
+                return code_block.strip()
+
        elif '```' in response:
            start = response.find('```') + 3
            end = response.find('```', start)
--- a/utils/fallback_openai_client.py
+++ b/utils/fallback_openai_client.py
@@ -57,7 +57,7 @@ class AsyncFallbackOpenAIClient:
            self.fallback_client = AsyncOpenAI(api_key=fallback_api_key, base_url=fallback_base_url, **_fallback_args)
            self.fallback_model_name = fallback_model_name
        else:
-            print("⚠️ 警告: 未完全配置备用 API 客户端。如果主 API 失败，将无法进行回退。")
+            print("[WARN] 警告: 未完全配置备用 API 客户端。如果主 API 失败，将无法进行回退。")

        self.content_filter_error_code = content_filter_error_code
        self.content_filter_error_field = content_filter_error_field
@@ -90,35 +90,60 @@ class AsyncFallbackOpenAIClient:
                return completion
            except (APIConnectionError, APITimeoutError) as e: # 通常可以重试的网络错误
                last_exception = e
-                print(f"⚠️ {api_name} API 调用时发生可重试错误 ({type(e).__name__}): {e}. 尝试次数 {attempt + 1}/{max_retries + 1}")
+                print(f"[WARN] {api_name} API 调用时发生可重试错误 ({type(e).__name__}): {e}. 尝试次数 {attempt + 1}/{max_retries + 1}")
                if attempt < max_retries:
                    await asyncio.sleep(self.retry_delay_seconds * (attempt + 1)) # 增加延迟
                else:
-                    print(f"❌ {api_name} API 在达到最大重试次数后仍然失败。")
+                    print(f"[ERROR] {api_name} API 在达到最大重试次数后仍然失败。")
            except APIStatusError as e: # API 返回的特定状态码错误
                is_content_filter_error = False
-                if e.status_code == 400:
-                    try:
-                        error_json = e.response.json()
-                        error_details = error_json.get("error", {})
-                        if (error_details.get("code") == self.content_filter_error_code and
-                            self.content_filter_error_field in error_json):
-                            is_content_filter_error = True
-                    except Exception:
-                        pass # 解析错误响应失败，不认为是内容过滤错误
+                retry_after = None
+                
+                # 尝试解析错误详情以获取更多信息（如 Google RPC RetryInfo）
+                try:
+                    error_json = e.response.json()
+                    error_details = error_json.get("error", {})
+                    
+                    # 检查内容过滤错误（针对特定服务商）
+                    if (error_details.get("code") == self.content_filter_error_code and
+                        self.content_filter_error_field in error_json):
+                        is_content_filter_error = True
+                    
+                    # 检查 Google RPC RetryInfo
+                    # 格式示例: {'error': {'details': [{'@type': 'type.googleapis.com/google.rpc.RetryInfo', 'retryDelay': '38s'}]}}
+                    if "details" in error_details:
+                        for detail in error_details["details"]:
+                            if detail.get("@type") == "type.googleapis.com/google.rpc.RetryInfo":
+                                delay_str = detail.get("retryDelay", "")
+                                if delay_str.endswith("s"):
+                                    try:
+                                        retry_after = float(delay_str[:-1])
+                                        print(f"[TIMER] 收到服务器 RetryInfo，等待时间: {retry_after}秒")
+                                    except ValueError:
+                                        pass
+                except Exception:
+                    pass # 解析错误响应失败，忽略
                
                if is_content_filter_error and api_name == "主": # 如果是主 API 的内容过滤错误，则直接抛出以便回退
                    raise e 
                
                last_exception = e
-                print(f"⚠️ {api_name} API 调用时发生 APIStatusError ({e.status_code}): {e}. 尝试次数 {attempt + 1}/{max_retries + 1}")
+                print(f"[WARN] {api_name} API 调用时发生 APIStatusError ({e.status_code}): {e}. 尝试次数 {attempt + 1}/{max_retries + 1}")
+                
                if attempt < max_retries:
-                    await asyncio.sleep(self.retry_delay_seconds * (attempt + 1))
+                    # 如果获取到了明确的 retry_after，则使用它；否则使用默认的指数退避
+                    wait_time = retry_after if retry_after is not None else (self.retry_delay_seconds * (attempt + 1))
+                    # 如果是 429 Too Many Requests 且没有解析出 retry_after，建议加大等待时间
+                    if e.status_code == 429 and retry_after is None:
+                        wait_time = max(wait_time, 5.0 * (attempt + 1)) # 429 默认至少等 5 秒
+                        
+                    print(f"[WAIT] 将等待 {wait_time:.2f} 秒后重试...")
+                    await asyncio.sleep(wait_time)
                else:
-                    print(f"❌ {api_name} API 在达到最大重试次数后仍然失败 (APIStatusError)。")
+                    print(f"[ERROR] {api_name} API 在达到最大重试次数后仍然失败 (APIStatusError)。")
            except APIError as e: # 其他不可轻易重试的 OpenAI 错误
                last_exception = e
-                print(f"❌ {api_name} API 调用时发生不可重试错误 ({type(e).__name__}): {e}")
+                print(f"[ERROR] {api_name} API 调用时发生不可重试错误 ({type(e).__name__}): {e}")
                break # 不再重试此类错误
        
        if last_exception:
@@ -171,7 +196,7 @@ class AsyncFallbackOpenAIClient:
                    pass 
            
            if is_content_filter_error and self.fallback_client and self.fallback_model_name:
-                print(f"ℹ️ 主 API 内容过滤错误 ({e_primary.status_code})。尝试切换到备用 API ({self.fallback_client.base_url})...")
+                print(f"[INFO] 主 API 内容过滤错误 ({e_primary.status_code})。尝试切换到备用 API ({self.fallback_client.base_url})...")
                try:
                    fallback_completion = await self._attempt_api_call(
                        client=self.fallback_client,
@@ -181,20 +206,20 @@ class AsyncFallbackOpenAIClient:
                        api_name="备用",
                        **kwargs.copy()
                    )
-                    print(f"✅ 备用 API 调用成功。")
+                    print(f"[OK] 备用 API 调用成功。")
                    return fallback_completion
                except APIError as e_fallback:
-                    print(f"❌ 备用 API 调用最终失败: {type(e_fallback).__name__} - {e_fallback}")
+                    print(f"[ERROR] 备用 API 调用最终失败: {type(e_fallback).__name__} - {e_fallback}")
                    raise e_fallback 
            else:
                if not (self.fallback_client and self.fallback_model_name and is_content_filter_error):
                     # 如果不是内容过滤错误，或者没有可用的备用API，则记录主API的原始错误
-                    print(f"ℹ️ 主 API 错误 ({type(e_primary).__name__}: {e_primary}), 且不满足备用条件或备用API未配置。")
+                    print(f"[INFO] 主 API 错误 ({type(e_primary).__name__}: {e_primary}), 且不满足备用条件或备用API未配置。")
                raise e_primary
        except APIError as e_primary_other: 
-            print(f"❌ 主 API 调用最终失败 (非内容过滤，错误类型: {type(e_primary_other).__name__}): {e_primary_other}")
+            print(f"[ERROR] 主 API 调用最终失败 (非内容过滤，错误类型: {type(e_primary_other).__name__}): {e_primary_other}")
            if self.fallback_client and self.fallback_model_name:
-                print(f"ℹ️ 主 API 失败，尝试切换到备用 API ({self.fallback_client.base_url})...")
+                print(f"[INFO] 主 API 失败，尝试切换到备用 API ({self.fallback_client.base_url})...")
                try:
                    fallback_completion = await self._attempt_api_call(
                        client=self.fallback_client,
@@ -204,10 +229,10 @@ class AsyncFallbackOpenAIClient:
                        api_name="备用",
                        **kwargs.copy()
                    )
-                    print(f"✅ 备用 API 调用成功。")
+                    print(f"[OK] 备用 API 调用成功。")
                    return fallback_completion
                except APIError as e_fallback_after_primary_fail:
-                    print(f"❌ 备用 API 在主 API 失败后也调用失败: {type(e_fallback_after_primary_fail).__name__} - {e_fallback_after_primary_fail}")
+                    print(f"[ERROR] 备用 API 在主 API 失败后也调用失败: {type(e_fallback_after_primary_fail).__name__} - {e_fallback_after_primary_fail}")
                    raise e_fallback_after_primary_fail 
            else: 
                raise e_primary_other
--- a/utils/format_execution_result.py
+++ b/utils/format_execution_result.py
@@ -7,17 +7,17 @@ def format_execution_result(result: Dict[str, Any]) -> str:
    feedback = []
    
    if result['success']:
-        feedback.append("✅ 代码执行成功")
+        feedback.append("[OK] 代码执行成功")
        
        if result['output']:
-            feedback.append(f"📊 输出结果：\n{result['output']}")
+            feedback.append(f"[CHART] 输出结果：\n{result['output']}")
        
        if result.get('variables'):
-            feedback.append("📋 新生成的变量：")
+            feedback.append("[LIST] 新生成的变量：")
            for var_name, var_info in result['variables'].items():
                feedback.append(f"  - {var_name}: {var_info}")
    else:
-        feedback.append("❌ 代码执行失败")
+        feedback.append("[ERROR] 代码执行失败")
        feedback.append(f"错误信息: {result['error']}")
        if result['output']:
            feedback.append(f"部分输出: {result['output']}")
--- a/utils/llm_helper.py
+++ b/utils/llm_helper.py
@@ -5,8 +5,17 @@ LLM调用辅助模块

 import asyncio
 import yaml
+from typing import Optional, Callable, AsyncIterator
 from config.llm_config import LLMConfig
+from config.app_config import app_config
 from utils.fallback_openai_client import AsyncFallbackOpenAIClient
+from utils.cache_manager import LLMCacheManager
+
+# 初始化LLM缓存管理器
+llm_cache = LLMCacheManager(
+    cache_dir=app_config.llm_cache_dir,
+    enabled=app_config.llm_cache_enabled
+)

 class LLMHelper:
    """LLM调用辅助类，支持同步和异步调用"""
@@ -75,12 +84,111 @@ class LLMHelper:
            else:
                yaml_content = response.strip()
            
-            return yaml.safe_load(yaml_content)
+            parsed = yaml.safe_load(yaml_content)
+            return parsed if parsed is not None else {}
        except Exception as e:
            print(f"YAML解析失败: {e}")
            print(f"原始响应: {response}")
            return {}
    
+    
    async def close(self):
        """关闭客户端"""
-        await self.client.close()
+        await self.client.close()
+    
+    async def async_call_with_cache(
+        self, 
+        prompt: str, 
+        system_prompt: str = None, 
+        max_tokens: int = None, 
+        temperature: float = None,
+        use_cache: bool = True
+    ) -> str:
+        """带缓存的异步LLM调用"""
+        messages = []
+        if system_prompt:
+            messages.append({"role": "system", "content": system_prompt})
+        messages.append({"role": "user", "content": prompt})
+        
+        # 生成缓存键
+        cache_key = llm_cache.get_cache_key_from_messages(messages, self.config.model)
+        
+        # 尝试从缓存获取
+        if use_cache and app_config.llm_cache_enabled:
+            cached_response = llm_cache.get(cache_key)
+            if cached_response:
+                print("[CACHE] 使用LLM缓存响应")
+                return cached_response
+        
+        # 调用LLM
+        response = await self.async_call(prompt, system_prompt, max_tokens, temperature)
+        
+        # 缓存响应
+        if use_cache and app_config.llm_cache_enabled and response:
+            llm_cache.set(cache_key, response)
+        
+        return response
+    
+    def call_with_cache(
+        self, 
+        prompt: str, 
+        system_prompt: str = None, 
+        max_tokens: int = None, 
+        temperature: float = None,
+        use_cache: bool = True
+    ) -> str:
+        """带缓存的同步LLM调用"""
+        try:
+            loop = asyncio.get_event_loop()
+        except RuntimeError:
+            loop = asyncio.new_event_loop()
+            asyncio.set_event_loop(loop)
+            
+        import nest_asyncio
+        nest_asyncio.apply()
+        
+        return loop.run_until_complete(
+            self.async_call_with_cache(prompt, system_prompt, max_tokens, temperature, use_cache)
+        )
+    
+    async def async_call_stream(
+        self, 
+        prompt: str, 
+        system_prompt: str = None,
+        max_tokens: int = None,
+        temperature: float = None,
+        callback: Optional[Callable[[str], None]] = None
+    ) -> AsyncIterator[str]:
+        """流式异步LLM调用"""
+        messages = []
+        if system_prompt:
+            messages.append({"role": "system", "content": system_prompt})
+        messages.append({"role": "user", "content": prompt})
+        
+        kwargs = {
+            'stream': True,
+            'max_tokens': max_tokens or self.config.max_tokens,
+            'temperature': temperature or self.config.temperature
+        }
+        
+        try:
+            response = await self.client.chat_completions_create(
+                messages=messages,
+                **kwargs
+            )
+            
+            full_response = ""
+            async for chunk in response:
+                if chunk.choices[0].delta.content:
+                    content = chunk.choices[0].delta.content
+                    full_response += content
+                    
+                    # 调用回调函数
+                    if callback:
+                        callback(content)
+                    
+                    yield content
+            
+        except Exception as e:
+            print(f"流式LLM调用失败: {e}")
+            yield ""
--- a/utils/script_generator.py
+++ b/utils/script_generator.py
@@ -0,0 +1,215 @@
+# -*- coding: utf-8 -*-
+"""
+可复用脚本生成器
+
+从分析会话的执行历史中提取成功执行的代码，
+合并去重后生成可独立运行的 .py 脚本文件。
+"""
+
+import os
+import re
+from datetime import datetime
+from typing import List, Dict, Any, Set
+
+
+def extract_imports(code: str) -> Set[str]:
+    """从代码中提取所有 import 语句"""
+    imports = set()
+    lines = code.split('\n')
+    for line in lines:
+        stripped = line.strip()
+        if stripped.startswith('import ') or stripped.startswith('from '):
+            # 标准化 import 语句
+            imports.add(stripped)
+    return imports
+
+
+def remove_imports(code: str) -> str:
+    """从代码中移除所有 import 语句"""
+    lines = code.split('\n')
+    result_lines = []
+    for line in lines:
+        stripped = line.strip()
+        if not stripped.startswith('import ') and not stripped.startswith('from '):
+            result_lines.append(line)
+    return '\n'.join(result_lines)
+
+
+def clean_code_block(code: str) -> str:
+    """清理代码块，移除不必要的内容"""
+    # 移除可能的重复配置代码
+    patterns_to_skip = [
+        r"plt\.rcParams\['font\.sans-serif'\]",  # 字体配置在模板中统一处理
+        r"plt\.rcParams\['axes\.unicode_minus'\]",
+    ]
+    
+    lines = code.split('\n')
+    result_lines = []
+    skip_until_empty = False
+    
+    for line in lines:
+        stripped = line.strip()
+        
+        # 跳过空行连续的情况
+        if not stripped:
+            if skip_until_empty:
+                skip_until_empty = False
+                continue
+            result_lines.append(line)
+            continue
+        
+        # 检查是否需要跳过的模式
+        should_skip = False
+        for pattern in patterns_to_skip:
+            if re.search(pattern, stripped):
+                should_skip = True
+                break
+        
+        if not should_skip:
+            result_lines.append(line)
+    
+    return '\n'.join(result_lines)
+
+
+def generate_reusable_script(
+    analysis_results: List[Dict[str, Any]],
+    data_files: List[str],
+    session_output_dir: str,
+    user_requirement: str = ""
+) -> str:
+    """
+    从分析结果中生成可复用的 Python 脚本
+    
+    Args:
+        analysis_results: 分析过程中记录的结果列表，每个元素包含 'code', 'result' 等
+        data_files: 原始数据文件路径列表
+        session_output_dir: 会话输出目录
+        user_requirement: 用户的原始需求描述
+    
+    Returns:
+        生成的脚本文件路径
+    """
+    # 收集所有成功执行的代码
+    all_imports = set()
+    code_blocks = []
+    
+    for result in analysis_results:
+        # 只处理 generate_code 类型的结果
+        if result.get("action") == "collect_figures":
+            continue
+            
+        code = result.get("code", "")
+        exec_result = result.get("result", {})
+        
+        # 只收集成功执行的代码
+        if code and exec_result.get("success", False):
+            # 提取 imports
+            imports = extract_imports(code)
+            all_imports.update(imports)
+            
+            # 清理代码块
+            cleaned_code = remove_imports(code)
+            cleaned_code = clean_code_block(cleaned_code)
+            
+            # 只添加非空的代码块
+            if cleaned_code.strip():
+                code_blocks.append({
+                    "round": result.get("round", 0),
+                    "code": cleaned_code.strip()
+                })
+    
+    if not code_blocks:
+        print("[WARN] 没有成功执行的代码块，跳过脚本生成")
+        return ""
+    
+    # 生成脚本内容
+    now = datetime.now()
+    timestamp = now.strftime("%Y%m%d_%H%M%S")
+    
+    # 构建脚本头部
+    script_header = f'''#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+"""
+数据分析脚本 - 自动生成
+=====================================
+原始数据文件: {', '.join(data_files)}
+生成时间: {now.strftime("%Y-%m-%d %H:%M:%S")}
+原始需求: {user_requirement[:200] + '...' if len(user_requirement) > 200 else user_requirement}
+=====================================
+
+使用方法:
+1. 修改下方 DATA_FILES 列表中的文件路径
+2. 修改 OUTPUT_DIR 指定输出目录
+3. 运行: python {os.path.basename(session_output_dir)}_分析脚本.py
+"""
+
+import os
+'''
+    
+    # 添加标准 imports（去重后排序）
+    standard_imports = sorted([imp for imp in all_imports if imp.startswith('import ')])
+    from_imports = sorted([imp for imp in all_imports if imp.startswith('from ')])
+    
+    imports_section = '\n'.join(standard_imports + from_imports)
+    
+    # 配置区域
+    config_section = f'''
+# ========== 配置区域 (可修改) ==========
+
+# 数据文件路径 - 修改此处以分析不同的数据
+DATA_FILES = {repr(data_files)}
+
+# 输出目录 - 图片和报告将保存在此目录
+OUTPUT_DIR = "./analysis_output"
+
+# 创建输出目录
+os.makedirs(OUTPUT_DIR, exist_ok=True)
+
+# ========== 字体配置 (中文显示) ==========
+import platform
+import matplotlib.pyplot as plt
+
+system_name = platform.system()
+if system_name == 'Darwin':
+    plt.rcParams['font.sans-serif'] = ['Arial Unicode MS', 'PingFang SC', 'sans-serif']
+elif system_name == 'Windows':
+    plt.rcParams['font.sans-serif'] = ['SimHei', 'Microsoft YaHei', 'sans-serif']
+else:
+    plt.rcParams['font.sans-serif'] = ['WenQuanYi Micro Hei', 'sans-serif']
+plt.rcParams['axes.unicode_minus'] = False
+
+# 设置 session_output_dir 变量（兼容原始代码）
+session_output_dir = OUTPUT_DIR
+'''
+    
+    # 合并代码块
+    code_section = "\n# ========== 分析代码 ==========\n\n"
+    
+    for i, block in enumerate(code_blocks, 1):
+        code_section += f"# --- 第 {block['round']} 轮分析 ---\n"
+        code_section += block['code'] + "\n\n"
+    
+    # 脚本尾部
+    script_footer = '''
+# ========== 完成 ==========
+print("\\n" + "=" * 50)
+print("[OK] 分析完成！")
+print(f"[OUTPUT] 输出目录: {os.path.abspath(OUTPUT_DIR)}")
+print("=" * 50)
+'''
+    
+    # 组装完整脚本
+    full_script = script_header + imports_section + config_section + code_section + script_footer
+    
+    # 保存脚本文件
+    script_filename = f"分析脚本_{timestamp}.py"
+    script_path = os.path.join(session_output_dir, script_filename)
+    
+    try:
+        with open(script_path, 'w', encoding='utf-8') as f:
+            f.write(full_script)
+        print(f"[OK] 可复用脚本已生成: {script_path}")
+        return script_path
+    except Exception as e:
+        print(f"[ERROR] 保存脚本失败: {e}")
+        return ""
--- a/web/main.py
+++ b/web/main.py
@@ -0,0 +1,641 @@
+
+import sys
+import os
+import threading
+import glob
+import uuid
+import json
+from datetime import datetime
+from typing import Optional, Dict, List
+from fastapi import FastAPI, UploadFile, File, BackgroundTasks, HTTPException, Query
+from fastapi.middleware.cors import CORSMiddleware
+from fastapi.staticfiles import StaticFiles
+from fastapi.responses import FileResponse, JSONResponse
+from pydantic import BaseModel
+
+# Add parent directory to path to import agent modules
+sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+
+from data_analysis_agent import DataAnalysisAgent
+from config.llm_config import LLMConfig
+from utils.create_session_dir import create_session_output_dir
+from config.llm_config import LLMConfig
+from utils.create_session_dir import create_session_output_dir
+
+app = FastAPI(title="IOV Data Analysis Agent")
+
+# CORS
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+
+# --- Session Management ---
+
+class SessionData:
+    def __init__(self, session_id: str):
+        self.session_id = session_id
+        self.is_running = False
+        self.output_dir: Optional[str] = None
+        self.generated_report: Optional[str] = None
+        self.log_file: Optional[str] = None
+        self.analysis_results: List[Dict] = [] # Store analysis results for gallery
+        self.agent: Optional[DataAnalysisAgent] = None # Store the agent instance for follow-up
+        
+        # 新增：进度跟踪
+        self.current_round: int = 0
+        self.max_rounds: int = 20
+        self.progress_percentage: float = 0.0
+        self.status_message: str = "等待开始"
+        
+        # 新增：历史记录
+        self.created_at: str = ""
+        self.last_updated: str = ""
+        self.user_requirement: str = ""
+        self.file_list: List[str] = []
+        self.reusable_script: Optional[str] = None  # 新增：可复用脚本路径
+
+
+class SessionManager:
+    def __init__(self):
+        self.sessions: Dict[str, SessionData] = {}
+        self.lock = threading.Lock()
+
+    def create_session(self) -> str:
+        with self.lock:
+            session_id = str(uuid.uuid4())
+            self.sessions[session_id] = SessionData(session_id)
+            return session_id
+
+
+    def get_session(self, session_id: str) -> Optional[SessionData]:
+        if session_id in self.sessions:
+            return self.sessions[session_id]
+            
+        # Fallback: Try to reconstruct from disk for history sessions
+        output_dir = os.path.join("outputs", f"session_{session_id}")
+        if os.path.exists(output_dir) and os.path.isdir(output_dir):
+            return self._reconstruct_session(session_id, output_dir)
+            
+        return None
+
+    def _reconstruct_session(self, session_id: str, output_dir: str) -> SessionData:
+        """从磁盘目录重建会话对象"""
+        session = SessionData(session_id)
+        session.output_dir = output_dir
+        session.is_running = False
+        session.current_round = session.max_rounds
+        session.progress_percentage = 100.0
+        session.status_message = "已完成 (历史记录)"
+        
+        # Recover Log
+        log_path = os.path.join(output_dir, "process.log")
+        if os.path.exists(log_path):
+            session.log_file = log_path
+            
+        # Recover Report
+        # 宽容查找：扫描所有 .md 文件，优先取包含 "report" 或 "报告" 的文件
+        md_files = glob.glob(os.path.join(output_dir, "*.md"))
+        if md_files:
+            # 默认取第一个
+            chosen = md_files[0]
+            # 尝试找更好的匹配
+            for md in md_files:
+                fname = os.path.basename(md).lower()
+                if "report" in fname or "报告" in fname:
+                    chosen = md
+                    break
+            session.generated_report = chosen
+            
+        # Recover Script (查找可能的脚本文件)
+        possible_scripts = ["data_analysis_script.py", "script.py", "analysis_script.py"]
+        for s in possible_scripts:
+            p = os.path.join(output_dir, s)
+            if os.path.exists(p):
+                session.reusable_script = p
+                break
+        
+        # Recover Results (images etc)
+        results_json = os.path.join(output_dir, "results.json")
+        if os.path.exists(results_json):
+            try:
+                with open(results_json, "r") as f:
+                     session.analysis_results = json.load(f)
+            except:
+                pass
+        
+        # Recover Metadata
+        try:
+             stat = os.stat(output_dir)
+             dt = datetime.fromtimestamp(stat.st_ctime)
+             session.created_at = dt.strftime("%Y-%m-%d %H:%M:%S")
+        except:
+             pass
+
+        # Cache it
+        with self.lock:
+             self.sessions[session_id] = session
+             
+        return session
+
+    def list_sessions(self):
+        return list(self.sessions.keys())
+    
+    def delete_session(self, session_id: str) -> bool:
+        """删除指定会话"""
+        with self.lock:
+            if session_id in self.sessions:
+                session = self.sessions[session_id]
+                if session.agent:
+                    session.agent.reset()
+                del self.sessions[session_id]
+                return True
+            return False
+    
+    def get_session_info(self, session_id: str) -> Optional[Dict]:
+        """获取会话详细信息"""
+        session = self.get_session(session_id)
+        if session:
+            return {
+                "session_id": session.session_id,
+                "is_running": session.is_running,
+                "progress": session.progress_percentage,  
+                "status": session.status_message,
+                "current_round": session.current_round,
+                "max_rounds": session.max_rounds,
+                "created_at": session.created_at,
+                "last_updated": session.last_updated,
+                "created_at": session.created_at,
+                "last_updated": session.last_updated,
+                "user_requirement": session.user_requirement[:100] + "..." if len(session.user_requirement) > 100 else session.user_requirement,
+                "script_path": session.reusable_script  # 新增：返回脚本路径
+            }
+        return None
+
+session_manager = SessionManager()
+
+# Mount static files
+os.makedirs("web/static", exist_ok=True)
+os.makedirs("uploads", exist_ok=True)
+os.makedirs("outputs", exist_ok=True)
+
+app.mount("/static", StaticFiles(directory="web/static"), name="static")
+app.mount("/outputs", StaticFiles(directory="outputs"), name="outputs")
+
+# --- Helper Functions ---
+
+def run_analysis_task(session_id: str, files: list, user_requirement: str, is_followup: bool = False):
+    """
+    Runs the analysis agent in a background thread for a specific session.
+    """
+    session = session_manager.get_session(session_id)
+    if not session:
+        print(f"Error: Session {session_id} not found in background task.")
+        return
+
+    session.is_running = True
+    try:
+        # Create session directory if not exists (for follow-up it should accept existing)
+        base_output_dir = "outputs"
+        
+        if not session.output_dir:
+             session.output_dir = create_session_output_dir(base_output_dir, user_requirement)
+        
+        session_output_dir = session.output_dir
+        
+        # Initialize Log capturing
+        session.log_file = os.path.join(session_output_dir, "process.log")
+        
+        # Thread-safe logging requires a bit of care. 
+        # Since we are running in a thread, redirecting sys.stdout globally is BAD for multi-session.
+        # However, for this MVP, if we run multiple sessions concurrently, their logs will mix in stdout.
+        # BUT we are writing to specific log files.
+        # We need a logger that writes to the session's log file.
+        # And the Agent needs to use that logger.
+        # Currently the Agent uses print(). 
+        # To support true concurrent logging without mixing, we'd need to refactor Agent to use a logger instance.
+        # LIMITATION: For now, we accept that stdout redirection intercepts EVERYTHING. 
+        # So multiple concurrent sessions is risky with global stdout redirection.
+        # A safer approach for now: We won't redirect stdout globally for multi-session support 
+        # unless we lock execution to one at a time.
+        # OR: We just rely on the fact that we might only run one analysis at a time mostly.
+        # Let's try to just write to the log file explicitly if we could, but we can't change Agent easily right now.
+        # Compromise: We will continue to use global redirection but acknowledge it's not thread-safe for output.
+        # A better way: Modify Agent to accept a 'log_callback'.
+        # For this refactor, let's stick to the existing pattern but bind it to the thread if possible? No.
+        
+        # We will wrap the execution with a simple File Logger that appends to the distinct file.
+        # But sys.stdout is global.
+        # We will assume single concurrent analysis for safety, or accept mixed terminal output but separate file logs?
+        # Actually, if we swap sys.stdout, it affects all threads.
+        # So we MUST NOT swap sys.stdout if we want concurrency.
+        # If we don't swap stdout, we don't capture logs to file unless Agent does it.
+        # The Agent code has `print`. 
+        # Correct fix: Refactor Agent to use `logging` module or pass a printer.
+        # Given the scope, let's just hold the lock (serialize execution) OR allow mixing in terminal 
+        # but try to capture to file?
+        # Let's just write to the file.
+        
+        # Let's just write to the file.
+        
+        with open(session.log_file, "a" if is_followup else "w", encoding="utf-8") as f:
+            if is_followup:
+                f.write(f"\n--- Follow-up Session {session_id} Continued ---\n")
+            else:
+                f.write(f"--- Session {session_id} Started ---\n")
+
+        # We will create a custom print function that writes to the file
+        # And monkeypatch builtins.print? No, that's too hacky.
+        # Let's just use the stdout redirector, but acknowledge only one active session at a time is safe.
+        # We can implement a crude lock for now.
+        
+        class FileLogger:
+            def __init__(self, filename):
+                self.terminal = sys.__stdout__
+                self.log = open(filename, "a", encoding="utf-8", buffering=1)
+                
+            def write(self, message):
+                self.terminal.write(message)
+                self.log.write(message)
+                
+            def flush(self):
+                self.terminal.flush()
+                self.log.flush()
+                
+            def close(self):
+                self.log.close()
+
+        logger = FileLogger(session.log_file)
+        sys.stdout = logger # Global hijack!
+        
+        try:
+            if not is_followup:
+                llm_config = LLMConfig()
+                agent = DataAnalysisAgent(llm_config, force_max_rounds=False, output_dir=base_output_dir)
+                session.agent = agent
+                
+                result = agent.analyze(
+                    user_input=user_requirement,
+                    files=files,
+                    session_output_dir=session_output_dir,
+                    reset_session=True
+                )
+            else:
+                agent = session.agent
+                if not agent:
+                     print("Error: Agent not initialized for follow-up.")
+                     return
+
+                result = agent.analyze(
+                    user_input=user_requirement,
+                    files=None,
+                    session_output_dir=session_output_dir,
+                    reset_session=False,
+                    max_rounds=10
+                )
+            
+            session.generated_report = result.get("report_file_path", None)
+            session.analysis_results = result.get("analysis_results", [])
+            session.reusable_script = result.get("reusable_script_path", None)  # 新增：保存脚本路径
+            
+            # Save results to json for persistence
+            with open(os.path.join(session_output_dir, "results.json"), "w") as f:
+                json.dump(session.analysis_results, f, default=str)
+            
+        except Exception as e:
+            print(f"Error during analysis: {e}")
+        finally:
+            sys.stdout = logger.terminal
+            logger.close()
+            
+    except Exception as e:
+        print(f"System Error: {e}")
+    finally:
+        session.is_running = False
+
+# --- Pydantic Models ---
+
+class StartRequest(BaseModel):
+    requirement: str
+
+class ChatRequest(BaseModel):
+    session_id: str
+    message: str
+
+# --- API Endpoints ---
+
+@app.get("/")
+async def read_root():
+    return FileResponse("web/static/index.html")
+
+@app.post("/api/upload")
+async def upload_files(files: list[UploadFile] = File(...)):
+    saved_files = []
+    for file in files:
+        file_location = f"uploads/{file.filename}"
+        with open(file_location, "wb+") as file_object:
+            file_object.write(file.file.read())
+        saved_files.append(file_location)
+    return {"info": f"Saved {len(saved_files)} files", "paths": saved_files}
+
+@app.post("/api/start")
+async def start_analysis(request: StartRequest, background_tasks: BackgroundTasks):
+    session_id = session_manager.create_session()
+    
+    files = glob.glob("uploads/*.csv")
+    if not files:
+        if os.path.exists("cleaned_data.csv"):
+            files = ["cleaned_data.csv"]
+        else:
+            raise HTTPException(status_code=400, detail="No CSV files found")
+            
+    files = [os.path.abspath(f) for f in files] # Only use absolute paths
+    
+    background_tasks.add_task(run_analysis_task, session_id, files, request.requirement, is_followup=False)
+    return {"status": "started", "session_id": session_id}
+
+@app.post("/api/chat")
+async def chat_analysis(request: ChatRequest, background_tasks: BackgroundTasks):
+    session = session_manager.get_session(request.session_id)
+    if not session:
+        raise HTTPException(status_code=404, detail="Session not found")
+        
+    if session.is_running:
+         raise HTTPException(status_code=400, detail="Analysis already in progress")
+         
+    background_tasks.add_task(run_analysis_task, request.session_id, [], request.message, is_followup=True)
+    return {"status": "started"}
+
+@app.get("/api/status")
+async def get_status(session_id: str = Query(..., description="Session ID")):
+    session = session_manager.get_session(session_id)
+    if not session:
+        raise HTTPException(status_code=404, detail="Session not found")
+        
+    log_content = ""
+    if session.log_file and os.path.exists(session.log_file):
+        with open(session.log_file, "r", encoding="utf-8") as f:
+            log_content = f.read()
+    
+    return {
+        "is_running": session.is_running,
+        "log": log_content,
+        "has_report": session.generated_report is not None,
+        "report_path": session.generated_report,
+        "script_path": session.reusable_script  # 新增：返回脚本路径
+    }
+
+@app.get("/api/export")
+async def export_session(session_id: str = Query(..., description="Session ID")):
+    session = session_manager.get_session(session_id)
+    if not session:
+        raise HTTPException(status_code=404, detail="Session not found")
+    
+    if not session.output_dir or not os.path.exists(session.output_dir):
+        raise HTTPException(status_code=404, detail="No data available for export")
+        
+    # Create a zip file
+    import shutil
+    
+    # We want to zip the contents of session_output_dir
+    # Zip path should be outside to avoid recursive zipping if inside
+    zip_base_name = os.path.join("outputs", f"export_{session_id}")
+    
+    # shutil.make_archive expects base_name (without extension) and root_dir
+    archive_path = shutil.make_archive(zip_base_name, 'zip', session.output_dir)
+    
+    return FileResponse(archive_path, media_type='application/zip', filename=f"analysis_export_{session_id}.zip")
+
+@app.get("/api/report")
+async def get_report(session_id: str = Query(..., description="Session ID")):
+    session = session_manager.get_session(session_id)
+    if not session:
+         raise HTTPException(status_code=404, detail="Session not found")
+         
+    if not session.generated_report or not os.path.exists(session.generated_report):
+         return {"content": "Report not ready."}
+         
+    with open(session.generated_report, "r", encoding="utf-8") as f:
+        content = f.read()
+        
+    # Fix image paths
+    relative_session_path = os.path.relpath(session.output_dir, os.getcwd()) 
+    web_base_path = f"/{relative_session_path}"
+    
+    # Robust image path replacement
+    # 1. Replace explicit relative paths ./image.png
+    content = content.replace("](./", f"]({web_base_path}/")
+    
+    # 2. Replace naked paths that might be generated like ](image.png) but NOT ](http...) or ](/...)
+    import re
+    def replace_link(match):
+        alt = match.group(1)
+        url = match.group(2)
+        if url.startswith("http") or url.startswith("/") or url.startswith("data:"):
+            return match.group(0)
+        # Remove ./ if exists again just in case
+        clean_url = url.lstrip("./")
+        return f"![{alt}]({web_base_path}/{clean_url})"
+        
+    content = re.sub(r'!\[(.*?)\]\((.*?)\)', replace_link, content)
+    
+    return {"content": content, "base_path": web_base_path}
+
+@app.get("/api/figures")
+async def get_figures(session_id: str = Query(..., description="Session ID")):
+    session = session_manager.get_session(session_id)
+    if not session:
+         raise HTTPException(status_code=404, detail="Session not found")
+    
+    # We can try to get from memory first
+    results = session.analysis_results
+    
+    # If empty in memory (maybe server restarted but files exist?), try load json
+    if not results and session.output_dir:
+        json_path = os.path.join(session.output_dir, "results.json")
+        if os.path.exists(json_path):
+            with open(json_path, 'r') as f:
+                results = json.load(f)
+
+    # Extract collected figures
+    figures = []
+    
+    # We iterate over analysis results to find 'collect_figures' actions
+    if results:
+        for item in results:
+            if item.get("action") == "collect_figures":
+                collected = item.get("collected_figures", [])
+                for fig in collected:
+                    # Enrich with web path
+                    if session.output_dir:
+                         # Assume filename is present
+                         fname = fig.get("filename")
+                         relative_session_path = os.path.relpath(session.output_dir, os.getcwd()) 
+                         fig["web_url"] = f"/{relative_session_path}/{fname}"
+                    figures.append(fig)
+            
+            # Also check for 'generate_code' results that might have implicit figures if we parse them
+            # But the 'collect_figures' action is the reliable source as per agent design
+            
+    # Auto-discovery fallback if list is empty but pngs exist?
+    if not figures and session.output_dir:
+        # Simple scan
+        pngs = glob.glob(os.path.join(session.output_dir, "*.png"))
+        for p in pngs:
+            fname = os.path.basename(p)
+            relative_session_path = os.path.relpath(session.output_dir, os.getcwd())
+            figures.append({
+                "filename": fname,
+                "description": "Auto-discovered image",
+                "analysis": "No analysis available",
+                "web_url": f"/{relative_session_path}/{fname}"
+            })
+
+    return {"figures": figures}
+
+@app.get("/api/export")
+async def export_report(session_id: str = Query(..., description="Session ID")):
+    session = session_manager.get_session(session_id)
+    if not session or not session.output_dir:
+        raise HTTPException(status_code=404, detail="Session not found")
+        
+    import zipfile
+    import tempfile
+    from datetime import datetime
+    
+    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
+    zip_filename = f"report_{timestamp}.zip"
+    
+    export_dir = "outputs"
+    os.makedirs(export_dir, exist_ok=True)
+    temp_zip_path = os.path.join(export_dir, zip_filename)
+    
+    with zipfile.ZipFile(temp_zip_path, "w", zipfile.ZIP_DEFLATED) as zf:
+        for root, dirs, files in os.walk(session.output_dir):
+            for file in files:
+                if file.endswith(('.md', '.png', '.csv', '.log', '.json', '.yaml')):
+                    abs_path = os.path.join(root, file)
+                    rel_path = os.path.relpath(abs_path, session.output_dir)
+                    zf.write(abs_path, arcname=rel_path)
+                    
+    return FileResponse(
+        path=temp_zip_path, 
+        filename=zip_filename,
+        media_type='application/zip'
+    )
+
+@app.get("/api/download_script")
+async def download_script(session_id: str = Query(..., description="Session ID")):
+    """下载生成的Python脚本"""
+    session = session_manager.get_session(session_id)
+    if not session or not session.reusable_script:
+        raise HTTPException(status_code=404, detail="Script not found")
+        
+    if not os.path.exists(session.reusable_script):
+        raise HTTPException(status_code=404, detail="Script file missing on server")
+        
+    return FileResponse(
+        path=session.reusable_script,
+        filename=os.path.basename(session.reusable_script),
+        media_type='text/x-python'
+    )
+
+# --- Tools API ---
+
+
+
+# --- 新增API端点 ---
+
+@app.get("/api/sessions/progress")
+async def get_session_progress(session_id: str = Query(..., description="Session ID")):
+    """获取会话分析进度"""
+    session_info = session_manager.get_session_info(session_id)
+    if not session_info:
+        raise HTTPException(status_code=404, detail="Session not found")
+    return session_info
+
+
+@app.get("/api/sessions/list")
+async def list_all_sessions():
+    """获取所有会话列表"""
+    session_ids = session_manager.list_sessions()
+    sessions_info = []
+    
+    for sid in session_ids:
+        info = session_manager.get_session_info(sid)
+        if info:
+            sessions_info.append(info)
+    
+    return {"sessions": sessions_info, "total": len(sessions_info)}
+
+
+@app.delete("/api/sessions/{session_id}")
+async def delete_specific_session(session_id: str):
+    """删除指定会话"""
+    success = session_manager.delete_session(session_id)
+    if not success:
+        raise HTTPException(status_code=404, detail="Session not found")
+    return {"status": "deleted", "session_id": session_id}
+
+    return {"status": "deleted", "session_id": session_id}
+
+
+# --- History API ---
+
+@app.get("/api/history")
+async def get_history():
+    """
+    Get list of past analysis sessions from outputs directory
+    """
+    history = []
+    output_base = "outputs"
+    
+    if not os.path.exists(output_base):
+        return {"history": []}
+        
+    try:
+        # Scan for session_* directories
+        for entry in os.scandir(output_base):
+            if entry.is_dir() and entry.name.startswith("session_"):
+                # Extract timestamp from folder name: session_20250101_120000
+                session_id = entry.name.replace("session_", "")
+                
+                # Check creation time or extract from name
+                try:
+                    # Try to parse timestamp from ID if it matches format
+                    # Format: YYYYMMDD_HHMMSS
+                    timestamp_str = session_id
+                    dt = datetime.strptime(timestamp_str, "%Y%m%d_%H%M%S")
+                    display_time = dt.strftime("%Y-%m-%d %H:%M:%S")
+                    sort_key = dt.timestamp()
+                except ValueError:
+                    # Fallback to file creation time
+                    sort_key = entry.stat().st_ctime
+                    display_time = datetime.fromtimestamp(sort_key).strftime("%Y-%m-%d %H:%M:%S")
+
+                history.append({
+                    "id": session_id,
+                    "timestamp": display_time,
+                    "sort_key": sort_key,
+                    "name": f"Session {display_time}"
+                })
+        
+        # Sort by latest first
+        history.sort(key=lambda x: x["sort_key"], reverse=True)
+        
+        # Cleanup internal sort key
+        for item in history:
+            del item["sort_key"]
+            
+        return {"history": history}
+        
+    except Exception as e:
+        print(f"Error scanning history: {e}")
+        return {"history": []}
+
+if __name__ == "__main__":
+    import uvicorn
+    uvicorn.run(app, host="0.0.0.0", port=8000)
--- a/web/static/clean_style.css
+++ b/web/static/clean_style.css
@@ -0,0 +1,535 @@
+/* Clean Style - IOV Data Analysis Agent */
+
+:root {
+    --primary-color: #2563EB;
+    /* Tech Blue */
+    --primary-hover: #1D4ED8;
+    --bg-color: #FFFFFF;
+    --sidebar-bg: #F9FAFB;
+    --text-primary: #111827;
+    --text-secondary: #6B7280;
+    --border-color: #E5E7EB;
+    --card-shadow: 0 1px 3px 0 rgba(0, 0, 0, 0.1), 0 1px 2px 0 rgba(0, 0, 0, 0.06);
+    --font-family: 'Inter', -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, "Helvetica Neue", Arial, sans-serif;
+}
+
+* {
+    box-sizing: border-box;
+    margin: 0;
+    padding: 0;
+}
+
+body {
+    font-family: var(--font-family);
+    color: var(--text-primary);
+    background-color: var(--bg-color);
+    line-height: 1.5;
+    height: 100vh;
+    overflow: hidden;
+}
+
+.app-container {
+    display: flex;
+    height: 100vh;
+}
+
+/* Sidebar */
+.sidebar {
+    width: 240px;
+    /* Compact width */
+    background-color: var(--sidebar-bg);
+    border-right: 1px solid var(--border-color);
+    display: flex;
+    flex-direction: column;
+    padding: 1rem;
+    flex-shrink: 0;
+}
+
+.brand {
+    display: flex;
+    align-items: center;
+    gap: 0.75rem;
+    margin-bottom: 1.5rem;
+    font-weight: 600;
+    color: var(--text-primary);
+}
+
+.brand i {
+    color: var(--primary-color);
+    font-size: 1.5rem;
+}
+
+.nav-menu {
+    display: flex;
+    flex-direction: column;
+    gap: 0.5rem;
+    flex: 1;
+    overflow-y: hidden;
+    /* Let history list handle scroll */
+}
+
+.nav-item {
+    display: flex;
+    align-items: center;
+    gap: 0.75rem;
+    padding: 0.75rem 1rem;
+    border-radius: 0.375rem;
+    color: var(--text-secondary);
+    text-decoration: none;
+    cursor: pointer;
+    transition: all 0.2s;
+    font-size: 0.95rem;
+    border: none;
+    background: none;
+    width: 100%;
+    text-align: left;
+}
+
+.nav-item:hover {
+    background-color: #F3F4F6;
+    color: var(--text-primary);
+}
+
+.nav-item.active {
+    background-color: #EFF6FF;
+    color: var(--primary-color);
+    font-weight: 500;
+}
+
+.nav-item i {
+    width: 1.25rem;
+    text-align: center;
+}
+
+.nav-divider {
+    height: 1px;
+    background-color: var(--border-color);
+    margin: 1rem 0 0.5rem 0;
+}
+
+.nav-section-title {
+    font-size: 0.75rem;
+    text-transform: uppercase;
+    color: var(--text-secondary);
+    font-weight: 600;
+    letter-spacing: 0.05em;
+    margin-bottom: 0.5rem;
+    padding-left: 0.5rem;
+}
+
+/* History List */
+.history-list {
+    flex: 1;
+    overflow-y: auto;
+    display: flex;
+    flex-direction: column;
+    gap: 0.25rem;
+    padding-right: 5px;
+}
+
+.history-item {
+    font-size: 0.85rem;
+    color: var(--text-secondary);
+    padding: 0.5rem 0.75rem;
+    border-radius: 0.375rem;
+    cursor: pointer;
+    transition: all 0.2s;
+    white-space: nowrap;
+    overflow: hidden;
+    text-overflow: ellipsis;
+    display: flex;
+    align-items: center;
+    gap: 0.5rem;
+}
+
+.history-item:hover {
+    background-color: #F3F4F6;
+    color: var(--text-primary);
+}
+
+.history-item.active {
+    background-color: #EFF6FF;
+    color: var(--primary-color);
+}
+
+
+.status-bar {
+    margin-top: auto;
+    padding-top: 1rem;
+    border-top: 1px solid var(--border-color);
+    display: flex;
+    align-items: center;
+    gap: 0.5rem;
+    font-size: 0.875rem;
+    color: var(--text-secondary);
+}
+
+.status-dot {
+    width: 8px;
+    height: 8px;
+    border-radius: 50%;
+    background-color: #D1D5DB;
+}
+
+.status-dot.running {
+    background-color: var(--primary-color);
+    box-shadow: 0 0 0 2px rgba(37, 99, 235, 0.2);
+}
+
+/* Main Content */
+.main-content {
+    flex: 1;
+    display: flex;
+    flex-direction: column;
+    height: 100vh;
+    overflow: hidden;
+    background-color: #FFFFFF;
+}
+
+.header {
+    height: 64px;
+    border-bottom: 1px solid var(--border-color);
+    display: flex;
+    align-items: center;
+    padding: 0 2rem;
+    background-color: #FFFFFF;
+}
+
+.header h2 {
+    font-size: 1.25rem;
+    font-weight: 600;
+}
+
+.content-area {
+    flex: 1;
+    overflow-y: auto;
+    padding: 2rem;
+    background-color: #ffffff;
+}
+
+/* Sections & Panel */
+.section {
+    display: none;
+    max-width: 1000px;
+    margin: 0 auto;
+}
+
+.section.active {
+    display: block;
+}
+
+.analysis-grid {
+    display: grid;
+    grid-template-columns: 350px 1fr;
+    gap: 2rem;
+    height: calc(100vh - 64px - 4rem);
+}
+
+.panel {
+    background: #FFFFFF;
+    border: 1px solid var(--border-color);
+    border-radius: 0.5rem;
+    padding: 1.5rem;
+    display: flex;
+    flex-direction: column;
+    gap: 1.5rem;
+}
+
+.panel-title {
+    font-size: 1rem;
+    font-weight: 600;
+    color: var(--text-primary);
+    margin-bottom: 0.5rem;
+    display: flex;
+    align-items: center;
+    justify-content: space-between;
+}
+
+/* Forms */
+.form-group {
+    display: flex;
+    flex-direction: column;
+    gap: 0.5rem;
+}
+
+.form-label {
+    font-size: 0.875rem;
+    font-weight: 500;
+    color: var(--text-secondary);
+}
+
+.form-input,
+.form-textarea {
+    padding: 0.625rem 0.875rem;
+    border: 1px solid var(--border-color);
+    border-radius: 0.375rem;
+    font-family: inherit;
+    font-size: 0.9rem;
+    color: var(--text-primary);
+    outline: none;
+    transition: border-color 0.2s;
+    width: 100%;
+}
+
+.form-input:focus,
+.form-textarea:focus {
+    border-color: var(--primary-color);
+    box-shadow: 0 0 0 2px rgba(37, 99, 235, 0.1);
+}
+
+.form-textarea {
+    resize: vertical;
+    min-height: 100px;
+}
+
+/* Buttons */
+.btn {
+    display: inline-flex;
+    align-items: center;
+    justify-content: center;
+    gap: 0.5rem;
+    padding: 0.625rem 1.25rem;
+    border-radius: 0.375rem;
+    font-weight: 500;
+    font-size: 0.9rem;
+    cursor: pointer;
+    transition: all 0.2s;
+    border: 1px solid transparent;
+}
+
+.btn-primary {
+    background-color: var(--primary-color);
+    color: white;
+}
+
+.btn-primary:hover {
+    background-color: var(--primary-hover);
+}
+
+.btn-secondary {
+    background-color: white;
+    border-color: var(--border-color);
+    color: var(--text-primary);
+}
+
+.btn-secondary:hover {
+    background-color: #F9FAFB;
+    border-color: #D1D5DB;
+}
+
+.btn-sm {
+    padding: 0.375rem 0.75rem;
+    font-size: 0.875rem;
+}
+
+/* Upload Area */
+.upload-area {
+    border: 2px dashed var(--border-color);
+    border-radius: 0.5rem;
+    padding: 2rem;
+    text-align: center;
+    cursor: pointer;
+    transition: all 0.2s;
+    background-color: #F9FAFB;
+}
+
+.upload-area:hover,
+.upload-area.dragover {
+    border-color: var(--primary-color);
+    background-color: #EFF6FF;
+}
+
+.upload-icon {
+    font-size: 1.5rem;
+    color: var(--text-secondary);
+    margin-bottom: 0.75rem;
+}
+
+.file-list {
+    margin-top: 1rem;
+    display: flex;
+    flex-direction: column;
+    gap: 0.5rem;
+}
+
+.file-item {
+    display: flex;
+    align-items: center;
+    gap: 0.5rem;
+    font-size: 0.85rem;
+    color: var(--text-primary);
+    background: #FFFFFF;
+    padding: 0.5rem;
+    border: 1px solid var(--border-color);
+    border-radius: 0.25rem;
+}
+
+/* Tabs */
+.tabs {
+    display: flex;
+    gap: 1rem;
+    margin-left: 1rem;
+}
+
+.tab {
+    padding: 0.25rem 0.5rem;
+    font-size: 0.9rem;
+    color: var(--text-secondary);
+    cursor: pointer;
+    border-bottom: 2px solid transparent;
+    transition: all 0.2s;
+}
+
+.tab:hover {
+    color: var(--text-primary);
+}
+
+.tab.active {
+    color: var(--primary-color);
+    border-bottom-color: var(--primary-color);
+    font-weight: 500;
+}
+
+/* Log & Report Content */
+.output-container {
+    flex: 1;
+    overflow-y: hidden;
+    /* Individual tabs scroll */
+    background: #F9FAFB;
+    border: 1px solid var(--border-color);
+    border-radius: 0.375rem;
+    padding: 1rem;
+    position: relative;
+    display: flex;
+    flex-direction: column;
+}
+
+#logsTab {
+    background-color: #1a1b26;
+    color: #a9b1d6;
+    font-family: 'JetBrains Mono', 'Menlo', 'Monaco', 'Courier New', monospace;
+    padding: 1.5rem;
+}
+
+.log-content {
+    font-family: inherit;
+    font-size: 0.85rem;
+    white-space: pre-wrap;
+    line-height: 1.6;
+    margin: 0;
+}
+
+.report-content {
+    font-size: 0.95rem;
+    line-height: 1.7;
+    color: #1F2937;
+}
+
+.report-content img {
+    max-width: 100%;
+    border-radius: 0.375rem;
+    margin: 1rem 0;
+    box-shadow: var(--card-shadow);
+}
+
+/* Empty State */
+.empty-state {
+    text-align: center;
+    padding: 4rem 2rem;
+    color: var(--text-secondary);
+}
+
+/* Utilities */
+.hidden {
+    display: none !important;
+}
+
+/* Gallery Carousel */
+.carousel-container {
+    position: relative;
+    width: 100%;
+    flex: 1;
+    display: flex;
+    align-items: center;
+    justify-content: center;
+    background: #F3F4F6;
+    border-radius: 0.5rem;
+    overflow: hidden;
+    margin-bottom: 1rem;
+}
+
+.carousel-slide {
+    width: 100%;
+    height: 100%;
+    display: flex;
+    flex-direction: column;
+    align-items: center;
+    justify-content: center;
+    padding: 2rem;
+}
+
+.carousel-slide img {
+    max-width: 100%;
+    max-height: 500px;
+    object-fit: contain;
+    border-radius: 0.25rem;
+    box-shadow: 0 10px 15px -3px rgba(0, 0, 0, 0.1);
+    transition: transform 0.2s;
+    background: white;
+}
+
+.carousel-btn {
+    position: absolute;
+    top: 50%;
+    transform: translateY(-50%);
+    background: rgba(255, 255, 255, 0.9);
+    border: 1px solid var(--border-color);
+    border-radius: 50%;
+    width: 44px;
+    height: 44px;
+    display: flex;
+    align-items: center;
+    justify-content: center;
+    cursor: pointer;
+    z-index: 10;
+    color: var(--text-primary);
+    box-shadow: 0 4px 6px -1px rgba(0, 0, 0, 0.1);
+    transition: all 0.2s;
+}
+
+.carousel-btn:hover {
+    background: var(--primary-color);
+    color: white;
+    border-color: var(--primary-color);
+    transform: translateY(-50%) scale(1.1);
+}
+
+.carousel-btn.prev {
+    left: 1rem;
+}
+
+.carousel-btn.next {
+    right: 1rem;
+}
+
+.image-info {
+    width: 100%;
+    text-align: center;
+    color: var(--text-primary);
+    background: white;
+    padding: 1rem;
+    border-radius: 0.5rem;
+    border: 1px solid var(--border-color);
+}
+
+.image-title {
+    font-weight: 600;
+    font-size: 1.1rem;
+    margin-bottom: 0.5rem;
+    color: var(--primary-color);
+}
+
+.image-desc {
+    font-size: 0.9rem;
+    color: var(--text-secondary);
+}
--- a/web/static/index.html
+++ b/web/static/index.html
@@ -0,0 +1,168 @@
+<!DOCTYPE html>
+<html lang="en">
+
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>IOV Data Analysis Agent</title>
+    <link rel="stylesheet" href="/static/clean_style.css">
+
+    <!-- Fonts -->
+    <link rel="preconnect" href="https://fonts.googleapis.com">
+    <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
+    <link
+        href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600&family=JetBrains+Mono:wght@400;500&display=swap"
+        rel="stylesheet">
+
+    <!-- Icons -->
+    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.0/css/all.min.css">
+
+    <!-- Markdown -->
+    <script src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"></script>
+</head>
+
+<body>
+    <div class="app-container">
+        <!-- Sidebar -->
+        <aside class="sidebar">
+            <div class="brand">
+                <i class="fa-solid fa-cube"></i>
+                <span>IOV Agent</span>
+            </div>
+
+            <nav class="nav-menu">
+                <button class="nav-item active" onclick="switchView('analysis')">
+                    <i class="fa-solid fa-chart-line"></i> Analysis
+                </button>
+
+                <div class="nav-divider"></div>
+                <div class="nav-section-title">History</div>
+                <div id="historyList" class="history-list">
+                    <!-- History items loaded via JS -->
+                    <div style="padding:0.5rem; font-size:0.8rem; color:#9CA3AF;">Loading...</div>
+                </div>
+            </nav>
+
+            <div class="status-bar">
+                <div id="statusDot" class="status-dot"></div>
+                <span id="statusText">Ready</span>
+            </div>
+        </aside>
+
+        <!-- Main Content -->
+        <main class="main-content">
+            <header class="header">
+                <h2 id="pageTitle">Analysis Dashboard</h2>
+            </header>
+
+            <div class="content-area">
+                <!-- VIEW: ANALYSIS -->
+                <div id="viewAnalysis" class="section active">
+                    <div class="analysis-grid">
+
+                        <!-- Configuration Panel -->
+                        <div class="panel">
+                            <div class="panel-title">
+                                <span>Configuration</span>
+                            </div>
+
+                            <div class="form-group">
+                                <label class="form-label">1. Data Upload</label>
+                                <div id="uploadZone" class="upload-area">
+                                    <i class="fa-solid fa-cloud-arrow-up upload-icon"></i>
+                                    <p>Click or Drag CSV/Excel Files</p>
+                                    <div id="fileList" class="file-list"></div>
+                                </div>
+                                <input type="file" id="fileInput" multiple accept=".csv,.xlsx,.xls" hidden>
+                            </div>
+
+                            <div class="form-group">
+                                <label class="form-label">2. Requirement</label>
+                                <textarea id="requirementInput" class="form-textarea"
+                                    placeholder="Describe what you want to analyze..."></textarea>
+                            </div>
+
+                            <button id="startBtn" class="btn btn-primary" style="margin-top: 1rem; width: 100%;">
+                                <i class="fa-solid fa-play"></i> Start Analysis
+                            </button>
+                        </div>
+
+                        <!-- Output Panel -->
+                        <div class="panel" style="overflow:hidden; display:flex; flex-direction:column;">
+                            <div class="panel-title" style="margin-bottom:0.5rem;">
+                                <span>Output</span>
+                                <div class="tabs">
+                                    <div class="tab active" onclick="switchTab('logs')">Live Log</div>
+                                    <div class="tab" onclick="switchTab('report')">Report</div>
+                                    <div class="tab" onclick="switchTab('gallery')">Gallery</div>
+                                </div>
+                                <button id="downloadScriptBtn" class="btn btn-sm btn-secondary hidden"
+                                    onclick="downloadScript()" style="margin-left:auto;">
+                                    <i class="fa-solid fa-code"></i> Script
+                                </button>
+                            </div>
+
+                            <div class="output-container" id="outputContainer">
+                                <!-- Logs Tab -->
+                                <div id="logsTab" class="tab-content active" style="height:100%; overflow-y:auto;">
+                                    <pre id="logOutput" class="log-content">Waiting to start...</pre>
+                                </div>
+
+                                <!-- Report Tab -->
+                                <div id="reportTab" class="tab-content hidden" style="height:100%; overflow-y:auto;">
+                                    <div id="reportContainer" class="report-content markdown-body">
+                                        <div class="empty-state">
+                                            <p>Report will appear here after analysis.</p>
+                                        </div>
+                                    </div>
+                                    <div id="followUpSection" class="hidden"
+                                        style="margin-top:2rem; border-top:1px solid var(--border-color); padding-top:1rem;">
+                                        <div class="form-group">
+                                            <label class="form-label">Follow-up Analysis</label>
+                                            <div style="display:flex; gap:0.5rem;">
+                                                <input type="text" id="followUpInput" class="form-input"
+                                                    placeholder="Ask a follow-up question...">
+                                                <button class="btn btn-primary btn-sm"
+                                                    onclick="sendFollowUp()">Send</button>
+                                            </div>
+                                        </div>
+                                    </div>
+                                    <div style="margin-top:1rem; text-align:right">
+                                        <button id="exportBtn" class="btn btn-secondary btn-sm"
+                                            onclick="triggerExport()">
+                                            <i class="fa-solid fa-download"></i> Export ZIP
+                                        </button>
+                                    </div>
+                                </div>
+
+                                <!-- Gallery Tab -->
+                                <div id="galleryTab" class="tab-content hidden"
+                                    style="height:100%; display:flex; flex-direction:column; align-items:center; justify-content:center;">
+                                    <div class="carousel-container">
+                                        <button class="carousel-btn prev" onclick="prevImage()"><i
+                                                class="fa-solid fa-chevron-left"></i></button>
+                                        <div class="carousel-slide" id="carouselSlide">
+                                            <p class="placeholder-text" style="color:var(--text-secondary);">No images
+                                                generated.</p>
+                                        </div>
+                                        <button class="carousel-btn next" onclick="nextImage()"><i
+                                                class="fa-solid fa-chevron-right"></i></button>
+                                    </div>
+                                    <div class="image-info" id="imageInfo" style="margin-top:1rem; text-align:center;">
+                                        <!-- Title/Desc -->
+                                    </div>
+                                </div>
+                            </div>
+                        </div>
+
+                    </div>
+                </div>
+
+            </div>
+        </main>
+    </div>
+
+    <script src="/static/script.js"></script>
+</body>
+
+</html>
--- a/web/static/script.js
+++ b/web/static/script.js
@@ -0,0 +1,434 @@
+
+// DOM Elements
+const uploadZone = document.getElementById('uploadZone');
+const fileInput = document.getElementById('fileInput');
+const fileList = document.getElementById('fileList');
+const startBtn = document.getElementById('startBtn');
+const requirementInput = document.getElementById('requirementInput');
+const statusDot = document.getElementById('statusDot');
+const statusText = document.getElementById('statusText');
+const logOutput = document.getElementById('logOutput');
+const reportContainer = document.getElementById('reportContainer');
+const downloadScriptBtn = document.getElementById('downloadScriptBtn');
+
+let isRunning = false;
+let pollingInterval = null;
+let currentSessionId = null;
+
+// --- Upload Logic ---
+if (uploadZone) {
+    uploadZone.addEventListener('dragover', (e) => {
+        e.preventDefault();
+        uploadZone.classList.add('dragover');
+    });
+    uploadZone.addEventListener('dragleave', () => uploadZone.classList.remove('dragover'));
+    uploadZone.addEventListener('drop', (e) => {
+        e.preventDefault();
+        uploadZone.classList.remove('dragover');
+        handleFiles(e.dataTransfer.files);
+    });
+    uploadZone.addEventListener('click', () => fileInput.click());
+}
+
+if (fileInput) {
+    fileInput.addEventListener('change', (e) => handleFiles(e.target.files));
+    fileInput.addEventListener('click', (e) => e.stopPropagation()); // Prevent bubbling to uploadZone
+}
+
+async function handleFiles(files) {
+    if (files.length === 0) return;
+
+    fileList.innerHTML = '';
+    const formData = new FormData();
+
+    for (const file of files) {
+        formData.append('files', file);
+        const fileItem = document.createElement('div');
+        fileItem.className = 'file-item';
+        fileItem.innerHTML = `<i class="fa-regular fa-file-excel"></i> ${file.name}`;
+        fileList.appendChild(fileItem);
+    }
+
+    try {
+        const res = await fetch('/api/upload', {
+            method: 'POST',
+            body: formData
+        });
+        if (res.ok) {
+            console.log('Upload success');
+        } else {
+            alert('Upload failed');
+        }
+    } catch (e) {
+        console.error(e);
+        alert('Upload failed');
+    }
+}
+
+// --- Analysis Logic ---
+if (startBtn) {
+    startBtn.addEventListener('click', startAnalysis);
+}
+
+async function startAnalysis() {
+    if (isRunning) return;
+
+    const requirement = requirementInput.value.trim();
+    if (!requirement) {
+        alert('Please enter analysis requirement');
+        return;
+    }
+
+    setRunningState(true);
+
+    try {
+        const res = await fetch('/api/start', {
+            method: 'POST',
+            headers: { 'Content-Type': 'application/json' },
+            body: JSON.stringify({ requirement })
+        });
+
+        if (res.ok) {
+            const data = await res.json();
+            currentSessionId = data.session_id;
+            console.log("Started Session:", currentSessionId);
+
+            startPolling();
+            switchTab('logs');
+        } else {
+            const err = await res.json();
+            alert('Failed to start: ' + err.detail);
+            setRunningState(false);
+        }
+    } catch (e) {
+        console.error(e);
+        alert('Error starting analysis');
+        setRunningState(false);
+    }
+}
+
+function setRunningState(running) {
+    isRunning = running;
+    startBtn.disabled = running;
+
+    if (running) {
+        startBtn.innerHTML = '<i class="fa-solid fa-spinner fa-spin"></i> Analysis in Progress...';
+        statusDot.className = 'status-dot running';
+        statusText.innerText = 'Analyzing';
+        statusText.style.color = 'var(--primary-color)';
+
+        // Hide follow-up and download during run
+        const followUpSection = document.getElementById('followUpSection');
+        if (followUpSection) followUpSection.classList.add('hidden');
+        if (downloadScriptBtn) downloadScriptBtn.classList.add('hidden');
+    } else {
+        startBtn.innerHTML = '<i class="fa-solid fa-play"></i> Start Analysis';
+        statusDot.className = 'status-dot';
+        statusText.innerText = 'Completed';
+        statusText.style.color = 'var(--text-secondary)';
+
+        const followUpSection = document.getElementById('followUpSection');
+        if (currentSessionId && followUpSection) {
+            followUpSection.classList.remove('hidden');
+        }
+    }
+}
+
+function startPolling() {
+    if (pollingInterval) clearInterval(pollingInterval);
+    if (!currentSessionId) return;
+
+    pollingInterval = setInterval(async () => {
+        try {
+            const res = await fetch(`/api/status?session_id=${currentSessionId}`);
+            if (!res.ok) return;
+            const data = await res.json();
+
+            // Update Logs
+            logOutput.innerText = data.log || "Waiting for output...";
+
+            // Auto scroll
+            const logTab = document.getElementById('logsTab');
+            if (logTab) logTab.scrollTop = logTab.scrollHeight;
+
+            if (!data.is_running && isRunning) {
+                // Finished
+                setRunningState(false);
+                clearInterval(pollingInterval);
+
+                if (data.has_report) {
+                    await loadReport();
+
+                    // 强制跳转到 Report Tab
+                    switchTab('report');
+                    console.log("Analysis done, switched to report tab");
+                }
+
+                // Check for script
+                if (data.script_path) {
+                    if (downloadScriptBtn) {
+                        downloadScriptBtn.classList.remove('hidden');
+                        downloadScriptBtn.style.display = 'inline-flex';
+                    }
+                }
+            }
+        } catch (e) {
+            console.error('Polling error', e);
+        }
+    }, 2000);
+}
+
+// --- Report Logic ---
+async function loadReport() {
+    if (!currentSessionId) return;
+    try {
+        const res = await fetch(`/api/report?session_id=${currentSessionId}`);
+        const data = await res.json();
+
+        if (!data.content || data.content === "Report not ready.") {
+            reportContainer.innerHTML = '<div class="empty-state"><p>Analysis in progress or no report generated yet.</p></div>';
+        } else {
+            reportContainer.innerHTML = marked.parse(data.content);
+        }
+    } catch (e) {
+        reportContainer.innerHTML = '<p class="error">Failed to load report.</p>';
+    }
+}
+
+// --- Gallery Logic ---
+let galleryImages = [];
+let currentImageIndex = 0;
+
+async function loadGallery() {
+    if (!currentSessionId) return;
+    try {
+        const res = await fetch(`/api/figures?session_id=${currentSessionId}`);
+        const data = await res.json();
+
+        galleryImages = data.figures || [];
+        currentImageIndex = 0;
+        renderGalleryImage();
+
+    } catch (e) {
+        console.error("Gallery load failed", e);
+        document.getElementById('carouselSlide').innerHTML = '<p class="error">Failed to load images.</p>';
+    }
+}
+
+function renderGalleryImage() {
+    const slide = document.getElementById('carouselSlide');
+    const info = document.getElementById('imageInfo');
+
+    if (galleryImages.length === 0) {
+        slide.innerHTML = '<p class="placeholder-text" style="color:var(--text-secondary);">No images generated in this session.</p>';
+        info.innerHTML = '';
+        return;
+    }
+
+    const img = galleryImages[currentImageIndex];
+
+    // Image
+    slide.innerHTML = `<img src="${img.web_url}" alt="${img.filename}" onclick="window.open('${img.web_url}', '_blank')">`;
+
+    // Info
+    info.innerHTML = `
+        <div class="image-title">${img.filename} (${currentImageIndex + 1}/${galleryImages.length})</div>
+        <div class="image-desc">${img.description || 'No description available.'}</div>
+        ${img.analysis ? `<div style="font-size:0.8rem; margin-top:0.5rem; color:#4B5563; background:#F3F4F6; padding:0.5rem; border-radius:4px;">${img.analysis}</div>` : ''}
+    `;
+}
+
+window.prevImage = function () {
+    if (galleryImages.length === 0) return;
+    currentImageIndex = (currentImageIndex - 1 + galleryImages.length) % galleryImages.length;
+    renderGalleryImage();
+}
+
+window.nextImage = function () {
+    if (galleryImages.length === 0) return;
+    currentImageIndex = (currentImageIndex + 1) % galleryImages.length;
+    renderGalleryImage();
+}
+
+// --- Download Script ---
+window.downloadScript = async function () {
+    if (!currentSessionId) return;
+    const link = document.createElement('a');
+    link.href = `/api/download_script?session_id=${currentSessionId}`;
+    link.download = '';
+    document.body.appendChild(link);
+    link.click();
+    document.body.removeChild(link);
+}
+
+// --- Export Report ---
+window.triggerExport = async function () {
+    if (!currentSessionId) {
+        alert("No active session to export.");
+        return;
+    }
+    const btn = document.getElementById('exportBtn');
+    const originalContent = btn.innerHTML;
+    btn.innerHTML = '<i class="fa-solid fa-spinner fa-spin"></i> Zipping...';
+    btn.disabled = true;
+
+    try {
+        const url = `/api/export?session_id=${currentSessionId}`;
+        window.open(url, '_blank');
+
+    } catch (e) {
+        alert("Export failed: " + e.message);
+    } finally {
+        setTimeout(() => {
+            btn.innerHTML = originalContent;
+            btn.disabled = false;
+        }, 2000);
+    }
+}
+
+// --- Follow-up Chat ---
+window.sendFollowUp = async function () {
+    if (!currentSessionId || isRunning) return;
+    const input = document.getElementById('followUpInput');
+    const message = input.value.trim();
+    if (!message) return;
+
+    input.disabled = true;
+    try {
+        const res = await fetch('/api/chat', {
+            method: 'POST',
+            headers: { 'Content-Type': 'application/json' },
+            body: JSON.stringify({ session_id: currentSessionId, message: message })
+        });
+
+        if (res.ok) {
+            input.value = '';
+            setRunningState(true);
+            startPolling();
+            switchTab('logs');
+        } else {
+            alert('Failed to send request');
+        }
+    } catch (e) {
+        console.error(e);
+    } finally {
+        input.disabled = false;
+    }
+}
+
+// --- History Logic ---
+async function loadHistory() {
+    const list = document.getElementById('historyList');
+    if (!list) return;
+
+    try {
+        const res = await fetch('/api/history');
+        const data = await res.json();
+
+        if (data.history.length === 0) {
+            list.innerHTML = '<div style="padding:0.5rem; font-size:0.8rem; color:#9CA3AF;">No history yet</div>';
+            return;
+        }
+
+        let html = '';
+        data.history.forEach(item => {
+            // item: {id, timestamp, name}
+            const timeStr = item.timestamp.split(' ')[0]; // Just date for compactness
+            html += `
+                <div class="history-item" onclick="loadSession('${item.id}')" id="hist-${item.id}">
+                    <i class="fa-regular fa-clock"></i>
+                    <span>${item.id}</span>
+                </div>
+            `;
+        });
+        list.innerHTML = html;
+
+    } catch (e) {
+        console.error("Failed to load history", e);
+    }
+}
+
+window.loadSession = async function (sessionId) {
+    if (isRunning) {
+        alert("Analysis in progress, please wait.");
+        return;
+    }
+
+    currentSessionId = sessionId;
+
+    // Update active class
+    document.querySelectorAll('.history-item').forEach(el => el.classList.remove('active'));
+    const activeItem = document.getElementById(`hist-${sessionId}`);
+    if (activeItem) activeItem.classList.add('active');
+
+    // Reset UI
+    logOutput.innerText = "Loading session data...";
+    reportContainer.innerHTML = "";
+    if (downloadScriptBtn) downloadScriptBtn.classList.add('hidden');
+
+    // Fetch Status to get logs and check report
+    try {
+        const res = await fetch(`/api/status?session_id=${sessionId}`);
+        if (res.ok) {
+            const data = await res.json();
+            logOutput.innerText = data.log || "No logs available.";
+
+            // Auto scroll log
+            const logTab = document.getElementById('logsTab');
+            if (logTab) logTab.scrollTop = logTab.scrollHeight;
+
+            if (data.has_report) {
+                await loadReport();
+                // Check if script exists
+                if (data.script_path && downloadScriptBtn) {
+                    downloadScriptBtn.classList.remove('hidden');
+                    downloadScriptBtn.style.display = 'inline-flex';
+                }
+                switchTab('report');
+            } else {
+                switchTab('logs');
+            }
+        }
+    } catch (e) {
+        logOutput.innerText = "Error loading session.";
+    }
+}
+
+// Initialize
+document.addEventListener('DOMContentLoaded', () => {
+    loadHistory();
+});
+
+// --- Navigation ---
+// No-op for switchView as sidebar is simplified
+window.switchView = function (viewName) {
+    console.log("View switch requested:", viewName);
+}
+
+window.switchTab = function (tabName) {
+    // Buttons
+    document.querySelectorAll('.tab').forEach(t => t.classList.remove('active'));
+    // Content
+    ['logs', 'report', 'gallery'].forEach(name => {
+        const content = document.getElementById(`${name}Tab`);
+        if (content) content.classList.add('hidden');
+
+        // 找到对应的 Tab 按钮并激活
+        // 这里假设 Tab 按钮的 onclick 包含 tabName
+        document.querySelectorAll('.tab').forEach(btn => {
+            if (btn.getAttribute('onclick') && btn.getAttribute('onclick').includes(`'${tabName}'`)) {
+                btn.classList.add('active');
+            }
+        });
+    });
+
+    // Valid tabs logic
+    if (tabName === 'logs') {
+        document.getElementById('logsTab').classList.remove('hidden');
+    } else if (tabName === 'report') {
+        document.getElementById('reportTab').classList.remove('hidden');
+    } else if (tabName === 'gallery') {
+        document.getElementById('galleryTab').classList.remove('hidden');
+        if (currentSessionId) loadGallery();
+    }
+}
Author	SHA1	Message	Date
Zhaojie	b033eb61cc	优化数据预处理	2026-02-02 09:44:07 +08:00
Zhaojie	c8fe5e6d6f	更新README文档	2026-02-02 09:28:49 +08:00
Zhaojie	3585ba6932	优化报告提示词，删减特殊字符	2026-02-02 09:18:14 +08:00
Zhaojie	e9644360ce	更新前端页交互模式	2026-01-31 20:27:17 +08:00
Zhaojie	5eb13324c2	清理表情	2026-01-31 18:00:05 +08:00
Zhaojie	674f48c74b	空提交	2026-01-24 12:53:06 +08:00
Zhaojie	fbbb5a2470	feat: Introduce LLM response caching and streaming, add application configuration, and enhance session data with progress and history tracking.	2026-01-24 12:52:35 +08:00
Zhaojie	162f5c4da4	修改前端显示逻辑	2026-01-22 22:26:04 +08:00
Zhaojie	b1d0cc5462	更新readme文档	2026-01-09 16:52:45 +08:00
Zhaojie	e51cdfea6f	docs: update README with Web Interface, Gallery, and Tools documentation	2026-01-09 16:48:44 +08:00
Zhaojie	621e546b43	feat: Update core agent logic, code execution utilities, and LLM configuration.	2026-01-07 16:41:38 +08:00