输出解析（Output Parsing）

让 LLM 输出结构化数据（JSON、YAML 等），便于程序处理。

为什么需要输出解析

LLM 默认输出的是自由文本，程序难以处理：

LLM 输出：
"我觉得这个问题的原因是服务器配置错误，
建议检查 nginx.conf 文件的 worker_processes 配置，
另外也要看看错误日志。"

程序：？？？怎么提取关键信息？

结构化输出：

json

{
  "root_cause": "服务器配置错误",
  "suggestions": [
    "检查 nginx.conf 的 worker_processes",
    "查看错误日志"
  ],
  "severity": "medium"
}

程序可以直接解析使用。

方法一：在 Prompt 中要求格式化输出

python

prompt = """
分析以下代码，以 JSON 格式返回结果。

要求：
- 使用严格 JSON 格式（不要用 Markdown 代码块包裹）
- 包含字段：has_bug (bool)、description (string)、suggestion (string)

代码：
```java
{code}

只输出 JSON，不要输出其他内容。 """


**问题**：LLM 不一定严格遵守格式，可能多加解释文字。

## 方法二：用 Output Parser（推荐）

### LangChain - StructuredOutputParser

```python
from langchain.output_parsers import StructuredOutputParser, ResponseSchema
from langchain.prompts import PromptTemplate

# 定义输出格式
response_schemas = [
    ResponseSchema(name="root_cause", description="问题的根本原因"),
    ResponseSchema(name="suggestions", description="建议列表（数组）"),
    ResponseSchema(name="severity", description="严重程度：high/medium/low")
]

parser = StructuredOutputParser.from_response_schemas(response_schemas)
format_instructions = parser.get_format_instructions()

# 把格式说明加入 Prompt
prompt = PromptTemplate(
    template="分析以下代码：\n{code}\n\n{format_instructions}",
    input_variables=["code"],
    partial_variables={"format_instructions": format_instructions}
)

# 执行
chain = LLMChain(llm=llm, prompt=prompt)
output = chain.run(code=java_code)

# 解析（自动处理格式错误）
result = parser.parse(output)
print(result["root_cause"])

LangChain - PydanticOutputParser（更推荐）

python

from pydantic import BaseModel, Field
from langchain.output_parsers import PydanticOutputParser

# 用 Pydantic 定义输出格式
class CodeReview(BaseModel):
    has_bug: bool = Field(description="是否存在 Bug")
    description: str = Field(description="问题描述")
    suggestion: str = Field(description="修改建议")
    severity: str = Field(description="严重程度：high/medium/low")

parser = PydanticOutputParser(pydantic_object=CodeReview)
format_instructions = parser.get_format_instructions()

prompt = f"""
分析以下代码，按格式要求输出。

格式要求：
{format_instructions}

代码：
{code}
"""

output = llm(prompt)
result = parser.parse(output)  # 返回 CodeReview 对象
print(result.has_bug)
print(result.suggestion)

方法三：Function Calling（最稳定）

OpenAI / Anthropic 的 Function Calling 可以保证输出格式：

python

import json
from openai import OpenAI

client = OpenAI()

# 定义函数（输出格式）
functions = [
    {
        "name": "output_code_review",
        "description": "输出代码审查结果",
        "parameters": {
            "type": "object",
            "properties": {
                "has_bug": {"type": "boolean"},
                "description": {"type": "string"},
                "suggestion": {"type": "string"},
                "severity": {"type": "string", "enum": ["high", "medium", "low"]}
            },
            "required": ["has_bug", "description", "suggestion", "severity"]
        }
    }
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": f"审查代码：\n{code}"}],
    tools=[{"type": "function", "function": f} for f in functions],
    tool_choice={"type": "function", "function": {"name": "output_code_review"}}
)

# 解析函数调用参数（保证是合法 JSON）
tool_call = response.choices[0].message.tool_calls[0]
result = json.loads(tool_call.function.arguments)

处理格式错误（Retry）

LLM 偶尔会输出格式错误的内容，需要重试：

python

from langchain.output_parsers import OutputFixingParser

# 包装原 Parser，自动修复格式错误
fixing_parser = OutputFixingParser.from_llm(
    parser=parser,
    llm=llm
)

try:
    result = parser.parse(output)
except:
    # 格式错误时，让 LLM 自己修复
    result = fixing_parser.parse(output)

常用输出格式

JSON（最常用）

python

# 要求输出 JSON
format_instructions = """
以 JSON 格式输出，格式如下：
{
  "field1": "value1",
  "field2": ["item1", "item2"]
}

只输出 JSON，不要用 ```json ``` 包裹。
"""

Markdown

python

format_instructions = """
以 Markdown 格式输出：
- 使用二级标题（##）分节
- 代码用 ```语言 包裹
- 列表用 - 表示
"""

CSV

python

format_instructions = """
以 CSV 格式输出，第一行是表头：
name,age,city
Alice,25,Beijing
Bob,30,Shanghai
"""

输出解析（Output Parsing） ​

为什么需要输出解析 ​

方法一：在 Prompt 中要求格式化输出 ​

LangChain - PydanticOutputParser（更推荐） ​

方法三：Function Calling（最稳定） ​

处理格式错误（Retry） ​

常用输出格式 ​

JSON（最常用） ​

Markdown ​

CSV ​

相关资源 ​

输出解析（Output Parsing）

为什么需要输出解析

方法一：在 Prompt 中要求格式化输出

LangChain - PydanticOutputParser（更推荐）

方法三：Function Calling（最稳定）

处理格式错误（Retry）

常用输出格式

JSON（最常用）

Markdown

CSV

相关资源