大模型驱动的会议人声内容总结技术浅析

2025 年 11 月
一	二	三	四	五	六	日
	1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30

一、引言

会议内容总结是提升协作效率的关键。结合语音识别、自然语言处理和大语言模型，可自动生成结构化摘要。本文介绍技术要点与成熟实现路径。

二、技术架构概览

2.1 整体流程

音频输入 → 语音识别(ASR) → 文本预处理 → 大模型总结 → 结构化输出
    ↓           ↓              ↓            ↓            ↓
 降噪处理   说话人分离    文本清洗      内容理解      格式优化

2.2 核心模块

音频处理模块：降噪、格式转换、分段
语音识别模块：ASR、说话人识别、时间戳
文本处理模块：清洗、分段、去重
大模型模块：内容理解、摘要生成、结构化
后处理模块：格式优化、关键信息提取

三、核心技术要点

3.1 语音识别（ASR）

主流方案对比

方案	优势	适用场景	成本
Whisper (OpenAI)	高准确率、多语言、开源	通用场景	免费
Azure Speech	企业级、高可用	商业应用	按量付费
Google Speech-to-Text	云端服务、易集成	Web应用	按量付费
阿里云ASR	中文优化、国内服务	国内项目	按量付费
讯飞ASR	中文识别优秀	中文场景	按量付费

Whisper 实现示例

import whisper

# 加载模型（base/small/medium/large）
model = whisper.load_model("base")

# 转录音频
result = model.transcribe("meeting.mp3", 
    language="zh",           # 指定语言
    task="transcribe",       # 转录任务
    verbose=True,           # 显示进度
    fp16=False              # 使用FP32精度
)

# 获取带时间戳的文本
transcription = result["text"]
segments = result["segments"]  # 包含时间戳的片段

说话人分离（Speaker Diarization）

from pyannote.audio import Pipeline

# 加载说话人分离模型
pipeline = Pipeline.from_pretrained(
    "pyannote/speaker-diarization-3.1",
    use_auth_token="YOUR_TOKEN"
)

# 执行说话人分离
diarization = pipeline("meeting.wav")

# 获取说话人标签
for turn, _, speaker in diarization.itertracks(yield_label=True):
    print(f"{speaker}: {turn.start:.1f}s - {turn.end:.1f}s")

3.2 文本预处理

关键处理步骤

import re
from datetime import datetime

def preprocess_transcription(segments, diarization):
    """
    预处理转录文本
    """
    processed_text = []

    for segment in segments:
        # 1. 文本清洗
        text = clean_text(segment['text'])

        # 2. 说话人标注
        speaker = get_speaker(segment['start'], diarization)

        # 3. 时间戳格式化
        timestamp = format_timestamp(segment['start'])

        # 4. 构建结构化数据
        processed_text.append({
            'speaker': speaker,
            'timestamp': timestamp,
            'text': text,
            'duration': segment['end'] - segment['start']
        })

    return processed_text

def clean_text(text):
    """文本清洗"""
    # 去除填充词
    text = re.sub(r'\b(嗯|啊|那个|这个|就是)\b', '', text)
    # 去除重复标点
    text = re.sub(r'[。，]{2,}', '。', text)
    # 去除多余空格
    text = re.sub(r'\s+', ' ', text)
    return text.strip()

3.3 大模型总结

方案一：使用 OpenAI GPT

from openai import OpenAI

client = OpenAI(api_key="YOUR_API_KEY")

def summarize_meeting(transcription_text):
    """
    使用GPT总结会议内容
    """
    prompt = f"""
请对以下会议内容进行总结，要求：
1. 提取关键议题和决策
2. 列出行动项（Action Items）
3. 标注重要时间节点
4. 识别不同发言人的主要观点

会议内容：
{transcription_text}
"""

    response = client.chat.completions.create(
        model="gpt-4-turbo-preview",
        messages=[
            {"role": "system", "content": "你是一个专业的会议记录助手。"},
            {"role": "user", "content": prompt}
        ],
        temperature=0.3,
        max_tokens=2000
    )

    return response.choices[0].message.content

方案二：使用 Claude (Anthropic)

import anthropic

client = anthropic.Anthropic(api_key="YOUR_API_KEY")

def summarize_with_claude(transcription_text):
    """
    使用Claude总结会议内容
    """
    message = client.messages.create(
        model="claude-3-opus-20240229",
        max_tokens=2000,
        temperature=0.3,
        system="你是一个专业的会议记录助手，擅长提取关键信息和结构化总结。",
        messages=[
            {
                "role": "user",
                "content": f"请总结以下会议内容：\n\n{transcription_text}"
            }
        ]
    )

    return message.content[0].text

方案三：使用国产大模型（智谱GLM、通义千问等）

import zhipuai

zhipuai.api_key = "YOUR_API_KEY"

def summarize_with_glm(transcription_text):
    """
    使用智谱GLM总结会议内容
    """
    response = zhipuai.model_api.invoke(
        model="glm-4",
        prompt=[
            {
                "role": "system",
                "content": "你是一个专业的会议记录助手。"
            },
            {
                "role": "user",
                "content": f"请总结以下会议内容：\n\n{transcription_text}"
            }
        ],
        temperature=0.3,
        top_p=0.7
    )

    return response['data']['choices'][0]['content']

3.4 结构化输出

定义输出格式

def generate_structured_summary(raw_summary, transcription_data):
    """
    生成结构化会议总结
    """
    structured_summary = {
        "meeting_info": {
            "date": get_meeting_date(),
            "duration": calculate_duration(transcription_data),
            "participants": extract_participants(transcription_data)
        },
        "key_topics": extract_topics(raw_summary),
        "decisions": extract_decisions(raw_summary),
        "action_items": extract_action_items(raw_summary),
        "timeline": build_timeline(transcription_data),
        "full_summary": raw_summary
    }

    return structured_summary

def extract_action_items(text):
    """
    提取行动项
    """
    # 使用正则表达式或大模型提取
    pattern = r'(?:行动项|待办|TODO|Action Item)[：:]\s*(.+?)(?:\n|$)'
    action_items = re.findall(pattern, text, re.IGNORECASE)
    return action_items

四、成熟技术线路

4.1 线路一：Whisper + GPT-4（推荐）

适用场景：通用场景，追求高准确率

技术栈：

ASR: Whisper (large-v3)
说话人分离: pyannote.audio
大模型: GPT-4 Turbo
部署: Python + FastAPI

优势：

准确率高
多语言支持
开源可控

实现示例：

from fastapi import FastAPI, UploadFile, File
import whisper
import openai

app = FastAPI()

# 初始化模型（启动时加载）
whisper_model = whisper.load_model("large-v3")
openai_client = OpenAI()

@app.post("/summarize-meeting")
async def summarize_meeting(audio: UploadFile = File(...)):
    # 1. 保存音频文件
    audio_path = f"/tmp/{audio.filename}"
    with open(audio_path, "wb") as f:
        f.write(await audio.read())

    # 2. 语音识别
    result = whisper_model.transcribe(audio_path, language="zh")
    transcription = result["text"]

    # 3. 大模型总结
    summary = summarize_with_gpt(transcription)

    # 4. 返回结果
    return {
        "transcription": transcription,
        "summary": summary
    }

4.2 线路二：Azure Speech + Claude

适用场景：企业级应用，需要高可用

技术栈：

ASR: Azure Speech Services
说话人分离: Azure Speaker Recognition
大模型: Claude 3 Opus
部署: Node.js + Express

优势：

企业级SLA
易于扩展
多区域部署

实现示例：

const sdk = require("microsoft-cognitiveservices-speech-sdk");
const Anthropic = require("@anthropic-ai/sdk");

// Azure Speech 配置
const speechConfig = sdk.SpeechConfig.fromSubscription(
    process.env.AZURE_SPEECH_KEY,
    process.env.AZURE_SPEECH_REGION
);

// Claude 客户端
const anthropic = new Anthropic({
    apiKey: process.env.ANTHROPIC_API_KEY
});

async function transcribeAndSummarize(audioBuffer) {
    // 1. Azure Speech 识别
    const audioConfig = sdk.AudioConfig.fromStream(
        sdk.AudioInputStream.createPushStream()
    );
    const recognizer = new sdk.SpeechRecognizer(speechConfig, audioConfig);

    const transcription = await new Promise((resolve, reject) => {
        let fullText = "";
        recognizer.recognized = (s, e) => {
            if (e.result.reason === sdk.ResultReason.RecognizedSpeech) {
                fullText += e.result.text + " ";
            }
        };
        recognizer.sessionStopped = () => resolve(fullText);
        recognizer.startContinuousRecognitionAsync();
    });

    // 2. Claude 总结
    const message = await anthropic.messages.create({
        model: "claude-3-opus-20240229",
        max_tokens: 2000,
        messages: [{
            role: "user",
            content: `请总结以下会议内容：\n\n${transcription}`
        }]
    });

    return {
        transcription,
        summary: message.content[0].text
    };
}

4.3 线路三：本地部署方案（Whisper + 本地大模型）

适用场景：数据安全要求高，需要离线运行

技术栈：

ASR: Whisper (本地部署)
大模型: Llama 3 / Qwen / ChatGLM (本地部署)
部署: Python + Ollama / vLLM

优势：

数据不出本地
无API调用成本
可定制化

实现示例：

import whisper
import ollama

# 本地 Whisper
whisper_model = whisper.load_model("base")

# 本地大模型 (Ollama)
def summarize_with_local_llm(text):
    response = ollama.chat(
        model='llama3:8b',
        messages=[
            {
                'role': 'system',
                'content': '你是一个专业的会议记录助手。'
            },
            {
                'role': 'user',
                'content': f'请总结以下会议内容：\n\n{text}'
            }
        ]
    )
    return response['message']['content']

# 完整流程
def process_meeting(audio_path):
    # 1. 语音识别
    result = whisper_model.transcribe(audio_path)
    transcription = result["text"]

    # 2. 本地大模型总结
    summary = summarize_with_local_llm(transcription)

    return {
        "transcription": transcription,
        "summary": summary
    }

4.4 线路四：流式处理方案

适用场景：实时会议，需要实时总结

技术栈：

ASR: 流式 Whisper / WebRTC
大模型: 流式 API (GPT-4 Stream / Claude Stream)
部署: WebSocket + Server-Sent Events

实现示例：

from fastapi import FastAPI, WebSocket
import asyncio
import whisper

app = FastAPI()
whisper_model = whisper.load_model("base")

@app.websocket("/ws/meeting")
async def websocket_endpoint(websocket: WebSocket):
    await websocket.accept()

    # 音频缓冲区
    audio_buffer = []

    try:
        while True:
            # 接收音频数据
            data = await websocket.receive_bytes()
            audio_buffer.append(data)

            # 每5秒处理一次
            if len(audio_buffer) >= 5:
                # 转录
                transcription = whisper_model.transcribe(
                    b''.join(audio_buffer)
                )["text"]

                # 流式总结
                async for chunk in stream_summarize(transcription):
                    await websocket.send_json({
                        "type": "summary_chunk",
                        "content": chunk
                    })

                audio_buffer = []
    except Exception as e:
        await websocket.close()

async def stream_summarize(text):
    """流式总结"""
    async for chunk in openai_client.chat.completions.create(
        model="gpt-4-turbo-preview",
        messages=[{"role": "user", "content": f"总结：{text}"}],
        stream=True
    ):
        if chunk.choices[0].delta.content:
            yield chunk.choices[0].delta.content

五、关键技术优化

5.1 音频预处理优化

import librosa
import noisereduce as nr

def preprocess_audio(audio_path):
    """
    音频预处理：降噪、标准化
    """
    # 加载音频
    audio, sr = librosa.load(audio_path, sr=16000)

    # 降噪
    audio_denoised = nr.reduce_noise(
        y=audio,
        sr=sr,
        stationary=False,
        prop_decrease=0.8
    )

    # 音量标准化
    audio_normalized = librosa.util.normalize(audio_denoised)

    return audio_normalized, sr

5.2 长文本分块处理

def chunk_text(text, max_length=4000):
    """
    将长文本分块，避免超出模型token限制
    """
    chunks = []
    sentences = text.split('。')

    current_chunk = ""
    for sentence in sentences:
        if len(current_chunk) + len(sentence) < max_length:
            current_chunk += sentence + "。"
        else:
            chunks.append(current_chunk)
            current_chunk = sentence + "。"

    if current_chunk:
        chunks.append(current_chunk)

    return chunks

def summarize_long_meeting(transcription):
    """
    处理长会议内容
    """
    chunks = chunk_text(transcription)
    summaries = []

    for chunk in chunks:
        summary = summarize_with_gpt(chunk)
        summaries.append(summary)

    # 合并总结
    final_summary = summarize_with_gpt("\n\n".join(summaries))
    return final_summary

5.3 提示词工程优化

MEETING_SUMMARY_PROMPT = """
你是一个专业的会议记录助手。请对以下会议内容进行结构化总结。

要求：
1. **会议基本信息**
   - 会议主题
   - 参与人员
   - 会议时长

2. **核心议题**
   - 列出讨论的主要议题（3-5个）
   - 每个议题的关键观点

3. **重要决策**
   - 明确记录所有决策事项
   - 标注决策人和时间

4. **行动项（Action Items）**
   - 负责人
   - 截止时间
   - 具体任务

5. **后续跟进**
   - 需要进一步讨论的事项
   - 待确认信息

请使用Markdown格式输出，确保结构清晰。

会议内容：
{transcription}
"""

def summarize_with_optimized_prompt(transcription):
    prompt = MEETING_SUMMARY_PROMPT.format(transcription=transcription)
    # ... 调用大模型

六、性能优化与成本控制

6.1 缓存策略

import redis
import hashlib
import json

redis_client = redis.Redis(host='localhost', port=6379)

def get_cached_summary(audio_hash):
    """获取缓存的总结"""
    cached = redis_client.get(f"summary:{audio_hash}")
    if cached:
        return json.loads(cached)
    return None

def cache_summary(audio_hash, summary):
    """缓存总结"""
    redis_client.setex(
        f"summary:{audio_hash}",
        3600 * 24,  # 24小时过期
        json.dumps(summary)
    )

def process_with_cache(audio_path):
    # 计算音频hash
    with open(audio_path, 'rb') as f:
        audio_hash = hashlib.md5(f.read()).hexdigest()

    # 检查缓存
    cached = get_cached_summary(audio_hash)
    if cached:
        return cached

    # 处理并缓存
    result = process_meeting(audio_path)
    cache_summary(audio_hash, result)
    return result

6.2 异步处理

from celery import Celery

celery_app = Celery('meeting_processor')

@celery_app.task
def process_meeting_async(audio_path):
    """异步处理会议音频"""
    return process_meeting(audio_path)

# 调用
task = process_meeting_async.delay(audio_path)
result = task.get()  # 获取结果

七、最佳实践总结

7.1 技术选型建议

场景	推荐方案	理由
通用场景	Whisper + GPT-4	平衡准确率和成本
企业级	Azure Speech + Claude	高可用、SLA保障
数据敏感	本地Whisper + 本地LLM	数据不出本地
实时场景	流式ASR + 流式LLM	低延迟
成本敏感	Whisper + 国产大模型	成本更低

7.2 实施步骤

MVP阶段：Whisper + GPT-3.5，快速验证
优化阶段：加入说话人分离、文本预处理
生产阶段：性能优化、缓存、异步处理
扩展阶段：多语言支持、实时处理

7.3 注意事项

隐私安全：敏感会议数据加密存储
准确率优化：针对特定领域微调模型
成本控制：合理使用缓存，避免重复计算
用户体验：提供进度反馈，优化响应时间

八、总结

会议内容总结系统涉及语音识别、自然语言处理和大语言模型。选择合适的技术栈，结合优化策略，可构建高效、准确的自动总结系统。随着模型能力提升，该领域仍有优化空间。

面试 · 2025年11月27日 0

一、引言

二、技术架构概览

2.1 整体流程

2.2 核心模块

三、核心技术要点

3.1 语音识别（ASR）

主流方案对比

Whisper 实现示例

说话人分离（Speaker Diarization）

3.2 文本预处理

关键处理步骤

3.3 大模型总结

方案一：使用 OpenAI GPT

方案二：使用 Claude (Anthropic)

方案三：使用国产大模型（智谱GLM、通义千问等）

3.4 结构化输出

定义输出格式

四、成熟技术线路

4.1 线路一：Whisper + GPT-4（推荐）

4.2 线路二：Azure Speech + Claude

4.3 线路三：本地部署方案（Whisper + 本地大模型）

4.4 线路四：流式处理方案

五、关键技术优化

5.1 音频预处理优化

5.2 长文本分块处理

5.3 提示词工程优化

六、性能优化与成本控制

6.1 缓存策略

6.2 异步处理

七、最佳实践总结

7.1 技术选型建议

7.2 实施步骤

7.3 注意事项

八、总结

You may also like...

【干货】个人小工具大集合

从输入URL到页面加载的主干流程

密码保护：OPC UA .NET Standard Stack 常用方法示例以及参数

发表回复 取消回复

发表回复取消回复