“你见过AI写的社区报告吗?它不仅不摸鱼,还能引用数据、自动分组、总结洞察,连老板都自愧不如!”
目录
项目简介
GraphRAG 是什么?一句话:
它是让AI在知识图谱里“串门”,自动发现小团体,然后写出一份份结构化、带数据引用的“社区报告”!
- 支持多种数据导入(CSV、Neo4j等)
- 自动构建知识图谱
- 图聚类/社区发现
- 文本分块+向量化+检索
- RAG增强问答
- 社区报告自动生成(今天的主角!)
GraphRAG的工作流程
让我们用一张“脑补流程图”来感受下:
graph TD
A[原始数据] --> B[图构建]
B --> C[社区发现]
C --> D[社区内容聚合]
D --> E[社区报告生成]
E --> F[RAG增强问答/可视化]
1. 数据导入与图构建
- 支持CSV、数据库、Neo4j等多种格式
- 自动识别实体、关系,生成节点和边
2. 社区发现
- 用Louvain、Leiden等算法,把大图分成若干“社区”
- 每个社区就是一个“主题小圈子”
3. 社区内容聚合
- 把每个社区的节点、边、文本都收集起来
- 统计关键词、代表节点、主要关系
4. 社区报告生成
- 用LLM(大语言模型)+定制Prompt,自动写“社区报告”
- 报告内容包括:标题、摘要、影响力评分、详细洞察、数据引用
5. RAG增强问答/可视化
- 用户提问时,先检索相关社区和报告,再让LLM生成答案
- 支持社区结构、报告内容的可视化
社区报告:AI的“八卦小报”
你以为AI只会一本正经地回答问题?
错!在GraphRAG里,AI会自动写“社区报告”,而且比你老板还会总结!
社区报告长啥样?
- 标题:一针见血,点名道姓
- 摘要:三言两语,概括全局
- 影响力评分:0-10分,谁是大佬一目了然
- 详细洞察:5-10条,每条都带数据引用,绝不空口胡说
- 数据引用:每个结论都能追溯到原始数据,老板再也不能说“你这是拍脑袋写的!”
Prompt片段(graphrag/prompts/index/community_report.py):
COMMUNITY_REPORT_PROMPT = """
...
Write a comprehensive report of a community, given a list of entities that belong to the community as well as their relationships and optional associated claims.
...
Return output as a well-formed JSON-formatted string with the following format:
{
"title": <report_title>,
"summary": <executive_summary>,
"rating": <impact_severity_rating>,
"rating_explanation": <rating_explanation>,
"findings": [
{
"summary":<insight_1_summary>,
"explanation": <insight_1_explanation>
},
...
]
}
...
Points supported by data should list their data references as follows:
"This is an example sentence supported by multiple data references [Data: <dataset name> (record ids); ...]"
...
"""
报告示例
{
"title": "Verdant Oasis Plaza and Unity March",
"summary": "The community revolves around the Verdant Oasis Plaza, which is the location of the Unity March...",
"rating": 5.0,
"rating_explanation": "The impact severity rating is moderate due to the potential for unrest or conflict during the Unity March.",
"findings": [
{
"summary": "Verdant Oasis Plaza as the central location",
"explanation": "Verdant Oasis Plaza is the central entity in this community... [Data: Entities (5), Relationships (37, 38, 39, 40, 41,+more)]"
},
...
]
}
代码揭秘:社区报告是怎么炼成的?
1. 社区报告的“炼金术士”——CommunityReportsExtractor
graphrag/index/operations/summarize_communities/community_reports_extractor.py
class CommunityReportsExtractor:
...
async def __call__(self, input_text: str):
# 1. 构建Prompt
prompt = self._extraction_prompt.format(
input_text=input_text,
max_report_length=str(self._max_report_length),
)
# 2. 调用LLM生成报告
response = await self._model.achat(
prompt,
json=True,
name="create_community_report",
json_model=CommunityReportResponse,
)
# 3. 解析结构化输出
output = response.parsed_response
...
return CommunityReportsResult(
structured_output=output,
output=text_output,
)
2. 社区报告的“流水线”——summarize_communities
graphrag/index/operations/summarize_communities/summarize_communities.py
async def summarize_communities(
nodes, communities, local_contexts, level_context_builder, ...
):
# 1. 统计社区层级
levels = get_levels(nodes)
# 2. 构建每个社区的上下文
for level in levels:
level_context = level_context_builder(...)
...
# 3. 对每个社区调用 _generate_report
for i, level_context in enumerate(level_contexts):
async def run_generate(record):
result = await _generate_report(
strategy_exec,
community_id=record[schemas.COMMUNITY_ID],
community_context=record[schemas.CONTEXT_STRING],
...
)
return result
local_reports = await derive_from_rows(
level_context,
run_generate,
...
)
reports.extend([lr for lr in local_reports if lr is not None])
return pd.DataFrame(reports)
3. 生成的报告去哪了?
- 存入数据库/缓存
- 作为RAG检索的“知识块”
- 展示在Web界面/Notebook里,供用户浏览、筛选、对比
结语
GraphRAG的“社区报告”功能,让AI不再只是“检索工具”,而是变成了一个会写报告、能引用数据、自动分组、结构化总结的“知识分析师”!
下次老板让你写社区分析报告,不妨让GraphRAG帮你“打个草稿”——
它不仅不摸鱼,还能帮你找到你都没发现的“八卦”!
想深入了解?快去翻翻
graphrag/index/operations/summarize_communities/目录下的源码吧,
让AI和你一起成为“社区八卦之王”!