# 知识库API
1. 目的与范围 #
本文档介绍Ragflow-Plus中知识库管理的REST API端点,包括知识库CRUD操作、文档管理、解析控制和嵌入配置。知识库API为管理文档集合及其关联处理流水线提供主要接口。
关于文档和文件存储操作,请参阅 文件管理API。关于使用知识库的对话和聊天功能,请参阅 对话与聊天API。关于管理功能(包括用户管理和系统配置),请参阅 管理与Admin API。
2. API架构概述 #
知识库API采用分层架构,在HTTP路由、业务逻辑和数据处理组件之间实现清晰分离。
2.1 API请求流程 #
API请求流程架构:
Flask Routes] end subgraph Service Service[KnowledgebaseService
Business Logic] end subgraph Parser Parser[document_parser.py
Document Processing] end subgraph DB DB[MySQL Database
knowledgebase, document tables] end subgraph Storage MinIO[MinIO Object Storage
File Storage] ES[Elasticsearch
Vector Index] Embedding[Embedding API
bge-m3, Ollama] end Client --> Routes Routes --> Service Service --> Parser Service --> DB Parser --> MinIO Parser --> ES Parser --> Embedding
2.2 核心API端点 #
知识库API提供以下主要端点类别:
2.2.1 知识库管理端点 #
GET /knowledgebases- 获取知识库列表POST /knowledgebases- 创建知识库GET /knowledgebases/{kb_id}- 获取知识库详情PUT /knowledgebases/{kb_id}- 更新知识库DELETE /knowledgebases/{kb_id}- 删除知识库DELETE /knowledgebases/batch- 批量删除知识库
2.2.2 文档管理端点 #
GET /knowledgebases/{kb_id}/documents- 获取知识库文档列表POST /knowledgebases/{kb_id}/documents- 添加文档到知识库DELETE /knowledgebases/documents/{doc_id}- 删除文档
2.2.3 文档处理端点 #
POST /knowledgebases/documents/{doc_id}/parse- 解析文档GET /knowledgebases/documents/{doc_id}/parse/progress- 获取解析进度POST /knowledgebases/{kb_id}/batch_parse_sequential/start- 启动顺序批量解析GET /knowledgebases/{kb_id}/batch_parse_sequential/progress- 获取批量解析进度
2.2.4 系统配置端点 #
GET /knowledgebases/system_embedding_config- 获取系统嵌入配置POST /knowledgebases/system_embedding_config- 设置系统嵌入配置
3. 知识库管理 #
知识库管理端点提供对存储在MySQL knowledgebase表中的知识库实体的完整生命周期控制。
3.1 知识库CRUD操作 #
3.1.1 列出知识库 #
端点: GET /api/v1/knowledgebases
查询参数:
currentPage(int): 页码(默认:1)size(int): 每页大小(默认:10)name(string): 按知识库名称过滤sort_by(string): 排序字段 - "name", "create_time", "create_date"(默认:"create_time")sort_order(string): 排序顺序 - "asc", "desc"(默认:"desc")
响应示例:
{
"code": 0,
"data": {
"list": [
{
"id": "kb_uuid",
"name": "Knowledge Base Name",
"description": "Description text",
"language": "Chinese",
"permission": "me",
"create_time": "2024-01-01T00:00:00Z",
"document_count": 10,
"chunk_count": 150
}
],
"total": 100,
"currentPage": 1,
"size": 10
}
}3.1.2 创建知识库 #
端点: POST /api/v1/knowledgebases
请求体:
{
"name": "My Knowledge Base",
"description": "Knowledge base description",
"language": "Chinese",
"permission": "me",
"creator_id": "user_uuid",
"embd_id": "bge-m3"
}响应示例:
{
"code": 0,
"data": {
"id": "kb_uuid",
"name": "My Knowledge Base",
"description": "Knowledge base description",
"language": "Chinese",
"permission": "me",
"create_time": "2024-01-01T00:00:00Z"
}
}3.1.3 获取知识库详情 #
端点: GET /api/v1/knowledgebases/{kb_id}
响应示例:
{
"code": 0,
"data": {
"id": "kb_uuid",
"name": "Knowledge Base Name",
"description": "Description text",
"language": "Chinese",
"permission": "me",
"create_time": "2024-01-01T00:00:00Z",
"update_time": "2024-01-02T00:00:00Z",
"document_count": 10,
"chunk_count": 150,
"embd_id": "bge-m3",
"embd_model": "BGE-M3"
}
}3.1.4 更新知识库 #
端点: PUT /api/v1/knowledgebases/{kb_id}
请求体:
{
"name": "Updated Knowledge Base Name",
"description": "Updated description",
"permission": "team"
}3.1.5 删除知识库 #
端点: DELETE /api/v1/knowledgebases/{kb_id}
响应示例:
{
"code": 0,
"message": "Knowledge base deleted successfully"
}3.1.6 批量删除知识库 #
端点: DELETE /api/v1/knowledgebases/batch
请求体:
{
"kb_ids": ["kb_uuid_1", "kb_uuid_2", "kb_uuid_3"]
}4. 文档管理 #
文档管理端点处理知识库中文档的添加、列出和删除。
4.1 文档集成流水线 #
文档集成流程:
- 文件上传:文件上传到MinIO对象存储
- 文档创建:在MySQL中创建文档记录
- 文件关联:建立文件与文档的关联关系
- 文档解析:调用文档解析器处理文档内容
- 分块生成:生成文档分块
- 向量生成:为分块生成嵌入向量
- 索引存储:将向量和文本存储到Elasticsearch
4.2 添加文档到知识库 #
端点: POST /api/v1/knowledgebases/{kb_id}/documents
请求格式: multipart/form-data
请求参数:
files(file[]): 要上传的文件(支持多文件)parser_id(string, 可选): 解析器IDparser_config(object, 可选): 解析器配置
响应示例:
{
"code": 0,
"data": {
"documents": [
{
"id": "doc_uuid",
"name": "document.pdf",
"status": "pending",
"file_id": "file_uuid"
}
]
}
}4.3 列出知识库文档 #
端点: GET /api/v1/knowledgebases/{kb_id}/documents
查询参数:
currentPage(int): 页码(默认:1)size(int): 每页大小(默认:10)status(string, 可选): 文档状态过滤 - "pending", "parsing", "parsed", "error"
响应示例:
{
"code": 0,
"data": {
"list": [
{
"id": "doc_uuid",
"name": "document.pdf",
"status": "parsed",
"chunk_count": 50,
"create_time": "2024-01-01T00:00:00Z",
"parse_time": "2024-01-01T00:05:00Z"
}
],
"total": 10,
"currentPage": 1,
"size": 10
}
}4.4 删除文档 #
端点: DELETE /api/v1/knowledgebases/documents/{doc_id}
响应示例:
{
"code": 0,
"message": "Document deleted successfully"
}5. 文档解析 #
文档解析端点控制文档处理流水线的执行。
5.1 解析文档 #
端点: POST /api/v1/knowledgebases/documents/{doc_id}/parse
请求体:
{
"parser_id": "mineru",
"parser_config": {
"mode": "ocr",
"chunk_size": 500,
"chunk_overlap": 50
}
}响应示例:
{
"code": 0,
"data": {
"doc_id": "doc_uuid",
"status": "parsing",
"message": "Document parsing started"
}
}5.2 获取解析进度 #
端点: GET /api/v1/knowledgebases/documents/{doc_id}/parse/progress
响应示例:
{
"code": 0,
"data": {
"doc_id": "doc_uuid",
"status": "parsing",
"progress": 65,
"current_step": "chunking",
"total_chunks": 100,
"processed_chunks": 65,
"message": "Processing chunks..."
}
}5.3 批量解析 #
端点: POST /api/v1/knowledgebases/{kb_id}/batch_parse_sequential/start
请求体:
{
"doc_ids": ["doc_uuid_1", "doc_uuid_2", "doc_uuid_3"],
"parser_config": {
"mode": "ocr",
"chunk_size": 500
}
}响应示例:
{
"code": 0,
"data": {
"batch_id": "batch_uuid",
"total_docs": 3,
"status": "processing"
}
}5.4 获取批量解析进度 #
端点: GET /api/v1/knowledgebases/{kb_id}/batch_parse_sequential/progress
查询参数:
batch_id(string): 批量处理ID
响应示例:
{
"code": 0,
"data": {
"batch_id": "batch_uuid",
"total_docs": 3,
"processed_docs": 2,
"failed_docs": 0,
"status": "processing",
"progress": 66.67
}
}6. 嵌入模型配置 #
嵌入模型配置端点管理知识库的嵌入模型设置。
6.1 获取系统嵌入配置 #
端点: GET /api/v1/knowledgebases/system_embedding_config
响应示例:
{
"code": 0,
"data": {
"default_embd_id": "bge-m3",
"available_models": [
{
"id": "bge-m3",
"name": "BGE-M3",
"dimension": 1024,
"max_length": 8192
},
{
"id": "text-embedding-ada-002",
"name": "OpenAI Ada-002",
"dimension": 1536,
"max_length": 8191
}
]
}
}6.2 设置系统嵌入配置 #
端点: POST /api/v1/knowledgebases/system_embedding_config
请求体:
{
"default_embd_id": "bge-m3"
}6.3 获取知识库嵌入配置 #
端点: GET /api/v1/knowledgebases/embedding_config?kb_id={kb_id}
响应示例:
{
"code": 0,
"data": {
"kb_id": "kb_uuid",
"embd_id": "bge-m3",
"embd_model": "BGE-M3",
"dimension": 1024
}
}6.4 获取可用嵌入模型 #
端点: GET /api/v1/knowledgebases/embedding_models/{kb_id}
响应示例:
{
"code": 0,
"data": {
"available_models": [
{
"id": "bge-m3",
"name": "BGE-M3",
"dimension": 1024,
"max_length": 8192,
"recommended": true
},
{
"id": "text-embedding-ada-002",
"name": "OpenAI Ada-002",
"dimension": 1536,
"max_length": 8191,
"recommended": false
}
]
}
}7. 分块管理 #
分块管理端点提供对文档分块的访问和管理。
7.1 获取文档分块 #
端点: GET /api/v1/knowledgebases/documents/{doc_id}/chunks
查询参数:
currentPage(int): 页码(默认:1)size(int): 每页大小(默认:20)
响应示例:
{
"code": 0,
"data": {
"list": [
{
"id": "chunk_uuid",
"content": "分块内容文本",
"chunk_index": 0,
"token_count": 150,
"vector_id": "vector_uuid"
}
],
"total": 50,
"currentPage": 1,
"size": 20
}
}7.2 更新分块 #
端点: PUT /api/v1/knowledgebases/chunks/{chunk_id}
请求体:
{
"content": "更新后的分块内容"
}7.3 删除分块 #
端点: DELETE /api/v1/knowledgebases/chunks/{chunk_id}
8. 图像管理 #
图像管理端点处理文档中的图像内容。
8.1 获取文档图像 #
端点: GET /api/v1/knowledgebases/documents/{doc_id}/images
响应示例:
{
"code": 0,
"data": {
"images": [
{
"id": "image_uuid",
"url": "https://minio.example.com/bucket/image.jpg",
"chunk_id": "chunk_uuid",
"page_number": 1
}
]
}
}8.2 搜索图像 #
端点: GET /api/v1/knowledgebases/{kb_id}/images/search
查询参数:
query(string): 搜索关键词currentPage(int): 页码size(int): 每页大小
9. 错误处理 #
9.1 常见错误 #
400 Bad Request:请求参数错误401 Unauthorized:未授权访问403 Forbidden:禁止访问404 Not Found:资源不存在500 Internal Server Error:服务器内部错误
9.2 错误响应格式 #
{
"code": 400,
"message": "Invalid request parameters",
"data": null,
"error": "详细错误信息"
}10. 最佳实践 #
10.1 知识库创建 #
- 选择合适的嵌入模型以匹配文档语言
- 设置适当的权限级别("me"或"team")
- 提供清晰的描述以便后续管理
10.2 文档上传 #
- 使用批量上传端点处理多个文件
- 监控解析进度以避免超时
- 处理大文件时使用分块上传
10.3 解析配置 #
- 根据文档类型选择合适的解析模式
- 调整分块大小以平衡检索精度和性能
- 使用批量解析处理大量文档
10.4 性能优化 #
- 使用分页参数限制响应大小
- 缓存频繁访问的知识库信息
- 异步处理长时间运行的解析任务
11. 总结 #
知识库API为Ragflow-Plus提供了完整的知识库和文档管理能力。通过RESTful接口,用户可以创建和管理知识库,上传和处理文档,配置嵌入模型,并监控处理进度。这些API为构建基于RAG的应用程序提供了坚实的基础。