nexent 上传文件之图片处理逻辑

针对 nexent 1.7.5.2版本

详细代码跟踪

nexent 会对上传的文件进行预处理,这里针对上传的文件为图片进行分析

首先我们看file_management_app.py 中 接收预处理请求的函数。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
@router.post("/preprocess")
async def agent_preprocess_api(
request: Request, query: str = Form(...),
files: List[UploadFile] = File(...),
authorization: Optional[str] = Header(None)
):
"""
Preprocess uploaded files and return streaming response
"""
try:
# Pre-read and cache all file contents
user_id, tenant_id, language = get_current_user_info(
authorization, request)
file_cache = []
for file in files:
try:
content = await file.read()
file_cache.append({
"filename": file.filename or "",
"content": content,
"ext": os.path.splitext(file.filename or "")[1].lower()
})
except Exception as e:
file_cache.append({
"filename": file.filename or "",
"error": str(e)
})

# Generate unique task ID for this preprocess operation
import uuid
task_id = str(uuid.uuid4())
conversation_id = request.query_params.get("conversation_id")
if conversation_id:
conversation_id = int(conversation_id)
else:
conversation_id = -1 # Default for cases without conversation_id

# Call service layer to generate streaming response
return StreamingResponse(
preprocess_files_generator(
query=query,
file_cache=file_cache,
tenant_id=tenant_id,
language=language,
task_id=task_id,
conversation_id=conversation_id
),
media_type="text/event-stream",
headers={
"Cache-Control": "no-cache",
"Connection": "keep-alive"
}
)
except Exception as e:
raise HTTPException(
status_code=500, detail=f"File preprocessing error: {str(e)}")

然后我们看 preprocess_files_generator这个函数,其中有一段

1
2
3
4
5
6
if file_data["ext"] in ['.jpg', '.jpeg', '.png', '.gif', '.bmp', '.webp']:
description = await process_image_file(query, file_data["filename"], file_data["content"], tenant_id, language)
truncation_percentage = None
else:
description, truncation_percentage = await process_text_file(query, file_data["filename"], file_data["content"], tenant_id, language)
file_descriptions.append(description)

也就是说当文件类型为 ['.jpg', '.jpeg', '.png', '.gif', '.bmp', '.webp']的时候,会将图片处理为description

再看process_image_file 函数

1
2
3
4
5
6
7
8
9
10
11
12
13
async def process_image_file(query: str, filename: str, file_content: bytes, tenant_id: str, language: str = LANGUAGE["ZH"]) -> str:
"""
Process image file, convert to text using external API
"""
# Load messages based on language
messages = get_file_processing_messages_template(language)

try:
image_stream = BytesIO(file_content)
text = convert_image_to_text(query, image_stream, tenant_id, language)
return messages["IMAGE_CONTENT_SUCCESS"].format(filename=filename, content=text)
except Exception as e:
return messages["IMAGE_CONTENT_ERROR"].format(filename=filename, error=str(e))

可以看到这里将 文件内容处理为了 BytesIO,然后计划将图片转化为文本。

io.BytesIO 是 Python 标准库 io 模块里的一个类,表示以内存缓冲区模拟的二进制“文件对象”

我们再看convert_image_to_text函数

这段函数把图像输入(路径或二进制流)交给一个视觉语言模型(OpenAIVLModel)去“分析图像并根据用户 query 生成描述文本”。最终返回模型分析结果的 .content 字段

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
def convert_image_to_text(query: str, image_input: Union[str, BinaryIO], tenant_id: str, language: str = LANGUAGE["ZH"]):
"""
Convert image to text description based on user query

Args:
query: User's question
image_input: Image input (file path or binary data)
tenant_id: Tenant ID for model configuration
language: Language code ('zh' for Chinese, 'en' for English)

Returns:
str: Image description text
"""
vlm_model_config = tenant_config_manager.get_model_config(
key=MODEL_CONFIG_MAPPING["vlm"], tenant_id=tenant_id)
image_to_text_model = OpenAIVLModel(
observer=MessageObserver(),
model_id=get_model_name_from_config(
vlm_model_config) if vlm_model_config else "",
api_base=vlm_model_config.get("base_url", ""),
api_key=vlm_model_config.get("api_key", ""),
temperature=0.7,
top_p=0.7,
frequency_penalty=0.5,
max_tokens=512
)

# Load prompts from yaml file
prompts = get_analyze_file_prompt_template(language)
system_prompt = Template(prompts['image_analysis']['system_prompt'],
undefined=StrictUndefined).render({'query': query})

return image_to_text_model.analyze_image(image_input=image_input, system_prompt=system_prompt).content

观察 analyze_file.yaml 中的image_analysis中的system_prompt

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# 文件分析 Prompt 模板
# 用于图片和长文本内容分析

image_analysis:
system_prompt: |-
用户提出了一个问题:{{ query }},请从回答这个问题的角度精简、仔细描述一下这个图片,200字以内。

**图片分析要求:**
1. 重点关注与用户问题相关的图片内容
2. 描述要精简明了,突出关键信息
3. 避免无关细节,专注于能帮助回答问题的内容
4. 保持客观描述,不要过度解读

user_prompt: |
请仔细观察这张图片,并从回答用户问题的角度进行描述。

然后将 图片、提示词,都丢给了 之前创建的模型,执行analyze_image函数转化为一段description

OpenAIVLModelnexent 封装的模型,继承了OpenAIModel模型

analyze_image函数基本等同于OpenAIModel模型的prepare_image_message

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
class OpenAIVLModel(OpenAIModel):
def __init__(self, observer: MessageObserver, temperature=0.7, top_p=0.7, frequency_penalty=0.5, max_tokens=512,
*args, **kwargs):
super().__init__(observer=observer, *args, **kwargs)
self.temperature = temperature
self.top_p = top_p
self.frequency_penalty = frequency_penalty
self.max_tokens = max_tokens
self._current_request = None # Used to store the current request

...

def analyze_image(self, image_input: Union[str, BinaryIO],
system_prompt: str = "Please describe this picture concisely and carefully, within 200 words.", stream: bool = True,
**kwargs) -> ChatMessage:
"""
Analyze image content.

Args:
image_input: Image file path or file stream object.
system_prompt: System prompt.
stream: Whether to output in streaming mode.
**kwargs: Other parameters.

Returns:
ChatMessage: Message returned by the model.
"""
messages = self.prepare_image_message(image_input, system_prompt)
return self(messages=messages, **kwargs)

预处理完成之后,前端拿到信息,将文件封装为以下对象

1
2
3
4
5
6
7
8
minio_files: [ {
"object_name": "attachments/20251116205511_52154066408b41d7b09a049c07cdc654.JPEG",
"name": "6.JPEG",
"type": "image",
"size": 342328,
"url": "",
"description":
}]

然后再发请求给语言大模型,进行后续的步骤(见其他文章)。

总结

nexent 对上传文件做了预处理,如果是图片的话,会根据以下提示词

1
2
3
4
5
6
7
8
system_prompt: |-
用户提出了一个问题:{{ query }},请从回答这个问题的角度精简、仔细描述一下这个图片,200字以内。

**图片分析要求:**
1. 重点关注与用户问题相关的图片内容
2. 描述要精简明了,突出关键信息
3. 避免无关细节,专注于能帮助回答问题的内容
4. 保持客观描述,不要过度解读

将用户的问题 放入 query,然后和图片一起交给OpenAIVLModel模型,执行analyze_image函数,返回一段对图片的 description

预处理结束之后,前端会再发请求给后端,请求中包括了 文件的信息,包括之前生成的description。然后后端再对请求进行处理,处理之后再发给大语言模型解决问题。