vllm为什么 不管问啥,模型回复的内容都是一 堆!号呀,全是!!!!!!!!!!!!!!!!

#18
by chaochaoli - opened

我在v100上,只能用float16启动。
用float32可以正常输出,但是需要占用8张卡,有点费资源。

{
"id": "chatcmpl-35d9b03dec7d42079ea18490a820cd9d",
"object": "chat.completion",
"created": 1748922394,
"model": "glm4-32b-0414",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"reasoning_content": null,
"content": "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!",
"tool_calls": []
},
"logprobs": null,
"finish_reason": "length",
"stop_reason": null
}
],
"usage": {
"prompt_tokens": 59,
"total_tokens": 149,
"completion_tokens": 90,
"prompt_tokens_details": null
},
"prompt_logprobs": null
}

Sign up or log in to comment