DeepSeekR1蒸馏Qwen2.5 32B版本经过Int4 GPTQ Marlin算法量化的版本，推荐RTX4090 24GB 2块GPU推理，性能达到1700tokens/秒，最优并发128同时使用。比PF16版本性能相当，ceval评测82.3，显存降低50%

Safetensors

Model size

5.7B params

Tensor type

I64

I32

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ExceedZhang/DeepSeek-R1-Distill-Qwen-32B-W4A16-G128

Base model

Quantized

(135)

this model

ExceedZhang
/

DeepSeek-R1-Distill-Qwen-32B-W4A16-G128