DeepSeek-R1-0528-Qwen3-8B-GPTQ-Int4-Int8Mix

基础型 deepseek-ai/DeepSeek-R1-0528-Qwen3-8B

【模型更新日期】

2025-05-29
1. 首次commit

【依赖】

vllm==0.9.0
transformers==4.52.3
### 【💡新版 VLLM 注意事项💡】

1. 建议使用V0推理模式

启动vllm之前,先设置环境变量

export VLLM_USE_V1=0

【模型列表】

文件大小 最近更新时间
6.9GB 2025-05-29

【模型下载】

from modelscope import snapshot_download
snapshot_download('tclf90/DeepSeek-R1-0528-Qwen3-8B-GPTQ-Int4-Int8Mix', cache_dir="本地路径")

【介绍】

DeepSeek-R1-0528

DeepSeek-V3

Paper Link👁️

1. Introduction

The DeepSeek R1 model has undergone a minor version upgrade, with the current version being DeepSeek-R1-0528. In the latest update, DeepSeek R1 has significantly improved its depth of reasoning and inference capabilities by leveraging increased computational resources and introducing algorithmic optimization mechanisms during post-training. The model has demonstrated outstanding performance across various benchmark evaluations, including mathematics, programming, and general logic. Its overall performance is now approaching that of leading models, such as O3 and Gemini 2.5 Pro.

Compared to the previous version, the upgraded model shows significant improvements in handling complex reasoning tasks. For instance, in the AIME 2025 test, the model’s accuracy has increased from 70% in the previous version to 87.5% in the current version. This advancement stems from enhanced thinking depth during the reasoning process: in the AIME test set, the previous model used an average of 12K tokens per question, whereas the new version averages 23K tokens per question.

Beyond its improved reasoning capabilities, this version also offers a reduced hallucination rate, enhanced support for function calling, and better experience for vibe coding.

2. Evaluation Results

DeepSeek-R1-0528

For all our models, the maximum generation length is set to 64K tokens. For benchmarks requiring sampling, we use a temperature of $0.6$, a top-p value of $0.95$, and generate 16 responses per query to estimate pass@1.

Category Benchmark (Metric) DeepSeek R1 DeepSeek R1 0528
General
MMLU-Redux (EM) 92.9 93.4
MMLU-Pro (EM) 84.0 85.0
GPQA-Diamond (Pass@1) 71.5 81.0
SimpleQA (Correct) 30.1 27.8
FRAMES (Acc.) 82.5 83.0
Humanity's Last Exam (Pass@1) 8.5 17.7
Code
LiveCodeBench (2408-2505) (Pass@1) 63.5 73.3
Codeforces-Div1 (Rating) 1530 1930
SWE Verified (Resolved) 49.2 57.6
Aider-Polyglot (Acc.) 53.3 71.6
Math
AIME 2024 (Pass@1) 79.8 91.4
AIME 2025 (Pass@1) 70.0 87.5
HMMT 2025 (Pass@1) 41.7 79.4
CNMO 2024 (Pass@1) 78.8 86.9
Tools
BFCL_v3_MultiTurn (Acc) - 37.0
Tau-Bench (Pass@1) - 53.5(Airline)/63.9(Retail)
Note: We use Agentless framework to evaluate model performance on SWE-Verified. We only evaluate text-only prompts in HLE testsets. GPT-4.1 is employed to act user role in Tau-bench evaluation.

DeepSeek-R1-0528-Qwen3-8B

Meanwhile, we distilled the chain-of-thought from DeepSeek-R1-0528 to post-train Qwen3 8B Base, obtaining DeepSeek-R1-0528-Qwen3-8B. This model achieves state-of-the-art (SOTA) performance among open-source models on the AIME 2024, surpassing Qwen3 8B by +10.0% and matching the performance of Qwen3-235B-thinking. We believe that the chain-of-thought from DeepSeek-R1-0528 will hold significant importance for both academic research on reasoning models and industrial development focused on small-scale models.

AIME 24 AIME 25 HMMT Feb 25 GPQA Diamond LiveCodeBench (2408-2505)
Qwen3-235B-A22B 85.7 81.5 62.5 71.1 66.5
Qwen3-32B 81.4 72.9 - 68.4 -
Qwen3-8B 76.0 67.3 - 62.0 -
Phi-4-Reasoning-Plus-14B 81.3 78.0 53.6 69.3 -
Gemini-2.5-Flash-Thinking-0520 82.3 72.0 64.2 82.8 62.3
o3-mini (medium) 79.6 76.7 53.3 76.8 65.9
DeepSeek-R1-0528-Qwen3-8B 86.0 76.3 61.5 61.1 60.5

5. License

This code repository is licensed under MIT License. The use of DeepSeek-R1 models is also subject to MIT License. DeepSeek-R1 series (including Base and Chat) supports commercial use and distillation.

6. Citation

@misc{deepseekai2025deepseekr1incentivizingreasoningcapability,
      title={DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning}, 
      author={DeepSeek-AI},
      year={2025},
      eprint={2501.12948},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2501.12948}, 
}

7. Contact

If you have any questions, please raise an issue or contact us at service@deepseek.com.

Downloads last month
282
Safetensors
Model size
2.48B params
Tensor type
I32
·
FP16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for QuantTrio/DeepSeek-R1-0528-Qwen3-8B-GPTQ-Int4-Int8Mix

Quantized
(63)
this model