Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

deepseek-ai
/
DeepSeek-R1-0528-Qwen3-8B

Text Generation
Transformers
Safetensors
qwen3
conversational
text-generation-inference
Model card Files Files and versions Community
14
New discussion
Resources
  • PR & discussions documentation
  • Code of Conduct
  • Hub documentation

Model collapse after SFT

1
#14 opened 3 days ago by
Banjiuyufen

Vocab missing tool-related strings in chat template, poor performance with tools

#13 opened 3 days ago by
mattjcly

Can you please release how you post-train qwen3 on deepseek?

2
#12 opened 7 days ago by
ZeroWw

Tried it, but not good as expected.

3
#11 opened 8 days ago by
kk3dmax

/no_think 标签不能用了吗

4
#10 opened 8 days ago by
loong

Any plans for a Qwen3-32B model?

👍 13
7
#9 opened 8 days ago by
wanghf

BTW For programmer, `Gemma` series are best to help you write comments, docstrings, and documents.

#8 opened 8 days ago by
DOFOFFICIAL

DeepSeek-R1-Lite

❤️ 🔥 19
7
#6 opened 8 days ago by
Dampfinchen

generation_config.json is missing

👀 👍 2
#5 opened 8 days ago by
Doctor-Chad-PhD

Model broken

👍 3
8
#4 opened 8 days ago by
sm54

牛啊牛啊

2
#3 opened 8 days ago by
mrli008

Any plans on gemma series? ;-;

❤️ 4
4
#2 opened 8 days ago by
Nakdesu

Any plans on 30B-A3B model?

🔥 30
7
#1 opened 8 days ago by
xxx777xxxASD
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs