Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

deepseek-ai
/
DeepSeek-R1-Distill-Llama-70B

Text Generation
Transformers
Safetensors
llama
conversational
text-generation-inference
Model card Files Files and versions Community
20
New discussion
Resources
  • PR & discussions documentation
  • Code of Conduct
  • Hub documentation

Tool Use

#21 opened 2 months ago by
jhuntbach

使用llama-factory训练70B最低的硬件配置是什么?

#20 opened 3 months ago by
Lraos

Do not require reasoning but just the ouput

1
#19 opened 3 months ago by
ameyv6

chat_template中为什么要把assistant角色中的<think>过程切掉

👍 3
#18 opened 3 months ago by
zhm0

能否发布一个awq版本的模型:deepseek-r1-distill-llama-70b-AWQ

#17 opened 3 months ago by
classdemo

Update README.md

#16 opened 4 months ago by
shubham-kothari

chnsmth

#15 opened 4 months ago by
chnsmth

Does DeepSeek-Llama-70B support tensor parallelism for multi-GPU inference?

1
#14 opened 4 months ago by
Merk0701234

weight files naming is not regular rule

#13 opened 4 months ago by
haili-tian

How much vram do you need?

8
#12 opened 4 months ago by
hyun10

Upload IMG_4815.jpeg

#11 opened 4 months ago by
H3mzy11

Amazon Sagemaker deployment failing with CUDA OutOfMemory error

3
#10 opened 4 months ago by
neelkapadia

<thinking> is the proper tag?

👍 1
4
#8 opened 4 months ago by
McUH

Add pipeline tag

#7 opened 4 months ago by
nielsr

Template

👍 1
#6 opened 5 months ago by
tugot17

Lora

#4 opened 5 months ago by
PSM24

SFT (Non-RL) distillation is this good on a sub-100B model?

3
#2 opened 5 months ago by
KrishnaKaasyap

Lfg

🔥 9
#1 opened 5 months ago by
Prakh24s
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs