Any chance for IQ3_XXS/IQ3_XS or similar size?

by Panchovix - opened 20 days ago

20 days ago

Hi there, thanks for the quant! I was wondering if it was possible to get a quant of ~300GB size or so, as I have 344GB memory (between VRAM + RAM), so can't load IQ4 :(

For example, I can load https://huggingface.co/unsloth/DeepSeek-V3-0324-GGUF-UD/tree/main/UD-Q3_K_XL which is ~276GB.

gghfez

20 days ago

@Panchovix I think the guy who quantized that pruned coder variant of V3-0324 has done it?

DevQuasar/tngtech.DeepSeek-R1T-Chimera-GGUF

I haven't tested them myself because the I only have 248GB combined VRAM/RAM

ubergarm

Owner 19 days ago

Yeah I realize this quant weighs in a little heavy at 339G which is a little tight even for 256GB RAM + 96GB VRAM.... Honestly I'm not sure it will finish uploading even... :fingers_crossed:

This one has a lot of iq4_ks layers which is pretty fast on CUDA, but yeah I don't have two RTX PRO 6000s myself either hah...

Panchovix

19 days ago

Oh I think I can't fit Q3_K_M (or near to the limit), but got Q3_K_S from here and it works.

https://huggingface.co/bullerwins/DeepSeek-R1T-Chimera-GGUF/tree/main/DeepSeek-R1T-Chimera-Q3_K_S

But I feel the quants of @ubergarm could have better quality with the imatrix.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment