Any chance for IQ3_XXS/IQ3_XS or similar size?

#2
by Panchovix - opened

Hi there, thanks for the quant! I was wondering if it was possible to get a quant of ~300GB size or so, as I have 344GB memory (between VRAM + RAM), so can't load IQ4 :(

For example, I can load https://huggingface.co/unsloth/DeepSeek-V3-0324-GGUF-UD/tree/main/UD-Q3_K_XL which is ~276GB.

@Panchovix I think the guy who quantized that pruned coder variant of V3-0324 has done it?

DevQuasar/tngtech.DeepSeek-R1T-Chimera-GGUF

I haven't tested them myself because the I only have 248GB combined VRAM/RAM

Yeah I realize this quant weighs in a little heavy at 339G which is a little tight even for 256GB RAM + 96GB VRAM.... Honestly I'm not sure it will finish uploading even... :fingers_crossed:

This one has a lot of iq4_ks layers which is pretty fast on CUDA, but yeah I don't have two RTX PRO 6000s myself either hah...

Oh I think I can't fit Q3_K_M (or near to the limit), but got Q3_K_S from here and it works.

https://huggingface.co/bullerwins/DeepSeek-R1T-Chimera-GGUF/tree/main/DeepSeek-R1T-Chimera-Q3_K_S

But I feel the quants of @ubergarm could have better quality with the imatrix.

Sign up or log in to comment