Any chance for IQ3_XXS/IQ3_XS or similar size?
Hi there, thanks for the quant! I was wondering if it was possible to get a quant of ~300GB size or so, as I have 344GB memory (between VRAM + RAM), so can't load IQ4 :(
For example, I can load https://huggingface.co/unsloth/DeepSeek-V3-0324-GGUF-UD/tree/main/UD-Q3_K_XL which is ~276GB.
@Panchovix I think the guy who quantized that pruned coder variant of V3-0324 has done it?
DevQuasar/tngtech.DeepSeek-R1T-Chimera-GGUF
I haven't tested them myself because the I only have 248GB combined VRAM/RAM
Yeah I realize this quant weighs in a little heavy at 339G which is a little tight even for 256GB RAM + 96GB VRAM.... Honestly I'm not sure it will finish uploading even... :fingers_crossed:
This one has a lot of iq4_ks
layers which is pretty fast on CUDA, but yeah I don't have two RTX PRO 6000s myself either hah...
Oh I think I can't fit Q3_K_M (or near to the limit), but got Q3_K_S from here and it works.
https://huggingface.co/bullerwins/DeepSeek-R1T-Chimera-GGUF/tree/main/DeepSeek-R1T-Chimera-Q3_K_S
But I feel the quants of @ubergarm could have better quality with the imatrix.