90 16 212

John Leimgruber III

ubergarm

https://emptyduck.com

ubergarm

AI & ML interests

Open LLMs and Astrophotography image processing.

Recent Activity

liked a model about 2 hours ago

rednote-hilab/dots.llm1.inst

new activity about 2 hours ago

ubergarm/DeepSeek-R1-0528-GGUF:1.5 bpw

new activity about 2 hours ago

anikifoss/DeepSeek-R1-0528-DQ4_K_R4:Wahoo thanks for sharing your work!

View all activity

Organizations

None yet

ubergarm's activity

liked a model about 2 hours ago

rednote-hilab/dots.llm1.inst

Text Generation • Updated about 19 hours ago • 23 • 65

New activity in ubergarm/DeepSeek-R1-0528-GGUF about 2 hours ago

1.5 bpw

➕ 1

#6 opened 6 days ago by

lmganon123

New activity in anikifoss/DeepSeek-R1-0528-DQ4_K_R4 about 2 hours ago

Wahoo thanks for sharing your work!

#1 opened 5 days ago by

ubergarm

New activity in ubergarm/DeepSeek-R1-0528-GGUF about 2 hours ago

benchmarks

👍 1

#8 opened 1 day ago by

BernardH

New activity in ubergarm/DeepSeek-R1-0528-GGUF about 3 hours ago

Thanks for your work! Any chance for something between Q2_K_R and Q3_K_R?

👀 👍 2

#7 opened 1 day ago by

Panchovix

updated a model 1 day ago

ubergarm/DeepSeek-R1-0528-GGUF

Text Generation • Updated 1 day ago • 1.71k • 15

liked a model 2 days ago

shisa-ai/shisa-v2-llama3.1-405b-GGUF

Updated 3 days ago • 752 • 2

liked a model 3 days ago

nm-testing/Qwen3-30B-A3B-awq-w4a16-g128-sym

Updated 3 days ago • 12 • 1

replied to eaddario's post 3 days ago

This is a good question, Ed. As we've discussed I'm still developing an intuition for these kinds of things.

In my limited experience there tend to be two more common scenarios:

The quantization doesn't damage the model too much and so is not immediately noticeable during inferencing. Probably 𝜌PPL is over 95%
The model barely works, can't form sentences, repeats small phrases forever and is very damaged.

Rarely have I seen a situation that is kind of in-between where the model is obviously acting different, but still is somewhat coherent though makes a lot of mistakes. That might be an interesting place to explore for this "cut-off" so to speak. I wish I had more stats on it. Specifically it happened on my first exllamav3 exl3 quantization of a "faile" ParetoQ QAT of a 1B model quantized to 2bpw lol:

https://gist.github.com/ubergarm/9d560bab80241b90dac802e91b656743#references

The drop down there shows the model is somewhat coherent, but definitely goofed up pretty good haha...

Anyway, I'll keep my eye on 𝜌PPL more closely as I'm running a lot of KLD comparisons lately. Cheers!

reacted to eaddario's post with 🚀 3 days ago

Post

1231

Layer-wise and Pruned versions of google/gemma-3-12b-it

After enhancing llama.cpp to handle user-defined quantization levels for arbitrary tensors (https://github.com/ggml-org/llama.cpp/pull/12511), I have added an option to prune whole layers (https://github.com/ggml-org/llama.cpp/pull/13037), and have published two versions of google/gemma-3-12b-it for demo and testing purposes:

* Tesor-wise: eaddario/gemma-3-12b-it-GGUF
* Pruned: eaddario/gemma-3-12b-it-pruned-GGUF

Even though the Perplexity scores of the pruned version are 3 times higher, the ARC, HellaSwag, MMLU, Truthful QA and WinoGrande scores are holding remarkably well, considering two layers were removed (26 and 29). This seems to support Xin Men et al conclusions in ShortGPT: Layers in Large Language Models are More Redundant Than You Expect (2403.03853)

Results summary in the model's card and test results in the ./scores directory. Questions/feedback is always welcomed.