LeanQuant commited on
Commit
220cb33
·
verified ·
1 Parent(s): 3556a42

Add files using upload-large-folder tool

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. .gitattributes +1 -0
  2. README.md +58 -0
  3. config.json +48 -0
  4. generation_config.json +9 -0
  5. lm_head.safetensors +3 -0
  6. model.safetensors +3 -0
  7. model_embed_tokens.safetensors +3 -0
  8. model_layers_0.safetensors +3 -0
  9. model_layers_1.safetensors +3 -0
  10. model_layers_10.safetensors +3 -0
  11. model_layers_11.safetensors +3 -0
  12. model_layers_12.safetensors +3 -0
  13. model_layers_13.safetensors +3 -0
  14. model_layers_14.safetensors +3 -0
  15. model_layers_15.safetensors +3 -0
  16. model_layers_16.safetensors +3 -0
  17. model_layers_17.safetensors +3 -0
  18. model_layers_18.safetensors +3 -0
  19. model_layers_19.safetensors +3 -0
  20. model_layers_2.safetensors +3 -0
  21. model_layers_20.safetensors +3 -0
  22. model_layers_21.safetensors +3 -0
  23. model_layers_22.safetensors +3 -0
  24. model_layers_23.safetensors +3 -0
  25. model_layers_24.safetensors +3 -0
  26. model_layers_25.safetensors +3 -0
  27. model_layers_26.safetensors +3 -0
  28. model_layers_27.safetensors +3 -0
  29. model_layers_28.safetensors +3 -0
  30. model_layers_29.safetensors +3 -0
  31. model_layers_3.safetensors +3 -0
  32. model_layers_30.safetensors +3 -0
  33. model_layers_31.safetensors +3 -0
  34. model_layers_32.safetensors +3 -0
  35. model_layers_33.safetensors +3 -0
  36. model_layers_34.safetensors +3 -0
  37. model_layers_35.safetensors +3 -0
  38. model_layers_36.safetensors +3 -0
  39. model_layers_37.safetensors +3 -0
  40. model_layers_38.safetensors +3 -0
  41. model_layers_39.safetensors +3 -0
  42. model_layers_4.safetensors +3 -0
  43. model_layers_40.safetensors +3 -0
  44. model_layers_41.safetensors +3 -0
  45. model_layers_42.safetensors +3 -0
  46. model_layers_43.safetensors +3 -0
  47. model_layers_44.safetensors +3 -0
  48. model_layers_45.safetensors +3 -0
  49. model_layers_46.safetensors +3 -0
  50. model_layers_47.safetensors +3 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## DFloat11 Compressed Model: `deepseek-ai/DeepSeek-R1-Distill-Qwen-14B`
2
+
3
+ This is a **losslessly compressed** version of [`deepseek-ai/DeepSeek-R1-Distill-Qwen-14B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B) using our custom **DFloat11** format. The outputs of this compressed model are **bit-for-bit identical** to the original BFloat16 model, while reducing GPU memory consumption by approximately **30%**.
4
+
5
+ ### 🔍 How It Works
6
+
7
+ DFloat11 compresses model weights using **Huffman coding** of BFloat16 exponent bits, combined with **hardware-aware algorithmic designs** that enable efficient on-the-fly decompression directly on the GPU. During inference, the weights remain compressed in GPU memory and are **decompressed just before matrix multiplications**, then **immediately discarded after use** to minimize memory footprint.
8
+
9
+ Key benefits:
10
+
11
+ * **No CPU decompression or host-device data transfer** -- all operations are handled entirely on the GPU.
12
+ * **Decompression overhead is constant** per forward pass and **independent of batch size**, making DFloat11 increasingly efficient at larger batch sizes.
13
+ * DFloat11 is **much faster than CPU-offloading approaches**, enabling practical deployment in memory-constrained environments.
14
+ * At **batch size = 1**, inference is approximately **2× slower** than the original BF16 model, but the performance gap **narrows significantly** with larger batches.
15
+ * The compression is **fully lossless**, guaranteeing that the model’s outputs are **bit-for-bit identical** to those of the original model.
16
+
17
+ ### 🔧 How to Use
18
+
19
+ 1. Install the DFloat11 pip package *(installs the CUDA kernel automatically; requires a CUDA-compatible GPU and PyTorch installed)*:
20
+
21
+ ```bash
22
+ pip install dfloat11[cuda12]
23
+ # or if you have CUDA version 11:
24
+ # pip install dfloat11[cuda11]
25
+ ```
26
+
27
+ 2. To use the DFloat11 model, run the following example code in Python:
28
+
29
+ ```python
30
+ import torch
31
+ from dfloat11 import DFloat11Model
32
+ from transformers import AutoTokenizer
33
+
34
+ model_id = "DFloat11/DeepSeek-R1-Distill-Qwen-14B-DF11"
35
+
36
+ model = DFloat11Model.from_pretrained(model_id, device_map="auto")
37
+
38
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
39
+ tokenizer.pad_token = tokenizer.eos_token
40
+
41
+ prompt = "Question: What is a binary tree and its applications? Answer:"
42
+ inputs = tokenizer(prompt, return_tensors="pt", padding=True).to(model.device)
43
+
44
+ with torch.no_grad():
45
+ output = model.generate(
46
+ **inputs,
47
+ max_new_tokens=256,
48
+ do_sample=True,
49
+ )
50
+
51
+ print(tokenizer.batch_decode(output, skip_special_tokens=True))
52
+ ```
53
+
54
+ ### 📄 Learn More
55
+
56
+ * **Paper**: [70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float](https://arxiv.org/abs/2504.11651)
57
+ * **GitHub**: [https://github.com/LeanModels/DFloat11](https://github.com/LeanModels/DFloat11)
58
+ * **HuggingFace**: [https://huggingface.co/DFloat11](https://huggingface.co/DFloat11)
config.json ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Qwen2ForCausalLM"
4
+ ],
5
+ "attention_dropout": 0.0,
6
+ "bos_token_id": 151643,
7
+ "dfloat11_config": {
8
+ "bytes_per_thread": 8,
9
+ "pattern_dict": {
10
+ "lm_head": [],
11
+ "model.embed_tokens": [],
12
+ "model.layers.\\d+": [
13
+ "self_attn.q_proj",
14
+ "self_attn.k_proj",
15
+ "self_attn.v_proj",
16
+ "self_attn.o_proj",
17
+ "mlp.gate_proj",
18
+ "mlp.up_proj",
19
+ "mlp.down_proj"
20
+ ]
21
+ },
22
+ "threads_per_block": [
23
+ 512
24
+ ],
25
+ "version": "0.2.0"
26
+ },
27
+ "eos_token_id": 151643,
28
+ "hidden_act": "silu",
29
+ "hidden_size": 5120,
30
+ "initializer_range": 0.02,
31
+ "intermediate_size": 13824,
32
+ "max_position_embeddings": 131072,
33
+ "max_window_layers": 48,
34
+ "model_type": "qwen2",
35
+ "num_attention_heads": 40,
36
+ "num_hidden_layers": 48,
37
+ "num_key_value_heads": 8,
38
+ "rms_norm_eps": 1e-05,
39
+ "rope_scaling": null,
40
+ "rope_theta": 1000000.0,
41
+ "sliding_window": 131072,
42
+ "tie_word_embeddings": false,
43
+ "torch_dtype": "bfloat16",
44
+ "transformers_version": "4.51.3",
45
+ "use_cache": true,
46
+ "use_sliding_window": false,
47
+ "vocab_size": 152064
48
+ }
generation_config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 151646,
4
+ "do_sample": true,
5
+ "eos_token_id": 151643,
6
+ "temperature": 0.6,
7
+ "top_p": 0.95,
8
+ "transformers_version": "4.51.3"
9
+ }
lm_head.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f4eecf4772bdac3bb9c84d51fa2228cbd2c3ce165d1328dd3155615381c1c18e
3
+ size 1056784505
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b2cffc93ba4adf070473e7ce99d9cf0e86c18f6ebeb2b8ff31aa77c1add4ea57
3
+ size 10360
model_embed_tokens.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6c40cc4c5d2d761f18fff8a7123cae61ba0409c966a0ddb087bf6aee86ac8bd1
3
+ size 1071226508
model_layers_0.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:64592e1de150efa621f05780c7f600d470871b87535ccdb91f755dce1c0cfd52
3
+ size 373047105
model_layers_1.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:466ac8e8297745cec7a05a96e47f462aca00e9dd5d520ef5c1e164c5a973d475
3
+ size 407124933
model_layers_10.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c71eaa3f0fa3de50249353c28d03302b71d5f5eca3c889fa36e21d4927744d8a
3
+ size 372696451
model_layers_11.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b2e58cfeeb590d2bb82379ea6dfb8f53c5c43dc7024b5250e75d9b0d62d64b9d
3
+ size 372735217
model_layers_12.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:127023e6a433ee5b95102faa6925d247708df11021852b107db2b10a389650a8
3
+ size 372512186
model_layers_13.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f1ab9aa91cbefae8dec0fb7720a1b3d1cd5bea9ad879d52b8e2979644c5567cf
3
+ size 372644304
model_layers_14.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1f1d1e5f24f38635c8621062de0e5d55146a359a77f11d9653848d79a348f9e3
3
+ size 372689109
model_layers_15.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a563e438369a75b25fb2ddda9a794fb2607104022573d45671d18097563fb173
3
+ size 372827573
model_layers_16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:14c54b5cac32bd8adc33a770fb36ed078c075b403737bfc4cd889b99269876d1
3
+ size 372841593
model_layers_17.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f0e359b3e1354b2b33bd067facc9f7b73560c32d7b5606dc49c0d007effa0512
3
+ size 372850701
model_layers_18.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3d3f49e2e46e29f2445b4052d18b8977900591f39d57e3e139510a552bdfd4a5
3
+ size 372969609
model_layers_19.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a2e31168db3bfa6e72083fa50eec09d3a42c409d80562bf71719eb267a99f14a
3
+ size 372981033
model_layers_2.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fd9b8c08dc171b33db62865878cdd91e13253b1a43bc36775b9dd1e8ef0db9a5
3
+ size 404482467
model_layers_20.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:66a570c8005cfebf78591c31b15a921fe5ca89682c1980b1a678beec6abea6d4
3
+ size 373027443
model_layers_21.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:485274bd83d85e4d9525ffc38a91fe51f531ac230481c04fec8d0414d6a1fab9
3
+ size 372985375
model_layers_22.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fe00a055b7145fa00824418f1bfcb8954a697de04e1967efc683a3a106f27218
3
+ size 373042609
model_layers_23.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9184467af1930ceb44a3509e8e65807be0b9e9c6887ebecf0d906dea7e6e44de
3
+ size 373149813
model_layers_24.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:89740d9a88aab40899c7110af66825acef3fc530c27b34bf0f6418a8ceb114ce
3
+ size 373064580
model_layers_25.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cc413b679743fd878d69fc17f082e37c8a50b11b7e9c9b24ba1b5b934f132f2c
3
+ size 373101108
model_layers_26.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7d2e46a671e40725463997bc5252911fb04f7d874d4e66b128b2d24dc3f3037e
3
+ size 373020025
model_layers_27.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:38e0e436ec35a653564e30985233b3e0d6be927c20c11c821dcf96d405f6f2c8
3
+ size 373088055
model_layers_28.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cbbcca033611ca315614aebc33dc905e6fd662b62a8f0992565835cb4a6acfc0
3
+ size 373286696
model_layers_29.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:64278000c1b16ab10ec6d6b873453c54d81b9056def35a4806abbf31390b622d
3
+ size 373241854
model_layers_3.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:620cc20a943e8dee242ad79567f511e535f8a2dac4029595cdc1c0d33b6ab530
3
+ size 399937743
model_layers_30.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:68b6bfbc38362951fc091dba8add7bdc8da4200b2a57ed3bcc1a1a8ff99cc8e7
3
+ size 373062312
model_layers_31.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4aa426af68b53e64575db31045967c9bdb0807706ec35f2e062033bde8b9570f
3
+ size 372957134
model_layers_32.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2963fcdb6dbb0a0d26caa89c7702bba3d4229eebc13164ea7edf2ad42ae25557
3
+ size 372970261
model_layers_33.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:75ededb70f30c148589eee0e2932066da0d5ffe5097d640d8483eccfbadc1e3c
3
+ size 373000655
model_layers_34.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ec98b16ba564b3da43c95106ebaa13a77eb597a9ea23c090b04325eda62be552
3
+ size 372965181
model_layers_35.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0188c4d576a6c9fb8e1e3de08cfc5737702b1629b34ebb451d3b439478c409a6
3
+ size 372928983
model_layers_36.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:03013e93fa7a89b0dbe2c9169385f55be68dda93d0378aeb899ed38e808eda67
3
+ size 373134987
model_layers_37.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2a7f5ad89914849f2d941b76e69240cb38bfecadb433ddc7bd64cfcb527d86ff
3
+ size 372845454
model_layers_38.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ea3532d5f432c5337310ddf2600e883f909fb834f95f8546a88649b1e02e487b
3
+ size 372803552
model_layers_39.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9985c3fdd12379c58fd8de831d326435630a4028f699cae2278e3a6fac087c95
3
+ size 372868360
model_layers_4.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6badf9c17e52e2e528ff39ab353d39d3f995900a368d8872fd961dfd5c459766
3
+ size 399649128
model_layers_40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bf93dfd5a74ec1b40d12cfa506fb60ab59a6095319729a3c63f2ff7d9785bc1b
3
+ size 372918947
model_layers_41.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:26d845994e38c1a4ff8a431427294c7ec8d9624ce7bf998364dac36faa273b8d
3
+ size 372935894
model_layers_42.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:42ad6f8434059cbefa0883b51ef4498bfbcbc6f80723d90b1f0328dc23d6a8d3
3
+ size 372957980
model_layers_43.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dad2b60612dc3c0156fd633effc2afe9cbb19c685b34a9940e2ccfd2d18fa1b1
3
+ size 373331033
model_layers_44.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:47815aa00bbdd8712ee2f23131d8f3ca05750660c5570f7dfa3628d9036846da
3
+ size 373460684
model_layers_45.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0b24af7998a022273d5eff0d65108fe424b9e103f061faf73bda8aa7dcb7d255
3
+ size 373484283
model_layers_46.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:781ed23e51fe41dbef445aff47bebd0d41eeab2eae1149f24b917d6cc50ae7f1
3
+ size 373640710
model_layers_47.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8fc3c41c904f744d70044c8004eed03b10ad3e3e2e14177a0894af169b408b90
3
+ size 373214307