LeanQuant commited on May 5

Commit

220cb33

verified ·

1 Parent(s): 3556a42

Add files using upload-large-folder tool

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.gitattributes +1 -0
README.md +58 -0
config.json +48 -0
generation_config.json +9 -0
lm_head.safetensors +3 -0
model.safetensors +3 -0
model_embed_tokens.safetensors +3 -0
model_layers_0.safetensors +3 -0
model_layers_1.safetensors +3 -0
model_layers_10.safetensors +3 -0
model_layers_11.safetensors +3 -0
model_layers_12.safetensors +3 -0
model_layers_13.safetensors +3 -0
model_layers_14.safetensors +3 -0
model_layers_15.safetensors +3 -0
model_layers_16.safetensors +3 -0
model_layers_17.safetensors +3 -0
model_layers_18.safetensors +3 -0
model_layers_19.safetensors +3 -0
model_layers_2.safetensors +3 -0
model_layers_20.safetensors +3 -0
model_layers_21.safetensors +3 -0
model_layers_22.safetensors +3 -0
model_layers_23.safetensors +3 -0
model_layers_24.safetensors +3 -0
model_layers_25.safetensors +3 -0
model_layers_26.safetensors +3 -0
model_layers_27.safetensors +3 -0
model_layers_28.safetensors +3 -0
model_layers_29.safetensors +3 -0
model_layers_3.safetensors +3 -0
model_layers_30.safetensors +3 -0
model_layers_31.safetensors +3 -0
model_layers_32.safetensors +3 -0
model_layers_33.safetensors +3 -0
model_layers_34.safetensors +3 -0
model_layers_35.safetensors +3 -0
model_layers_36.safetensors +3 -0
model_layers_37.safetensors +3 -0
model_layers_38.safetensors +3 -0
model_layers_39.safetensors +3 -0
model_layers_4.safetensors +3 -0
model_layers_40.safetensors +3 -0
model_layers_41.safetensors +3 -0
model_layers_42.safetensors +3 -0
model_layers_43.safetensors +3 -0
model_layers_44.safetensors +3 -0
model_layers_45.safetensors +3 -0
model_layers_46.safetensors +3 -0
model_layers_47.safetensors +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,58 @@

+## DFloat11 Compressed Model: `deepseek-ai/DeepSeek-R1-Distill-Qwen-14B`
+This is a **losslessly compressed** version of [`deepseek-ai/DeepSeek-R1-Distill-Qwen-14B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B) using our custom **DFloat11** format. The outputs of this compressed model are **bit-for-bit identical** to the original BFloat16 model, while reducing GPU memory consumption by approximately **30%**.
+### 🔍 How It Works
+DFloat11 compresses model weights using **Huffman coding** of BFloat16 exponent bits, combined with **hardware-aware algorithmic designs** that enable efficient on-the-fly decompression directly on the GPU. During inference, the weights remain compressed in GPU memory and are **decompressed just before matrix multiplications**, then **immediately discarded after use** to minimize memory footprint.
+Key benefits:
+* **No CPU decompression or host-device data transfer** -- all operations are handled entirely on the GPU.
+* **Decompression overhead is constant** per forward pass and **independent of batch size**, making DFloat11 increasingly efficient at larger batch sizes.
+* DFloat11 is **much faster than CPU-offloading approaches**, enabling practical deployment in memory-constrained environments.
+* At **batch size = 1**, inference is approximately **2× slower** than the original BF16 model, but the performance gap **narrows significantly** with larger batches.
+* The compression is **fully lossless**, guaranteeing that the model’s outputs are **bit-for-bit identical** to those of the original model.
+### 🔧 How to Use
+1. Install the DFloat11 pip package *(installs the CUDA kernel automatically; requires a CUDA-compatible GPU and PyTorch installed)*:
+    ```bash
+    pip install dfloat11[cuda12]
+    # or if you have CUDA version 11:
+    # pip install dfloat11[cuda11]
+    ```
+2. To use the DFloat11 model, run the following example code in Python:
+    ```python
+    import torch
+    from dfloat11 import DFloat11Model
+    from transformers import AutoTokenizer
+    model_id = "DFloat11/DeepSeek-R1-Distill-Qwen-14B-DF11"
+    model = DFloat11Model.from_pretrained(model_id, device_map="auto")
+    tokenizer = AutoTokenizer.from_pretrained(model_id)
+    tokenizer.pad_token = tokenizer.eos_token
+    prompt = "Question: What is a binary tree and its applications? Answer:"
+    inputs = tokenizer(prompt, return_tensors="pt", padding=True).to(model.device)
+    with torch.no_grad():
+        output = model.generate(
+            **inputs,
+            max_new_tokens=256,
+            do_sample=True,
+        )
+    print(tokenizer.batch_decode(output, skip_special_tokens=True))
+    ```
+### 📄 Learn More
+* **Paper**: [70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float](https://arxiv.org/abs/2504.11651)
+* **GitHub**: [https://github.com/LeanModels/DFloat11](https://github.com/LeanModels/DFloat11)
+* **HuggingFace**: [https://huggingface.co/DFloat11](https://huggingface.co/DFloat11)

config.json ADDED Viewed

	@@ -0,0 +1,48 @@

+{
+  "architectures": [
+    "Qwen2ForCausalLM"
+  ],
+  "attention_dropout": 0.0,
+  "bos_token_id": 151643,
+  "dfloat11_config": {
+    "bytes_per_thread": 8,
+    "pattern_dict": {
+      "lm_head": [],
+      "model.embed_tokens": [],
+      "model.layers.\\d+": [
+        "self_attn.q_proj",
+        "self_attn.k_proj",
+        "self_attn.v_proj",
+        "self_attn.o_proj",
+        "mlp.gate_proj",
+        "mlp.up_proj",
+        "mlp.down_proj"
+      ]
+    },
+    "threads_per_block": [
+      512
+    ],
+    "version": "0.2.0"
+  },
+  "eos_token_id": 151643,
+  "hidden_act": "silu",
+  "hidden_size": 5120,
+  "initializer_range": 0.02,
+  "intermediate_size": 13824,
+  "max_position_embeddings": 131072,
+  "max_window_layers": 48,
+  "model_type": "qwen2",
+  "num_attention_heads": 40,
+  "num_hidden_layers": 48,
+  "num_key_value_heads": 8,
+  "rms_norm_eps": 1e-05,
+  "rope_scaling": null,
+  "rope_theta": 1000000.0,
+  "sliding_window": 131072,
+  "tie_word_embeddings": false,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.51.3",
+  "use_cache": true,
+  "use_sliding_window": false,
+  "vocab_size": 152064
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 151646,
+  "do_sample": true,
+  "eos_token_id": 151643,
+  "temperature": 0.6,
+  "top_p": 0.95,
+  "transformers_version": "4.51.3"
+}

lm_head.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f4eecf4772bdac3bb9c84d51fa2228cbd2c3ce165d1328dd3155615381c1c18e
+size 1056784505

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b2cffc93ba4adf070473e7ce99d9cf0e86c18f6ebeb2b8ff31aa77c1add4ea57
+size 10360

model_embed_tokens.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6c40cc4c5d2d761f18fff8a7123cae61ba0409c966a0ddb087bf6aee86ac8bd1
+size 1071226508

model_layers_0.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:64592e1de150efa621f05780c7f600d470871b87535ccdb91f755dce1c0cfd52
+size 373047105

model_layers_1.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:466ac8e8297745cec7a05a96e47f462aca00e9dd5d520ef5c1e164c5a973d475
+size 407124933

model_layers_10.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c71eaa3f0fa3de50249353c28d03302b71d5f5eca3c889fa36e21d4927744d8a
+size 372696451

model_layers_11.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b2e58cfeeb590d2bb82379ea6dfb8f53c5c43dc7024b5250e75d9b0d62d64b9d
+size 372735217

model_layers_12.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:127023e6a433ee5b95102faa6925d247708df11021852b107db2b10a389650a8
+size 372512186

model_layers_13.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f1ab9aa91cbefae8dec0fb7720a1b3d1cd5bea9ad879d52b8e2979644c5567cf
+size 372644304

model_layers_14.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1f1d1e5f24f38635c8621062de0e5d55146a359a77f11d9653848d79a348f9e3
+size 372689109

model_layers_15.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a563e438369a75b25fb2ddda9a794fb2607104022573d45671d18097563fb173
+size 372827573

model_layers_16.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:14c54b5cac32bd8adc33a770fb36ed078c075b403737bfc4cd889b99269876d1
+size 372841593

model_layers_17.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f0e359b3e1354b2b33bd067facc9f7b73560c32d7b5606dc49c0d007effa0512
+size 372850701

model_layers_18.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3d3f49e2e46e29f2445b4052d18b8977900591f39d57e3e139510a552bdfd4a5
+size 372969609

model_layers_19.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a2e31168db3bfa6e72083fa50eec09d3a42c409d80562bf71719eb267a99f14a
+size 372981033

model_layers_2.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fd9b8c08dc171b33db62865878cdd91e13253b1a43bc36775b9dd1e8ef0db9a5
+size 404482467

model_layers_20.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:66a570c8005cfebf78591c31b15a921fe5ca89682c1980b1a678beec6abea6d4
+size 373027443

model_layers_21.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:485274bd83d85e4d9525ffc38a91fe51f531ac230481c04fec8d0414d6a1fab9
+size 372985375

model_layers_22.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fe00a055b7145fa00824418f1bfcb8954a697de04e1967efc683a3a106f27218
+size 373042609

model_layers_23.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9184467af1930ceb44a3509e8e65807be0b9e9c6887ebecf0d906dea7e6e44de
+size 373149813

model_layers_24.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:89740d9a88aab40899c7110af66825acef3fc530c27b34bf0f6418a8ceb114ce
+size 373064580

model_layers_25.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cc413b679743fd878d69fc17f082e37c8a50b11b7e9c9b24ba1b5b934f132f2c
+size 373101108

model_layers_26.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7d2e46a671e40725463997bc5252911fb04f7d874d4e66b128b2d24dc3f3037e
+size 373020025

model_layers_27.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:38e0e436ec35a653564e30985233b3e0d6be927c20c11c821dcf96d405f6f2c8
+size 373088055

model_layers_28.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cbbcca033611ca315614aebc33dc905e6fd662b62a8f0992565835cb4a6acfc0
+size 373286696

model_layers_29.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:64278000c1b16ab10ec6d6b873453c54d81b9056def35a4806abbf31390b622d
+size 373241854

model_layers_3.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:620cc20a943e8dee242ad79567f511e535f8a2dac4029595cdc1c0d33b6ab530
+size 399937743

model_layers_30.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:68b6bfbc38362951fc091dba8add7bdc8da4200b2a57ed3bcc1a1a8ff99cc8e7
+size 373062312

model_layers_31.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4aa426af68b53e64575db31045967c9bdb0807706ec35f2e062033bde8b9570f
+size 372957134

model_layers_32.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2963fcdb6dbb0a0d26caa89c7702bba3d4229eebc13164ea7edf2ad42ae25557
+size 372970261

model_layers_33.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:75ededb70f30c148589eee0e2932066da0d5ffe5097d640d8483eccfbadc1e3c
+size 373000655

model_layers_34.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ec98b16ba564b3da43c95106ebaa13a77eb597a9ea23c090b04325eda62be552
+size 372965181

model_layers_35.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0188c4d576a6c9fb8e1e3de08cfc5737702b1629b34ebb451d3b439478c409a6
+size 372928983

model_layers_36.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:03013e93fa7a89b0dbe2c9169385f55be68dda93d0378aeb899ed38e808eda67
+size 373134987

model_layers_37.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2a7f5ad89914849f2d941b76e69240cb38bfecadb433ddc7bd64cfcb527d86ff
+size 372845454

model_layers_38.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ea3532d5f432c5337310ddf2600e883f909fb834f95f8546a88649b1e02e487b
+size 372803552

model_layers_39.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9985c3fdd12379c58fd8de831d326435630a4028f699cae2278e3a6fac087c95
+size 372868360

model_layers_4.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6badf9c17e52e2e528ff39ab353d39d3f995900a368d8872fd961dfd5c459766
+size 399649128

model_layers_40.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bf93dfd5a74ec1b40d12cfa506fb60ab59a6095319729a3c63f2ff7d9785bc1b
+size 372918947

model_layers_41.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:26d845994e38c1a4ff8a431427294c7ec8d9624ce7bf998364dac36faa273b8d
+size 372935894

model_layers_42.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:42ad6f8434059cbefa0883b51ef4498bfbcbc6f80723d90b1f0328dc23d6a8d3
+size 372957980

model_layers_43.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dad2b60612dc3c0156fd633effc2afe9cbb19c685b34a9940e2ccfd2d18fa1b1
+size 373331033

model_layers_44.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:47815aa00bbdd8712ee2f23131d8f3ca05750660c5570f7dfa3628d9036846da
+size 373460684

model_layers_45.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0b24af7998a022273d5eff0d65108fe424b9e103f061faf73bda8aa7dcb7d255
+size 373484283

model_layers_46.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:781ed23e51fe41dbef445aff47bebd0d41eeab2eae1149f24b917d6cc50ae7f1
+size 373640710

model_layers_47.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8fc3c41c904f744d70044c8004eed03b10ad3e3e2e14177a0894af169b408b90
+size 373214307