LeanQuant commited on
Commit
76a500c
·
verified ·
1 Parent(s): 7abe3ca

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -3
README.md CHANGED
@@ -1,10 +1,41 @@
1
  ---
2
- title: README
3
- emoji: 👁
4
  colorFrom: red
5
  colorTo: gray
6
  sdk: static
7
  pinned: false
8
  ---
9
 
10
- Edit this `README.md` markdown file to author your organization card.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: DFloat11 - Lossless LLM Compression for Efficient GPU Inference
3
+ emoji:
4
  colorFrom: red
5
  colorTo: gray
6
  sdk: static
7
  pinned: false
8
  ---
9
 
10
+ # ⚡️ DFloat11: Lossless LLM Compression for Efficient GPU Inference
11
+
12
+ DFloat11 is a lossless compression framework that reduces the size of LLMs and Diffusion Models by approximately 30% while preserving bit-for-bit identical outputs to the original model. It enables efficient GPU inference on resource-constrained hardware without sacrificing accuracy.
13
+
14
+ ## 🚀 Key Features
15
+
16
+ * **Lossless Compression**: Achieves \~30% model size reduction with outputs identical to the original BFloat16 models.
17
+ * **GPU-Efficient**: All decompression is handled on-GPU, eliminating CPU overhead and host-device data transfers.
18
+ * **Scalable Performance**: Decompression overhead remains constant per forward pass and is independent of batch size.
19
+ * **Broad Model Support**: Compatible with various models, including Qwen3, Gemma3, Llama3, Phi4, Wan2.1, FLUX.1, and BAGEL.
20
+
21
+ ## 🛠 Installation
22
+
23
+ Ensure you have a CUDA-compatible GPU and PyTorch installed.
24
+
25
+ ```bash
26
+ # For CUDA 12
27
+ pip install -U dfloat11[cuda12]
28
+
29
+ # For CUDA 11
30
+ pip install -U dfloat11[cuda11]
31
+ ```
32
+
33
+ ## 🧪 Quick Start
34
+
35
+ For example usage, refer to the [examples directory](https://github.com/LeanModels/DFloat11/tree/master/examples) in the GitHub repository.
36
+
37
+ ## 📄 Learn More
38
+
39
+ * **Paper**: [70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float](https://arxiv.org/abs/2504.11651)
40
+ * **GitHub Repository**: [LeanModels/DFloat11](https://github.com/LeanModels/DFloat11)
41
+ * **Hugging Face Models**: [DFloat11 on Hugging Face](https://huggingface.co/DFloat11)