Add link to paper (#1)
Browse files- Add link to paper (4848ceb47ad401827547eb58d409745f66614b6f)
Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>
README.md
CHANGED
@@ -3,13 +3,15 @@ license: mit
|
|
3 |
---
|
4 |
|
5 |
# Sparge-attention model zoo
|
|
|
6 |
Welcome to Sparge-attention model zoo, this repo contains list of hyperparameters pre-tuned for branch of models.
|
7 |
|
|
|
|
|
8 |
## Naming of ckpt
|
9 |
The tuned ckpt is often named by following format:`${moddel name or type}_${l1}_${pv_l1}.pt`, in some cases the pv_l1 will be omitted when not choose to tune pv.
|
10 |
The larger l1 and pv_l1 make model more sparse, but may sacrifice output quality.
|
11 |
|
12 |
-
|
13 |
## Overview
|
14 |
|
15 |
| model name | tuned ckpt dir |
|
|
|
3 |
---
|
4 |
|
5 |
# Sparge-attention model zoo
|
6 |
+
|
7 |
Welcome to Sparge-attention model zoo, this repo contains list of hyperparameters pre-tuned for branch of models.
|
8 |
|
9 |
+
It was presented in the paper [SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference](https://huggingface.co/papers/2502.18137).
|
10 |
+
|
11 |
## Naming of ckpt
|
12 |
The tuned ckpt is often named by following format:`${moddel name or type}_${l1}_${pv_l1}.pt`, in some cases the pv_l1 will be omitted when not choose to tune pv.
|
13 |
The larger l1 and pv_l1 make model more sparse, but may sacrifice output quality.
|
14 |
|
|
|
15 |
## Overview
|
16 |
|
17 |
| model name | tuned ckpt dir |
|