Xiang-cd nielsr HF Staff commited on
Commit
6e9a505
·
verified ·
1 Parent(s): 51f414c

Add link to paper (#1)

Browse files

- Add link to paper (4848ceb47ad401827547eb58d409745f66614b6f)


Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -3,13 +3,15 @@ license: mit
3
  ---
4
 
5
  # Sparge-attention model zoo
 
6
  Welcome to Sparge-attention model zoo, this repo contains list of hyperparameters pre-tuned for branch of models.
7
 
 
 
8
  ## Naming of ckpt
9
  The tuned ckpt is often named by following format:`${moddel name or type}_${l1}_${pv_l1}.pt`, in some cases the pv_l1 will be omitted when not choose to tune pv.
10
  The larger l1 and pv_l1 make model more sparse, but may sacrifice output quality.
11
 
12
-
13
  ## Overview
14
 
15
  | model name | tuned ckpt dir |
 
3
  ---
4
 
5
  # Sparge-attention model zoo
6
+
7
  Welcome to Sparge-attention model zoo, this repo contains list of hyperparameters pre-tuned for branch of models.
8
 
9
+ It was presented in the paper [SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference](https://huggingface.co/papers/2502.18137).
10
+
11
  ## Naming of ckpt
12
  The tuned ckpt is often named by following format:`${moddel name or type}_${l1}_${pv_l1}.pt`, in some cases the pv_l1 will be omitted when not choose to tune pv.
13
  The larger l1 and pv_l1 make model more sparse, but may sacrifice output quality.
14
 
 
15
  ## Overview
16
 
17
  | model name | tuned ckpt dir |