patrickvonplaten cross-attention commited on
Commit
4bd5031
·
0 Parent(s):

Duplicate from cross-attention/asymmetric-autoencoder-kl-x-2

Browse files

Co-authored-by: Ruslan Vorovchenko <cross-attention@users.noreply.huggingface.co>

Files changed (5) hide show
  1. .gitattributes +35 -0
  2. README.md +61 -0
  3. compare.jpeg +0 -0
  4. config.json +37 -0
  5. diffusion_pytorch_model.bin +3 -0
.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - stable-diffusion
4
+ - stable-diffusion-diffusers
5
+ inference: false
6
+ library_name: diffusers
7
+ duplicated_from: cross-attention/asymmetric-autoencoder-kl-x-2
8
+ ---
9
+ # Asymmetric Autoencoder KL
10
+
11
+ [Designing a Better Asymmetric VQGAN for StableDiffusion](https://arxiv.org/abs/2306.04632)
12
+
13
+ ## Abstract
14
+ *StableDiffusion is a revolutionary text-to-image generator that is causing a stir in the world of image generation and editing. Unlike traditional methods that learn a diffusion model in pixel space, StableDiffusion learns a diffusion model in the latent space via a VQGAN, ensuring both efficiency and quality. It not only supports image generation tasks, but also enables image editing for real images, such as image inpainting and local editing. However, we have observed that the vanilla VQGAN used in StableDiffusion leads to significant information loss, causing distortion artifacts even in non-edited image regions. To this end, we propose a new asymmetric VQGAN with two simple designs. Firstly, in addition to the input from the encoder, the decoder contains a conditional branch that incorporates information from task-specific priors, such as the unmasked image region in inpainting. Secondly, the decoder is much heavier than the encoder, allowing for more detailed recovery while only slightly increasing the total inference cost. The training cost of our asymmetric VQGAN is cheap, and we only need to retrain a new asymmetric decoder while keeping the vanilla VQGAN encoder and StableDiffusion unchanged. Our asymmetric VQGAN can be widely used in StableDiffusion-based inpainting and local editing methods. Extensive experiments demonstrate that it can significantly improve the inpainting and editing performance, while maintaining the original text-to-image capability. The code is available at https://github.com/buxiangzhiren/Asymmetric_VQGAN/tree/main*
15
+
16
+ ## Scales
17
+ * https://huggingface.co/cross-attention/asymmetric-autoencoder-kl-x-1-5
18
+ * https://huggingface.co/cross-attention/asymmetric-autoencoder-kl-x-2
19
+
20
+ ## Diffusers
21
+ ```python
22
+ from io import BytesIO
23
+ from PIL import Image
24
+ import requests
25
+ from diffusers import AsymmetricAutoencoderKL, StableDiffusionInpaintPipeline
26
+
27
+
28
+ def download_image(url: str) -> Image.Image:
29
+ response = requests.get(url)
30
+ return Image.open(BytesIO(response.content)).convert("RGB")
31
+
32
+
33
+ prompt = "a photo of a person"
34
+ img_url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/repaint/celeba_hq_256.png"
35
+ mask_url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/repaint/mask_256.png"
36
+
37
+ image = download_image(img_url).resize((256, 256))
38
+ mask_image = download_image(mask_url).resize((256, 256))
39
+
40
+ pipe = StableDiffusionInpaintPipeline.from_pretrained("runwayml/stable-diffusion-inpainting")
41
+ pipe.vae = AsymmetricAutoencoderKL.from_pretrained("cross-attention/asymmetric-autoencoder-kl-x-2")
42
+ pipe.to("cuda")
43
+
44
+ image = pipe(prompt=prompt, image=image, mask_image=mask_image).images[0]
45
+ image.save("image.jpeg")
46
+ ```
47
+
48
+ ### Visual
49
+ _Visualization of VAE perfomance on 512x512 image with runwayml/stable-diffusion-inpainting_
50
+
51
+ <p align="center">
52
+ <br>original image, masked image, mask
53
+ <br><b>runwayml/stable-diffusion-inpainting original VAE</b>
54
+ <br><b>stabilityai/sd-vae-ft-mse VAE</b>
55
+ <br><b>Asymmetric Autoencoder KL x1.5 VAE</b>
56
+ <br><b>Asymmetric Autoencoder KL x2 VAE</b>
57
+ </p>
58
+
59
+ <p align="center">
60
+ <img src=https://huggingface.co/cross-attention/asymmetric-autoencoder-kl-x-2/resolve/main/compare.jpeg width="50%"/>
61
+ </p>
compare.jpeg ADDED
config.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "AsymmetricAutoencoderKL",
3
+ "_diffusers_version": "0.19.0.dev0",
4
+ "act_fn": "silu",
5
+ "down_block_out_channels": [
6
+ 128,
7
+ 256,
8
+ 512,
9
+ 512
10
+ ],
11
+ "down_block_types": [
12
+ "DownEncoderBlock2D",
13
+ "DownEncoderBlock2D",
14
+ "DownEncoderBlock2D",
15
+ "DownEncoderBlock2D"
16
+ ],
17
+ "in_channels": 3,
18
+ "latent_channels": 4,
19
+ "layers_per_down_block": 2,
20
+ "layers_per_up_block": 5,
21
+ "norm_num_groups": 32,
22
+ "out_channels": 3,
23
+ "sample_size": 256,
24
+ "scaling_factor": 0.18215,
25
+ "up_block_out_channels": [
26
+ 256,
27
+ 512,
28
+ 1024,
29
+ 1024
30
+ ],
31
+ "up_block_types": [
32
+ "UpDecoderBlock2D",
33
+ "UpDecoderBlock2D",
34
+ "UpDecoderBlock2D",
35
+ "UpDecoderBlock2D"
36
+ ]
37
+ }
diffusion_pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9d8bf37bcf0b253256a18485a98aa35fdcc9f87375ad617128a687ae26a974e1
3
+ size 1624658301