alexnasa commited on
Commit
802a102
Β·
verified Β·
1 Parent(s): 0301e15

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -89
README.md CHANGED
@@ -1,89 +1,8 @@
1
- # Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment
2
-
3
- This repository is the official implementation of [Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment](https://arxiv.org/abs/2505.18600), led by
4
-
5
- [Bryan Sangwoo Kim](https://scholar.google.com/citations?user=ndWU-84AAAAJ&hl=en), [Jeongsol Kim](https://jeongsol.dev/), [Jong Chul Ye](https://bispl.weebly.com/professor.html)
6
-
7
- ![main figure](assets/teaser.jpg)
8
-
9
- [![Project Website](https://img.shields.io/badge/Project-Website-blue)](https://bryanswkim.github.io/chain-of-zoom/)
10
- [![arXiv](https://img.shields.io/badge/arXiv-2505.18600-b31b1b.svg)](https://arxiv.org/abs/2505.18600)
11
-
12
- ---
13
- ## πŸ”₯ Summary
14
-
15
- Modern single-image super-resolution (SISR) models deliver photo-realistic results at the scale factors on which they are trained, but show notable drawbacks:
16
-
17
- 1. **Blur and artifacts** when pushed to magnify beyond its training regime
18
- 2. **High computational costs and inefficiency** of retraining models when we want to magnify further
19
-
20
- This brings us to the fundamental question: \
21
- _How can we effectively utilize super-resolution models to explore much higher resolutions than they were originally trained for?_
22
-
23
- We address this via **Chain-of-Zoom** πŸ”Ž, a model-agnostic framework that factorizes SISR into an autoregressive chain of intermediate scale-states with multi-scale-aware prompts.
24
- CoZ repeatedly re-uses a backbone SR model, decomposing the conditional probability into tractable sub-problems to achieve extreme resolutions without additional training.
25
- Because visual cues diminish at high magnifications, we augment each zoom step with multi-scale-aware text prompts generated by a prompt extractor VLM.
26
- This prompt extractor can be fine-tuned through GRPO with a critic VLM to further align text guidance towards human preference.
27
-
28
- ## πŸ—“ ️News
29
- - [May 2025] Code and paper are uploaded.
30
-
31
- ## πŸ› οΈ Setup
32
- First, create your environment. We recommend using the following commands.
33
-
34
- ```
35
- git clone https://github.com/bryanswkim/Chain-of-Zoom.git
36
- cd Chain-of-Zoom
37
-
38
- conda create -n coz python=3.10
39
- conda activate coz
40
- pip install -r requirements.txt
41
- ```
42
-
43
- ## ⏳ Models
44
-
45
- |Models|Checkpoints|
46
- |:---------|:--------|
47
- |Stable Diffusion v3|[Hugging Face](https://huggingface.co/stabilityai/stable-diffusion-3-medium)
48
- |Qwen2.5-VL-3B-Instruct|[Hugging Face](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct)
49
- |RAM|[Hugging Face](https://huggingface.co/spaces/xinyu1205/recognize-anything/blob/main/ram_swin_large_14m.pth)
50
-
51
- ## πŸŒ„ Example
52
- You can quickly check the results of using **CoZ** with the following example:
53
- ```
54
- python inference_coz.py \
55
- -i samples \
56
- -o inference_results/coz_vlmprompt \
57
- --rec_type recursive_multiscale \
58
- --prompt_type vlm \
59
- --lora_path ckpt/SR_LoRA/model_20001.pkl \
60
- --vae_path ckpt/SR_VAE/vae_encoder_20001.pt \
61
- --pretrained_model_name_or_path 'stabilityai/stable-diffusion-3-medium-diffusers' \
62
- --ram_ft_path ckpt/DAPE/DAPE.pth \
63
- --ram_path ckpt/RAM/ram_swin_large_14m.pth \
64
- ```
65
- Which will give a result like below:
66
-
67
- ![main figure](assets/example_result.png)
68
-
69
- ## πŸ”¬ Efficient Memory
70
- Using ```--efficient_memory``` allows CoZ to run on a single GPU with 24GB VRAM, but highly increases inference time due to offloading. \
71
- We recommend using two GPUs.
72
-
73
- ## πŸ“ Citation
74
- If you find our method useful, please cite as below or leave a star to this repository.
75
-
76
- ```
77
- @article{kim2025chain,
78
- title={Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment},
79
- author={Kim, Bryan Sangwoo and Kim, Jeongsol and Ye, Jong Chul},
80
- journal={arXiv preprint arXiv:2505.18600},
81
- year={2025}
82
- }
83
- ```
84
-
85
- ## πŸ€— Acknowledgements
86
- We thank the authors of [OSEDiff](https://github.com/cswry/OSEDiff) for sharing their awesome work!
87
-
88
- > [!note]
89
- > This work is currently in the preprint stage, and there may be some changes to the code.
 
1
+ title: Chain-of-Zoom
2
+ emoji: πŸš€
3
+ colorFrom: green
4
+ colorTo: green
5
+ sdk: gradio
6
+ sdk_version: 5.31.0
7
+ app_file: app.py
8
+ pinned: false