Spaces:
Running
on
Zero
Running
on
Zero
Update README.md
Browse files
README.md
CHANGED
@@ -1,89 +1,8 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
[](https://bryanswkim.github.io/chain-of-zoom/)
|
10 |
-
[](https://arxiv.org/abs/2505.18600)
|
11 |
-
|
12 |
-
---
|
13 |
-
## π₯ Summary
|
14 |
-
|
15 |
-
Modern single-image super-resolution (SISR) models deliver photo-realistic results at the scale factors on which they are trained, but show notable drawbacks:
|
16 |
-
|
17 |
-
1. **Blur and artifacts** when pushed to magnify beyond its training regime
|
18 |
-
2. **High computational costs and inefficiency** of retraining models when we want to magnify further
|
19 |
-
|
20 |
-
This brings us to the fundamental question: \
|
21 |
-
_How can we effectively utilize super-resolution models to explore much higher resolutions than they were originally trained for?_
|
22 |
-
|
23 |
-
We address this via **Chain-of-Zoom** π, a model-agnostic framework that factorizes SISR into an autoregressive chain of intermediate scale-states with multi-scale-aware prompts.
|
24 |
-
CoZ repeatedly re-uses a backbone SR model, decomposing the conditional probability into tractable sub-problems to achieve extreme resolutions without additional training.
|
25 |
-
Because visual cues diminish at high magnifications, we augment each zoom step with multi-scale-aware text prompts generated by a prompt extractor VLM.
|
26 |
-
This prompt extractor can be fine-tuned through GRPO with a critic VLM to further align text guidance towards human preference.
|
27 |
-
|
28 |
-
## π οΈNews
|
29 |
-
- [May 2025] Code and paper are uploaded.
|
30 |
-
|
31 |
-
## π οΈ Setup
|
32 |
-
First, create your environment. We recommend using the following commands.
|
33 |
-
|
34 |
-
```
|
35 |
-
git clone https://github.com/bryanswkim/Chain-of-Zoom.git
|
36 |
-
cd Chain-of-Zoom
|
37 |
-
|
38 |
-
conda create -n coz python=3.10
|
39 |
-
conda activate coz
|
40 |
-
pip install -r requirements.txt
|
41 |
-
```
|
42 |
-
|
43 |
-
## β³ Models
|
44 |
-
|
45 |
-
|Models|Checkpoints|
|
46 |
-
|:---------|:--------|
|
47 |
-
|Stable Diffusion v3|[Hugging Face](https://huggingface.co/stabilityai/stable-diffusion-3-medium)
|
48 |
-
|Qwen2.5-VL-3B-Instruct|[Hugging Face](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct)
|
49 |
-
|RAM|[Hugging Face](https://huggingface.co/spaces/xinyu1205/recognize-anything/blob/main/ram_swin_large_14m.pth)
|
50 |
-
|
51 |
-
## π Example
|
52 |
-
You can quickly check the results of using **CoZ** with the following example:
|
53 |
-
```
|
54 |
-
python inference_coz.py \
|
55 |
-
-i samples \
|
56 |
-
-o inference_results/coz_vlmprompt \
|
57 |
-
--rec_type recursive_multiscale \
|
58 |
-
--prompt_type vlm \
|
59 |
-
--lora_path ckpt/SR_LoRA/model_20001.pkl \
|
60 |
-
--vae_path ckpt/SR_VAE/vae_encoder_20001.pt \
|
61 |
-
--pretrained_model_name_or_path 'stabilityai/stable-diffusion-3-medium-diffusers' \
|
62 |
-
--ram_ft_path ckpt/DAPE/DAPE.pth \
|
63 |
-
--ram_path ckpt/RAM/ram_swin_large_14m.pth \
|
64 |
-
```
|
65 |
-
Which will give a result like below:
|
66 |
-
|
67 |
-

|
68 |
-
|
69 |
-
## π¬ Efficient Memory
|
70 |
-
Using ```--efficient_memory``` allows CoZ to run on a single GPU with 24GB VRAM, but highly increases inference time due to offloading. \
|
71 |
-
We recommend using two GPUs.
|
72 |
-
|
73 |
-
## π Citation
|
74 |
-
If you find our method useful, please cite as below or leave a star to this repository.
|
75 |
-
|
76 |
-
```
|
77 |
-
@article{kim2025chain,
|
78 |
-
title={Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment},
|
79 |
-
author={Kim, Bryan Sangwoo and Kim, Jeongsol and Ye, Jong Chul},
|
80 |
-
journal={arXiv preprint arXiv:2505.18600},
|
81 |
-
year={2025}
|
82 |
-
}
|
83 |
-
```
|
84 |
-
|
85 |
-
## π€ Acknowledgements
|
86 |
-
We thank the authors of [OSEDiff](https://github.com/cswry/OSEDiff) for sharing their awesome work!
|
87 |
-
|
88 |
-
> [!note]
|
89 |
-
> This work is currently in the preprint stage, and there may be some changes to the code.
|
|
|
1 |
+
title: Chain-of-Zoom
|
2 |
+
emoji: π
|
3 |
+
colorFrom: green
|
4 |
+
colorTo: green
|
5 |
+
sdk: gradio
|
6 |
+
sdk_version: 5.31.0
|
7 |
+
app_file: app.py
|
8 |
+
pinned: false
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|