--- license: cc-by-nc-sa-4.0 --- # ColorizeDiffusion: Adjustable Sketch Colorization with Reference Image and Text
[![arXiv Paper](https://img.shields.io/badge/arXiv-2407.15886%20(base)-B31B1B?style=flat&logo=arXiv)](https://arxiv.org/abs/2401.01456) [![WACV 2025](https://img.shields.io/badge/WACV%202025-v1-0CA4A5?style=flat&logo=Semantic%20Web)](https://openaccess.thecvf.com/content/WACV2025/html/Yan_ColorizeDiffusion_Improving_Reference-Based_Sketch_Colorization_with_Latent_Diffusion_Model_WACV_2025_paper.html) [![arXiv v1.5 Paper](https://img.shields.io/badge/arXiv-2502.19937%20(v1.5)-B31B1B?style=flat&logo=arXiv)](https://arxiv.org/abs/2502.19937) [![arXiv v2 Paper](https://img.shields.io/badge/arXiv-2504.06895%20(v2)-B31B1B?style=flat&logo=arXiv)](https://arxiv.org/abs/2504.06895) [![Model Weights](https://img.shields.io/badge/Hugging%20Face-Model%20Weights-FF9D00?style=flat&logo=Hugging%20Face)](https://huggingface.co/tellurion/ColorizeDiffusion/tree/main) [![License](https://img.shields.io/badge/License-CC--BY--NC--SA%204.0-4CAF50?style=flat&logo=Creative%20Commons)](https://github.com/tellurion-kanata/colorizeDiffusion/blob/master/LICENSE)
![img](assets/teaser.png) (April. 2025) Official implementation of Colorize Diffusion. Colorize Diffusion is a SD-based colorization framework that can achieve high-quality colorization results with arbitrary input pairs. Fundamental issue for this repository: [ColorizeDiffusion (e-print)](https://arxiv.org/abs/2401.01456). ***Version 1*** - Base training, 512px. Released, ckpt starts with **mult**. ***Version 1.5*** - Solving spatial entanglement, 512px. Released, ckpt starts with **switch**. ***Version 2*** - Enhancing background and style transfer, 768px. Released, ckpt starts with **v2**. ***Version XL*** - Enhancing embedding guidance for character colorization, geometry disentanglement, 1024px. Available soon. ## Getting Start ------------------------------------------------------------------------------------------- ```shell conda env create -f environment.yaml conda activate hf ``` ## User Interface ------------------------------------------------------------------------------------------- We implement a fully-featured UI. To run it, just: ```shell python -u app.py ``` The default server address is http://localhost:7860. #### Important inference options | Options | Description | |:----------------------|:--------------------------------------------------------------------------------------------------| | BG enhance | Low-level feature injection for v2 models. | | FG enhance | Useless for currently open-sourced models. | | Reference strength | Decreasing it to increase semantic fidelity to sketch inputs. | | Foreground strength | Similar to reference strength but only for foreground region. Need to activate FG or BG enhance. | | Preprocessor | Sketch preprocessing. **Extract** is suggested if the sketch input is complicated pencil drawing. | | Line extractor | Line extractors used when preprocessor is **Extract**. | | Sketch guidance scale | Classifier-free guidance scale of the sketch image, suggested 1. | | Attention injection | Noised low-level feature injection, 2x inference time. | ### 768-level Cross-content colorization results (from v2) ![img](assets/cross-1.png) ![img](assets/cross-2.png) ### 1536-level Character colorization results (from XL) ![img](assets/disentanglement2.png) ![img](assets/demon.png) ## Manipulation ------------------------------------------------------------------------------------------- The colorization results can be manipulated using text prompts, see [ColorizeDiffusion (e-print)](https://arxiv.org/abs/2401.01456). It is now deactivated by default. To activate it, use ```shell python -u app.py -manipulate ``` For local manipulations, a visualization is provided to show the correlation between each prompt and tokens in the reference image. The manipulation result and correlation visualization of the settings: Target prompt: the girl's blonde hair Anchor prompt the girl's brown hair Control prompt the girl's brown hair, Target scale: 8 Enhanced: false Thresholds: 0.5、0.55、0.65、0.95 ![img](assets/preview1.png) ![img](assets/preview2.png) As you can see, the manipluation unavoidably changed some unrelated regions as it is taken on the reference embeddings. #### Manipulation options | Options | Description | | :----- |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Group index | The index of selected manipulation sequences's parameter group. | | Target prompt | The prompt used to specify the desired visual attribute for the image after manipulation. | | Anchor prompt | The prompt to specify the anchored visaul attribute for the image before manipulation. | | Control prompt | Used for local manipulation (crossattn-based models). The prompt to specify the target regions. | | Enhance | Specify whether this manipulation should be enhanced or not. (More likely to influence unrelated attribute). | | Target scale | The scale used to progressively control the manipulation. | | Thresholds | Used for local manipulation (crossattn-based models). Four hyperparameters used to reduce the influnece on irrelevant visual attributes, where 0.0 < threshold 0 < threshold 1 < threshold 2 < threshold 3 < 1.0. | | \Threshold3 | Select most unrelated regions. Indicated by brown. | |Add| Click add to save current manipulation in the sequence. | ## Training Our implementation is based on Accelerate and Deepspeed. Before starting a training, first collect data and organize your training dataset as follows: ``` [dataset_path] ├── image_list.json # Optionally for image indexing ├── color/ # Color images │ ├── 0001.zip | | ├── 10001.png | | ├── 100001.jpg │ | └── ... │ ├── 0002.zip │ └── ... ├── sketch # Sketch images │ ├── 0001.zip | | ├── 10001.png | | ├── 100001.jpg │ | └── ... │ ├── 0002.zip │ └── ... └── mask # Mask images (required for fg-bg training) ├── 0001.zip | ├── 10001.png | ├── 100001.jpg | └── ... ├── 0002.zip └── ... ``` For details of dataset organization, check `data/dataloader.py`. Training command example: ``` accelerate launch --config_file [accelerate_config_file] \ train.py \ --name base \ --dataroot [dataset_path] \ --batch_size 64 \ --num_threads 8 \ -cfg configs/train/sd2.1/mult.yaml \ -pt [pretrained_model_path] ``` Refer to `options.py` for training/inference/validation arguments. Note that the `batch size` here is micro batch size per gpu. If you run the command on 8 gpus, the total batch size is 512. ## Code reference 1. [Stable Diffusion v2](https://github.com/Stability-AI/stablediffusion) 2. [Stable Diffusion XL](https://github.com/Stability-AI/generative-models) 3. [SD-webui-ControlNet](https://github.com/Mikubill/sd-webui-controlnet) 4. [Stable-Diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui) 5. [K-diffusion](https://github.com/crowsonkb/k-diffusion) 6. [Deepspeed](https://github.com/microsoft/DeepSpeed) 7. [sketchKeras-PyTorch](https://github.com/higumax/sketchKeras-pytorch) ## Citation ``` @article{2024arXiv240101456Y, author = {{Yan}, Dingkun and {Yuan}, Liang and {Wu}, Erwin and {Nishioka}, Yuma and {Fujishiro}, Issei and {Saito}, Suguru}, title = "{ColorizeDiffusion: Adjustable Sketch Colorization with Reference Image and Text}", journal = {arXiv e-prints}, year = {2024}, doi = {10.48550/arXiv.2401.01456}, } @InProceedings{Yan_2025_WACV, author = {Yan, Dingkun and Yuan, Liang and Wu, Erwin and Nishioka, Yuma and Fujishiro, Issei and Saito, Suguru}, title = {ColorizeDiffusion: Improving Reference-Based Sketch Colorization with Latent Diffusion Model}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, year = {2025}, pages = {5092-5102} } @article{2025arXiv250219937Y, author = {{Yan}, Dingkun and {Wang}, Xinrui and {Li}, Zhuoru and {Saito}, Suguru and {Iwasawa}, Yusuke and {Matsuo}, Yutaka and {Guo}, Jiaxian}, title = "{Image Referenced Sketch Colorization Based on Animation Creation Workflow}", journal = {arXiv e-prints}, year = {2025}, doi = {10.48550/arXiv.2502.19937}, } @article{yan2025colorizediffusionv2enhancingreferencebased, title={ColorizeDiffusion v2: Enhancing Reference-based Sketch Colorization Through Separating Utilities}, author={Dingkun Yan and Xinrui Wang and Yusuke Iwasawa and Yutaka Matsuo and Suguru Saito and Jiaxian Guo}, year={2025}, journal = {arXiv e-prints}, doi = {10.48550/arXiv.2504.06895}, }