Gui28F commited on
Commit
d7977ea
·
verified ·
1 Parent(s): 173ea2b

Upload 2 files

Browse files
Files changed (2) hide show
  1. README.md +28 -8
  2. requirements.txt +75 -0
README.md CHANGED
@@ -14,7 +14,24 @@ pipeline_tag: text-to-image
14
  **BeamDiffusion** introduces a novel approach for generating coherent image sequences from text prompts by employing beam search in latent space. Unlike traditional methods that generate images independently, BeamDiffusion iteratively explores latent representations, ensuring smooth transitions and visual continuity across frames. A cross-attention mechanism efficiently scores and prunes search paths, optimizing both textual alignment and visual coherence.
15
  BeamDiffusion addresses the challenge of maintaining visual consistency in image sequences generated from text prompts. By leveraging a beam search strategy in the latent space, it refines the generation process to produce sequences with enhanced coherence and alignment with textual descriptions, as outlined in the [paper](https://arxiv.org/abs/2503.20429).
16
 
17
- ## Quickstart Guide
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
  Here's a basic example of how to use BeamDiffusion with the `transformers` library to generate an image sequence based on a series of text prompts:
20
 
@@ -50,7 +67,7 @@ sequence_imgs = pipe(input_data)
50
 
51
  ![Generated Image Sequence](./example.png)
52
 
53
- ## Input Parameters Explained
54
 
55
  - **`steps`** (`list of strings`): Descriptions for each step in the image generation process. The model generates one image per step, forming a sequence that aligns with these descriptions.
56
 
@@ -68,15 +85,18 @@ sequence_imgs = pipe(input_data)
68
 
69
  - **`use_rand`** (`bool`): Flag to introduce randomness in the inference process. If set to `True`, the model generates more varied and creative results; if `False`, it produces more deterministic outputs.
70
 
71
- ## Citation
72
 
73
  If you use BeamDiffusion in your research or projects, please cite the following paper:
74
 
75
  ```
76
- @article{fernandes2025latent,
77
- title={Latent Beam Diffusion Models for Decoding Image Sequences},
78
- author={Fernandes, Guilherme and Ramos, Vasco and Cohen, Regev and Szpektor, Idan and Magalh{\~a}es, Jo{\~a}o},
79
- journal={arXiv preprint arXiv:2503.20429},
80
- year={2025}
 
 
 
81
  }
82
  ```
 
14
  **BeamDiffusion** introduces a novel approach for generating coherent image sequences from text prompts by employing beam search in latent space. Unlike traditional methods that generate images independently, BeamDiffusion iteratively explores latent representations, ensuring smooth transitions and visual continuity across frames. A cross-attention mechanism efficiently scores and prunes search paths, optimizing both textual alignment and visual coherence.
15
  BeamDiffusion addresses the challenge of maintaining visual consistency in image sequences generated from text prompts. By leveraging a beam search strategy in the latent space, it refines the generation process to produce sequences with enhanced coherence and alignment with textual descriptions, as outlined in the [paper](https://arxiv.org/abs/2503.20429).
16
 
17
+ ---
18
+ ## 🛠️ Setup Instructions
19
+
20
+ Before using BeamDiffusion, follow these steps to set up your environment:
21
+
22
+ ```bash
23
+ # 1. Create a virtual environment (recommended)
24
+ python3 -m venv beam_env
25
+
26
+ # 2. Activate the virtual environment
27
+ source beam_env/bin/activate # On macOS/Linux
28
+ # beam_env\Scripts\activate # On Windows
29
+
30
+ # 3. Install required dependencies
31
+ pip install -r ./BeamDiffusionModel/requirements.txt
32
+ ```
33
+ ---
34
+ ## 🚀 Quickstart Guide
35
 
36
  Here's a basic example of how to use BeamDiffusion with the `transformers` library to generate an image sequence based on a series of text prompts:
37
 
 
67
 
68
  ![Generated Image Sequence](./example.png)
69
 
70
+ ## 🔍 Input Parameters Explained
71
 
72
  - **`steps`** (`list of strings`): Descriptions for each step in the image generation process. The model generates one image per step, forming a sequence that aligns with these descriptions.
73
 
 
85
 
86
  - **`use_rand`** (`bool`): Flag to introduce randomness in the inference process. If set to `True`, the model generates more varied and creative results; if `False`, it produces more deterministic outputs.
87
 
88
+ ## 📚 Citation
89
 
90
  If you use BeamDiffusion in your research or projects, please cite the following paper:
91
 
92
  ```
93
+ @misc{fernandes2025latentbeamdiffusionmodels,
94
+ title={Latent Beam Diffusion Models for Decoding Image Sequences},
95
+ author={Guilherme Fernandes and Vasco Ramos and Regev Cohen and Idan Szpektor and João Magalhães},
96
+ year={2025},
97
+ eprint={2503.20429},
98
+ archivePrefix={arXiv},
99
+ primaryClass={cs.CV},
100
+ url={https://arxiv.org/abs/2503.20429},
101
  }
102
  ```
requirements.txt ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ --extra-index-url https://download.pytorch.org/whl/cu126
2
+ accelerate==1.6.0
3
+ aiohappyeyeballs==2.6.1
4
+ aiohttp==3.11.16
5
+ aiosignal==1.3.2
6
+ annotated-types==0.7.0
7
+ async-timeout==5.0.1
8
+ attrs==25.3.0
9
+ certifi==2025.1.31
10
+ charset-normalizer==3.4.1
11
+ click==8.1.8
12
+ diffusers==0.32.2
13
+ docker-pycreds==0.4.0
14
+ eval_type_backport==0.2.2
15
+ filelock==3.18.0
16
+ frozenlist==1.5.0
17
+ fsspec==2025.3.2
18
+ gitdb==4.0.12
19
+ GitPython==3.1.44
20
+ huggingface-hub==0.30.1
21
+ idna==3.10
22
+ importlib_metadata==8.6.1
23
+ Jinja2==3.1.3
24
+ lightning==2.5.1
25
+ lightning-utilities==0.14.3
26
+ MarkupSafe==2.1.5
27
+ mpmath==1.3.0
28
+ multidict==6.3.2
29
+ networkx==3.2.1
30
+ numpy==2.0.2
31
+ nvidia-cublas-cu12==12.6.4.1
32
+ nvidia-cuda-cupti-cu12==12.6.80
33
+ nvidia-cuda-nvrtc-cu12==12.6.77
34
+ nvidia-cuda-runtime-cu12==12.6.77
35
+ nvidia-cudnn-cu12==9.5.1.17
36
+ nvidia-cufft-cu12==11.3.0.4
37
+ nvidia-curand-cu12==10.3.7.77
38
+ nvidia-cusolver-cu12==11.7.1.2
39
+ nvidia-cusparse-cu12==12.5.4.2
40
+ nvidia-cusparselt-cu12==0.6.3
41
+ nvidia-nccl-cu12==2.21.5
42
+ nvidia-nvjitlink-cu12==12.6.85
43
+ nvidia-nvtx-cu12==12.6.77
44
+ packaging==24.2
45
+ pillow==11.0.0
46
+ platformdirs==4.3.7
47
+ propcache==0.3.1
48
+ protobuf==5.29.4
49
+ psutil==7.0.0
50
+ pydantic==2.11.2
51
+ pydantic_core==2.33.1
52
+ pytorch-lightning==2.5.1
53
+ PyYAML==6.0.2
54
+ regex==2024.11.6
55
+ requests==2.32.3
56
+ safetensors==0.5.3
57
+ sentry-sdk==2.25.1
58
+ setproctitle==1.3.5
59
+ six==1.17.0
60
+ smmap==5.0.2
61
+ sympy==1.13.1
62
+ tokenizers==0.21.1
63
+ torch==2.6.0+cu126
64
+ torchaudio==2.6.0+cu126
65
+ torchmetrics==1.7.0
66
+ torchvision==0.21.0+cu126
67
+ tqdm==4.67.1
68
+ transformers==4.51.0
69
+ triton==3.2.0
70
+ typing-inspection==0.4.0
71
+ typing_extensions==4.13.1
72
+ urllib3==2.3.0
73
+ wandb==0.19.9
74
+ yarl==1.19.0
75
+ zipp==3.21.0