Spaces:

deepsuchak
/

mv-vton-demo

Configuration error

App Files Files Community

deepsuchak commited on Apr 24

Commit

949df4a

verified ·

1 Parent(s): a61d4bf

Upload 8 files

Browse files

Files changed (8) hide show

.gitattributes +35 -0
LICENSE +437 -0
README.md +201 -0
environment.yaml +197 -0
main.py +738 -0
test.py +447 -0
test.sh +13 -0
train.sh +1 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1,35 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text

LICENSE ADDED Viewed

	@@ -0,0 +1,437 @@

+Attribution-NonCommercial-ShareAlike 4.0 International
+=======================================================================
+Creative Commons Corporation ("Creative Commons") is not a law firm and
+does not provide legal services or legal advice. Distribution of
+Creative Commons public licenses does not create a lawyer-client or
+other relationship. Creative Commons makes its licenses and related
+information available on an "as-is" basis. Creative Commons gives no
+warranties regarding its licenses, any material licensed under their
+terms and conditions, or any related information. Creative Commons
+disclaims all liability for damages resulting from their use to the
+fullest extent possible.
+Using Creative Commons Public Licenses
+Creative Commons public licenses provide a standard set of terms and
+conditions that creators and other rights holders may use to share
+original works of authorship and other material subject to copyright
+and certain other rights specified in the public license below. The
+following considerations are for informational purposes only, are not
+exhaustive, and do not form part of our licenses.
+     Considerations for licensors: Our public licenses are
+     intended for use by those authorized to give the public
+     permission to use material in ways otherwise restricted by
+     copyright and certain other rights. Our licenses are
+     irrevocable. Licensors should read and understand the terms
+     and conditions of the license they choose before applying it.
+     Licensors should also secure all rights necessary before
+     applying our licenses so that the public can reuse the
+     material as expected. Licensors should clearly mark any
+     material not subject to the license. This includes other CC-
+     licensed material, or material used under an exception or
+     limitation to copyright. More considerations for licensors:
+    wiki.creativecommons.org/Considerations_for_licensors
+     Considerations for the public: By using one of our public
+     licenses, a licensor grants the public permission to use the
+     licensed material under specified terms and conditions. If
+     the licensor's permission is not necessary for any reason--for
+     example, because of any applicable exception or limitation to
+     copyright--then that use is not regulated by the license. Our
+     licenses grant only permissions under copyright and certain
+     other rights that a licensor has authority to grant. Use of
+     the licensed material may still be restricted for other
+     reasons, including because others have copyright or other
+     rights in the material. A licensor may make special requests,
+     such as asking that all changes be marked or described.
+     Although not required by our licenses, you are encouraged to
+     respect those requests where reasonable. More considerations
+     for the public:
+    wiki.creativecommons.org/Considerations_for_licensees
+=======================================================================
+Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International
+Public License
+By exercising the Licensed Rights (defined below), You accept and agree
+to be bound by the terms and conditions of this Creative Commons
+Attribution-NonCommercial-ShareAlike 4.0 International Public License
+("Public License"). To the extent this Public License may be
+interpreted as a contract, You are granted the Licensed Rights in
+consideration of Your acceptance of these terms and conditions, and the
+Licensor grants You such rights in consideration of benefits the
+Licensor receives from making the Licensed Material available under
+these terms and conditions.
+Section 1 -- Definitions.
+  a. Adapted Material means material subject to Copyright and Similar
+     Rights that is derived from or based upon the Licensed Material
+     and in which the Licensed Material is translated, altered,
+     arranged, transformed, or otherwise modified in a manner requiring
+     permission under the Copyright and Similar Rights held by the
+     Licensor. For purposes of this Public License, where the Licensed
+     Material is a musical work, performance, or sound recording,
+     Adapted Material is always produced where the Licensed Material is
+     synched in timed relation with a moving image.
+  b. Adapter's License means the license You apply to Your Copyright
+     and Similar Rights in Your contributions to Adapted Material in
+     accordance with the terms and conditions of this Public License.
+  c. BY-NC-SA Compatible License means a license listed at
+     creativecommons.org/compatiblelicenses, approved by Creative
+     Commons as essentially the equivalent of this Public License.
+  d. Copyright and Similar Rights means copyright and/or similar rights
+     closely related to copyright including, without limitation,
+     performance, broadcast, sound recording, and Sui Generis Database
+     Rights, without regard to how the rights are labeled or
+     categorized. For purposes of this Public License, the rights
+     specified in Section 2(b)(1)-(2) are not Copyright and Similar
+     Rights.
+  e. Effective Technological Measures means those measures that, in the
+     absence of proper authority, may not be circumvented under laws
+     fulfilling obligations under Article 11 of the WIPO Copyright
+     Treaty adopted on December 20, 1996, and/or similar international
+     agreements.
+  f. Exceptions and Limitations means fair use, fair dealing, and/or
+     any other exception or limitation to Copyright and Similar Rights
+     that applies to Your use of the Licensed Material.
+  g. License Elements means the license attributes listed in the name
+     of a Creative Commons Public License. The License Elements of this
+     Public License are Attribution, NonCommercial, and ShareAlike.
+  h. Licensed Material means the artistic or literary work, database,
+     or other material to which the Licensor applied this Public
+     License.
+  i. Licensed Rights means the rights granted to You subject to the
+     terms and conditions of this Public License, which are limited to
+     all Copyright and Similar Rights that apply to Your use of the
+     Licensed Material and that the Licensor has authority to license.
+  j. Licensor means the individual(s) or entity(ies) granting rights
+     under this Public License.
+  k. NonCommercial means not primarily intended for or directed towards
+     commercial advantage or monetary compensation. For purposes of
+     this Public License, the exchange of the Licensed Material for
+     other material subject to Copyright and Similar Rights by digital
+     file-sharing or similar means is NonCommercial provided there is
+     no payment of monetary compensation in connection with the
+     exchange.
+  l. Share means to provide material to the public by any means or
+     process that requires permission under the Licensed Rights, such
+     as reproduction, public display, public performance, distribution,
+     dissemination, communication, or importation, and to make material
+     available to the public including in ways that members of the
+     public may access the material from a place and at a time
+     individually chosen by them.
+  m. Sui Generis Database Rights means rights other than copyright
+     resulting from Directive 96/9/EC of the European Parliament and of
+     the Council of 11 March 1996 on the legal protection of databases,
+     as amended and/or succeeded, as well as other essentially
+     equivalent rights anywhere in the world.
+  n. You means the individual or entity exercising the Licensed Rights
+     under this Public License. Your has a corresponding meaning.
+Section 2 -- Scope.
+  a. License grant.
+       1. Subject to the terms and conditions of this Public License,
+          the Licensor hereby grants You a worldwide, royalty-free,
+          non-sublicensable, non-exclusive, irrevocable license to
+          exercise the Licensed Rights in the Licensed Material to:
+            a. reproduce and Share the Licensed Material, in whole or
+               in part, for NonCommercial purposes only; and
+            b. produce, reproduce, and Share Adapted Material for
+               NonCommercial purposes only.
+       2. Exceptions and Limitations. For the avoidance of doubt, where
+          Exceptions and Limitations apply to Your use, this Public
+          License does not apply, and You do not need to comply with
+          its terms and conditions.
+       3. Term. The term of this Public License is specified in Section
+          6(a).
+       4. Media and formats; technical modifications allowed. The
+          Licensor authorizes You to exercise the Licensed Rights in
+          all media and formats whether now known or hereafter created,
+          and to make technical modifications necessary to do so. The
+          Licensor waives and/or agrees not to assert any right or
+          authority to forbid You from making technical modifications
+          necessary to exercise the Licensed Rights, including
+          technical modifications necessary to circumvent Effective
+          Technological Measures. For purposes of this Public License,
+          simply making modifications authorized by this Section 2(a)
+          (4) never produces Adapted Material.
+       5. Downstream recipients.
+            a. Offer from the Licensor -- Licensed Material. Every
+               recipient of the Licensed Material automatically
+               receives an offer from the Licensor to exercise the
+               Licensed Rights under the terms and conditions of this
+               Public License.
+            b. Additional offer from the Licensor -- Adapted Material.
+               Every recipient of Adapted Material from You
+               automatically receives an offer from the Licensor to
+               exercise the Licensed Rights in the Adapted Material
+               under the conditions of the Adapter's License You apply.
+            c. No downstream restrictions. You may not offer or impose
+               any additional or different terms or conditions on, or
+               apply any Effective Technological Measures to, the
+               Licensed Material if doing so restricts exercise of the
+               Licensed Rights by any recipient of the Licensed
+               Material.
+       6. No endorsement. Nothing in this Public License constitutes or
+          may be construed as permission to assert or imply that You
+          are, or that Your use of the Licensed Material is, connected
+          with, or sponsored, endorsed, or granted official status by,
+          the Licensor or others designated to receive attribution as
+          provided in Section 3(a)(1)(A)(i).
+  b. Other rights.
+       1. Moral rights, such as the right of integrity, are not
+          licensed under this Public License, nor are publicity,
+          privacy, and/or other similar personality rights; however, to
+          the extent possible, the Licensor waives and/or agrees not to
+          assert any such rights held by the Licensor to the limited
+          extent necessary to allow You to exercise the Licensed
+          Rights, but not otherwise.
+       2. Patent and trademark rights are not licensed under this
+          Public License.
+       3. To the extent possible, the Licensor waives any right to
+          collect royalties from You for the exercise of the Licensed
+          Rights, whether directly or through a collecting society
+          under any voluntary or waivable statutory or compulsory
+          licensing scheme. In all other cases the Licensor expressly
+          reserves any right to collect such royalties, including when
+          the Licensed Material is used other than for NonCommercial
+          purposes.
+Section 3 -- License Conditions.
+Your exercise of the Licensed Rights is expressly made subject to the
+following conditions.
+  a. Attribution.
+       1. If You Share the Licensed Material (including in modified
+          form), You must:
+            a. retain the following if it is supplied by the Licensor
+               with the Licensed Material:
+                 i. identification of the creator(s) of the Licensed
+                    Material and any others designated to receive
+                    attribution, in any reasonable manner requested by
+                    the Licensor (including by pseudonym if
+                    designated);
+                ii. a copyright notice;
+               iii. a notice that refers to this Public License;
+                iv. a notice that refers to the disclaimer of
+                    warranties;
+                 v. a URI or hyperlink to the Licensed Material to the
+                    extent reasonably practicable;
+            b. indicate if You modified the Licensed Material and
+               retain an indication of any previous modifications; and
+            c. indicate the Licensed Material is licensed under this
+               Public License, and include the text of, or the URI or
+               hyperlink to, this Public License.
+       2. You may satisfy the conditions in Section 3(a)(1) in any
+          reasonable manner based on the medium, means, and context in
+          which You Share the Licensed Material. For example, it may be
+          reasonable to satisfy the conditions by providing a URI or
+          hyperlink to a resource that includes the required
+          information.
+       3. If requested by the Licensor, You must remove any of the
+          information required by Section 3(a)(1)(A) to the extent
+          reasonably practicable.
+  b. ShareAlike.
+     In addition to the conditions in Section 3(a), if You Share
+     Adapted Material You produce, the following conditions also apply.
+       1. The Adapter's License You apply must be a Creative Commons
+          license with the same License Elements, this version or
+          later, or a BY-NC-SA Compatible License.
+       2. You must include the text of, or the URI or hyperlink to, the
+          Adapter's License You apply. You may satisfy this condition
+          in any reasonable manner based on the medium, means, and
+          context in which You Share Adapted Material.
+       3. You may not offer or impose any additional or different terms
+          or conditions on, or apply any Effective Technological
+          Measures to, Adapted Material that restrict exercise of the
+          rights granted under the Adapter's License You apply.
+Section 4 -- Sui Generis Database Rights.
+Where the Licensed Rights include Sui Generis Database Rights that
+apply to Your use of the Licensed Material:
+  a. for the avoidance of doubt, Section 2(a)(1) grants You the right
+     to extract, reuse, reproduce, and Share all or a substantial
+     portion of the contents of the database for NonCommercial purposes
+     only;
+  b. if You include all or a substantial portion of the database
+     contents in a database in which You have Sui Generis Database
+     Rights, then the database in which You have Sui Generis Database
+     Rights (but not its individual contents) is Adapted Material,
+     including for purposes of Section 3(b); and
+  c. You must comply with the conditions in Section 3(a) if You Share
+     all or a substantial portion of the contents of the database.
+For the avoidance of doubt, this Section 4 supplements and does not
+replace Your obligations under this Public License where the Licensed
+Rights include other Copyright and Similar Rights.
+Section 5 -- Disclaimer of Warranties and Limitation of Liability.
+  a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE
+     EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS
+     AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF
+     ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS,
+     IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION,
+     WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR
+     PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS,
+     ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT
+     KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT
+     ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU.
+  b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE
+     TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION,
+     NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT,
+     INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES,
+     COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR
+     USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN
+     ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR
+     DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR
+     IN PART, THIS LIMITATION MAY NOT APPLY TO YOU.
+  c. The disclaimer of warranties and limitation of liability provided
+     above shall be interpreted in a manner that, to the extent
+     possible, most closely approximates an absolute disclaimer and
+     waiver of all liability.
+Section 6 -- Term and Termination.
+  a. This Public License applies for the term of the Copyright and
+     Similar Rights licensed here. However, if You fail to comply with
+     this Public License, then Your rights under this Public License
+     terminate automatically.
+  b. Where Your right to use the Licensed Material has terminated under
+     Section 6(a), it reinstates:
+       1. automatically as of the date the violation is cured, provided
+          it is cured within 30 days of Your discovery of the
+          violation; or
+       2. upon express reinstatement by the Licensor.
+     For the avoidance of doubt, this Section 6(b) does not affect any
+     right the Licensor may have to seek remedies for Your violations
+     of this Public License.
+  c. For the avoidance of doubt, the Licensor may also offer the
+     Licensed Material under separate terms or conditions or stop
+     distributing the Licensed Material at any time; however, doing so
+     will not terminate this Public License.
+  d. Sections 1, 5, 6, 7, and 8 survive termination of this Public
+     License.
+Section 7 -- Other Terms and Conditions.
+  a. The Licensor shall not be bound by any additional or different
+     terms or conditions communicated by You unless expressly agreed.
+  b. Any arrangements, understandings, or agreements regarding the
+     Licensed Material not stated herein are separate from and
+     independent of the terms and conditions of this Public License.
+Section 8 -- Interpretation.
+  a. For the avoidance of doubt, this Public License does not, and
+     shall not be interpreted to, reduce, limit, restrict, or impose
+     conditions on any use of the Licensed Material that could lawfully
+     be made without permission under this Public License.
+  b. To the extent possible, if any provision of this Public License is
+     deemed unenforceable, it shall be automatically reformed to the
+     minimum extent necessary to make it enforceable. If the provision
+     cannot be reformed, it shall be severed from this Public License
+     without affecting the enforceability of the remaining terms and
+     conditions.
+  c. No term or condition of this Public License will be waived and no
+     failure to comply consented to unless expressly agreed to by the
+     Licensor.
+  d. Nothing in this Public License constitutes or may be interpreted
+     as a limitation upon, or waiver of, any privileges and immunities
+     that apply to the Licensor or You, including from the legal
+     processes of any jurisdiction or authority.
+=======================================================================
+Creative Commons is not a party to its public
+licenses. Notwithstanding, Creative Commons may elect to apply one of
+its public licenses to material it publishes and in those instances
+will be considered the “Licensor.” The text of the Creative Commons
+public licenses is dedicated to the public domain under the CC0 Public
+Domain Dedication. Except for the limited purpose of indicating that
+material is shared under a Creative Commons public license or as
+otherwise permitted by the Creative Commons policies published at
+creativecommons.org/policies, Creative Commons does not authorize the
+use of the trademark "Creative Commons" or any other trademark or logo
+of Creative Commons without its prior written consent including,
+without limitation, in connection with any unauthorized modifications
+to any of its public licenses or any other arrangements,
+understandings, or agreements concerning use of licensed material. For
+the avoidance of doubt, this paragraph does not form part of the
+public licenses.
+Creative Commons may be contacted at creativecommons.org.

README.md ADDED Viewed

	@@ -0,0 +1,201 @@

+<<<<<<< HEAD
+# MV-VTON
+PyTorch implementation of **MV-VTON: Multi-View Virtual Try-On with Diffusion Models**
+[![arXiv](https://img.shields.io/badge/arXiv-2404.04908-b10.svg)](https://arxiv.org/abs/2404.17364)
+[![Project](https://img.shields.io/badge/Project-Website-orange)](https://hywang2002.github.io/MV-VTON/)
+![visitors](https://visitor-badge.laobi.icu/badge?page_id=hywang2002.MV-VTON)
+[![LICENSE](https://img.shields.io/badge/license-CC--BY--NC--SA--4.0-lightgrey)](https://creativecommons.org/licenses/by-nc-sa/4.0/)
+## News
+- 🔥The first multi-view virtual try-on dataset MVG is now available.
+- 🔥Checkpoints on both frontal-view and multi-view virtual try-on tasks are released.
+## Overview
+![](assets/framework.png)
+> **Abstract:**
+> The goal of image-based virtual try-on is to generate an image of the target person naturally wearing the given
+> clothing. However, most existing methods solely focus on the frontal try-on using the frontal clothing. When the views
+> of the clothing and person are significantly inconsistent, particularly when the person’s view is non-frontal, the
+> results are unsatisfactory. To address this challenge, we introduce Multi-View Virtual Try-ON (MV-VTON), which aims to
+> reconstruct the dressing results of a person from multiple views using the given clothes. On the one hand, given that
+> single-view clothes provide insufficient information for MV-VTON, we instead employ two images, i.e., the frontal and
+> back views of the clothing, to encompass the complete view as much as possible. On the other hand, the diffusion
+> models
+> that have demonstrated superior abilities are adopted to perform our MV-VTON. In particular, we propose a
+> view-adaptive
+> selection method where hard-selection and soft-selection are applied to the global and local clothing feature
+> extraction, respectively. This ensures that the clothing features are roughly fit to the person’s view. Subsequently,
+> we
+> suggest a joint attention block to align and fuse clothing features with person features. Additionally, we collect a
+> MV-VTON dataset, i.e., Multi-View Garment (MVG), in which each person has multiple photos with diverse views and
+> poses.
+> Experiments show that the proposed method not only achieves state-of-the-art results on MV-VTON task using our MVG
+> dataset, but also has superiority on frontal-view virtual try-on task using VITON-HD and DressCode datasets.
+## Getting Started
+### Installation
+1. Clone the repository
+```shell
+git clone https://github.com/hywang2002/MV-VTON.git
+cd MV-VTON
+```
+2. Install Python dependencies
+```shell
+conda env create -f environment.yaml
+conda activate mv-vton
+```
+3. Download the pretrained [vgg](https://drive.google.com/file/d/1rvow8jStPt8t2prDcSRlnf8yzXhrYeGo/view?usp=sharing)
+   checkpoint and put it in `models/vgg/` for Multi-View VTON and `Frontal-View VTON/models/vgg/` for Frontal-View VTON.
+4. Download the pretrained models `mvg.ckpt` via [Baidu Cloud](https://pan.baidu.com/s/17SC8fHE5w2g7gEtzJgRRew?pwd=cshy) or [Google Drive](https://drive.google.com/file/d/1J91PoT8A9yqHWNxkgRe6ZCnDEhN-H9O6/view?usp=sharing),
+   and `vitonhd.ckpt` via [Baidu Cloud](https://pan.baidu.com/s/1R2yGgm35UwTpnXPEU6-tlA?pwd=cshy) or [Google Drive](https://drive.google.com/file/d/13A0uzUY6PuvitLOqzyHzWASOh0dNXdem/view?usp=sharing), and put `mvg.ckpt` in `checkpoint/` and
+   put `vitonhd.ckpt`
+   in `Frontal-View VTON/checkpoint/`.
+### Datasets
+#### MVG
+1. Fill `Dataset Request Form` via [Baidu Cloud](https://pan.baidu.com/s/12HAq0V4FfgpU_q8AeyZzwA?pwd=cshy) or [Google Drive](https://drive.google.com/file/d/1zWt6HYBz7Vzaxu8rp1bwkhRoBkxbwQjw/view?usp=sharing), and
+   contact `cshy2mvvton@outlook.com` with this form to get MVG dataset (
+   Non-institutional emails (e.g. gmail.com) are not allowed. Please provide your institutional
+   email address.).
+After these, the folder structure should look like this (the warp_feat_unpair* only included in test directory):
+```
+├── MVG
+|   ├── unpaired.txt
+│   ├── [train | test]
+|   |   ├── image-wo-bg
+│   │   ├── cloth
+│   │   ├── cloth-mask
+│   │   ├── warp_feat
+│   │   ├── warp_feat_unpair
+│   │   ├── ...
+```
+#### VITON-HD
+1. Download [VITON-HD](https://github.com/shadow2496/VITON-HD) dataset
+2. Download pre-warped cloth image/mask via [Baidu Cloud](https://pan.baidu.com/s/1uQM0IOltOmbeqwdOKX5kCw?pwd=cshy) or [Google Drive](https://drive.google.com/file/d/18DTWfhxUnfg41nnwwpCKN--akC4eT9DM/view?usp=sharing) and
+   put
+   it under VITON-HD dataset.
+After these, the folder structure should look like this (the unpaired-cloth* only included in test directory):
+```
+├── VITON-HD
+|   ├── test_pairs.txt
+|   ├── train_pairs.txt
+│   ├── [train | test]
+|   |   ├── image
+│   │   │   ├── [000006_00.jpg | 000008_00.jpg | ...]
+│   │   ├── cloth
+│   │   │   ├── [000006_00.jpg | 000008_00.jpg | ...]
+│   │   ├── cloth-mask
+│   │   │   ├── [000006_00.jpg | 000008_00.jpg | ...]
+│   │   ├── cloth-warp
+│   │   │   ├── [000006_00.jpg | 000008_00.jpg | ...]
+│   │   ├── cloth-warp-mask
+│   │   │   ├── [000006_00.jpg | 000008_00.jpg | ...]
+│   │   ├── unpaired-cloth-warp
+│   │   │   ├── [000006_00.jpg | 000008_00.jpg | ...]
+│   │   ├── unpaired-cloth-warp-mask
+│   │   │   ├── [000006_00.jpg | 000008_00.jpg | ...]
+```
+### Inference
+#### MVG
+To test on paired settings (using `cp_dataset_mv_paired.py`), you can modify the `configs/viton512.yaml` and `main.py`,
+or directly rename `cp_dataset_mv_paired.py` to `cp_dataset.py` (recommended). Then run:
+```shell
+sh test.sh
+```
+To test on unpaired settings, rename `cp_dataset_mv_unpaired.py` to `cp_dataset.py`, and do the same operation.
+#### VITON-HD
+To test on paired settings, input command `cd Frontal-View\ VTON/`, then directly run:
+```shell
+sh test.sh
+```
+To test on unpaired settings, input command `cd Frontal-View\ VTON/`, add `--unpaired` to `test.sh`, add then run:
+```shell
+sh test.sh
+```
+#### Metrics
+We compute `LPIPS`, `SSIM`, `FID`, `KID` using the same tools in [LaDI-VTON](https://github.com/miccunifi/ladi-vton).
+### Training
+#### MVG
+We use Paint-by-Example as initialization, please download the pretrained model
+from [Google Drive](https://drive.google.com/file/d/15QzaTWsvZonJcXsNv-ilMRCYaQLhzR_i/view) and save the model to
+directory `checkpoints`. Rename `cp_dataset_mv_paired.py` to `cp_dataset.py`, then run:
+```shell
+sh train.sh
+```
+#### VITON-HD
+Input command `cd Frontal-View\ VTON/`, then directly run:
+```shell
+sh train.sh
+```
+## Acknowledgements
+Our code is heavily borrowed from [Paint-by-Example](https://github.com/Fantasy-Studio/Paint-by-Example)
+and [DCI-VTON](https://github.com/bcmi/DCI-VTON-Virtual-Try-On). We also
+thank previous work [PF-AFN](https://github.com/geyuying/PF-AFN), [GP-VTON](https://github.com/xiezhy6/GP-VTON),
+[LaDI-VTON](https://github.com/miccunifi/ladi-vton)
+and [StableVITON](https://github.com/rlawjdghek/StableVITON).
+## LICENSE
+MV-VTON: Multi-View Virtual Try-On with Diffusion Models © 2024 by Haoyu Wang, Zhilu Zhang, Donglin Di, Shiliang Zhang, Wangmeng Zuo is licensed under CC BY-NC-SA 4.0
+## Citation
+```
+@article{wang2024mv,
+  title={MV-VTON: Multi-View Virtual Try-On with Diffusion Models},
+  author={Wang, Haoyu and Zhang, Zhilu and Di, Donglin and Zhang, Shiliang and Zuo, Wangmeng},
+  journal={arXiv preprint arXiv:2404.17364},
+  year={2024}
+}
+```
+=======
+---
+title: Mv Vton Demo
+emoji: 👁
+colorFrom: gray
+colorTo: pink
+sdk: gradio
+sdk_version: 5.26.0
+app_file: app.py
+pinned: false
+---
+Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
+>>>>>>> 2a4541c57faf075fa9e813ae2777dfaa55fc0306

environment.yaml ADDED Viewed

	@@ -0,0 +1,197 @@

+name: mv-vton
+channels:
+  - pytorch
+  - defaults
+dependencies:
+  - _libgcc_mutex=0.1=main
+  - _openmp_mutex=5.1=1_gnu
+  - blas=1.0=mkl
+  - brotli-python=1.0.9=py38h6a678d5_7
+  - bzip2=1.0.8=h7b6447c_0
+  - ca-certificates=2023.08.22=h06a4308_0
+  - certifi=2023.11.17=py38h06a4308_0
+  - cffi=1.15.1=py38h74dc2b5_0
+  - charset-normalizer=2.0.4=pyhd3eb1b0_0
+  - cryptography=41.0.3=py38h130f0dd_0
+  - cudatoolkit=11.3.1=h2bc3f7f_2
+  - ffmpeg=4.3=hf484d3e_0
+  - freetype=2.12.1=h4a9f257_0
+  - giflib=5.2.1=h5eee18b_3
+  - gmp=6.2.1=h295c915_3
+  - gnutls=3.6.15=he1e5248_0
+  - idna=3.4=py38h06a4308_0
+  - intel-openmp=2021.4.0=h06a4308_3561
+  - jpeg=9e=h5eee18b_1
+  - lame=3.100=h7b6447c_0
+  - lcms2=2.12=h3be6417_0
+  - ld_impl_linux-64=2.38=h1181459_1
+  - lerc=3.0=h295c915_0
+  - libdeflate=1.17=h5eee18b_1
+  - libffi=3.3=he6710b0_2
+  - libgcc-ng=11.2.0=h1234567_1
+  - libgfortran-ng=11.2.0=h00389a5_1
+  - libgfortran5=11.2.0=h1234567_1
+  - libgomp=11.2.0=h1234567_1
+  - libiconv=1.16=h7f8727e_2
+  - libidn2=2.3.4=h5eee18b_0
+  - libpng=1.6.39=h5eee18b_0
+  - libstdcxx-ng=11.2.0=h1234567_1
+  - libtasn1=4.19.0=h5eee18b_0
+  - libtiff=4.5.1=h6a678d5_0
+  - libunistring=0.9.10=h27cfd23_0
+  - libuv=1.44.2=h5eee18b_0
+  - libwebp=1.3.2=h11a3e52_0
+  - libwebp-base=1.3.2=h5eee18b_0
+  - lz4-c=1.9.4=h6a678d5_0
+  - mkl=2021.4.0=h06a4308_640
+  - mkl-service=2.4.0=py38h7f8727e_0
+  - mkl_fft=1.3.1=py38hd3c417c_0
+  - mkl_random=1.2.2=py38h51133e4_0
+  - ncurses=6.4=h6a678d5_0
+  - nettle=3.7.3=hbbd107a_1
+  - openh264=2.1.1=h4ff587b_0
+  - openjpeg=2.4.0=h3ad879b_0
+  - openssl=1.1.1w=h7f8727e_0
+  - pillow=10.0.1=py38ha6cbd5a_0
+  - pip=20.3.3=py38h06a4308_0
+  - pycparser=2.21=pyhd3eb1b0_0
+  - pyopenssl=23.2.0=py38h06a4308_0
+  - pysocks=1.7.1=py38h06a4308_0
+  - python=3.8.5=h7579374_1
+  - pytorch=1.11.0=py3.8_cuda11.3_cudnn8.2.0_0
+  - pytorch-mutex=1.0=cuda
+  - readline=8.2=h5eee18b_0
+  - requests=2.31.0=py38h06a4308_0
+  - setuptools=68.0.0=py38h06a4308_0
+  - six=1.16.0=pyhd3eb1b0_1
+  - sqlite=3.41.2=h5eee18b_0
+  - tk=8.6.12=h1ccaba5_0
+  - torchvision=0.12.0=py38_cu113
+  - typing_extensions=4.7.1=py38h06a4308_0
+  - urllib3=1.26.18=py38h06a4308_0
+  - wheel=0.41.2=py38h06a4308_0
+  - xz=5.4.5=h5eee18b_0
+  - zlib=1.2.13=h5eee18b_0
+  - zstd=1.5.5=hc292b87_0
+  - pip:
+      - absl-py==2.0.0
+      - aiohttp==3.9.1
+      - aiosignal==1.3.1
+      - albumentations==0.4.3
+      - altair==5.2.0
+      - antlr4-python3-runtime==4.9.3
+      - async-timeout==4.0.3
+      - attrs==23.1.0
+      - av==12.0.0
+      - backports-zoneinfo==0.2.1
+      - bezier==2023.7.28
+      - black==24.2.0
+      - blinker==1.7.0
+      - cachetools==5.3.2
+      - click==8.1.7
+      - clip==0.2.0
+      - cloudpickle==3.0.0
+      - contourpy==1.1.1
+      - cupy==12.3.0
+      - cycler==0.12.1
+      - diffusers==0.20.0
+      - einops==0.3.0
+      - fastrlock==0.8.2
+      - filelock==3.13.1
+      - fonttools==4.45.1
+      - frozenlist==1.4.0
+      - fsspec==2023.10.0
+      - future==0.18.3
+      - fvcore==0.1.5.post20221221
+      - gitdb==4.0.11
+      - gitpython==3.1.40
+      - google-auth==2.23.4
+      - google-auth-oauthlib==1.0.0
+      - grpcio==1.59.3
+      - huggingface-hub==0.19.4
+      - hydra-core==1.3.2
+      - imageio==2.9.0
+      - imageio-ffmpeg==0.4.2
+      - imgaug==0.2.6
+      - importlib-metadata==6.8.0
+      - importlib-resources==6.1.1
+      - invisible-watermark==0.2.0
+      - iopath==0.1.9
+      - jinja2==3.1.2
+      - jsonschema==4.20.0
+      - jsonschema-specifications==2023.11.1
+      - kiwisolver==1.4.5
+      - kornia==0.6.0
+      - lazy-loader==0.3
+      - markdown==3.5.1
+      - markdown-it-py==3.0.0
+      - markupsafe==2.1.3
+      - matplotlib==3.7.4
+      - mdurl==0.1.2
+      - multidict==6.0.4
+      - mypy-extensions==1.0.0
+      - networkx==3.1
+      - numpy==1.24.4
+      - oauthlib==3.2.2
+      - omegaconf==2.3.0
+      - opencv-python==4.1.2.30
+      - opencv-python-headless==4.8.1.78
+      - packaging==23.2
+      - pandas==2.0.3
+      - pathspec==0.12.1
+      - pkgutil-resolve-name==1.3.10
+      - platformdirs==4.2.0
+      - portalocker==2.8.2
+      - protobuf==4.25.1
+      - pudb==2019.2
+      - pyarrow==14.0.1
+      - pyasn1==0.5.1
+      - pyasn1-modules==0.3.0
+      - pycocotools==2.0.7
+      - pydeck==0.8.1b0
+      - pydeprecate==0.3.1
+      - pygments==2.17.2
+      - pyparsing==3.1.1
+      - python-dateutil==2.8.2
+      - pytorch-lightning==1.4.2
+      - pytz==2023.3.post1
+      - pywavelets==1.4.1
+      - pyyaml==6.0.1
+      - referencing==0.31.1
+      - regex==2023.10.3
+      - requests-oauthlib==1.3.1
+      - rich==13.7.0
+      - rpds-py==0.13.2
+      - rsa==4.9
+      - safetensors==0.4.1
+      - scikit-image==0.20.0
+      - scipy==1.9.1
+      - smmap==5.0.1
+      - streamlit==1.28.2
+      - tabulate==0.9.0
+      - taming-transformers==0.0.1
+      - tenacity==8.2.3
+      - tensorboard==2.14.0
+      - tensorboard-data-server==0.7.2
+      - termcolor==2.4.0
+      - test-tube==0.7.5
+      - tifffile==2023.7.10
+      - tokenizers==0.12.1
+      - toml==0.10.2
+      - tomli==2.0.1
+      - toolz==0.12.0
+      - torch-fidelity==0.3.0
+      - torchmetrics==0.6.0
+      - tornado==6.4
+      - tqdm==4.66.1
+      - transformers==4.27.3
+      - tzdata==2023.3
+      - tzlocal==5.2
+      - urwid==2.2.3
+      - validators==0.22.0
+      - watchdog==3.0.0
+      - werkzeug==3.0.1
+      - yacs==0.1.8
+      - yarl==1.9.3
+      - zipp==3.17.0
+prefix: /mnt/pfs-mc0p4k/cvg/team/didonglin/conda_envs/mv-vton

main.py ADDED Viewed

	@@ -0,0 +1,738 @@

+import argparse, os, sys, datetime, glob, importlib, csv
+import numpy as np
+import time
+import torch
+import torchvision
+import pytorch_lightning as pl
+sys.setrecursionlimit(10000)
+from packaging import version
+from omegaconf import OmegaConf
+from torch.utils.data import random_split, DataLoader, Dataset, Subset
+from functools import partial
+from PIL import Image
+from pytorch_lightning import seed_everything
+from pytorch_lightning.trainer import Trainer
+from pytorch_lightning.callbacks import ModelCheckpoint, Callback, LearningRateMonitor
+from pytorch_lightning.utilities.distributed import rank_zero_only
+from pytorch_lightning.utilities import rank_zero_info
+from ldm.data.base import Txt2ImgIterableBaseDataset
+from ldm.util import instantiate_from_config
+import socket
+from pytorch_lightning.plugins.environments import ClusterEnvironment, SLURMEnvironment
+def get_parser(**parser_kwargs):
+    def str2bool(v):
+        if isinstance(v, bool):
+            return v
+        if v.lower() in ("yes", "true", "t", "y", "1"):
+            return True
+        elif v.lower() in ("no", "false", "f", "n", "0"):
+            return False
+        else:
+            raise argparse.ArgumentTypeError("Boolean value expected.")
+    parser = argparse.ArgumentParser(**parser_kwargs)
+    parser.add_argument(
+        "-n",
+        "--name",
+        type=str,
+        const=True,
+        default="",
+        nargs="?",
+        help="postfix for logdir",
+    )
+    parser.add_argument(
+        "-r",
+        "--resume",
+        type=str,
+        const=True,
+        default="",
+        nargs="?",
+        help="resume from logdir or checkpoint in logdir",
+    )
+    parser.add_argument(
+        "-b",
+        "--base",
+        nargs="*",
+        metavar="base_config.yaml",
+        help="paths to base configs. Loaded from left-to-right. "
+             "Parameters can be overwritten or added with command-line options of the form `--key value`.",
+        default=["configs/stable-diffusion/v1-inference-inpaint.yaml"],
+    )
+    parser.add_argument(
+        "-t",
+        "--train",
+        type=str2bool,
+        const=True,
+        default=True,
+        nargs="?",
+        help="train",
+    )
+    parser.add_argument(
+        "--no-test",
+        type=str2bool,
+        const=True,
+        default=False,
+        nargs="?",
+        help="disable test",
+    )
+    parser.add_argument(
+        "-p",
+        "--project",
+        help="name of new or path to existing project"
+    )
+    parser.add_argument(
+        "-d",
+        "--debug",
+        type=str2bool,
+        nargs="?",
+        const=True,
+        default=False,
+        help="enable post-mortem debugging",
+    )
+    parser.add_argument(
+        "-s",
+        "--seed",
+        type=int,
+        default=23,
+        help="seed for seed_everything",
+    )
+    parser.add_argument(
+        "-f",
+        "--postfix",
+        type=str,
+        default="",
+        help="post-postfix for default name",
+    )
+    parser.add_argument(
+        "-l",
+        "--logdir",
+        type=str,
+        default="logs",
+        help="directory for logging dat shit",
+    )
+    parser.add_argument(
+        "--pretrained_model",
+        type=str,
+        default="",
+        help="path to pretrained model",
+    )
+    parser.add_argument(
+        "--scale_lr",
+        type=str2bool,
+        nargs="?",
+        const=True,
+        default=True,
+        help="scale base-lr by ngpu * batch_size * n_accumulate",
+    )
+    parser.add_argument(
+        "--train_from_scratch",
+        type=str2bool,
+        nargs="?",
+        const=True,
+        default=False,
+        help="Train from scratch",
+    )
+    return parser
+def nondefault_trainer_args(opt):
+    parser = argparse.ArgumentParser()
+    parser = Trainer.add_argparse_args(parser)
+    args = parser.parse_args([])
+    return sorted(k for k in vars(args) if getattr(opt, k) != getattr(args, k))
+class WrappedDataset(Dataset):
+    """Wraps an arbitrary object with __len__ and __getitem__ into a pytorch dataset"""
+    def __init__(self, dataset):
+        self.data = dataset
+    def __len__(self):
+        return len(self.data)
+    def __getitem__(self, idx):
+        return self.data[idx]
+def worker_init_fn(_):
+    worker_info = torch.utils.data.get_worker_info()
+    dataset = worker_info.dataset
+    worker_id = worker_info.id
+    if isinstance(dataset, Txt2ImgIterableBaseDataset):
+        split_size = dataset.num_records // worker_info.num_workers
+        # reset num_records to the true number to retain reliable length information
+        dataset.sample_ids = dataset.valid_ids[worker_id * split_size:(worker_id + 1) * split_size]
+        current_id = np.random.choice(len(np.random.get_state()[1]), 1)
+        return np.random.seed(np.random.get_state()[1][current_id] + worker_id)
+    else:
+        return np.random.seed(np.random.get_state()[1][0] + worker_id)
+class DataModuleFromConfig(pl.LightningDataModule):
+    def __init__(self, batch_size, train=None, validation=None, test=None, predict=None,
+                 wrap=False, num_workers=None, shuffle_test_loader=False, use_worker_init_fn=False,
+                 shuffle_val_dataloader=False):
+        super().__init__()
+        self.batch_size = batch_size
+        self.dataset_configs = dict()
+        self.num_workers = num_workers if num_workers is not None else batch_size * 2
+        self.use_worker_init_fn = use_worker_init_fn
+        if train is not None:
+            self.dataset_configs["train"] = train
+            self.train_dataloader = self._train_dataloader
+        if validation is not None:
+            self.dataset_configs["validation"] = validation
+            self.val_dataloader = partial(self._val_dataloader, shuffle=shuffle_val_dataloader)
+        if test is not None:
+            self.dataset_configs["test"] = test
+            self.test_dataloader = partial(self._test_dataloader, shuffle=shuffle_test_loader)
+        if predict is not None:
+            self.dataset_configs["predict"] = predict
+            self.predict_dataloader = self._predict_dataloader
+        self.wrap = wrap
+    def prepare_data(self):
+        for data_cfg in self.dataset_configs.values():
+            instantiate_from_config(data_cfg)
+    def setup(self, stage=None):
+        self.datasets = dict(
+            (k, instantiate_from_config(self.dataset_configs[k]))
+            for k in self.dataset_configs)
+        if self.wrap:
+            for k in self.datasets:
+                self.datasets[k] = WrappedDataset(self.datasets[k])
+    def _train_dataloader(self):
+        is_iterable_dataset = isinstance(self.datasets['train'], Txt2ImgIterableBaseDataset)
+        if is_iterable_dataset or self.use_worker_init_fn:
+            init_fn = worker_init_fn
+        else:
+            init_fn = None
+        return DataLoader(self.datasets["train"], batch_size=self.batch_size,
+                          num_workers=self.num_workers, shuffle=False if is_iterable_dataset else True,
+                          worker_init_fn=init_fn)
+    def _val_dataloader(self, shuffle=False):
+        if isinstance(self.datasets['validation'], Txt2ImgIterableBaseDataset) or self.use_worker_init_fn:
+            init_fn = worker_init_fn
+        else:
+            init_fn = None
+        return DataLoader(self.datasets["validation"],
+                          batch_size=self.batch_size,
+                          num_workers=self.num_workers,
+                          worker_init_fn=init_fn,
+                          shuffle=shuffle)
+    def _test_dataloader(self, shuffle=False):
+        is_iterable_dataset = isinstance(self.datasets['train'], Txt2ImgIterableBaseDataset)
+        if is_iterable_dataset or self.use_worker_init_fn:
+            init_fn = worker_init_fn
+        else:
+            init_fn = None
+        # do not shuffle dataloader for iterable dataset
+        shuffle = shuffle and (not is_iterable_dataset)
+        return DataLoader(self.datasets["test"], batch_size=self.batch_size,
+                          num_workers=self.num_workers, worker_init_fn=init_fn, shuffle=shuffle)
+    def _predict_dataloader(self, shuffle=False):
+        if isinstance(self.datasets['predict'], Txt2ImgIterableBaseDataset) or self.use_worker_init_fn:
+            init_fn = worker_init_fn
+        else:
+            init_fn = None
+        return DataLoader(self.datasets["predict"], batch_size=self.batch_size,
+                          num_workers=self.num_workers, worker_init_fn=init_fn)
+class SetupCallback(Callback):
+    def __init__(self, resume, now, logdir, ckptdir, cfgdir, config, lightning_config):
+        super().__init__()
+        self.resume = resume
+        self.now = now
+        self.logdir = logdir
+        self.ckptdir = ckptdir
+        self.cfgdir = cfgdir
+        self.config = config
+        self.lightning_config = lightning_config
+    def on_keyboard_interrupt(self, trainer, pl_module):
+        if trainer.global_rank == 0:
+            print("Summoning checkpoint.")
+            if hasattr(self.config, 'lora_config'):
+                ckpt_path = os.path.join(self.ckptdir, "lora_last.ckpt")
+                from lora.lora import save_lora_weight
+                save_lora_weight(trainer.model, path=ckpt_path)
+            else:
+                ckpt_path = os.path.join(self.ckptdir, "last.ckpt")
+                trainer.save_checkpoint(ckpt_path)
+    def on_pretrain_routine_start(self, trainer, pl_module):
+        if trainer.global_rank == 0:
+            # Create logdirs and save configs
+            os.makedirs(self.logdir, exist_ok=True)
+            os.makedirs(self.ckptdir, exist_ok=True)
+            os.makedirs(self.cfgdir, exist_ok=True)
+            if "callbacks" in self.lightning_config:
+                if 'metrics_over_trainsteps_checkpoint' in self.lightning_config['callbacks']:
+                    os.makedirs(os.path.join(self.ckptdir, 'trainstep_checkpoints'), exist_ok=True)
+            print("Project config")
+            print(OmegaConf.to_yaml(self.config))
+            OmegaConf.save(self.config,
+                           os.path.join(self.cfgdir, "{}-project.yaml".format(self.now)))
+            print("Lightning config")
+            print(OmegaConf.to_yaml(self.lightning_config))
+            OmegaConf.save(OmegaConf.create({"lightning": self.lightning_config}),
+                           os.path.join(self.cfgdir, "{}-lightning.yaml".format(self.now)))
+        else:
+            # ModelCheckpoint callback created log directory --- remove it
+            if not self.resume and os.path.exists(self.logdir):
+                dst, name = os.path.split(self.logdir)
+                dst = os.path.join(dst, "child_runs", name)
+                os.makedirs(os.path.split(dst)[0], exist_ok=True)
+                try:
+                    os.rename(self.logdir, dst)
+                except FileNotFoundError:
+                    pass
+class ImageLogger(Callback):
+    def __init__(self, batch_frequency, max_images, clamp=True, increase_log_steps=True,
+                 rescale=True, disabled=False, log_on_batch_idx=False, log_first_step=False,
+                 log_images_kwargs=None):
+        super().__init__()
+        self.rescale = rescale
+        self.batch_freq = batch_frequency
+        self.max_images = max_images
+        self.logger_log_images = {
+            pl.loggers.TestTubeLogger: self._testtube,
+        }
+        self.log_steps = [2 ** n for n in range(int(np.log2(self.batch_freq)) + 1)]
+        if not increase_log_steps:
+            self.log_steps = [self.batch_freq]
+        self.clamp = clamp
+        self.disabled = disabled
+        self.log_on_batch_idx = log_on_batch_idx
+        self.log_images_kwargs = log_images_kwargs if log_images_kwargs else {}
+        self.log_first_step = log_first_step
+    @rank_zero_only
+    def _testtube(self, pl_module, images, batch_idx, split):
+        for k in images:
+            grid = torchvision.utils.make_grid(images[k])
+            grid = (grid + 1.0) / 2.0  # -1,1 -> 0,1; c,h,w
+            tag = f"{split}/{k}"
+            pl_module.logger.experiment.add_image(
+                tag, grid,
+                global_step=pl_module.global_step)
+    @rank_zero_only
+    def log_local(self, save_dir, split, images,
+                  global_step, current_epoch, batch_idx):
+        root = os.path.join(save_dir, "images", split)
+        for k in images:
+            grid = torchvision.utils.make_grid(images[k], nrow=4)
+            if self.rescale:
+                grid = (grid + 1.0) / 2.0  # -1,1 -> 0,1; c,h,w
+            grid = grid.transpose(0, 1).transpose(1, 2).squeeze(-1)
+            grid = grid.numpy()
+            grid = (grid * 255).astype(np.uint8)
+            filename = "{}_gs-{:06}_e-{:06}_b-{:06}.png".format(
+                k,
+                global_step,
+                current_epoch,
+                batch_idx)
+            path = os.path.join(root, filename)
+            os.makedirs(os.path.split(path)[0], exist_ok=True)
+            Image.fromarray(grid).save(path)
+    def log_img(self, pl_module, batch, batch_idx, split="train"):
+        check_idx = batch_idx if self.log_on_batch_idx else pl_module.global_step
+        if (self.check_frequency(check_idx) and  # batch_idx % self.batch_freq == 0
+                hasattr(pl_module, "log_images") and
+                callable(pl_module.log_images) and
+                self.max_images > 0):
+            logger = type(pl_module.logger)
+            is_train = pl_module.training
+            if is_train:
+                pl_module.eval()
+            with torch.no_grad():
+                images = pl_module.log_images(batch, split=split, **self.log_images_kwargs)
+            for k in images:
+                N = min(images[k].shape[0], self.max_images)
+                images[k] = images[k][:N]
+                if isinstance(images[k], torch.Tensor):
+                    images[k] = images[k].detach().cpu()
+                    if self.clamp:
+                        images[k] = torch.clamp(images[k], -1., 1.)
+            self.log_local(pl_module.logger.save_dir, split, images,
+                           pl_module.global_step, pl_module.current_epoch, batch_idx)
+            logger_log_images = self.logger_log_images.get(logger, lambda *args, **kwargs: None)
+            logger_log_images(pl_module, images, pl_module.global_step, split)
+            if is_train:
+                pl_module.train()
+    def check_frequency(self, check_idx):
+        if ((check_idx % self.batch_freq) == 0 or (check_idx in self.log_steps)) and (
+                check_idx > 0 or self.log_first_step):
+            try:
+                self.log_steps.pop(0)
+            except IndexError as e:
+                print(e)
+                pass
+            return True
+        return False
+    def on_train_batch_end(self, trainer, pl_module, outputs, batch, batch_idx, dataloader_idx):
+        if not self.disabled and (pl_module.global_step > 0 or self.log_first_step):
+            self.log_img(pl_module, batch, batch_idx, split="train")
+    def on_validation_batch_end(self, trainer, pl_module, outputs, batch, batch_idx, dataloader_idx):
+        if not self.disabled and pl_module.global_step > 0:
+            self.log_img(pl_module, batch, batch_idx, split="val")
+        if hasattr(pl_module, 'calibrate_grad_norm'):
+            if (pl_module.calibrate_grad_norm and batch_idx % 25 == 0) and batch_idx > 0:
+                self.log_gradients(trainer, pl_module, batch_idx=batch_idx)
+class CUDACallback(Callback):
+    # see https://github.com/SeanNaren/minGPT/blob/master/mingpt/callback.py
+    def on_train_epoch_start(self, trainer, pl_module):
+        # Reset the memory use counter
+        torch.cuda.reset_peak_memory_stats(trainer.root_gpu)
+        torch.cuda.synchronize(trainer.root_gpu)
+        self.start_time = time.time()
+    def on_train_epoch_end(self, trainer, pl_module, outputs):
+        torch.cuda.synchronize(trainer.root_gpu)
+        max_memory = torch.cuda.max_memory_allocated(trainer.root_gpu) / 2 ** 20
+        epoch_time = time.time() - self.start_time
+        try:
+            max_memory = trainer.training_type_plugin.reduce(max_memory)
+            epoch_time = trainer.training_type_plugin.reduce(epoch_time)
+            rank_zero_info(f"Average Epoch time: {epoch_time:.2f} seconds")
+            rank_zero_info(f"Average Peak memory {max_memory:.2f}MiB")
+        except AttributeError:
+            pass
+if __name__ == "__main__":
+    now = datetime.datetime.now().strftime("%Y-%m-%dT%H-%M-%S")
+    sys.path.append(os.getcwd())
+    parser = get_parser()
+    parser = Trainer.add_argparse_args(parser)
+    opt, unknown = parser.parse_known_args()
+    if opt.name and opt.resume:
+        raise ValueError(
+            "-n/--name and -r/--resume cannot be specified both."
+            "If you want to resume training in a new log folder, "
+            "use -n/--name in combination with --resume_from_checkpoint"
+        )
+    if opt.resume:
+        if not os.path.exists(opt.resume):
+            raise ValueError("Cannot find {}".format(opt.resume))
+        if os.path.isfile(opt.resume):
+            paths = opt.resume.split("/")
+            # idx = len(paths)-paths[::-1].index("logs")+1
+            # logdir = "/".join(paths[:idx])
+            logdir = "/".join(paths[:-2])
+            ckpt = opt.resume
+        else:
+            assert os.path.isdir(opt.resume), opt.resume
+            logdir = opt.resume.rstrip("/")
+            ckpt = os.path.join(logdir, "checkpoints", "last.ckpt")
+        opt.resume_from_checkpoint = ckpt
+        base_configs = sorted(glob.glob(os.path.join(logdir, "configs/*.yaml")))
+        opt.base = base_configs + opt.base
+        _tmp = logdir.split("/")
+        nowname = _tmp[-1]
+    else:
+        if opt.name:
+            name = "_" + opt.name
+        elif opt.base:
+            cfg_fname = os.path.split(opt.base[0])[-1]
+            cfg_name = os.path.splitext(cfg_fname)[0]
+            name = "_" + cfg_name
+        else:
+            name = ""
+        nowname = now + name + opt.postfix
+        logdir = os.path.join(opt.logdir, nowname)
+    ckptdir = os.path.join(logdir, "checkpoints")
+    cfgdir = os.path.join(logdir, "configs")
+    seed_everything(opt.seed)
+    # try:
+    # init and save configs
+    configs = [OmegaConf.load(cfg) for cfg in opt.base]
+    cli = OmegaConf.from_dotlist(unknown)
+    config = OmegaConf.merge(*configs, cli)
+    lightning_config = config.pop("lightning", OmegaConf.create())
+    # merge trainer cli with config
+    trainer_config = lightning_config.get("trainer", OmegaConf.create())
+    # default to ddp
+    trainer_config["accelerator"] = "ddp"
+    for k in nondefault_trainer_args(opt):
+        trainer_config[k] = getattr(opt, k)
+    if not "gpus" in trainer_config:
+        del trainer_config["accelerator"]
+        cpu = True
+    else:
+        gpuinfo = trainer_config["gpus"]
+        print(f"Running on GPUs {gpuinfo}")
+        cpu = False
+    trainer_opt = argparse.Namespace(**trainer_config)
+    lightning_config.trainer = trainer_config
+    # model
+    model = instantiate_from_config(config.model)
+    if not opt.resume:
+        if opt.train_from_scratch:
+            ckpt_file = torch.load(opt.pretrained_model, map_location='cpu')['state_dict']
+            ckpt_file = {key: value for key, value in ckpt_file.items() if not (key[:6] == 'model.')}
+            model.load_state_dict(ckpt_file, strict=False)
+            print("Train from scratch!")
+        else:
+            model.load_state_dict(torch.load(opt.pretrained_model, map_location='cpu')['state_dict'], strict=False)
+            print("Load Stable Diffusion v1-4!")
+    # lora
+    if hasattr(config, 'lora_config'):
+        model.eval()
+        model._requires_grad = False
+        from lora.lora import inject_trainable_lora_extended
+        params, names = inject_trainable_lora_extended(model, r=config.lora_config.rank)
+    model.requires_grad_(False)
+    for name, param in model.named_parameters():
+        if "diffusion_model.output_blocks" in name and "transformer_blocks" in name:
+            param.requires_grad = True
+        if "local_controlnet" in name or "pose" in name:
+            param.requires_grad = True
+        # 打开一个文件来写入模块名称
+    with open("module_names.txt", "w") as file:
+        # 遍历模型的所有模块并将名称写入文件
+        for name, param in model.named_parameters():
+            if param.requires_grad == True:
+                file.write(name + "\n")
+    # trainer and callbacks
+    trainer_kwargs = dict()
+    # default logger configs
+    default_logger_cfgs = {
+        "wandb": {
+            "target": "pytorch_lightning.loggers.WandbLogger",
+            "params": {
+                "name": nowname,
+                "save_dir": logdir,
+                "offline": opt.debug,
+                "id": nowname,
+            }
+        },
+        "testtube": {
+            "target": "pytorch_lightning.loggers.TestTubeLogger",
+            "params": {
+                "name": "testtube",
+                "save_dir": logdir,
+            }
+        },
+    }
+    default_logger_cfg = default_logger_cfgs["testtube"]
+    if "logger" in lightning_config:
+        logger_cfg = lightning_config.logger
+    else:
+        logger_cfg = OmegaConf.create()
+    logger_cfg = OmegaConf.merge(default_logger_cfg, logger_cfg)
+    trainer_kwargs["logger"] = instantiate_from_config(logger_cfg)
+    # modelcheckpoint - use TrainResult/EvalResult(checkpoint_on=metric) to
+    # specify which metric is used to determine best models
+    default_modelckpt_cfg = {
+        "target": "pytorch_lightning.callbacks.ModelCheckpoint",
+        "params": {
+            "dirpath": ckptdir,
+            "filename": "{epoch:06}",
+            "verbose": True,
+            "save_last": False,
+            "every_n_epochs": 1
+        }
+    }
+    if hasattr(model, "monitor"):
+        print(f"Monitoring {model.monitor} as checkpoint metric.")
+        default_modelckpt_cfg["params"]["monitor"] = model.monitor
+        default_modelckpt_cfg["params"]["save_top_k"] = 30
+    if "modelcheckpoint" in lightning_config:
+        modelckpt_cfg = lightning_config.modelcheckpoint
+    else:
+        modelckpt_cfg = OmegaConf.create()
+    modelckpt_cfg = OmegaConf.merge(default_modelckpt_cfg, modelckpt_cfg)
+    print(f"Merged modelckpt-cfg: \n{modelckpt_cfg}")
+    if version.parse(pl.__version__) < version.parse('1.4.0'):
+        trainer_kwargs["checkpoint_callback"] = instantiate_from_config(modelckpt_cfg)
+    # add callback which sets up log directory
+    default_callbacks_cfg = {
+        "setup_callback": {
+            "target": "main.SetupCallback",
+            "params": {
+                "resume": opt.resume,
+                "now": now,
+                "logdir": logdir,
+                "ckptdir": ckptdir,
+                "cfgdir": cfgdir,
+                "config": config,
+                "lightning_config": lightning_config,
+            }
+        },
+        "image_logger": {
+            "target": "main.ImageLogger",
+            "params": {
+                "batch_frequency": 500,
+                "max_images": 4,
+                "clamp": True
+            }
+        },
+        "learning_rate_logger": {
+            "target": "main.LearningRateMonitor",
+            "params": {
+                "logging_interval": "step",
+                # "log_momentum": True
+            }
+        },
+        "cuda_callback": {
+            "target": "main.CUDACallback"
+        },
+    }
+    if version.parse(pl.__version__) >= version.parse('1.4.0'):
+        default_callbacks_cfg.update({'checkpoint_callback': modelckpt_cfg})
+    if "callbacks" in lightning_config:
+        callbacks_cfg = lightning_config.callbacks
+    else:
+        callbacks_cfg = OmegaConf.create()
+    if 'metrics_over_trainsteps_checkpoint' in callbacks_cfg:
+        print(
+            'Caution: Saving checkpoints every n train steps without deleting. This might require some free space.')
+        default_metrics_over_trainsteps_ckpt_dict = {
+            'metrics_over_trainsteps_checkpoint':
+                {"target": 'pytorch_lightning.callbacks.ModelCheckpoint',
+                 'params': {
+                     "dirpath": os.path.join(ckptdir, 'trainstep_checkpoints'),
+                     "filename": "{epoch:06}-{step:09}",
+                     "verbose": True,
+                     'save_top_k': -1,
+                     'every_n_train_steps': 10000,
+                     'save_weights_only': True
+                 }
+                 }
+        }
+        default_callbacks_cfg.update(default_metrics_over_trainsteps_ckpt_dict)
+    callbacks_cfg = OmegaConf.merge(default_callbacks_cfg, callbacks_cfg)
+    if 'ignore_keys_callback' in callbacks_cfg and hasattr(trainer_opt, 'resume_from_checkpoint'):
+        callbacks_cfg.ignore_keys_callback.params['ckpt_path'] = trainer_opt.resume_from_checkpoint
+    elif 'ignore_keys_callback' in callbacks_cfg:
+        del callbacks_cfg['ignore_keys_callback']
+    trainer_kwargs["callbacks"] = [instantiate_from_config(callbacks_cfg[k]) for k in callbacks_cfg]
+    trainer = Trainer.from_argparse_args(trainer_opt, **trainer_kwargs)
+    # trainer.plugins = [MyCluster()]
+    trainer.logdir = logdir  ###
+    # data
+    data = instantiate_from_config(config.data)
+    # NOTE according to https://pytorch-lightning.readthedocs.io/en/latest/datamodules.html
+    # calling these ourselves should not be necessary but it is.
+    # lightning still takes care of proper multiprocessing though
+    data.prepare_data()
+    data.setup()
+    print("#### Data #####")
+    for k in data.datasets:
+        print(f"{k}, {data.datasets[k].__class__.__name__}, {len(data.datasets[k])}")
+    # configure learning rate
+    bs, base_lr = config.data.params.batch_size, config.model.base_learning_rate
+    if not cpu:
+        ngpu = len(lightning_config.trainer.gpus.strip(",").split(','))
+    else:
+        ngpu = 1
+    if 'accumulate_grad_batches' in lightning_config.trainer:
+        accumulate_grad_batches = lightning_config.trainer.accumulate_grad_batches
+    else:
+        accumulate_grad_batches = 1
+    # if 'num_nodes' in lightning_config.trainer:
+    #     num_nodes = lightning_config.trainer.num_nodes
+    # else:
+    num_nodes = 1
+    print(f"accumulate_grad_batches = {accumulate_grad_batches}")
+    lightning_config.trainer.accumulate_grad_batches = accumulate_grad_batches
+    if opt.scale_lr:
+        model.learning_rate = accumulate_grad_batches * num_nodes * ngpu * bs * base_lr
+        print(
+            "Setting learning rate to {:.2e} = {} (accumulate_grad_batches) * {} (num_nodes) * {} (num_gpus) * {} (batchsize) * {:.2e} (base_lr)".format(
+                model.learning_rate, accumulate_grad_batches, num_nodes, ngpu, bs, base_lr))
+    else:
+        model.learning_rate = base_lr
+        print("++++ NOT USING LR SCALING ++++")
+        print(f"Setting learning rate to {model.learning_rate:.2e}")
+    # allow checkpointing via USR1
+    def melk(*args, **kwargs):
+        # run all checkpoint hooks
+        if trainer.global_rank == 0:
+            print("Summoning checkpoint.")
+            ckpt_path = os.path.join(ckptdir, "last.ckpt")
+            trainer.save_checkpoint(ckpt_path)
+    def divein(*args, **kwargs):
+        if trainer.global_rank == 0:
+            import pudb
+            pudb.set_trace()
+    import signal
+    signal.signal(signal.SIGUSR1, melk)
+    signal.signal(signal.SIGUSR2, divein)
+    # run
+    if opt.train:
+        try:
+            trainer.fit(model, data)
+        except Exception:
+            melk()
+            raise
+    if not opt.no_test and not trainer.interrupted:
+        trainer.test(model, data)

test.py ADDED Viewed

	@@ -0,0 +1,447 @@

+import argparse, os, sys, glob
+import cv2
+import torch
+import numpy as np
+from omegaconf import OmegaConf
+from PIL import Image
+from torch.utils.data import DataLoader
+from torchvision import transforms
+from tqdm import tqdm, trange
+from itertools import islice
+from einops import rearrange
+from torchvision.utils import make_grid
+import time
+from pytorch_lightning import seed_everything
+from torch import autocast
+from contextlib import contextmanager, nullcontext
+import torchvision
+from ldm.data.cp_dataset import CPDataset
+from ldm.resizer import Resizer
+from ldm.util import instantiate_from_config
+from ldm.models.diffusion.ddim import DDIMSampler
+from ldm.models.diffusion.plms import PLMSSampler
+from ldm.data.deepfashions import DFPairDataset
+import clip
+from torchvision.transforms import Resize
+def chunk(it, size):
+    it = iter(it)
+    return iter(lambda: tuple(islice(it, size)), ())
+def get_tensor_clip(normalize=True, toTensor=True):
+    transform_list = []
+    if toTensor:
+        transform_list += [torchvision.transforms.ToTensor()]
+    if normalize:
+        transform_list += [torchvision.transforms.Normalize((0.48145466, 0.4578275, 0.40821073),
+                                                            (0.26862954, 0.26130258, 0.27577711))]
+    return torchvision.transforms.Compose(transform_list)
+def numpy_to_pil(images):
+    """
+    Convert a numpy image or a batch of images to a PIL image.
+    """
+    if images.ndim == 3:
+        images = images[None, ...]
+    images = (images * 255).round().astype("uint8")
+    pil_images = [Image.fromarray(image) for image in images]
+    return pil_images
+def load_model_from_config(config, ckpt, verbose=False):
+    print(f"Loading model from {ckpt}")
+    pl_sd = torch.load(ckpt, map_location="cpu")
+    if "global_step" in pl_sd:
+        print(f"Global Step: {pl_sd['global_step']}")
+    sd = pl_sd["state_dict"]
+    model = instantiate_from_config(config.model)
+    m, u = model.load_state_dict(sd, strict=False)
+    if len(m) > 0 and verbose:
+        print("missing keys:")
+        print(m)
+    if len(u) > 0 and verbose:
+        print("unexpected keys:")
+        print(u)
+    model.cuda()
+    model.eval()
+    return model
+def put_watermark(img, wm_encoder=None):
+    if wm_encoder is not None:
+        img = cv2.cvtColor(np.array(img), cv2.COLOR_RGB2BGR)
+        img = wm_encoder.encode(img, 'dwtDct')
+        img = Image.fromarray(img[:, :, ::-1])
+    return img
+def load_replacement(x):
+    try:
+        hwc = x.shape
+        y = Image.open("assets/rick.jpeg").convert("RGB").resize((hwc[1], hwc[0]))
+        y = (np.array(y) / 255.0).astype(x.dtype)
+        assert y.shape == x.shape
+        return y
+    except Exception:
+        return x
+def get_tensor(normalize=True, toTensor=True):
+    transform_list = []
+    if toTensor:
+        transform_list += [torchvision.transforms.ToTensor()]
+    if normalize:
+        transform_list += [torchvision.transforms.Normalize((0.5, 0.5, 0.5),
+                                                            (0.5, 0.5, 0.5))]
+    return torchvision.transforms.Compose(transform_list)
+def get_tensor_clip(normalize=True, toTensor=True):
+    transform_list = []
+    if toTensor:
+        transform_list += [torchvision.transforms.ToTensor()]
+    if normalize:
+        transform_list += [torchvision.transforms.Normalize((0.48145466, 0.4578275, 0.40821073),
+                                                            (0.26862954, 0.26130258, 0.27577711))]
+    return torchvision.transforms.Compose(transform_list)
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--outdir",
+        type=str,
+        nargs="?",
+        help="dir to write results to",
+        default="outputs/txt2img-samples"
+    )
+    parser.add_argument(
+        "--skip_grid",
+        action='store_true',
+        help="do not save a grid, only individual samples. Helpful when evaluating lots of samples",
+    )
+    parser.add_argument(
+        "--skip_save",
+        action='store_true',
+        help="do not save individual samples. For speed measurements.",
+    )
+    parser.add_argument(
+        "--gpu_id",
+        type=int,
+        default=0,
+        help="which gpu to use",
+    )
+    parser.add_argument(
+        "--ddim_steps",
+        type=int,
+        default=30,
+        help="number of ddim sampling steps",
+    )
+    parser.add_argument(
+        "--plms",
+        action='store_true',
+        help="use plms sampling",
+    )
+    parser.add_argument(
+        "--fixed_code",
+        action='store_true',
+        help="if enabled, uses the same starting code across samples ",
+    )
+    parser.add_argument(
+        "--ddim_eta",
+        type=float,
+        default=0.0,
+        help="ddim eta (eta=0.0 corresponds to deterministic sampling",
+    )
+    parser.add_argument(
+        "--n_iter",
+        type=int,
+        default=2,
+        help="sample this often",
+    )
+    parser.add_argument(
+        "--H",
+        type=int,
+        default=512,
+        help="image height, in pixel space",
+    )
+    parser.add_argument(
+        "--W",
+        type=int,
+        default=512,
+        help="image width, in pixel space",
+    )
+    parser.add_argument(
+        "--n_imgs",
+        type=int,
+        default=100,
+        help="image width, in pixel space",
+    )
+    parser.add_argument(
+        "--C",
+        type=int,
+        default=4,
+        help="latent channels",
+    )
+    parser.add_argument(
+        "--f",
+        type=int,
+        default=8,
+        help="downsampling factor",
+    )
+    parser.add_argument(
+        "--n_samples",
+        type=int,
+        default=1,
+        help="how many samples to produce for each given reference image. A.k.a. batch size",
+    )
+    parser.add_argument(
+        "--n_rows",
+        type=int,
+        default=0,
+        help="rows in the grid (default: n_samples)",
+    )
+    parser.add_argument(
+        "--scale",
+        type=float,
+        default=1,
+        help="unconditional guidance scale: eps = eps(x, empty) + scale * (eps(x, cond) - eps(x, empty))",
+    )
+    parser.add_argument(
+        "--config",
+        type=str,
+        default="",
+        help="path to config which constructs model",
+    )
+    parser.add_argument(
+        "--ckpt",
+        type=str,
+        default="",
+        help="path to checkpoint of model",
+    )
+    parser.add_argument(
+        "--seed",
+        type=int,
+        default=42,
+        help="the seed (for reproducible sampling)",
+    )
+    parser.add_argument(
+        "--precision",
+        type=str,
+        help="evaluate at this precision",
+        choices=["full", "autocast"],
+        default="autocast"
+    )
+    parser.add_argument(
+        "--unpaired",
+        action='store_true',
+        help="if enabled, uses the same starting code across samples "
+    )
+    parser.add_argument(
+        "--dataroot",
+        type=str,
+        help="path to dataroot of the dataset",
+        default=""
+    )
+    opt = parser.parse_args()
+    seed_everything(opt.seed)
+    device = torch.device("cuda:{}".format(opt.gpu_id)) if torch.cuda.is_available() else torch.device("cpu")
+    torch.cuda.set_device(device)
+    config = OmegaConf.load(f"{opt.config}")
+    version = opt.config.split('/')[-1].split('.')[0]
+    model = load_model_from_config(config, f"{opt.ckpt}")
+    # model = model.to(device)
+    dataset = CPDataset(opt.dataroot, opt.H, mode='test', unpaired=opt.unpaired)
+    loader = DataLoader(dataset, batch_size=opt.n_samples, shuffle=False, num_workers=4, pin_memory=True)
+    if opt.plms:
+        sampler = PLMSSampler(model)
+    else:
+        sampler = DDIMSampler(model)
+    os.makedirs(opt.outdir, exist_ok=True)
+    outpath = opt.outdir
+    result_path = os.path.join(outpath, "upper_body")
+    os.makedirs(result_path, exist_ok=True)
+    start_code = None
+    if opt.fixed_code:
+        start_code = torch.randn([opt.n_samples, opt.C, opt.H // opt.f, opt.W // opt.f], device=device)
+    iterator = tqdm(loader, desc='Test Dataset', total=len(loader))
+    precision_scope = autocast if opt.precision == "autocast" else nullcontext
+    with torch.no_grad():
+        with precision_scope("cuda"):
+            with model.ema_scope():
+                for data in iterator:
+                    mask_tensor = data['inpaint_mask']
+                    inpaint_image = data['inpaint_image']
+                    ref_tensor_f = data['ref_imgs_f']
+                    ref_tensor_b = data['ref_imgs_b']
+                    skeleton_cf = data['skeleton_cf']
+                    skeleton_cb = data['skeleton_cb']
+                    skeleton_p = data['skeleton_p']
+                    order = data['order']
+                    feat_tensor = data['warp_feat']
+                    image_tensor = data['GT']
+                    controlnet_cond_f = data['controlnet_cond_f']
+                    controlnet_cond_b = data['controlnet_cond_b']
+                    ref_tensor = ref_tensor_f
+                    for i in range(len(order)):
+                        if order[i] == "1" or order[i] == "2":
+                            continue
+                        elif order[i] == "3":
+                            ref_tensor[i] = ref_tensor_b[i]
+                        else:
+                            raise ValueError("Invalid order")
+                    # filename = data['file_name']
+                    test_model_kwargs = {}
+                    test_model_kwargs['inpaint_mask'] = mask_tensor.to(device)
+                    test_model_kwargs['inpaint_image'] = inpaint_image.to(device)
+                    feat_tensor = feat_tensor.to(device)
+                    ref_tensor = ref_tensor.to(device)
+                    controlnet_cond_f = controlnet_cond_f.to(device)
+                    controlnet_cond_b = controlnet_cond_b.to(device)
+                    skeleton_cf = skeleton_cf.to(device)
+                    skeleton_cb = skeleton_cb.to(device)
+                    skeleton_p = skeleton_p.to(device)
+                    uc = None
+                    if opt.scale != 1.0:
+                        uc = model.learnable_vector
+                        uc = uc.repeat(ref_tensor.size(0), 1, 1)
+                    c = model.get_learned_conditioning(ref_tensor.to(torch.float16))
+                    c = model.proj_out(c)
+                    # z_gt = model.encode_first_stage(image_tensor.to(device))
+                    # z_gt = model.get_first_stage_encoding(z_gt).detach()
+                    z_inpaint = model.encode_first_stage(test_model_kwargs['inpaint_image'])
+                    z_inpaint = model.get_first_stage_encoding(z_inpaint).detach()
+                    test_model_kwargs['inpaint_image'] = z_inpaint
+                    test_model_kwargs['inpaint_mask'] = Resize([z_inpaint.shape[-2], z_inpaint.shape[-1]])(
+                        test_model_kwargs['inpaint_mask'])
+                    warp_feat = model.encode_first_stage(feat_tensor)
+                    warp_feat = model.get_first_stage_encoding(warp_feat).detach()
+                    ts = torch.full((1,), 999, device=device, dtype=torch.long)
+                    start_code = model.q_sample(warp_feat, ts)
+                    # local_controlnet
+                    ehs_cf = model.pose_model(skeleton_cf)
+                    ehs_cb = model.pose_model(skeleton_cb)
+                    ehs_p = model.pose_model(skeleton_p)
+                    ehs_text = torch.zeros((c.shape[0], 1, 768)).to("cuda")
+                    # controlnet_cond = torch.cat((controlnet_cond_f, controlnet_cond_b, ehs_cf, ehs_cb, ehs_p), dim=1)
+                    x_noisy = torch.cat(
+                        (start_code, test_model_kwargs['inpaint_image'], test_model_kwargs['inpaint_mask']), dim=1)
+                    down_samples_f, mid_samples_f = model.local_controlnet(x_noisy, ts,
+                                                                           encoder_hidden_states=ehs_text.to("cuda"), controlnet_cond=controlnet_cond_f, ehs_c=ehs_cf, ehs_p=ehs_p)
+                    down_samples_b, mid_samples_b = model.local_controlnet(x_noisy, ts,
+                                                                           encoder_hidden_states=ehs_text.to("cuda"), controlnet_cond=controlnet_cond_b, ehs_c=ehs_cb, ehs_p=ehs_p)
+                    # print(torch.max(down_samples_f[0]))
+                    # print(torch.min(down_samples_f[0]))
+                    # normalized_tensor = (down_samples_f[0] + 1) / 2
+                    # # 将张量值范围从[0,1]转换到[0,255]
+                    # scaled_tensor = normalized_tensor * 255
+                    # # 将张量转换为NumPy数组
+                    # numpy_array = scaled_tensor.squeeze().cpu().numpy().astype(np.uint8)
+                    # # 将NumPy数组转换为PIL图像
+                    # image = Image.fromarray(numpy_array)
+                    # # 保存图像
+                    # image.save("down_samples_f.jpg")
+                    # normalized_tensor = (down_samples_b[0] + 1) / 2
+                    # # 将张量值范围从[0,1]转换到[0,255]
+                    # scaled_tensor = normalized_tensor * 255
+                    # # 将张量转换为NumPy数组
+                    # numpy_array = scaled_tensor.squeeze().cpu().numpy().astype(np.uint8)
+                    # # 将NumPy数组转换为PIL图像
+                    # image = Image.fromarray(numpy_array)
+                    # # 保存图像
+                    # image.save("down_samples_b.jpg")
+                    mid_samples = mid_samples_f + mid_samples_b
+                    down_samples = ()
+                    for ds in range(len(down_samples_f)):
+                        tmp = torch.cat((down_samples_f[ds], down_samples_b[ds]), dim=1)
+                        down_samples = down_samples + (tmp,)
+                    shape = [opt.C, opt.H // opt.f, opt.W // opt.f]
+                    samples_ddim, _ = sampler.sample(S=opt.ddim_steps,
+                                                     conditioning=c,
+                                                     batch_size=opt.n_samples,
+                                                     shape=shape,
+                                                     verbose=False,
+                                                     unconditional_guidance_scale=opt.scale,
+                                                     unconditional_conditioning=uc,
+                                                     eta=opt.ddim_eta,
+                                                     x_T=start_code,
+                                                     down_samples=down_samples,
+                                                     test_model_kwargs=test_model_kwargs)
+                    x_samples_ddim = model.decode_first_stage(samples_ddim)
+                    x_sample_result = x_samples_ddim
+                    x_samples_ddim = torch.clamp((x_samples_ddim + 1.0) / 2.0, min=0.0, max=1.0)
+                    x_samples_ddim = x_samples_ddim.cpu().permute(0, 2, 3, 1).numpy()
+                    x_checked_image = x_samples_ddim
+                    x_checked_image_torch = torch.from_numpy(x_checked_image).permute(0, 3, 1, 2)
+                    x_source = torch.clamp((image_tensor + 1.0) / 2.0, min=0.0, max=1.0)
+                    x_result = x_checked_image_torch * (1 - mask_tensor) + mask_tensor * x_source
+                    # x_result = x_checked_image_torch
+                    resize = transforms.Resize((opt.H, int(opt.H / 256 * 192)))
+                    if not opt.skip_save:
+                        def un_norm(x):
+                            return (x + 1.0) / 2.0
+                        for i, x_sample in enumerate(x_result):
+                            filename = data['file_name'][i]
+                            # filename = data['file_name']
+                            save_x = resize(x_sample)
+                            save_x = 255. * rearrange(save_x.cpu().numpy(), 'c h w -> h w c')
+                            img = Image.fromarray(save_x.astype(np.uint8))
+                            img.save(os.path.join(result_path, filename[:-4] + ".png"))
+    print(f"Your samples are ready and waiting for you here: \n{outpath} \n"
+          f" \nEnjoy.")
+if __name__ == "__main__":
+    main()

test.sh ADDED Viewed

	@@ -0,0 +1,13 @@

+CUDA_VISIBLE_DEVICES=3 python test.py --gpu_id 0 \
+--ddim_steps 50 \
+--outdir results/try/ \
+--config configs/viton512.yaml \
+--dataroot /datasets/NVG \
+--ckpt checkpoints/mvg.ckpt  \
+--n_samples 1 \
+--seed 23 \
+--scale 1 \
+--H 512 \
+--W 384
+#!/bin/bash

train.sh ADDED Viewed

	@@ -0,0 +1 @@


1	+ CUDA_VISIBLE_DEVICES=4,5 python -u main.py --logdir models/oc --pretrained_model checkpoints/model.ckpt --base configs/viton512.yaml --scale_lr False