Xiang_Lookalike Text-to-Image (14B) Generation
This repository contains the necessary steps and scripts to generate Xiang_Lookalike using the Wan2.1-T2V-14B text-to-video model with LoRA (Low-Rank Adaptation) weights.
Prerequisites
Before proceeding, ensure that you have the following installed on your system:
- Ubuntu (or a compatible Linux distribution)
- Python 3.x
- pip (Python package manager)
- Git
- Git LFS (Git Large File Storage)
Installation
Update and Install Dependencies
sudo apt-get update && sudo apt-get install build-essential git-lfs
Clone the Repository
⚠️ Note: You can use any existing Wan2.1-compatible repo structure or clone directly from Hugging Face.
git clone https://huggingface.co/svjack/Xiang_Lookalike_wan_2_1_14_B_text2video_lora cd Xiang_Lookalike_wan_2_1_14_B_text2video_lora
Install Python Dependencies
pip install torch torchvision pip install -r requirements.txt pip install ascii-magic matplotlib tensorboard huggingface_hub datasets pip install sageattention==1.0.6
Download Model Weights
# Base Models wget https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/diffusion_models/wan2.1_t2v_14B_bf16.safetensors wget https://huggingface.co/Wan-AI/Wan2.1-T2V-14B/resolve/main/models_t5_umt5-xxl-enc-bf16.pth wget https://huggingface.co/Wan-AI/Wan2.1-T2V-14B/resolve/main/Wan2.1_VAE.pth
Usage
To generate an image, use the wan_generate_video.py
script with the --task t2v-14B
parameter.
Example: Xiang looklike boy
python wan_generate_video.py --fp8 --task t2v-14B --video_size 480 832 --infer_steps 35 --video_length 45 \
--save_path save --output_type both \
--dit wan2.1_t2v_14B_bf16.safetensors --vae Wan2.1_VAE.pth \
--t5 models_t5_umt5-xxl-enc-bf16.pth \
--attn_mode torch \
--lora_weight XiangLooklike_w14_outputs/XiangLooklike_w14_lora-000005.safetensors \
--lora_multiplier 1.0 \
--interactive
Prompt
- 1
一个年轻的男子在喝奶。
- 2
一个年轻男子在舔棒棒糖。
- 3
一个年轻的男子赤裸全身站在镜头前,正在吃冰淇凌。
Key Parameters
Parameter | Description |
---|---|
--fp8 |
Enable FP8 precision for improved performance |
--task |
Set to t2i-14B for image generation |
--video_size |
Output resolution (e.g., 480 832 ) |
--infer_steps |
Speed vs quality trade-off (20 recommended for quick test) |
--lora_weight |
Path to LoRA weight files (can specify multiple) |
--lora_multiplier |
Strength of LoRA effect (default: 1.0) |
--prompt |
Include "3D Chibi Style" for best results |
Style Characteristics
For optimal results, prompts should emphasize:
- Chibi-style characters with exaggerated heads and facial expressions
- Vibrant colors and dynamic lighting effects
- Fantasy or magical settings (e.g., gardens, castles, floating islands)
- Neon or glowing elements, especially in futuristic or energetic scenes
Output
Generated images will be saved in the specified --save_path
directory with:
- PNG image file
- (Optional) MP4 video (if
--output_type both
is used)
Troubleshooting
- Ensure all model weights are correctly downloaded and placed in the right directories.
- Check GPU memory availability; at least 20GB VRAM is recommended for 14B models.
- Verify no conflicts exist between Python packages using
pip check
.
License
This project is licensed under the MIT License.
Acknowledgments
- Hugging Face – For hosting the model and dataset repositories
- Wan-AI – For providing base diffusion models
- svjack – For adapting and sharing LoRA weights for various styles
For support or feedback, please open an issue in this repository.