Xiang_Lookalike Text-to-Image (14B) Generation

This repository contains the necessary steps and scripts to generate Xiang_Lookalike using the Wan2.1-T2V-14B text-to-video model with LoRA (Low-Rank Adaptation) weights.

Prerequisites

Before proceeding, ensure that you have the following installed on your system:

Ubuntu (or a compatible Linux distribution)
Python 3.x
pip (Python package manager)
Git
Git LFS (Git Large File Storage)

Installation

Update and Install Dependencies

sudo apt-get update && sudo apt-get install build-essential git-lfs

Clone the Repository

⚠️ Note: You can use any existing Wan2.1-compatible repo structure or clone directly from Hugging Face.

git clone https://huggingface.co/svjack/Xiang_Lookalike_wan_2_1_14_B_text2video_lora
cd Xiang_Lookalike_wan_2_1_14_B_text2video_lora

Install Python Dependencies

pip install torch torchvision
pip install -r requirements.txt
pip install ascii-magic matplotlib tensorboard huggingface_hub datasets
pip install sageattention==1.0.6

Download Model Weights

# Base Models
wget https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/diffusion_models/wan2.1_t2v_14B_bf16.safetensors
wget https://huggingface.co/Wan-AI/Wan2.1-T2V-14B/resolve/main/models_t5_umt5-xxl-enc-bf16.pth
wget https://huggingface.co/Wan-AI/Wan2.1-T2V-14B/resolve/main/Wan2.1_VAE.pth

Usage

To generate an image, use the wan_generate_video.py script with the --task t2v-14B parameter.

Example: Xiang looklike boy

python wan_generate_video.py --fp8 --task t2v-14B --video_size 480 832 --infer_steps 35 --video_length 45 \
--save_path save --output_type both \
--dit wan2.1_t2v_14B_bf16.safetensors --vae Wan2.1_VAE.pth \
--t5 models_t5_umt5-xxl-enc-bf16.pth \
--attn_mode torch \
--lora_weight XiangLooklike_w14_outputs/XiangLooklike_w14_lora-000005.safetensors \
--lora_multiplier 1.0 \
--interactive

Prompt

一个年轻的男子在喝奶。

一个年轻男子在舔棒棒糖。

一个年轻的男子赤裸全身站在镜头前，正在吃冰淇凌。

Key Parameters

Parameter	Description
`--fp8`	Enable FP8 precision for improved performance
`--task`	Set to `t2i-14B` for image generation
`--video_size`	Output resolution (e.g., `480 832`)
`--infer_steps`	Speed vs quality trade-off (`20` recommended for quick test)
`--lora_weight`	Path to LoRA weight files (can specify multiple)
`--lora_multiplier`	Strength of LoRA effect (default: 1.0)
`--prompt`	Include `"3D Chibi Style"` for best results

Style Characteristics

For optimal results, prompts should emphasize:

Chibi-style characters with exaggerated heads and facial expressions
Vibrant colors and dynamic lighting effects
Fantasy or magical settings (e.g., gardens, castles, floating islands)
Neon or glowing elements, especially in futuristic or energetic scenes

Output

Generated images will be saved in the specified --save_path directory with:

PNG image file
(Optional) MP4 video (if --output_type both is used)

Troubleshooting

Ensure all model weights are correctly downloaded and placed in the right directories.
Check GPU memory availability; at least 20GB VRAM is recommended for 14B models.
Verify no conflicts exist between Python packages using pip check.

License

This project is licensed under the MIT License.

Acknowledgments

Hugging Face – For hosting the model and dataset repositories
Wan-AI – For providing base diffusion models
svjack – For adapting and sharing LoRA weights for various styles

For support or feedback, please open an issue in this repository.