This PR adds AoTI compilation + Flash Attention 3

A working demo with the changes is available at: https://huggingface.co/spaces/zerogpu-aoti/Qwen-Image-Edit-aot-dynamic-fa3-fix-cfg

An average inference (for the diffusion part) performance gain is as follows:

  • 64.8s Dynamic AoT + FA3
  • 100.1s original demo

PS: The previous version of this PR was breaking when CFG was activated. This PR fixes that.

multimodalart changed pull request status to open
littlebird13 changed pull request status to merged

Sign up or log in to comment