nsfw
Not-For-All-Audiences

NSFW Wan 1.3B T2V - Uncensored Text-to-Video Model

🚨 IMPORTANT UPDATE: New Experimental Checkpoints Available! 🚨

A new, experimental set of checkpoints (wan_1.3B_exp_e1 through wan_1.3B_exp_e14) has been released. These were trained using a revised methodology to fix significant image quality degradation issues (e.g., body horror, artifacts) found in the original e4-e20 checkpoints.

We strongly recommend new users start with the experimental wan_1.3B_exp_e14.safetensors checkpoint. For more details, see the new section below titled "The 'Fix' - Experimental Epochs 1-8". Your feedback on these new models is crucial and will help determine if they will replace the original series.

Model Description

NSFW Wan 1.3b T2V is a powerful, 1.3 billion parameter text-to-video generation model, specifically fine-tuned for generating Not Safe For Work (NSFW) content. The model has undergone multiple training methodologies to create a model that has a solid understanding across the entire NSFW spectrum and can generate videos with coherent motion natively.

The primary goal of this model is to provide a research and creative tool capable of generating thematically relevant short video clips based on text prompts within the adult content domain. It aims to understand and render a wide array of NSFW scenarios, aesthetics, and actions described in natural language, now with improved temporal consistency.

Model Details

  • Architecture: Text-to-Video Transformer Architecture
  • Parameters: 1.3 Billion
  • Type: Text-to-Video (T2V)
  • Specialization: NSFW Content Generation

🚨 The "Fix" - Experimental Epochs 1-8 (Recommended)

After user feedback and internal review revealed significant image quality degradation and "body horror" artifacts in the original training run (specifically after epoch 3), a new training procedure was designed and executed.

The Problem with the Original Run

The original two-phase approach, while sound in theory, suffered in practice. The initial image-only training phase (epochs 1-10) was too aggressive, causing "catastrophic forgetting" where the model's understanding of coherent anatomy (faces, hands, etc.) collapsed. The subsequent video-only training (epochs 11-20) could not fully recover from this damage, resulting in outputs that were often distorted or of low quality.

The Revised Training Solution

A new, single-run training configuration was developed to address these flaws from the ground up:

  • Mixed Dataset: Instead of separate phases, the new run was trained on a mixed dataset of 30k video clips and 20k still images simultaneously. This provided constant spatial regularization, preventing the anatomical drift and quality collapse seen previously.
  • Stable Configuration: A more conservative learning rate (LR), smaller batch sizes, and a shorter overall training schedule were used to ensure the model learned the new concepts without destroying its foundational knowledge.
  • Outcome: The result is a series of 8 new experimental epochs that demonstrate vastly improved spatial quality, stable motion, and reliable NSFW fidelity without the "glitching" or "body horror" of the original run.

We strongly recommend using wan_1.3B_exp_e14.safetensors for all general use cases and LoRA training. This checkpoint represents the best trade-off between explicit content generation and visual coherence from the new, improved training run.


Two-Phase Training: Image and Video (Legacy)

Note: This section describes the original training process, which resulted in the e1 through e20 checkpoints. This process had known flaws that led to quality degradation. For the best results, please use the new experimental models described above.

The model's original training was split into two distinct phases to first build a strong aesthetic foundation and then learn motion.

  • Epochs 1-10 (Image-Trained): The initial checkpoints were fine-tuned primarily on a massive NSFW image dataset. These epochs excel at style and detail but have limited native motion capabilities. Quality degrades significantly after epoch 3.
  • Epochs 11-20 (Video-Trained): These later checkpoints were trained exclusively on a video dataset. This phase taught the model temporal coherence and motion. The result is a model that can generate quality video directly, without the need for any helper LoRAs. wan_1.3B_e20.safetensors is the best of this original series.

Training Data

The model was trained on a dataset comprising the top 1,000 posts from approximately 1,250 distinct NSFW subreddits. This dataset was carefully curated to capture a broad spectrum of adult themes, visual styles, character archetypes, specific kinks, and actions prevalent in these online communities. The second phase of training utilized a video dataset sourced from similar communities.

The captions associated with the training data leveraged the language and tagging conventions found within these subreddents. For insights into effective prompting strategies for specific styles or content, please refer to the prompting-guide.json file included in this repository.

Note: Due to the nature of the source material, the training dataset inherently contains explicit adult content.

Files Included

  • Experimental (Recommended):
    • wan_1.3B_exp_e1.safetensors
    • ... (and all intermediate epochs)
    • wan_1.3B_exp_e14.safetensors
  • Original (Legacy):
    • wan_1.3B_e1.safetensors
    • ... (and all intermediate epochs)
    • wan_1.3B_e20.safetensors
  • prompting-guide.json: This crucial JSON file contains an analysis of common keywords, phrases, and descriptive language associated with the content from various source subreddits. It is designed to help users craft more effective prompts.

How to Use

This model is intended for generating short video clips (typically a few seconds) from descriptive text prompts.

  1. Select a Checkpoint: We now strongly recommend using the experimental wan_1.3B_exp_e14.safetensors. This checkpoint is from the revised training run and offers superior visual quality and motion coherence.
  2. No Helper LoRA Needed: With the video-trained checkpoints (e11-e20 and all exp models), you do not need to use the old NSFW_Wan_1.3b_motion_helper LoRA. The model generates motion natively.
  3. Craft Your Prompt: Utilize natural language to describe the desired scene, subjects, actions, and style.
  4. Consult prompting-guide.json: For best results, especially when targeting specific sub-community styles or niche fetishes, refer to the prompting-guide.json. This guide will provide insights into the terminology and phrasing most likely to elicit the desired output.
  5. Generate: Use your preferred inference pipeline compatible with this model architecture.

The Ideal Base for LoRA Fine-Tuning

While NSFW Wan 1.3B T2V is a capable standalone model, its greatest strength lies in its efficacy as a foundational base for training specialized LoRAs (Low-Rank Adaptations).

We highly recommend using the new wan_1.3B_exp_e14.safetensors as the base for all LoRA training.

Its improved and more stable training provides an even more robust understanding of:

  • Core NSFW Anatomy & Aesthetics: The mixed-data training provides a strong, non-degraded grasp of anatomy and visual styles.
  • Coherent Motion & Actions: The video component provides foundational knowledge of common sexual acts and temporal consistency.

Because this new base model is not "damaged," you don't need to waste training cycles teaching your LoRA to fix underlying anatomical problems. You can focus your LoRA training dataset exclusively on the specific niche concept, character, artistic style, or unique action you want to master. This leads to more efficient LoRA training and superior results.

Community & Support

Join our Discord server!

Connect with other users, share your creations, get help with prompting, discuss the new experimental models, and contribute to the community:

https://discord.gg/mjnStFuCYh

We encourage active participation and feedback to help improve future iterations and resources! Your feedback on the experimental models is especially valuable.

Limitations and Bias

  • NSFW Focus: The model's knowledge is heavily biased towards the content prevalent in the NSFW subreddits it was trained on. It will likely perform poorly on SFW (Safe For Work) prompts.
  • Specificity & Artifacts: While greatly improved in the experimental checkpoints, the model may still produce visual artifacts, anatomical inaccuracies, or fail to perfectly capture highly complex or nuanced prompts. Video generation is an evolving field.
  • Bias: The training data reflects the content, biases, preferences, and potentially problematic depictions present in the source NSFW communities. The model may generate content that perpetuates these biases.
  • Safety: This model does not have built-in safety filters. Users are responsible for the ethical application of the model.
  • Temporal Coherence: Coherence is significantly improved. However, very long or complex actions might still exhibit some temporal inconsistencies.

Ethical Considerations & Responsible AI

This model is intended for adult users (18+/21+ depending on local regulations) only.

  • Consent and Harm: This model generates fictional, synthetic media. It must not be used to create non-consensual depictions of real individuals, to impersonate, defame, harass, or generate content that could cause harm.
  • Legal Use: Users are solely responsible for ensuring that their use of this model and the content they generate complies with all applicable local, national, and international laws and regulations.
  • Distribution: Exercise extreme caution and responsibility if distributing content generated by this model. Be mindful of platform terms of service and legal restrictions regarding adult content.
  • No Endorsement: The creators of this model do not endorse or condone the creation or distribution of illegal, unethical, or harmful content.

We strongly recommend users familiarize themselves with responsible AI practices and the potential societal impacts of generative NSFW media.

License

Steal this model!

Disclaimer

The outputs of this model are entirely synthetic and computer-generated. They do not depict real people or events unless explicitly prompted to do so with user-provided data (which is not the intended use of this pre-trained model). The developers of this model are not responsible for the outputs created by users.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ 5 Ask for provider support

Model tree for NSFW-API/NSFW_Wan_1.3b

Finetuned
(13)
this model
Adapters
1 model
Finetunes
1 model