toxic-eye / troubleshooting.md
nyasukun's picture
.
8ea290d

A newer version of the Gradio SDK is available: 5.34.2

Upgrade

Troubleshooting Guide

This document provides solutions for common issues encountered when running the Toxic Eye application.

Gradio Version Compatibility

Ensure that you're using Gradio version 5.23.2 as specified in the project's README.md file:

pip install gradio==5.23.2

You can check your current Gradio version with:

pip show gradio

If you're running on HuggingFace Spaces, check that the sdk_version in the README.md frontmatter is set to 5.23.2:

sdk: gradio
sdk_version: 5.23.2

Using older or newer versions might cause unexpected behavior with the Spaces GPU integration.

GPU Acceleration Issues

spaces.GPU Decorator Issues

We've observed that the spaces.GPU decorator may not work correctly when used with methods inside a class. This can lead to errors like:

HTTP Request: POST http://device-api.zero/release?allowToken=... "HTTP/1.1 404 Not Found"
Error in text generation: 'GPU task aborted'

Solution

  1. The syntax for spaces.GPU can be either with or without parentheses. Both of these syntaxes should work:

    @spaces.GPU
    def generate_text(model_path, text):
        # ...
    
    @spaces.GPU()
    def generate_text(model_path, text):
        # ...
    

    If you need to specify a duration for longer GPU operations, use parentheses:

    @spaces.GPU(duration=120)  # Set 120-second duration
    def generate_long_text(model_path, text):
        # ...
    
  2. Use standalone functions instead of class methods with spaces.GPU:

    Problematic:

    class ModelManager:
        @spaces.GPU
        def generate_text(self, model_path, text):  # Class method doesn't work well
            # ...
    

    Recommended:

    @spaces.GPU
    def generate_text_local(model_path, text):  # Standalone function
        # ...
    
  3. Use direct pipeline creation instead of loading model and tokenizer separately:

    Recommended:

    tokenizer = AutoTokenizer.from_pretrained(model_path)
    pipe = pipeline(
        "text-generation",
        model=model_path,  # Pass the model ID/path directly
        tokenizer=tokenizer,
        torch_dtype=torch.bfloat16,
        device_map="auto"
    )
    
  4. Use synchronous InferenceClient instead of AsyncInferenceClient for API calls:

    Recommended:

    from huggingface_hub import InferenceClient
    client = InferenceClient(model_id)
    response = client.text_generation(text)  # Synchronous call
    
  5. Implement appropriate error handling to gracefully recover from GPU task aborts:

    try:
        result = pipeline(text)
        return result
    except Exception as e:
        logger.error(f"Error: {str(e)}")
        return f"Error: {str(e)}"  # Return error message instead of raising
    

Other Common Issues

Multiple Models Loading Timeout

When preloading multiple large models, the application might timeout or crash due to memory constraints.

Solution:

  • Use torch.bfloat16 or torch.float16 precision to reduce memory usage
  • Add trust_remote_code=True parameter when loading models
  • Use do_sample=False to make text generation more deterministic
  • Keep token generation limits reasonable (max_new_tokens=40 or less)

API vs Local Model Performance

When mixing API and local models, you might encounter inconsistent behavior.

Solution:

  • Keep separate functions for API and local model execution
  • Handle errors distinctly for each type
  • Use non-async code for simpler execution flow

Reporting Issues

If you encounter issues not covered in this guide, please report them by creating an issue in the repository with:

  • A detailed description of the problem
  • Relevant error messages
  • Steps to reproduce the issue
  • Your environment information (OS, Python version, GPU, etc.)