Troubleshooting Guide

This document provides solutions for common issues encountered when running the Toxic Eye application.

Gradio Version Compatibility

Ensure that you're using Gradio version 5.23.2 as specified in the project's README.md file:

pip install gradio==5.23.2

You can check your current Gradio version with:

pip show gradio

If you're running on HuggingFace Spaces, check that the sdk_version in the README.md frontmatter is set to 5.23.2:

sdk: gradio
sdk_version: 5.23.2

Using older or newer versions might cause unexpected behavior with the Spaces GPU integration.

GPU Acceleration Issues

spaces.GPU Decorator Issues

We've observed that the spaces.GPU decorator may not work correctly when used with methods inside a class. This can lead to errors like:

HTTP Request: POST http://device-api.zero/release?allowToken=... "HTTP/1.1 404 Not Found"
Error in text generation: 'GPU task aborted'

Solution

The syntax for spaces.GPU can be either with or without parentheses. Both of these syntaxes should work:

@spaces.GPU
def generate_text(model_path, text):
    # ...

@spaces.GPU()
def generate_text(model_path, text):
    # ...

If you need to specify a duration for longer GPU operations, use parentheses:

@spaces.GPU(duration=120)  # Set 120-second duration
def generate_long_text(model_path, text):
    # ...

Use standalone functions instead of class methods with spaces.GPU:

Problematic:

class ModelManager:
    @spaces.GPU
    def generate_text(self, model_path, text):  # Class method doesn't work well
        # ...

Recommended:

@spaces.GPU
def generate_text_local(model_path, text):  # Standalone function
    # ...

Use direct pipeline creation instead of loading model and tokenizer separately:

Recommended:

tokenizer = AutoTokenizer.from_pretrained(model_path)
pipe = pipeline(
    "text-generation",
    model=model_path,  # Pass the model ID/path directly
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

Use synchronous InferenceClient instead of AsyncInferenceClient for API calls:

Recommended:

from huggingface_hub import InferenceClient
client = InferenceClient(model_id)
response = client.text_generation(text)  # Synchronous call

Implement appropriate error handling to gracefully recover from GPU task aborts:

try:
    result = pipeline(text)
    return result
except Exception as e:
    logger.error(f"Error: {str(e)}")
    return f"Error: {str(e)}"  # Return error message instead of raising

Other Common Issues

Multiple Models Loading Timeout

When preloading multiple large models, the application might timeout or crash due to memory constraints.

Solution:

Use torch.bfloat16 or torch.float16 precision to reduce memory usage
Add trust_remote_code=True parameter when loading models
Use do_sample=False to make text generation more deterministic
Keep token generation limits reasonable (max_new_tokens=40 or less)

API vs Local Model Performance

When mixing API and local models, you might encounter inconsistent behavior.

Solution:

Keep separate functions for API and local model execution
Handle errors distinctly for each type
Use non-async code for simpler execution flow

Reporting Issues

If you encounter issues not covered in this guide, please report them by creating an issue in the repository with:

A detailed description of the problem
Relevant error messages
Steps to reproduce the issue
Your environment information (OS, Python version, GPU, etc.)