# Troubleshooting Guide

This document provides solutions for common issues encountered when running the Toxic Eye application.

## Gradio Version Compatibility

Ensure that you're using Gradio version 5.23.2 as specified in the project's `README.md` file:

```bash
pip install gradio==5.23.2
```

You can check your current Gradio version with:

```bash
pip show gradio
```

If you're running on HuggingFace Spaces, check that the `sdk_version` in the README.md frontmatter is set to 5.23.2:

```yaml
sdk: gradio
sdk_version: 5.23.2
```

Using older or newer versions might cause unexpected behavior with the Spaces GPU integration.

## GPU Acceleration Issues

### spaces.GPU Decorator Issues

We've observed that the `spaces.GPU` decorator may not work correctly when used with methods inside a class. This can lead to errors like:

```
HTTP Request: POST http://device-api.zero/release?allowToken=... "HTTP/1.1 404 Not Found"
Error in text generation: 'GPU task aborted'
```

### Solution

1. The syntax for spaces.GPU can be either with or without parentheses. Both of these syntaxes should work:

   ```python
   @spaces.GPU
   def generate_text(model_path, text):
       # ...
   ```

   ```python
   @spaces.GPU()
   def generate_text(model_path, text):
       # ...
   ```

   If you need to specify a duration for longer GPU operations, use parentheses:
   
   ```python
   @spaces.GPU(duration=120)  # Set 120-second duration
   def generate_long_text(model_path, text):
       # ...
   ```

2. Use standalone functions instead of class methods with spaces.GPU:

   **Problematic:**
   ```python
   class ModelManager:
       @spaces.GPU
       def generate_text(self, model_path, text):  # Class method doesn't work well
           # ...
   ```

   **Recommended:**
   ```python
   @spaces.GPU
   def generate_text_local(model_path, text):  # Standalone function
       # ...
   ```

3. Use direct pipeline creation instead of loading model and tokenizer separately:

   **Recommended:**
   ```python
   tokenizer = AutoTokenizer.from_pretrained(model_path)
   pipe = pipeline(
       "text-generation",
       model=model_path,  # Pass the model ID/path directly
       tokenizer=tokenizer,
       torch_dtype=torch.bfloat16,
       device_map="auto"
   )
   ```

4. Use synchronous `InferenceClient` instead of `AsyncInferenceClient` for API calls:

   **Recommended:**
   ```python
   from huggingface_hub import InferenceClient
   client = InferenceClient(model_id)
   response = client.text_generation(text)  # Synchronous call
   ```

5. Implement appropriate error handling to gracefully recover from GPU task aborts:

   ```python
   try:
       result = pipeline(text)
       return result
   except Exception as e:
       logger.error(f"Error: {str(e)}")
       return f"Error: {str(e)}"  # Return error message instead of raising
   ```

## Other Common Issues

### Multiple Models Loading Timeout

When preloading multiple large models, the application might timeout or crash due to memory constraints.

**Solution:**
- Use `torch.bfloat16` or `torch.float16` precision to reduce memory usage
- Add `trust_remote_code=True` parameter when loading models
- Use `do_sample=False` to make text generation more deterministic
- Keep token generation limits reasonable (max_new_tokens=40 or less)

### API vs Local Model Performance

When mixing API and local models, you might encounter inconsistent behavior.

**Solution:**
- Keep separate functions for API and local model execution
- Handle errors distinctly for each type
- Use non-async code for simpler execution flow

## Reporting Issues

If you encounter issues not covered in this guide, please report them by creating an issue in the repository with:
- A detailed description of the problem
- Relevant error messages
- Steps to reproduce the issue
- Your environment information (OS, Python version, GPU, etc.)