toxic-eye / troubleshooting.md
nyasukun's picture
.
8ea290d
# Troubleshooting Guide
This document provides solutions for common issues encountered when running the Toxic Eye application.
## Gradio Version Compatibility
Ensure that you're using Gradio version 5.23.2 as specified in the project's `README.md` file:
```bash
pip install gradio==5.23.2
```
You can check your current Gradio version with:
```bash
pip show gradio
```
If you're running on HuggingFace Spaces, check that the `sdk_version` in the README.md frontmatter is set to 5.23.2:
```yaml
sdk: gradio
sdk_version: 5.23.2
```
Using older or newer versions might cause unexpected behavior with the Spaces GPU integration.
## GPU Acceleration Issues
### spaces.GPU Decorator Issues
We've observed that the `spaces.GPU` decorator may not work correctly when used with methods inside a class. This can lead to errors like:
```
HTTP Request: POST http://device-api.zero/release?allowToken=... "HTTP/1.1 404 Not Found"
Error in text generation: 'GPU task aborted'
```
### Solution
1. The syntax for spaces.GPU can be either with or without parentheses. Both of these syntaxes should work:
```python
@spaces.GPU
def generate_text(model_path, text):
# ...
```
```python
@spaces.GPU()
def generate_text(model_path, text):
# ...
```
If you need to specify a duration for longer GPU operations, use parentheses:
```python
@spaces.GPU(duration=120) # Set 120-second duration
def generate_long_text(model_path, text):
# ...
```
2. Use standalone functions instead of class methods with spaces.GPU:
**Problematic:**
```python
class ModelManager:
@spaces.GPU
def generate_text(self, model_path, text): # Class method doesn't work well
# ...
```
**Recommended:**
```python
@spaces.GPU
def generate_text_local(model_path, text): # Standalone function
# ...
```
3. Use direct pipeline creation instead of loading model and tokenizer separately:
**Recommended:**
```python
tokenizer = AutoTokenizer.from_pretrained(model_path)
pipe = pipeline(
"text-generation",
model=model_path, # Pass the model ID/path directly
tokenizer=tokenizer,
torch_dtype=torch.bfloat16,
device_map="auto"
)
```
4. Use synchronous `InferenceClient` instead of `AsyncInferenceClient` for API calls:
**Recommended:**
```python
from huggingface_hub import InferenceClient
client = InferenceClient(model_id)
response = client.text_generation(text) # Synchronous call
```
5. Implement appropriate error handling to gracefully recover from GPU task aborts:
```python
try:
result = pipeline(text)
return result
except Exception as e:
logger.error(f"Error: {str(e)}")
return f"Error: {str(e)}" # Return error message instead of raising
```
## Other Common Issues
### Multiple Models Loading Timeout
When preloading multiple large models, the application might timeout or crash due to memory constraints.
**Solution:**
- Use `torch.bfloat16` or `torch.float16` precision to reduce memory usage
- Add `trust_remote_code=True` parameter when loading models
- Use `do_sample=False` to make text generation more deterministic
- Keep token generation limits reasonable (max_new_tokens=40 or less)
### API vs Local Model Performance
When mixing API and local models, you might encounter inconsistent behavior.
**Solution:**
- Keep separate functions for API and local model execution
- Handle errors distinctly for each type
- Use non-async code for simpler execution flow
## Reporting Issues
If you encounter issues not covered in this guide, please report them by creating an issue in the repository with:
- A detailed description of the problem
- Relevant error messages
- Steps to reproduce the issue
- Your environment information (OS, Python version, GPU, etc.)