# Troubleshooting Guide This document provides solutions for common issues encountered when running the Toxic Eye application. ## Gradio Version Compatibility Ensure that you're using Gradio version 5.23.2 as specified in the project's `README.md` file: ```bash pip install gradio==5.23.2 ``` You can check your current Gradio version with: ```bash pip show gradio ``` If you're running on HuggingFace Spaces, check that the `sdk_version` in the README.md frontmatter is set to 5.23.2: ```yaml sdk: gradio sdk_version: 5.23.2 ``` Using older or newer versions might cause unexpected behavior with the Spaces GPU integration. ## GPU Acceleration Issues ### spaces.GPU Decorator Issues We've observed that the `spaces.GPU` decorator may not work correctly when used with methods inside a class. This can lead to errors like: ``` HTTP Request: POST http://device-api.zero/release?allowToken=... "HTTP/1.1 404 Not Found" Error in text generation: 'GPU task aborted' ``` ### Solution 1. The syntax for spaces.GPU can be either with or without parentheses. Both of these syntaxes should work: ```python @spaces.GPU def generate_text(model_path, text): # ... ``` ```python @spaces.GPU() def generate_text(model_path, text): # ... ``` If you need to specify a duration for longer GPU operations, use parentheses: ```python @spaces.GPU(duration=120) # Set 120-second duration def generate_long_text(model_path, text): # ... ``` 2. Use standalone functions instead of class methods with spaces.GPU: **Problematic:** ```python class ModelManager: @spaces.GPU def generate_text(self, model_path, text): # Class method doesn't work well # ... ``` **Recommended:** ```python @spaces.GPU def generate_text_local(model_path, text): # Standalone function # ... ``` 3. Use direct pipeline creation instead of loading model and tokenizer separately: **Recommended:** ```python tokenizer = AutoTokenizer.from_pretrained(model_path) pipe = pipeline( "text-generation", model=model_path, # Pass the model ID/path directly tokenizer=tokenizer, torch_dtype=torch.bfloat16, device_map="auto" ) ``` 4. Use synchronous `InferenceClient` instead of `AsyncInferenceClient` for API calls: **Recommended:** ```python from huggingface_hub import InferenceClient client = InferenceClient(model_id) response = client.text_generation(text) # Synchronous call ``` 5. Implement appropriate error handling to gracefully recover from GPU task aborts: ```python try: result = pipeline(text) return result except Exception as e: logger.error(f"Error: {str(e)}") return f"Error: {str(e)}" # Return error message instead of raising ``` ## Other Common Issues ### Multiple Models Loading Timeout When preloading multiple large models, the application might timeout or crash due to memory constraints. **Solution:** - Use `torch.bfloat16` or `torch.float16` precision to reduce memory usage - Add `trust_remote_code=True` parameter when loading models - Use `do_sample=False` to make text generation more deterministic - Keep token generation limits reasonable (max_new_tokens=40 or less) ### API vs Local Model Performance When mixing API and local models, you might encounter inconsistent behavior. **Solution:** - Keep separate functions for API and local model execution - Handle errors distinctly for each type - Use non-async code for simpler execution flow ## Reporting Issues If you encounter issues not covered in this guide, please report them by creating an issue in the repository with: - A detailed description of the problem - Relevant error messages - Steps to reproduce the issue - Your environment information (OS, Python version, GPU, etc.)