Spaces:

nyasukun
/

toxic-eye

Sleeping

App Files Files Community

toxic-eye / troubleshooting.md

nyasukun

8ea290d 3 months ago

preview code

raw

history blame contribute delete

3.89 kB

	# Troubleshooting Guide

	This document provides solutions for common issues encountered when running the Toxic Eye application.

	## Gradio Version Compatibility

	Ensure that you're using Gradio version 5.23.2 as specified in the project's `README.md` file:

	```bash
	pip install gradio==5.23.2
	```

	You can check your current Gradio version with:

	```bash
	pip show gradio
	```

	If you're running on HuggingFace Spaces, check that the `sdk_version` in the README.md frontmatter is set to 5.23.2:

	```yaml
	sdk: gradio
	sdk_version: 5.23.2
	```

	Using older or newer versions might cause unexpected behavior with the Spaces GPU integration.

	## GPU Acceleration Issues

	### spaces.GPU Decorator Issues

	We've observed that the `spaces.GPU` decorator may not work correctly when used with methods inside a class. This can lead to errors like:

	```
	HTTP Request: POST http://device-api.zero/release?allowToken=... "HTTP/1.1 404 Not Found"
	Error in text generation: 'GPU task aborted'
	```

	### Solution

	1. The syntax for spaces.GPU can be either with or without parentheses. Both of these syntaxes should work:

	```python
	@spaces.GPU
	def generate_text(model_path, text):
	# ...
	```

	```python
	@spaces.GPU()
	def generate_text(model_path, text):
	# ...
	```

	If you need to specify a duration for longer GPU operations, use parentheses:

	```python
	@spaces.GPU(duration=120) # Set 120-second duration
	def generate_long_text(model_path, text):
	# ...
	```

	2. Use standalone functions instead of class methods with spaces.GPU:

	Problematic:
	```python
	class ModelManager:
	@spaces.GPU
	def generate_text(self, model_path, text): # Class method doesn't work well
	# ...
	```

	Recommended:
	```python
	@spaces.GPU
	def generate_text_local(model_path, text): # Standalone function
	# ...
	```

	3. Use direct pipeline creation instead of loading model and tokenizer separately:

	Recommended:
	```python
	tokenizer = AutoTokenizer.from_pretrained(model_path)
	pipe = pipeline(
	"text-generation",
	model=model_path, # Pass the model ID/path directly
	tokenizer=tokenizer,
	torch_dtype=torch.bfloat16,
	device_map="auto"
	)
	```

	4. Use synchronous `InferenceClient` instead of `AsyncInferenceClient` for API calls:

	Recommended:
	```python
	from huggingface_hub import InferenceClient
	client = InferenceClient(model_id)
	response = client.text_generation(text) # Synchronous call
	```

	5. Implement appropriate error handling to gracefully recover from GPU task aborts:

	```python
	try:
	result = pipeline(text)
	return result
	except Exception as e:
	logger.error(f"Error: {str(e)}")
	return f"Error: {str(e)}" # Return error message instead of raising
	```

	## Other Common Issues

	### Multiple Models Loading Timeout

	When preloading multiple large models, the application might timeout or crash due to memory constraints.

	Solution:
	- Use `torch.bfloat16` or `torch.float16` precision to reduce memory usage
	- Add `trust_remote_code=True` parameter when loading models
	- Use `do_sample=False` to make text generation more deterministic
	- Keep token generation limits reasonable (max_new_tokens=40 or less)

	### API vs Local Model Performance

	When mixing API and local models, you might encounter inconsistent behavior.

	Solution:
	- Keep separate functions for API and local model execution
	- Handle errors distinctly for each type
	- Use non-async code for simpler execution flow

	## Reporting Issues

	If you encounter issues not covered in this guide, please report them by creating an issue in the repository with:
	- A detailed description of the problem
	- Relevant error messages
	- Steps to reproduce the issue
	- Your environment information (OS, Python version, GPU, etc.)