Spaces:
No application file
General Agent with Audio, Image, and File Processing
This is a general-purpose agent built with LangChain/LangGraph that includes advanced tools for processing various types of data.
Tool Categories
The agent includes tools for working with:
- Web Content: Search web pages, news articles, Wikipedia, ArXiv papers
- Files: Read PDFs, DOCXs, Excel files
- Media: Process images, transcribe audio, extract text from YouTube videos
- Code: Analyze code structure, read code files, analyze functions
- Math: Basic math operations
Testing Your Installation
Before using the agent, you should check if all dependencies are installed and tools are working correctly:
Check and install dependencies:
python fix_dependencies.py
This script will check for missing Python packages and system dependencies.
Test all tools:
python test_all_tools.py
This will test all tools and report any issues.
Test image and audio processing specifically:
python test_image_audio.py
This focuses on testing media processing tools and provides detailed troubleshooting steps.
System Requirements
For full functionality, you'll need:
- Python 3.8+
- Tesseract OCR (for image text extraction)
- FFmpeg (for audio processing)
- Internet connection (for web search, YouTube, etc.)
- API Keys: GROQ_API_KEY must be set in .env file or environment variables
Agent Structure
This agent uses a streamlined 3-node graph structure:
- PerceptionAgent: Handles web searches, looking up information
- ActionAgent: Performs calculations, file operations, code analysis
- EvaluationAgent: Ensures answers are properly formatted
Common Issues
If you encounter issues:
- Web Scraping Errors: The agent has robust error handling for 403 Forbidden errors
- Audio Processing Errors: Make sure FFmpeg is installed and in your PATH
- Image Processing Errors: Make sure Tesseract OCR is installed and in your PATH
- GROQ API Rate Limits: The agent includes automatic rate limiting and retry mechanisms
Running GAIA Tests
To test if the agent can properly handle factual questions with GAIA format:
python test_factual_questions.py
Testing Individual Tools
from agent import multiply, add, subtract, divide, modulus # Math tools
from agent import web_search, wiki_search # Web tools
from agent import read_text_from_pdf, read_text_from_docx # Document tools
from agent import image_processing # Image tools
from agent import transcribe_audio # Audio tools
from agent import analyze_code, read_code_file # Code tools
# Test a tool directly
result = multiply(5, 7)
print(result) # 35
# Process an image
image_description = image_processing("Describe this image", "path/to/image.jpg")
print(image_description)