General Agent with Audio, Image, and File Processing

This is a general-purpose agent built with LangChain/LangGraph that includes advanced tools for processing various types of data.

Tool Categories

The agent includes tools for working with:

Web Content: Search web pages, news articles, Wikipedia, ArXiv papers
Files: Read PDFs, DOCXs, Excel files
Media: Process images, transcribe audio, extract text from YouTube videos
Code: Analyze code structure, read code files, analyze functions
Math: Basic math operations

Testing Your Installation

Before using the agent, you should check if all dependencies are installed and tools are working correctly:

Check and install dependencies:
```
python fix_dependencies.py
```
This script will check for missing Python packages and system dependencies.
Test all tools:
```
python test_all_tools.py
```
This will test all tools and report any issues.
Test image and audio processing specifically:
```
python test_image_audio.py
```
This focuses on testing media processing tools and provides detailed troubleshooting steps.

System Requirements

For full functionality, you'll need:

Python 3.8+
Tesseract OCR (for image text extraction)
FFmpeg (for audio processing)
Internet connection (for web search, YouTube, etc.)
API Keys: GROQ_API_KEY must be set in .env file or environment variables

Agent Structure

This agent uses a streamlined 3-node graph structure:

PerceptionAgent: Handles web searches, looking up information
ActionAgent: Performs calculations, file operations, code analysis
EvaluationAgent: Ensures answers are properly formatted

Common Issues

If you encounter issues:

Web Scraping Errors: The agent has robust error handling for 403 Forbidden errors
Audio Processing Errors: Make sure FFmpeg is installed and in your PATH
Image Processing Errors: Make sure Tesseract OCR is installed and in your PATH
GROQ API Rate Limits: The agent includes automatic rate limiting and retry mechanisms

Running GAIA Tests

To test if the agent can properly handle factual questions with GAIA format:

python test_factual_questions.py

Testing Individual Tools

from agent import multiply, add, subtract, divide, modulus  # Math tools
from agent import web_search, wiki_search  # Web tools
from agent import read_text_from_pdf, read_text_from_docx  # Document tools
from agent import image_processing  # Image tools
from agent import transcribe_audio  # Audio tools
from agent import analyze_code, read_code_file  # Code tools

# Test a tool directly
result = multiply(5, 7)
print(result)  # 35

# Process an image
image_description = image_processing("Describe this image", "path/to/image.jpg")
print(image_description)