WebashalarForML's picture
Upload 111 files
adadaa5 verified
|
raw
history blame
2.98 kB

General Agent with Audio, Image, and File Processing

This is a general-purpose agent built with LangChain/LangGraph that includes advanced tools for processing various types of data.

Tool Categories

The agent includes tools for working with:

  • Web Content: Search web pages, news articles, Wikipedia, ArXiv papers
  • Files: Read PDFs, DOCXs, Excel files
  • Media: Process images, transcribe audio, extract text from YouTube videos
  • Code: Analyze code structure, read code files, analyze functions
  • Math: Basic math operations

Testing Your Installation

Before using the agent, you should check if all dependencies are installed and tools are working correctly:

  1. Check and install dependencies:

    python fix_dependencies.py
    

    This script will check for missing Python packages and system dependencies.

  2. Test all tools:

    python test_all_tools.py
    

    This will test all tools and report any issues.

  3. Test image and audio processing specifically:

    python test_image_audio.py
    

    This focuses on testing media processing tools and provides detailed troubleshooting steps.

System Requirements

For full functionality, you'll need:

  • Python 3.8+
  • Tesseract OCR (for image text extraction)
  • FFmpeg (for audio processing)
  • Internet connection (for web search, YouTube, etc.)
  • API Keys: GROQ_API_KEY must be set in .env file or environment variables

Agent Structure

This agent uses a streamlined 3-node graph structure:

  1. PerceptionAgent: Handles web searches, looking up information
  2. ActionAgent: Performs calculations, file operations, code analysis
  3. EvaluationAgent: Ensures answers are properly formatted

Common Issues

If you encounter issues:

  1. Web Scraping Errors: The agent has robust error handling for 403 Forbidden errors
  2. Audio Processing Errors: Make sure FFmpeg is installed and in your PATH
  3. Image Processing Errors: Make sure Tesseract OCR is installed and in your PATH
  4. GROQ API Rate Limits: The agent includes automatic rate limiting and retry mechanisms

Running GAIA Tests

To test if the agent can properly handle factual questions with GAIA format:

python test_factual_questions.py

Testing Individual Tools

from agent import multiply, add, subtract, divide, modulus  # Math tools
from agent import web_search, wiki_search  # Web tools
from agent import read_text_from_pdf, read_text_from_docx  # Document tools
from agent import image_processing  # Image tools
from agent import transcribe_audio  # Audio tools
from agent import analyze_code, read_code_file  # Code tools

# Test a tool directly
result = multiply(5, 7)
print(result)  # 35

# Process an image
image_description = image_processing("Describe this image", "path/to/image.jpg")
print(image_description)