---
title: Agentic HF Analyzer
emoji: 🌍
colorFrom: yellow
colorTo: green
sdk: gradio
sdk_version: 5.32.1
app_file: app.py
pinned: false
short_description: Recommends users which Repos/Spaces to look at
---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

# 🚀 HF Repo Analyzer

An AI-powered Hugging Face repository discovery and analysis tool that helps you find, evaluate, and explore the best repositories for your specific needs.

![HF Repo Analyzer](https://img.shields.io/badge/Powered%20by-Gradio-orange)
![Python](https://img.shields.io/badge/Python-3.8+-blue)
![Hugging Face](https://img.shields.io/badge/Hugging%20Face-Spaces-yellow)

## ✨ Features

- 🤖 **AI Assistant**: Intelligent conversation-based repository discovery
- 🔍 **Smart Search**: Auto-detection of repository IDs vs. keywords
- 📊 **Automated Analysis**: LLM-powered repository evaluation and ranking
- 🏆 **Top 3 Selection**: AI-curated most relevant repositories
- 💬 **Repository Explorer**: Interactive chat with repository contents
- 🎯 **Requirements Extraction**: Automatic keyword extraction from conversations
- 📋 **Comprehensive Results**: Detailed analysis with strengths, weaknesses, and specialities

## 🚦 Quick Start

### Prerequisites

- Python 3.8+
- OpenAI API key (for LLM analysis)
- Hugging Face access (for repository downloads)

### Installation

1. **Clone the repository**
   ```bash
   git clone <repository-url>
   cd Agentic_HF_Analyzer
   ```

2. **Install dependencies**
   ```bash
   pip install -r requirements.txt
   ```

3. **Set up environment variables**
   ```bash
   export modal_api="your_openai_api_key"
   export base_url="your_openai_base_url"
   ```

4. **Run the application**
   ```bash
   python app.py
   ```

5. **Open your browser** to `http://localhost:7860`

## 📖 User Guide

### 🤖 Using the AI Assistant (Recommended)

1. **Start a Conversation**
   - Navigate to the "🤖 AI Assistant" tab
   - Describe your project: "I'm building a chatbot for customer service"
   - The AI will ask clarifying questions about your needs

2. **Automatic Discovery**
   - When the AI has enough information, it will automatically:
     - Extract relevant keywords from your conversation
     - Search for matching repositories
     - Analyze and rank them by relevance

3. **Review Results**
   - The interface automatically switches to "🔬 Analysis & Results"
   - View the top 3 most relevant repositories
   - Browse all analyzed repositories with detailed insights

### 📝 Using Smart Search (Direct Input)

1. **Repository IDs**
   ```
   microsoft/DialoGPT-medium
   openai/whisper
   huggingface/transformers
   ```

2. **Keywords**
   ```
   text generation
   image classification
   sentiment analysis
   ```

3. **Mixed Input**
   - The system automatically detects the input type
   - Repository IDs (containing `/`) are processed directly
   - Keywords trigger automatic repository search

### 🔬 Analyzing Results

- **Top 3 Repositories**: AI-selected most relevant based on your requirements
- **Detailed Analysis**: Strengths, weaknesses, specialities, and relevance ratings
- **Quick Actions**: Click repository names to visit or explore them
- **Repository Explorer**: Deep dive into individual repositories with AI chat

### 🔍 Repository Explorer

1. **Access Methods**:
   - Click "🔍 Open in Repo Explorer" from repository actions
   - Manually enter repository ID in the Repo Explorer tab

2. **Features**:
   - Automatic repository loading and analysis
   - Interactive chat about repository contents
   - File structure exploration
   - Code analysis and explanations

## 🛠️ Technical Architecture

### Core Components

```
app.py                 # Main Gradio interface and orchestration
├── analyzer.py        # Repository analysis and LLM processing
├── hf_utils.py       # Hugging Face API interactions
├── chatbot_page.py   # AI assistant conversation logic
└── repo_explorer.py  # Repository exploration interface
```

### Key Features Implementation

#### 🤖 AI Assistant
- **System Prompt**: Focused on requirements gathering, not recommendations
- **Auto-Extraction**: Detects conversation readiness for keyword extraction
- **Smart Processing**: Converts natural language to actionable search queries

#### 🔍 Smart Input Detection
```python
def is_repo_id_format(text: str) -> bool:
    # Detects if input contains repository IDs (with /) vs keywords
    lines = [line.strip() for line in re.split(r'[\n,]+', text) if line.strip()]
    slash_count = sum(1 for line in lines if '/' in line)
    return slash_count >= len(lines) * 0.5
```

#### 🏆 LLM-Powered Repository Ranking
- **Model**: `Orion-zhen/Qwen2.5-Coder-7B-Instruct-AWQ`
- **Criteria**: Requirements matching, strengths, relevance rating, speciality alignment
- **Output**: JSON-formatted repository rankings

#### 📊 Analysis Pipeline
1. **Download**: Repository files (`.py`, `.md`, `.txt`)
2. **Combine**: Merge files into single analyzable document
3. **Analyze**: LLM evaluation for strengths, weaknesses, specialities
4. **Rank**: User requirement-based relevance scoring
5. **Select**: Top 3 most relevant repositories

### Data Flow

```mermaid
graph TD
    A[User Input] --> B{Input Type?}
    B -->|Keywords| C[Repository Search]
    B -->|Repo IDs| D[Direct Processing]
    C --> E[Repository List]
    D --> E
    E --> F[Download & Analyze]
    F --> G[LLM Evaluation]
    G --> H[Ranking & Selection]
    H --> I[Results Display]
    I --> J[Repository Explorer]
```

### File Structure

```
📦 Agentic_HF_Analyzer/
├── 📄 app.py                    # Main application
├── 📄 analyzer.py               # Repository analysis logic
├── 📄 hf_utils.py              # Hugging Face utilities
├── 📄 chatbot_page.py          # AI assistant functionality
├── 📄 repo_explorer.py         # Repository exploration
├── 📄 requirements.txt         # Python dependencies
├── 📄 README.md               # Documentation
├── 📄 repo_ids.csv            # Analysis results storage
└── 📁 repo_files/             # Temporary repository downloads
```

### Dependencies

```
gradio>=4.0.0          # Web interface framework
pandas>=1.5.0          # Data manipulation
regex>=2022.0.0        # Advanced regex operations
openai>=1.0.0          # LLM API access
huggingface_hub>=0.16.0 # HF repository access
requests>=2.28.0       # HTTP requests
```

### Environment Variables

| Variable | Description | Required |
|----------|-------------|----------|
| `modal_api` | OpenAI API key for LLM analysis | ✅ |
| `base_url` | OpenAI API base URL | ✅ |

### LLM Integration

#### Analysis Prompt Structure
```python
ANALYSIS_PROMPT = """
Analyze this repository and provide:
1. Strengths and capabilities
2. Potential weaknesses or limitations  
3. Primary speciality/use case
4. Relevance rating for: {user_requirements}

Return valid JSON with: strength, weaknesses, speciality, relevance rating
"""
```

#### Repository Ranking System
- **Input**: User requirements + repository analysis data
- **Processing**: LLM evaluates relevance and ranks repositories
- **Output**: Top 3 most relevant repositories in order

### UI Components

#### Modern Design Features
- **Gradient Backgrounds**: Linear gradients for visual appeal
- **Glassmorphism**: Backdrop blur effects for modern look
- **Responsive Layout**: Adaptive to different screen sizes
- **Interactive Elements**: Hover effects and smooth transitions
- **Modal System**: Repository action selection popups

#### Tab Organization
1. **🤖 AI Assistant**: Conversation-based discovery
2. **📝 Smart Search**: Direct input processing
3. **🔬 Analysis & Results**: Comprehensive analysis display
4. **🔍 Repo Explorer**: Interactive repository exploration

### Advanced Features

#### Auto-Navigation
- Automatic tab switching based on workflow state
- Smooth scrolling to top on tab changes
- Progressive disclosure of information

#### Error Handling
- Graceful fallbacks for LLM failures
- CSV update retry mechanisms
- User-friendly error messages

#### Performance Optimizations
- Parallel processing for multiple repositories
- Progress tracking for long operations
- Efficient file caching and cleanup

## 🔧 Configuration

### Customizing Analysis
- Modify `CHATBOT_SYSTEM_PROMPT` for different assistant behavior
- Adjust repository search limits in `search_top_spaces()`
- Configure analysis criteria in `get_top_relevant_repos()`

### Adding File Types
```python
# In analyzer.py
download_filtered_space_files(
    repo_id, 
    local_dir="repo_files", 
    file_extensions=['.py', '.md', '.txt', '.js', '.ts']  # Add more
)
```

## 🤝 Contributing

1. Fork the repository
2. Create a feature branch
3. Implement your changes
4. Add tests if applicable
5. Submit a pull request

## 📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

## 🙏 Acknowledgments

- **Gradio**: For the amazing web interface framework
- **Hugging Face**: For the incredible repository ecosystem
- **OpenAI**: For powerful language model capabilities

---

<div align="center">
  <p>Built with ❤️ for the open source community</p>
  <p>🚀 Happy repository hunting! 🚀</p>
</div>