Spaces:

jacob-c
/

Resume_Screener_and_Skill_Extractor

Paused

File size: 5,997 Bytes

---
title: AI-driven Candidate Matcher
emoji: 🎯
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.31.0
app_file: app.py
pinned: false
license: mit
---

# AI-driven Candidate Matcher

An advanced AI-powered resume screening application that uses a sophisticated 5-stage pipeline to efficiently match resumes against job descriptions. Built with state-of-the-art machine learning models for accurate candidate ranking.

## 🚀 Features

- **5-Stage Advanced Pipeline**: Multi-layered approach combining semantic similarity, keyword matching, and AI intent analysis
- **State-of-the-Art Models**: Uses BAAI/bge-large-en-v1.5 embeddings and Cross-Encoder re-ranking
- **FAISS Integration**: Lightning-fast similarity search for large resume collections
- **AI Intent Analysis**: Qwen3-1.7B model analyzes candidate job-seeking intent
- **Multi-format Support**: Processes PDFs, DOCX, TXT, and CSV files
- **Interactive Visualizations**: Comprehensive score breakdowns and comparative analysis
- **Batch Processing**: Upload and analyze multiple resumes simultaneously
- **Export Results**: Download detailed analysis as CSV

## 🔧 How It Works

### 5-Stage Advanced Pipeline

1. **FAISS Recall (Top 50)**: Initial semantic similarity search using BAAI/bge-large-en-v1.5 embeddings
2. **Cross-Encoder Re-ranking (Top 20)**: Deep semantic relevance scoring with ms-marco-MiniLM-L6-v2
3. **BM25 Keyword Matching**: Traditional keyword-based scoring for skill alignment
4. **LLM Intent Analysis**: Qwen3-1.7B analyzes candidate suitability and job-seeking intent
5. **Combined Scoring**: Weighted combination of all scores for final ranking

### Scoring Formula
**Final Score = Cross-Encoder (0-0.7) + BM25 (0.1-0.2) + Intent (0-0.1)**

### Input & Output
- **Input**: Job description + Resume files (PDF/DOCX/TXT/CSV)
- **Output**: Ranked candidates with detailed score breakdowns and AI explanations

## 🤖 Technical Details

### Models Used
- **BAAI/bge-large-en-v1.5**: Advanced embedding model for semantic similarity
- **Cross-Encoder/ms-marco-MiniLM-L6-v2**: Deep re-ranking for relevance scoring
- **Qwen3-1.7B**: Large language model for intent analysis and explanations

### Key Libraries
- **FAISS**: Facebook AI Similarity Search for efficient vector operations
- **Sentence Transformers**: For embedding generation and cross-encoding
- **rank_bm25**: BM25 algorithm implementation for keyword matching
- **Streamlit**: Interactive web interface
- **PyTorch**: Deep learning framework

## 📊 Configuration Options

The sidebar provides several customization options:
- **Results Count**: Choose how many top candidates to display (1-5)
- **Pipeline Visualization**: Real-time progress through the 5-stage pipeline
- **Score Breakdown**: Detailed view of individual scoring components

## 🚀 Getting Started

### Online Usage
1. Visit the application
2. Enter a comprehensive job description
3. Upload resume files or CSV dataset
4. Click "Advanced Pipeline Analysis"
5. Review ranked candidates with detailed insights

### Local Installation

```bash
git clone <repository-url>
cd Resume_Screener_and_Skill_Extractor
pip install -r requirements.txt
streamlit run app.py
```

### Requirements
- Python 3.8+
- CUDA-compatible GPU (optional, for faster processing)
- Minimum 8GB RAM recommended

## 📋 Supported File Formats

- **PDF**: Extracted using pdfplumber with PyPDF2 fallback
- **DOCX**: Microsoft Word documents
- **TXT**: Plain text files
- **CSV**: Structured datasets with resume text columns

## 🔒 Privacy & Security

### Data Privacy Statement

**Your privacy is our top priority. We are committed to protecting the confidentiality of all resume data processed through this application.**

#### Data Handling
- **No Data Storage**: Resume content is processed in memory only and never stored permanently
- **Session-Based**: All data is cleared when you close the browser or reset the application
- **Local Processing**: All AI analysis happens locally within the application environment
- **No External Transmission**: Resume data is never sent to external services or third parties

#### Security Measures
- **Temporary Files**: Uploaded files are processed in secure temporary locations and immediately deleted
- **Memory Management**: Automatic cleanup of resume data from system memory
- **No Logging**: Resume content is never logged or cached
- **Secure Processing**: All text extraction and analysis occurs within isolated processing environments

#### User Control
- **Clear Data Options**: Multiple options to clear resume data and free memory
- **Session Management**: Complete control over when and how data is processed
- **Transparent Processing**: Full visibility into what data is being analyzed

**We recommend reviewing your organization's data handling policies before uploading sensitive resume information.**

## 📈 Performance Metrics

- **Accuracy**: Advanced multi-stage pipeline ensures high-quality candidate ranking
- **Speed**: FAISS indexing enables sub-second search across thousands of resumes
- **Scalability**: Efficient memory management for large resume datasets
- **Reliability**: Fallback models ensure consistent operation

## 🔮 Future Enhancements

- **Multi-language Support**: Extend to non-English resumes and job descriptions
- **Custom Scoring Weights**: User-configurable importance of different scoring components
- **Advanced Skill Extraction**: Enhanced NLP for technical skill identification
- **Integration APIs**: Connect with ATS and HR management systems
- **Batch Job Processing**: Queue-based processing for large-scale screening

## 📄 License

MIT License - See LICENSE file for details

## 🤝 Contributing

Contributions are welcome! Please feel free to submit pull requests or open issues for bugs and feature requests.

---

*Built with ❤️ using Streamlit, Transformers, and FAISS*

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference