File size: 5,997 Bytes
a928595 6cea573 7044586 a928595 7044586 a928595 6cea573 7044586 6cea573 7044586 6cea573 7044586 6cea573 d60989a 6cea573 d60989a 6cea573 d60989a 6cea573 d60989a 6cea573 d60989a 6cea573 d60989a 6cea573 d60989a 6cea573 d60989a 6cea573 d60989a 6cea573 d60989a 6cea573 d60989a e84a893 6cea573 d60989a e84a893 6cea573 7044586 6cea573 7044586 6cea573 7044586 6cea573 7044586 a928595 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 |
---
title: AI-driven Candidate Matcher
emoji: ๐ฏ
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.31.0
app_file: app.py
pinned: false
license: mit
---
# AI-driven Candidate Matcher
An advanced AI-powered resume screening application that uses a sophisticated 5-stage pipeline to efficiently match resumes against job descriptions. Built with state-of-the-art machine learning models for accurate candidate ranking.
## ๐ Features
- **5-Stage Advanced Pipeline**: Multi-layered approach combining semantic similarity, keyword matching, and AI intent analysis
- **State-of-the-Art Models**: Uses BAAI/bge-large-en-v1.5 embeddings and Cross-Encoder re-ranking
- **FAISS Integration**: Lightning-fast similarity search for large resume collections
- **AI Intent Analysis**: Qwen3-1.7B model analyzes candidate job-seeking intent
- **Multi-format Support**: Processes PDFs, DOCX, TXT, and CSV files
- **Interactive Visualizations**: Comprehensive score breakdowns and comparative analysis
- **Batch Processing**: Upload and analyze multiple resumes simultaneously
- **Export Results**: Download detailed analysis as CSV
## ๐ง How It Works
### 5-Stage Advanced Pipeline
1. **FAISS Recall (Top 50)**: Initial semantic similarity search using BAAI/bge-large-en-v1.5 embeddings
2. **Cross-Encoder Re-ranking (Top 20)**: Deep semantic relevance scoring with ms-marco-MiniLM-L6-v2
3. **BM25 Keyword Matching**: Traditional keyword-based scoring for skill alignment
4. **LLM Intent Analysis**: Qwen3-1.7B analyzes candidate suitability and job-seeking intent
5. **Combined Scoring**: Weighted combination of all scores for final ranking
### Scoring Formula
**Final Score = Cross-Encoder (0-0.7) + BM25 (0.1-0.2) + Intent (0-0.1)**
### Input & Output
- **Input**: Job description + Resume files (PDF/DOCX/TXT/CSV)
- **Output**: Ranked candidates with detailed score breakdowns and AI explanations
## ๐ค Technical Details
### Models Used
- **BAAI/bge-large-en-v1.5**: Advanced embedding model for semantic similarity
- **Cross-Encoder/ms-marco-MiniLM-L6-v2**: Deep re-ranking for relevance scoring
- **Qwen3-1.7B**: Large language model for intent analysis and explanations
### Key Libraries
- **FAISS**: Facebook AI Similarity Search for efficient vector operations
- **Sentence Transformers**: For embedding generation and cross-encoding
- **rank_bm25**: BM25 algorithm implementation for keyword matching
- **Streamlit**: Interactive web interface
- **PyTorch**: Deep learning framework
## ๐ Configuration Options
The sidebar provides several customization options:
- **Results Count**: Choose how many top candidates to display (1-5)
- **Pipeline Visualization**: Real-time progress through the 5-stage pipeline
- **Score Breakdown**: Detailed view of individual scoring components
## ๐ Getting Started
### Online Usage
1. Visit the application
2. Enter a comprehensive job description
3. Upload resume files or CSV dataset
4. Click "Advanced Pipeline Analysis"
5. Review ranked candidates with detailed insights
### Local Installation
```bash
git clone <repository-url>
cd Resume_Screener_and_Skill_Extractor
pip install -r requirements.txt
streamlit run app.py
```
### Requirements
- Python 3.8+
- CUDA-compatible GPU (optional, for faster processing)
- Minimum 8GB RAM recommended
## ๐ Supported File Formats
- **PDF**: Extracted using pdfplumber with PyPDF2 fallback
- **DOCX**: Microsoft Word documents
- **TXT**: Plain text files
- **CSV**: Structured datasets with resume text columns
## ๐ Privacy & Security
### Data Privacy Statement
**Your privacy is our top priority. We are committed to protecting the confidentiality of all resume data processed through this application.**
#### Data Handling
- **No Data Storage**: Resume content is processed in memory only and never stored permanently
- **Session-Based**: All data is cleared when you close the browser or reset the application
- **Local Processing**: All AI analysis happens locally within the application environment
- **No External Transmission**: Resume data is never sent to external services or third parties
#### Security Measures
- **Temporary Files**: Uploaded files are processed in secure temporary locations and immediately deleted
- **Memory Management**: Automatic cleanup of resume data from system memory
- **No Logging**: Resume content is never logged or cached
- **Secure Processing**: All text extraction and analysis occurs within isolated processing environments
#### User Control
- **Clear Data Options**: Multiple options to clear resume data and free memory
- **Session Management**: Complete control over when and how data is processed
- **Transparent Processing**: Full visibility into what data is being analyzed
**We recommend reviewing your organization's data handling policies before uploading sensitive resume information.**
## ๐ Performance Metrics
- **Accuracy**: Advanced multi-stage pipeline ensures high-quality candidate ranking
- **Speed**: FAISS indexing enables sub-second search across thousands of resumes
- **Scalability**: Efficient memory management for large resume datasets
- **Reliability**: Fallback models ensure consistent operation
## ๐ฎ Future Enhancements
- **Multi-language Support**: Extend to non-English resumes and job descriptions
- **Custom Scoring Weights**: User-configurable importance of different scoring components
- **Advanced Skill Extraction**: Enhanced NLP for technical skill identification
- **Integration APIs**: Connect with ATS and HR management systems
- **Batch Job Processing**: Queue-based processing for large-scale screening
## ๐ License
MIT License - See LICENSE file for details
## ๐ค Contributing
Contributions are welcome! Please feel free to submit pull requests or open issues for bugs and feature requests.
---
*Built with โค๏ธ using Streamlit, Transformers, and FAISS*
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|