File size: 5,997 Bytes
a928595
6cea573
 
7044586
 
a928595
7044586
a928595
 
 
 
 
6cea573
7044586
6cea573
7044586
6cea573
7044586
6cea573
 
 
 
d60989a
6cea573
 
 
d60989a
6cea573
d60989a
6cea573
d60989a
6cea573
 
 
 
 
 
 
 
 
 
 
 
 
 
d60989a
 
6cea573
 
 
d60989a
6cea573
 
 
 
 
 
d60989a
6cea573
d60989a
6cea573
 
 
 
d60989a
6cea573
d60989a
 
6cea573
 
 
 
 
d60989a
 
e84a893
 
6cea573
d60989a
e84a893
 
 
 
6cea573
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7044586
6cea573
 
 
 
7044586
6cea573
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7044586
6cea573
7044586
a928595
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
---
title: AI-driven Candidate Matcher
emoji: ๐ŸŽฏ
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.31.0
app_file: app.py
pinned: false
license: mit
---

# AI-driven Candidate Matcher

An advanced AI-powered resume screening application that uses a sophisticated 5-stage pipeline to efficiently match resumes against job descriptions. Built with state-of-the-art machine learning models for accurate candidate ranking.

## ๐Ÿš€ Features

- **5-Stage Advanced Pipeline**: Multi-layered approach combining semantic similarity, keyword matching, and AI intent analysis
- **State-of-the-Art Models**: Uses BAAI/bge-large-en-v1.5 embeddings and Cross-Encoder re-ranking
- **FAISS Integration**: Lightning-fast similarity search for large resume collections
- **AI Intent Analysis**: Qwen3-1.7B model analyzes candidate job-seeking intent
- **Multi-format Support**: Processes PDFs, DOCX, TXT, and CSV files
- **Interactive Visualizations**: Comprehensive score breakdowns and comparative analysis
- **Batch Processing**: Upload and analyze multiple resumes simultaneously
- **Export Results**: Download detailed analysis as CSV

## ๐Ÿ”ง How It Works

### 5-Stage Advanced Pipeline

1. **FAISS Recall (Top 50)**: Initial semantic similarity search using BAAI/bge-large-en-v1.5 embeddings
2. **Cross-Encoder Re-ranking (Top 20)**: Deep semantic relevance scoring with ms-marco-MiniLM-L6-v2
3. **BM25 Keyword Matching**: Traditional keyword-based scoring for skill alignment
4. **LLM Intent Analysis**: Qwen3-1.7B analyzes candidate suitability and job-seeking intent
5. **Combined Scoring**: Weighted combination of all scores for final ranking

### Scoring Formula
**Final Score = Cross-Encoder (0-0.7) + BM25 (0.1-0.2) + Intent (0-0.1)**

### Input & Output
- **Input**: Job description + Resume files (PDF/DOCX/TXT/CSV)
- **Output**: Ranked candidates with detailed score breakdowns and AI explanations

## ๐Ÿค– Technical Details

### Models Used
- **BAAI/bge-large-en-v1.5**: Advanced embedding model for semantic similarity
- **Cross-Encoder/ms-marco-MiniLM-L6-v2**: Deep re-ranking for relevance scoring
- **Qwen3-1.7B**: Large language model for intent analysis and explanations

### Key Libraries
- **FAISS**: Facebook AI Similarity Search for efficient vector operations
- **Sentence Transformers**: For embedding generation and cross-encoding
- **rank_bm25**: BM25 algorithm implementation for keyword matching
- **Streamlit**: Interactive web interface
- **PyTorch**: Deep learning framework

## ๐Ÿ“Š Configuration Options

The sidebar provides several customization options:
- **Results Count**: Choose how many top candidates to display (1-5)
- **Pipeline Visualization**: Real-time progress through the 5-stage pipeline
- **Score Breakdown**: Detailed view of individual scoring components

## ๐Ÿš€ Getting Started

### Online Usage
1. Visit the application
2. Enter a comprehensive job description
3. Upload resume files or CSV dataset
4. Click "Advanced Pipeline Analysis"
5. Review ranked candidates with detailed insights

### Local Installation

```bash
git clone <repository-url>
cd Resume_Screener_and_Skill_Extractor
pip install -r requirements.txt
streamlit run app.py
```

### Requirements
- Python 3.8+
- CUDA-compatible GPU (optional, for faster processing)
- Minimum 8GB RAM recommended

## ๐Ÿ“‹ Supported File Formats

- **PDF**: Extracted using pdfplumber with PyPDF2 fallback
- **DOCX**: Microsoft Word documents
- **TXT**: Plain text files
- **CSV**: Structured datasets with resume text columns

## ๐Ÿ”’ Privacy & Security

### Data Privacy Statement

**Your privacy is our top priority. We are committed to protecting the confidentiality of all resume data processed through this application.**

#### Data Handling
- **No Data Storage**: Resume content is processed in memory only and never stored permanently
- **Session-Based**: All data is cleared when you close the browser or reset the application
- **Local Processing**: All AI analysis happens locally within the application environment
- **No External Transmission**: Resume data is never sent to external services or third parties

#### Security Measures
- **Temporary Files**: Uploaded files are processed in secure temporary locations and immediately deleted
- **Memory Management**: Automatic cleanup of resume data from system memory
- **No Logging**: Resume content is never logged or cached
- **Secure Processing**: All text extraction and analysis occurs within isolated processing environments

#### User Control
- **Clear Data Options**: Multiple options to clear resume data and free memory
- **Session Management**: Complete control over when and how data is processed
- **Transparent Processing**: Full visibility into what data is being analyzed

**We recommend reviewing your organization's data handling policies before uploading sensitive resume information.**

## ๐Ÿ“ˆ Performance Metrics

- **Accuracy**: Advanced multi-stage pipeline ensures high-quality candidate ranking
- **Speed**: FAISS indexing enables sub-second search across thousands of resumes
- **Scalability**: Efficient memory management for large resume datasets
- **Reliability**: Fallback models ensure consistent operation

## ๐Ÿ”ฎ Future Enhancements

- **Multi-language Support**: Extend to non-English resumes and job descriptions
- **Custom Scoring Weights**: User-configurable importance of different scoring components
- **Advanced Skill Extraction**: Enhanced NLP for technical skill identification
- **Integration APIs**: Connect with ATS and HR management systems
- **Batch Job Processing**: Queue-based processing for large-scale screening

## ๐Ÿ“„ License

MIT License - See LICENSE file for details

## ๐Ÿค Contributing

Contributions are welcome! Please feel free to submit pull requests or open issues for bugs and feature requests.

---

*Built with โค๏ธ using Streamlit, Transformers, and FAISS*

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference