--- title: CV to CSV Extraction App emoji: 📄 colorFrom: blue colorTo: green sdk: gradio sdk_version: 5.29.0 app_file: app.py pinned: false --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference # CV to CSV Extraction App A Gradio application that extracts publications, talks, and other scholarly accomplishments from faculty CVs (PDFs) using Google's Gemini API. ## Features - Extract scholarly accomplishments from faculty CVs in PDF format - Categorize accomplishments into different types (books, journal articles, conference presentations, etc.) - Display results in a tabular format - Download results as CSV - Password protection using Hugging Face secrets ## Installation 1. Clone this repository: ``` git clone cd CV_to_CSV ``` 2. Install the required dependencies: ``` pip install -r requirements.txt ``` 3. Create a `.env` file in the root directory with your Google API key: ``` GOOGLE_API_KEY=your_google_api_key_here ``` ## Usage ### Running Locally 1. Run the application: ``` python cv_extraction_app.py ``` 2. Open your browser and navigate to `http://localhost:7860` 3. Enter the password (if set in the environment variable `APP_PASSWORD`) 4. Upload one or more faculty CV PDFs and click "Extract Accomplishments" 5. View the extracted accomplishments and download as CSV if desired ### Deploying on Hugging Face Spaces 1. Create a new Space on Hugging Face Spaces with the Gradio SDK 2. Upload your code to the Space 3. Set up the following secrets in your Space settings: - `GOOGLE_API_KEY`: Your Google Gemini API key - `APP_PASSWORD`: The password you want to use for app authentication ## How It Works 1. **Authentication**: The app checks if the provided password matches the one stored in the environment variable `APP_PASSWORD` 2. **PDF Processing**: The app extracts text from uploaded PDF files using PyPDF2 3. **LLM Processing**: The extracted text is sent to Google's Gemini API to identify faculty names and extract scholarly accomplishments 4. **Categorization**: Accomplishments are categorized into different types based on a decision tree approach 5. **Results Display**: The extracted accomplishments are displayed in a tabular format and can be downloaded as CSV ## Customization ### Changing the Password To change the password, update the `APP_PASSWORD` environment variable: - Locally: Modify the `.env` file - On Hugging Face Spaces: Update the secret in the Space settings ### Modifying Categories To modify the categories of scholarly accomplishments, edit the `MAIN_CATEGORIES` and `SCHOLARLY_WORK_TYPES` lists in `cv_extraction_app.py`. ## Troubleshooting - **API Key Issues**: Ensure your Google API key is correctly set in the environment variables - **PDF Extraction Errors**: Some PDFs may be password-protected or have security settings that prevent text extraction - **LLM Processing Errors**: If the LLM fails to extract accomplishments, try adjusting the prompt or model parameters ## License This project is licensed under the MIT License.