Spaces:

awacke1
/

GPT-4o-omni-text-audio-image-video

Running

App Files Files Community

awacke1 commited on 18 days ago

Commit

4bda7c1

verified ·

1 Parent(s): 90bb105

Update README.md

Browse files

Files changed (1) hide show

README.md +58 -0

README.md CHANGED Viewed

@@ -9,6 +9,64 @@ app_file: app.py
 pinned: false
 license: mit
 ---
 # 7/9 - evaluate new GPT models
 GPT-4o Documentation:  https://cookbook.openai.com/examples/gpt4o/introduction_to_gpt4o

 pinned: false
 license: mit
 ---
+| 🧩 **Category**                       | **Info**                                                                                                                                                     |
+|---------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| 🛠️ **Libraries & Modules Used**       | **Web & UI**: streamlit, streamlit.components.v1                                                                                                              |
+|                                       | **AI & API Integration**: openai, gradio_client                                                                                                                |
+|                                       | **File Handling**: base64, os, glob, zipfile, textract, PyPDF2                                                                                                 |
+|                                       | **Image & Video Processing**: cv2, moviepy, PIL                                                                                                                |
+|                                       | **Text Processing**: re, BeautifulSoup, pandas                                                                                                                 |
+|                                       | **Time & Date**: datetime, pytz                                                                                                                                 |
+|                                       | **Utilities & Concurrency**: concurrent.futures, ThreadPoolExecutor, tqdm                                                                                       |
+|                                       | **Speech & Audio**: audio_recorder_streamlit                                                                                                                   |
+| ⚙️ **App Configuration**              | **Site Name**: Scholarly-Article-Document-Search-With-Memory                                                                                                   |
+|                                       | **Page Title/Icon**: 🔬🧠ScienceBrain.AI, custom icon file (icons.ico)                                                                                          |
+|                                       | **Sidebar**: Save session checkbox (should_save)                                                                                                                |
+| 🗣️ **Core Functionalities**           | **Text Interaction**:                                                                                                                                          |
+|                                       | - Chat-based prompts using GPT-4o.                                                                                                                             |
+|                                       | - Saves conversations as Markdown files (md).                                                                                                                  |
+|                                       | - Speech Synthesis (HTML5)                                                                                                                                     |
+|                                       | - Embedded JavaScript function (SpeechSynthesis) to read aloud content.                                                                                        |
+|                                       | **Image Interaction**:                                                                                                                                         |
+|                                       | - Upload image, base64 encode, and analyze via GPT-4o.                                                                                                         |
+|                                       | - Results stored as Markdown (md) with filenames including prompts and timestamps.                                                                             |
+|                                       | **Audio Interaction**:                                                                                                                                         |
+|                                       | - Upload or record audio.                                                                                                                                      |
+|                                       | - Audio transcribed using Whisper, summarized or analyzed via GPT-4o, and responses stored as Markdown files.                                                  |
+|                                       | **Video Interaction**:                                                                                                                                         |
+|                                       | - Extracts frames and audio from videos.                                                                                                                       |
+|                                       | - Transcribes audio track, summarizes via GPT-4o.                                                                                                              |
+|                                       | - Markdown files created with summarized content.                                                                                                              |
+| 📚 **Advanced Document Handling**     | **Vector Stores & PDF Galleries**:                                                                                                                             |
+|                                       | - Allows upload of multiple PDF files, generating quizzes, summaries, or key facts.                                                                            |
+|                                       | - Vector stores created and managed for RAG querying.                                                                                                          |
+|                                       | - Arxiv scholarly search integration through Hugging Face Gradio API (awacke1/Arxiv-Paper-Search-And-QA-RAG-Pattern).                                           |
+|                                       | **RAG (Retrieval-Augmented Generation)**:                                                                                                                      |
+|                                       | - Performs semantic search on uploaded PDFs.                                                                                                                    |
+|                                       | - Evaluates retrieval performance metrics (recall@k, mrr, map).                                                                                                 |
+| 🗃️ **File Management**                | **Sidebar File Manager**:                                                                                                                                      |
+|                                       | - Filter files (.md, .pdf, .png, etc.).                                                                                                                        |
+|                                       | - Operations: View, Edit, Download, Run, Delete individual or bulk actions.                                                                                    |
+|                                       | - Zip download functionality for filtered files.                                                                                                               |
+| 🛠️ **Helper Functions & Utilities**   | **Filename Generation (generate_filename)**:                                                                                                                   |
+|                                       | - Date/time, prompt sanitized for filesystem.                                                                                                                  |
+|                                       | **File Saving (create_and_save_file)**:                                                                                                                        |
+|                                       | - Conditional file writing based on checkbox option.                                                                                                           |
+|                                       | **Concurrent PDF uploads**:                                                                                                                                   |
+|                                       | - Managed via ThreadPoolExecutor with progress tracking.                                                                                                       |
+| 🎨 **UI Enhancements**                | - Chat-style user-assistant interactions.                                                                                                                      |
+|                                       | - Streamlit's columns for organized interface and clean button prompts.                                                                                        |
+|                                       | - Dynamic HTML and JavaScript embeds for rich user interaction.                                                                                                |
+| 🌐 **Environmental Variables**        | - **Required variables**:                                                                                                                                     |
+|                                       | - API_KEY                                                                                                                                                      |
+|                                       | - HF_KEY (Hugging Face)                                                                                                                                        |
+|                                       | - OPENAI_API_KEY                                                                                                                                               |
+|                                       | - OPENAI_ORG_ID                                                                                                                                                |
+| 📝 **Note**                           | The app robustly integrates multimodal capabilities (text, audio, video, images) and leverages OpenAI and Hugging Face resources extensively to provide a rich, interactive AI-driven document management and analysis experience.             |
 # 7/9 - evaluate new GPT models
 GPT-4o Documentation:  https://cookbook.openai.com/examples/gpt4o/introduction_to_gpt4o