SmartManuals-AI / README.md
damoojeje's picture
Update README.md
d70ec6f verified
---
title: SmartManuals-AI
emoji: 🧠
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 5.30.0
app_file: app.py
pinned: false
license: apache-2.0
tags:
- RAG
- LLM
- Chroma
- Gradio
- OCR
- HuggingFace
- PDF
- Word
- SemanticSearch
- SmartManualsAI
---
# βœ… SmartManuals-AI for Hugging Face Spaces
SmartManuals-AI is a local-first document QA system that uses **retrieval-augmented generation (RAG)**, **OCR**, and **semantic embeddings** to answer technical questions from equipment manuals, service guides, and parts catalogs.
This app is optimized for Hugging Face Spaces and **requires no user upload** β€” just preload your manuals in a `Manuals/` folder.
---
## πŸ”§ Features
- 🧠 Ask **natural-language questions** against your own manuals
- πŸ“„ Supports both **PDF** and **Word (.docx)** documents
- πŸ” Uses `sentence-transformers` for semantic search
- πŸ—ƒοΈ Indexes chunks in **ChromaDB** (stored locally)
- πŸ’¬ Generates answers via Hugging Face models (default: **Meta LLaMA 3.1 8B Instruct**)
- πŸ–₯️ Clean **Gradio interface** for querying
---
## πŸ“ Folder Structure
```
SmartManuals-AI/
β”œβ”€β”€ app.py # Main Hugging Face app
β”œβ”€β”€ Manuals/ # Place your PDF and DOCX manuals here
β”‚ β”œβ”€β”€ OM_Treadmill.pdf
β”‚ └── Parts_Bike.docx
β”œβ”€β”€ chroma_store/ # Vector database (auto-generated)
β”œβ”€β”€ requirements.txt # Dependencies
└── README.md # This file
```
---
## πŸš€ Usage on Hugging Face Spaces
### πŸ” Environment Variable
Add this secret in your Space settings:
| Name | Value |
|-----------|----------------------|
| `HF_TOKEN` | Your Hugging Face token |
> **Note**: You must accept model licenses on [Hugging Face Hub](https://huggingface.co/meta-llama) before using gated models like `Llama-3.1-8B-Instruct`.
---
### πŸ“€ Uploading Manuals
- Upload your **PDF and Word documents** directly to the `Manuals/` folder in your Space repository.
- No need for file uploads via the interface.
---
### 🧠 How It Works
- On app startup:
- Text is extracted from **PDFs (with OCR fallback)** and `.docx` Word files
- Sentences are cleaned, chunked, and embedded with `all-MiniLM-L6-v2`
- Chunks are stored in a local **ChromaDB vector database**
- At query time:
- Your question is embedded and semantically compared against chunks
- The most relevant chunks are passed to the LLM
- The **LLM (LLaMA 3.1)** generates a focused answer from context only
---
## πŸ€– Default Model
- This app uses: **`meta-llama/Llama-3.1-8B-Instruct`**
- More models are supported behind-the-scenes (e.g. Mistral, Gemma)
- **No need to manually pick** models, doc types, or categories
---
## 🧩 Supported File Types
- βœ… PDF (`.pdf`) with OCR fallback using Tesseract
- βœ… Word Documents (`.docx`)
---
## πŸ§ͺ Local Development
Clone and run locally:
```bash
git clone https://github.com/damoojeje/SmartManuals-AI.git
cd SmartManuals-AI
pip install -r requirements.txt
python app.py
```
> πŸ“ Place your manuals inside the `Manuals/` directory before running.
---
## πŸ‘¨πŸ½β€πŸ’» Created By
**Damilare Eniolabi**
πŸ“§ [damilareeniolabi@gmail.com](mailto:damilareeniolabi@gmail.com)
πŸ”— GitHub: [@damoojeje](https://github.com/damoojeje)
---
## πŸ”– Tags
`RAG` Β· `LLM` Β· `Gradio` Β· `ChromaDB` Β· `OCR` Β· `SemanticSearch` Β· `PDF` Β· `Word` Β· `SmartManualsAI` Β· `EquipmentQA`