Spaces:

damoojeje
/

SmartManuals-AI

Running

App Files Files Community

SmartManuals-AI / README.md

damoojeje

Update README.md

d70ec6f verified 21 days ago

preview code

raw

history blame contribute delete

3.47 kB

	---
	title: SmartManuals-AI
	emoji: 🧠
	colorFrom: indigo
	colorTo: blue
	sdk: gradio
	sdk_version: 5.30.0
	app_file: app.py
	pinned: false
	license: apache-2.0
	tags:
	- RAG
	- LLM
	- Chroma
	- Gradio
	- OCR
	- HuggingFace
	- PDF
	- Word
	- SemanticSearch
	- SmartManualsAI
	---

	# ✅ SmartManuals-AI for Hugging Face Spaces

	SmartManuals-AI is a local-first document QA system that uses retrieval-augmented generation (RAG), OCR, and semantic embeddings to answer technical questions from equipment manuals, service guides, and parts catalogs.

	This app is optimized for Hugging Face Spaces and requires no user upload — just preload your manuals in a `Manuals/` folder.

	---

	## 🔧 Features

	- 🧠 Ask natural-language questions against your own manuals
	- 📄 Supports both PDF and Word (.docx) documents
	- 🔍 Uses `sentence-transformers` for semantic search
	- 🗃️ Indexes chunks in ChromaDB (stored locally)
	- 💬 Generates answers via Hugging Face models (default: Meta LLaMA 3.1 8B Instruct)
	- 🖥️ Clean Gradio interface for querying

	---

	## 📁 Folder Structure

	```
	SmartManuals-AI/
	├── app.py # Main Hugging Face app
	├── Manuals/ # Place your PDF and DOCX manuals here
	│ ├── OM_Treadmill.pdf
	│ └── Parts_Bike.docx
	├── chroma_store/ # Vector database (auto-generated)
	├── requirements.txt # Dependencies
	└── README.md # This file
	```

	---

	## 🚀 Usage on Hugging Face Spaces

	### 🔐 Environment Variable

	Add this secret in your Space settings:

	\| Name \| Value \|
	\|-----------\|----------------------\|
	\| `HF_TOKEN` \| Your Hugging Face token \|

	> Note: You must accept model licenses on [Hugging Face Hub](https://huggingface.co/meta-llama) before using gated models like `Llama-3.1-8B-Instruct`.

	---

	### 📤 Uploading Manuals

	- Upload your PDF and Word documents directly to the `Manuals/` folder in your Space repository.
	- No need for file uploads via the interface.

	---

	### 🧠 How It Works

	- On app startup:
	- Text is extracted from PDFs (with OCR fallback) and `.docx` Word files
	- Sentences are cleaned, chunked, and embedded with `all-MiniLM-L6-v2`
	- Chunks are stored in a local ChromaDB vector database

	- At query time:
	- Your question is embedded and semantically compared against chunks
	- The most relevant chunks are passed to the LLM
	- The LLM (LLaMA 3.1) generates a focused answer from context only

	---

	## 🤖 Default Model

	- This app uses: `meta-llama/Llama-3.1-8B-Instruct`
	- More models are supported behind-the-scenes (e.g. Mistral, Gemma)
	- No need to manually pick models, doc types, or categories

	---

	## 🧩 Supported File Types

	- ✅ PDF (`.pdf`) with OCR fallback using Tesseract
	- ✅ Word Documents (`.docx`)

	---

	## 🧪 Local Development

	Clone and run locally:

	```bash
	git clone https://github.com/damoojeje/SmartManuals-AI.git
	cd SmartManuals-AI
	pip install -r requirements.txt
	python app.py
	```

	> 📁 Place your manuals inside the `Manuals/` directory before running.

	---

	## 👨🏽‍💻 Created By

	Damilare Eniolabi
	📧 [damilareeniolabi@gmail.com](mailto:damilareeniolabi@gmail.com)
	🔗 GitHub: [@damoojeje](https://github.com/damoojeje)

	---

	## 🔖 Tags

	`RAG` · `LLM` · `Gradio` · `ChromaDB` · `OCR` · `SemanticSearch` · `PDF` · `Word` · `SmartManualsAI` · `EquipmentQA`