Spaces:

tbdavid2019
/

web-scraper-and-chatbot-rag

Running

App Files Files Community

web-scraper-and-chatbot-rag / README.md

david

說明2

6596a21 3 months ago

preview code

raw

history blame contribute delete

1.71 kB

	---
	title: Web Scraper & Q&A Chatbot with RAG
	emoji: 🏃
	colorFrom: blue
	colorTo: yellow
	sdk: streamlit
	sdk_version: 1.43.1
	app_file: app.py
	pinned: false
	short_description: 使用RAG的AI爬蟲對話機器人
	---

	Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference


	---


	## 繁體中文說明

	這是一個結合「網頁爬蟲」與「RAG（檢索增強生成）」的 AI 對話機器人專案。
	- 你可以輸入任意網址，系統會自動爬取該網頁（可設定多層遞迴與同網域限制），將內容分段並向量化存入本地資料庫。
	- 之後可直接用中文或英文提問，系統會根據爬取內容檢索最相關段落，並用 Gemini LLM 生成回覆。
	- 支援中文語意檢索，適合知識管理、網站摘要、FAQ 應用。

	### 安裝與執行
	1. 安裝依賴：`pip install -r requirements.txt`
	2. 複製 `example.env` 為 `.env` 並填入你的 Gemini API 金鑰
	3. 執行：`streamlit run app.py`

	---

	## English Description

	This project is a Web Scraper & RAG-based AI Chatbot.
	- Enter any website URL, and the system will crawl the page (with configurable recursion depth and same-domain restriction), split and vectorize the content, and store it in a local database.
	- You can then ask questions in Chinese or English. The system retrieves the most relevant content and generates answers using Gemini LLM.
	- Optimized for Chinese semantic search, suitable for knowledge management, website summarization, and FAQ scenarios.

	### Installation & Usage
	1. Install dependencies: `pip install -r requirements.txt`
	2. Copy `example.env` to `.env` and fill in your Gemini API key
	3. Run: `streamlit run app.py`