Spaces:

can-org
/

AI-Checker

Running

App Files Files Community

AI-Checker / readme.md

Pujan-Dev

feat: added the proper readme.md

87a735b 2 months ago

preview code

raw

history blame

10.2 kB

	# 🚀 FastAPI AI Text Detector

	A production-ready FastAPI application for AI-generated vs. human-written text detection in both English and Nepali. Models are auto-managed and endpoints are secured via Bearer token authentication.

	---

	## 🏗️ Project Structure

	```
	├── app.py # Main FastAPI app entrypoint
	├── config.py # Configuration loader (.env, settings)
	├── features/
	│ ├── text_classifier/ # English (GPT-2) classifier
	│ │ ├── controller.py
	│ │ ├── inferencer.py
	│ │ ├── model_loader.py
	│ │ ├── preprocess.py
	│ │ └── routes.py
	│ └── nepali_text_classifier/ # Nepali (sentencepiece) classifier
	│ ├── controller.py
	│ ├── inferencer.py
	│ ├── model_loader.py
	│ ├── preprocess.py
	│ └── routes.py
	├── np_text_model/ # Nepali model artifacts (auto-downloaded)
	│ ├── classifier/
	│ │ └── sentencepiece.bpe.model
	│ └── model_95_acc.pth
	├── models/ # English GPT-2 model/tokenizer (auto-downloaded)
	│ ├── merges.txt
	│ ├── tokenizer.json
	│ └── model_weights.pth
	├── Dockerfile # Container build config
	├── Procfile # Deployment entrypoint (for PaaS)
	├── requirements.txt # Python dependencies
	├── README.md # This file
	└── .env # Secret token(s), environment config
	```

	---

	### 🌟 Key Files and Their Roles

	- `app.py`: Entry point initializing FastAPI app and routes.
	- `Procfile`: Tells Railway (or similar platforms) how to run the program.
	- `requirements.txt`: Tracks all Python dependencies for the project.
	- `__init__.py`: Package initializer for the root module and submodules.
	- `features/text_classifier/`
	- `controller.py`: Handles logic between routes and the model.
	- `inferencer.py`: Runs inference and returns predictions as well as file system
	utilities.
	- `features/NP/`
	- `controller.py`: Handles logic between routes and the model.
	- `inferencer.py`: Runs inference and returns predictions as well as file system
	utilities.
	- `model_loader.py`: Loads the ML model and tokenizer.
	- `preprocess.py`: Prepares input text for the model.
	- `routes.py`: Defines API routes for text classification.

	---

	## ⚙️ Setup & Installation

	1. Clone the repository

	```bash
	git clone https://github.com/cyberalertnepal/aiapi
	cd aiapi
	```

	2. Install dependencies

	```bash
	pip install -r requirements.txt
	```

	3. Configure secrets

	- Create a `.env` file at the project root:

	```env
	SECRET_TOKEN=your_secret_token_here
	```

	- All endpoints require `Authorization: Bearer <SECRET_TOKEN>`

	---

	## 🚦 Running the API Server

	```bash
	uvicorn app:app --host 0.0.0.0 --port 8000
	```

	---

	## 🔒 Security: Bearer Token Auth

	All endpoints require authentication via Bearer token:

	- Set `SECRET_TOKEN` in `.env`
	- Add header: `Authorization: Bearer <SECRET_TOKEN>`

	Unauthorized requests receive `403 Forbidden`.

	---

	## 🧩 API Endpoints

	### English (GPT-2) - `/text/`

	\| Endpoint \| Method \| Description \|
	\| --------------------------------- \| ------ \| ----------------------------------------- \|
	\| `/text/analyse` \| POST \| Classify raw English text \|
	\| `/text/analyse-sentences` \| POST \| Sentence-by-sentence breakdown \|
	\| `/text/analyse-sentance-file` \| POST \| Upload file, per-sentence breakdown \|
	\| `/text/upload` \| POST \| Upload file for overall classification \|
	\| `/text/health` \| GET \| Health check \|

	#### Example: Classify English text

	```bash
	curl -X POST http://localhost:8000/text/analyse \
	-H "Authorization: Bearer <SECRET_TOKEN>" \
	-H "Content-Type: application/json" \
	-d '{"text": "This is a sample text for analysis."}'
	```

	Response:
	```json
	{
	"result": "AI-generated",
	"perplexity": 55.67,
	"ai_likelihood": 66.6
	}
	```

	#### Example: File upload

	```bash
	curl -X POST http://localhost:8000/text/upload \
	-H "Authorization: Bearer <SECRET_TOKEN>" \
	-F 'file=@yourfile.txt;type=text/plain'
	```

	---

	### Nepali (SentencePiece) - `/NP/`

	\| Endpoint \| Method \| Description \|
	\| --------------------------------- \| ------ \| ----------------------------------------- \|
	\| `/NP/analyse` \| POST \| Classify Nepali text \|
	\| `/NP/analyse-sentences` \| POST \| Sentence-by-sentence breakdown \|
	\| `/NP/upload` \| POST \| Upload Nepali PDF for classification \|
	\| `/NP/file-sentences-analyse` \| POST \| PDF upload, per-sentence breakdown \|
	\| `/NP/health` \| GET \| Health check \|

	#### Example: Nepali text classification

	```bash
	curl -X POST http://localhost:8000/NP/analyse \
	-H "Authorization: Bearer <SECRET_TOKEN>" \
	-H "Content-Type: application/json" \
	-d '{"text": "यो उदाहरण वाक्य हो।"}'
	```

	Response:
	```json
	{
	"label": "Human",
	"confidence": 98.6
	}
	```

	#### Example: Nepali PDF upload

	```bash
	curl -X POST http://localhost:8000/NP/upload \
	-H "Authorization: Bearer <SECRET_TOKEN>" \
	-F 'file=@NepaliText.pdf;type=application/pdf'
	```

	---

	## 📝 API Docs

	- Swagger UI: [http://localhost:8000/docs](http://localhost:8000/docs)
	- ReDoc: [http://localhost:8000/redoc](http://localhost:8000/redoc)

	---

	## 🧪 Example: Integration with NestJS

	You can easily call this API from a NestJS microservice.

	.env
	```env
	FASTAPI_BASE_URL=http://localhost:8000
	SECRET_TOKEN=your_secret_token_here
	```

	fastapi.service.ts
	```typescript
	import { Injectable } from "@nestjs/common";
	import { HttpService } from "@nestjs/axios";
	import { ConfigService } from "@nestjs/config";
	import { firstValueFrom } from "rxjs";

	@Injectable()
	export class FastAPIService {
	constructor(
	private http: HttpService,
	private config: ConfigService,
	) {}

	async analyzeText(text: string) {
	const url = `${this.config.get("FASTAPI_BASE_URL")}/text/analyse`;
	const token = this.config.get("SECRET_TOKEN");

	const response = await firstValueFrom(
	this.http.post(
	url,
	{ text },
	{
	headers: {
	Authorization: `Bearer ${token}`,
	},
	},
	),
	);

	return response.data;
	}
	}
	```

	app.module.ts
	```typescript
	import { Module } from "@nestjs/common";
	import { ConfigModule } from "@nestjs/config";
	import { HttpModule } from "@nestjs/axios";
	import { AppController } from "./app.controller";
	import { FastAPIService } from "./fastapi.service";

	@Module({
	imports: [ConfigModule.forRoot(), HttpModule],
	controllers: [AppController],
	providers: [FastAPIService],
	})
	export class AppModule {}
	```

	app.controller.ts
	```typescript
	import { Body, Controller, Post, Get } from '@nestjs/common';
	import { FastAPIService } from './fastapi.service';

	@Controller()
	export class AppController {
	constructor(private readonly fastapiService: FastAPIService) {}

	@Post('analyze-text')
	async callFastAPI(@Body('text') text: string) {
	return this.fastapiService.analyzeText(text);
	}

	@Get()
	getHello(): string {
	return 'NestJS is connected to FastAPI';
	}
	}
	```

	---

	## 🧠 Main Functions in Text Classifier (`features/text_classifier/` and `features/text_classifier/`)

	- `load_model()`
	Loads the GPT-2 model and tokenizer from the specified directory paths.

	- `lifespan()`
	Manages the application lifecycle. Initializes the model at startup and handles cleanup on shutdown.

	- `classify_text_sync()`
	Synchronously tokenizes input text and predicts using the GPT-2 model. Returns classification and perplexity.

	- `classify_text()`
	Asynchronously runs `classify_text_sync()` in a thread pool for non-blocking text classification.

	- `analyze_text()`
	POST endpoint: Accepts text input, classifies it using `classify_text()`, and returns the result with perplexity.

	- `health()`
	GET endpoint: Simple health check for API liveness.

	- `parse_docx()`, `parse_pdf()`, `parse_txt()`
	Utilities to extract and convert `.docx`, `.pdf`, and `.txt` file contents to plain text.

	- `warmup()`
	Downloads the model repository and initializes the model/tokenizer using `load_model()`.

	- `download_model_repo()`
	Downloads the model files from the designated `MODEL` folder.

	- `get_model_tokenizer()`
	Checks if the model already exists; if not, downloads it—otherwise, loads the cached model.

	- `handle_file_upload()`
	Handles file uploads from the `/upload` route. Extracts text, classifies, and returns results.

	- `extract_file_contents()`
	Extracts and returns plain text from uploaded files (PDF, DOCX, TXT).

	- `handle_file_sentence()`
	Processes file uploads by analyzing each sentence (under 10,000 chars) before classification.

	- `handle_sentence_level_analysis()`
	Checks/strips each sentence, then computes AI/human likelihood for each.

	- `analyze_sentences()`
	Splits paragraphs into sentences, classifies each, and returns all results.

	- `analyze_sentence_file()`
	Like `handle_file_sentence()`—analyzes sentences in uploaded files.

	---

	## 🚀 Deployment

	- Local: Use `uvicorn` as above.
	- Railway/Heroku: Use the provided `Procfile`.
	- Hugging Face Spaces: Use the `Dockerfile` for container deployment.

	---

	## 💡 Tips

	- Model files auto-download at first start if not found.
	- Keep `requirements.txt` up-to-date after adding dependencies.
	- All endpoints require the correct `Authorization` header.
	- For security: Avoid committing `.env` to public repos.

	---