Spaces:
Running
Running
# π FastAPI AI Text Detector | |
A production-ready FastAPI application for **AI-generated vs. human-written text detection** in both **English** and **Nepali**. Models are auto-managed and endpoints are secured via Bearer token authentication. | |
--- | |
## ποΈ Project Structure | |
``` | |
βββ app.py # Main FastAPI app entrypoint | |
βββ config.py # Configuration loader (.env, settings) | |
βββ features/ | |
β βββ text_classifier/ # English (GPT-2) classifier | |
β β βββ controller.py | |
β β βββ inferencer.py | |
β β βββ model_loader.py | |
β β βββ preprocess.py | |
β β βββ routes.py | |
β βββ nepali_text_classifier/ # Nepali (sentencepiece) classifier | |
β βββ controller.py | |
β βββ inferencer.py | |
β βββ model_loader.py | |
β βββ preprocess.py | |
β βββ routes.py | |
βββ np_text_model/ # Nepali model artifacts (auto-downloaded) | |
β βββ classifier/ | |
β β βββ sentencepiece.bpe.model | |
β βββ model_95_acc.pth | |
βββ models/ # English GPT-2 model/tokenizer (auto-downloaded) | |
β βββ merges.txt | |
β βββ tokenizer.json | |
β βββ model_weights.pth | |
βββ Dockerfile # Container build config | |
βββ Procfile # Deployment entrypoint (for PaaS) | |
βββ requirements.txt # Python dependencies | |
βββ README.md # This file | |
βββ .env # Secret token(s), environment config | |
``` | |
--- | |
### π Key Files and Their Roles | |
- **`app.py`**: Entry point initializing FastAPI app and routes. | |
- **`Procfile`**: Tells Railway (or similar platforms) how to run the program. | |
- **`requirements.txt`**: Tracks all Python dependencies for the project. | |
- **`__init__.py`**: Package initializer for the root module and submodules. | |
- **`features/text_classifier/`** | |
- **`controller.py`**: Handles logic between routes and the model. | |
- **`inferencer.py`**: Runs inference and returns predictions as well as file system | |
utilities. | |
- **`features/NP/`** | |
- **`controller.py`**: Handles logic between routes and the model. | |
- **`inferencer.py`**: Runs inference and returns predictions as well as file system | |
utilities. | |
- **`model_loader.py`**: Loads the ML model and tokenizer. | |
- **`preprocess.py`**: Prepares input text for the model. | |
- **`routes.py`**: Defines API routes for text classification. | |
--- | |
## βοΈ Setup & Installation | |
1. **Clone the repository** | |
```bash | |
git clone https://github.com/cyberalertnepal/aiapi | |
cd aiapi | |
``` | |
2. **Install dependencies** | |
```bash | |
pip install -r requirements.txt | |
``` | |
3. **Configure secrets** | |
- Create a `.env` file at the project root: | |
```env | |
SECRET_TOKEN=your_secret_token_here | |
``` | |
- **All endpoints require `Authorization: Bearer <SECRET_TOKEN>`** | |
--- | |
## π¦ Running the API Server | |
```bash | |
uvicorn app:app --host 0.0.0.0 --port 8000 | |
``` | |
--- | |
## π Security: Bearer Token Auth | |
All endpoints require authentication via Bearer token: | |
- Set `SECRET_TOKEN` in `.env` | |
- Add header: `Authorization: Bearer <SECRET_TOKEN>` | |
Unauthorized requests receive `403 Forbidden`. | |
--- | |
## π§© API Endpoints | |
### English (GPT-2) - `/text/` | |
| Endpoint | Method | Description | | |
| --------------------------------- | ------ | ----------------------------------------- | | |
| `/text/analyse` | POST | Classify raw English text | | |
| `/text/analyse-sentences` | POST | Sentence-by-sentence breakdown | | |
| `/text/analyse-sentance-file` | POST | Upload file, per-sentence breakdown | | |
| `/text/upload` | POST | Upload file for overall classification | | |
| `/text/health` | GET | Health check | | |
#### Example: Classify English text | |
```bash | |
curl -X POST http://localhost:8000/text/analyse \ | |
-H "Authorization: Bearer <SECRET_TOKEN>" \ | |
-H "Content-Type: application/json" \ | |
-d '{"text": "This is a sample text for analysis."}' | |
``` | |
**Response:** | |
```json | |
{ | |
"result": "AI-generated", | |
"perplexity": 55.67, | |
"ai_likelihood": 66.6 | |
} | |
``` | |
#### Example: File upload | |
```bash | |
curl -X POST http://localhost:8000/text/upload \ | |
-H "Authorization: Bearer <SECRET_TOKEN>" \ | |
-F 'file=@yourfile.txt;type=text/plain' | |
``` | |
--- | |
### Nepali (SentencePiece) - `/NP/` | |
| Endpoint | Method | Description | | |
| --------------------------------- | ------ | ----------------------------------------- | | |
| `/NP/analyse` | POST | Classify Nepali text | | |
| `/NP/analyse-sentences` | POST | Sentence-by-sentence breakdown | | |
| `/NP/upload` | POST | Upload Nepali PDF for classification | | |
| `/NP/file-sentences-analyse` | POST | PDF upload, per-sentence breakdown | | |
| `/NP/health` | GET | Health check | | |
#### Example: Nepali text classification | |
```bash | |
curl -X POST http://localhost:8000/NP/analyse \ | |
-H "Authorization: Bearer <SECRET_TOKEN>" \ | |
-H "Content-Type: application/json" \ | |
-d '{"text": "ΰ€―ΰ₯ ΰ€ΰ€¦ΰ€Ύΰ€Ήΰ€°ΰ€£ ΰ€΅ΰ€Ύΰ€ΰ₯ΰ€― ΰ€Ήΰ₯ΰ₯€"}' | |
``` | |
**Response:** | |
```json | |
{ | |
"label": "Human", | |
"confidence": 98.6 | |
} | |
``` | |
#### Example: Nepali PDF upload | |
```bash | |
curl -X POST http://localhost:8000/NP/upload \ | |
-H "Authorization: Bearer <SECRET_TOKEN>" \ | |
-F 'file=@NepaliText.pdf;type=application/pdf' | |
``` | |
--- | |
## π API Docs | |
- **Swagger UI:** [http://localhost:8000/docs](http://localhost:8000/docs) | |
- **ReDoc:** [http://localhost:8000/redoc](http://localhost:8000/redoc) | |
--- | |
## π§ͺ Example: Integration with NestJS | |
You can easily call this API from a NestJS microservice. | |
**.env** | |
```env | |
FASTAPI_BASE_URL=http://localhost:8000 | |
SECRET_TOKEN=your_secret_token_here | |
``` | |
**fastapi.service.ts** | |
```typescript | |
import { Injectable } from "@nestjs/common"; | |
import { HttpService } from "@nestjs/axios"; | |
import { ConfigService } from "@nestjs/config"; | |
import { firstValueFrom } from "rxjs"; | |
@Injectable() | |
export class FastAPIService { | |
constructor( | |
private http: HttpService, | |
private config: ConfigService, | |
) {} | |
async analyzeText(text: string) { | |
const url = `${this.config.get("FASTAPI_BASE_URL")}/text/analyse`; | |
const token = this.config.get("SECRET_TOKEN"); | |
const response = await firstValueFrom( | |
this.http.post( | |
url, | |
{ text }, | |
{ | |
headers: { | |
Authorization: `Bearer ${token}`, | |
}, | |
}, | |
), | |
); | |
return response.data; | |
} | |
} | |
``` | |
**app.module.ts** | |
```typescript | |
import { Module } from "@nestjs/common"; | |
import { ConfigModule } from "@nestjs/config"; | |
import { HttpModule } from "@nestjs/axios"; | |
import { AppController } from "./app.controller"; | |
import { FastAPIService } from "./fastapi.service"; | |
@Module({ | |
imports: [ConfigModule.forRoot(), HttpModule], | |
controllers: [AppController], | |
providers: [FastAPIService], | |
}) | |
export class AppModule {} | |
``` | |
**app.controller.ts** | |
```typescript | |
import { Body, Controller, Post, Get } from '@nestjs/common'; | |
import { FastAPIService } from './fastapi.service'; | |
@Controller() | |
export class AppController { | |
constructor(private readonly fastapiService: FastAPIService) {} | |
@Post('analyze-text') | |
async callFastAPI(@Body('text') text: string) { | |
return this.fastapiService.analyzeText(text); | |
} | |
@Get() | |
getHello(): string { | |
return 'NestJS is connected to FastAPI'; | |
} | |
} | |
``` | |
--- | |
## π§ Main Functions in Text Classifier (`features/text_classifier/` and `features/text_classifier/`) | |
- **`load_model()`** | |
Loads the GPT-2 model and tokenizer from the specified directory paths. | |
- **`lifespan()`** | |
Manages the application lifecycle. Initializes the model at startup and handles cleanup on shutdown. | |
- **`classify_text_sync()`** | |
Synchronously tokenizes input text and predicts using the GPT-2 model. Returns classification and perplexity. | |
- **`classify_text()`** | |
Asynchronously runs `classify_text_sync()` in a thread pool for non-blocking text classification. | |
- **`analyze_text()`** | |
**POST** endpoint: Accepts text input, classifies it using `classify_text()`, and returns the result with perplexity. | |
- **`health()`** | |
**GET** endpoint: Simple health check for API liveness. | |
- **`parse_docx()`, `parse_pdf()`, `parse_txt()`** | |
Utilities to extract and convert `.docx`, `.pdf`, and `.txt` file contents to plain text. | |
- **`warmup()`** | |
Downloads the model repository and initializes the model/tokenizer using `load_model()`. | |
- **`download_model_repo()`** | |
Downloads the model files from the designated `MODEL` folder. | |
- **`get_model_tokenizer()`** | |
Checks if the model already exists; if not, downloads itβotherwise, loads the cached model. | |
- **`handle_file_upload()`** | |
Handles file uploads from the `/upload` route. Extracts text, classifies, and returns results. | |
- **`extract_file_contents()`** | |
Extracts and returns plain text from uploaded files (PDF, DOCX, TXT). | |
- **`handle_file_sentence()`** | |
Processes file uploads by analyzing each sentence (under 10,000 chars) before classification. | |
- **`handle_sentence_level_analysis()`** | |
Checks/strips each sentence, then computes AI/human likelihood for each. | |
- **`analyze_sentences()`** | |
Splits paragraphs into sentences, classifies each, and returns all results. | |
- **`analyze_sentence_file()`** | |
Like `handle_file_sentence()`βanalyzes sentences in uploaded files. | |
--- | |
## π Deployment | |
- **Local**: Use `uvicorn` as above. | |
- **Railway/Heroku**: Use the provided `Procfile`. | |
- **Hugging Face Spaces**: Use the `Dockerfile` for container deployment. | |
--- | |
## π‘ Tips | |
- **Model files auto-download at first start** if not found. | |
- **Keep `requirements.txt` up-to-date** after adding dependencies. | |
- **All endpoints require the correct `Authorization` header**. | |
- **For security**: Avoid committing `.env` to public repos. | |
--- | |