AI-Checker / readme.md
Pujan-Dev's picture
feat: added the proper readme.md
87a735b
|
raw
history blame
10.2 kB
# πŸš€ FastAPI AI Text Detector
A production-ready FastAPI application for **AI-generated vs. human-written text detection** in both **English** and **Nepali**. Models are auto-managed and endpoints are secured via Bearer token authentication.
---
## πŸ—οΈ Project Structure
```
β”œβ”€β”€ app.py # Main FastAPI app entrypoint
β”œβ”€β”€ config.py # Configuration loader (.env, settings)
β”œβ”€β”€ features/
β”‚ β”œβ”€β”€ text_classifier/ # English (GPT-2) classifier
β”‚ β”‚ β”œβ”€β”€ controller.py
β”‚ β”‚ β”œβ”€β”€ inferencer.py
β”‚ β”‚ β”œβ”€β”€ model_loader.py
β”‚ β”‚ β”œβ”€β”€ preprocess.py
β”‚ β”‚ └── routes.py
β”‚ └── nepali_text_classifier/ # Nepali (sentencepiece) classifier
β”‚ β”œβ”€β”€ controller.py
β”‚ β”œβ”€β”€ inferencer.py
β”‚ β”œβ”€β”€ model_loader.py
β”‚ β”œβ”€β”€ preprocess.py
β”‚ └── routes.py
β”œβ”€β”€ np_text_model/ # Nepali model artifacts (auto-downloaded)
β”‚ β”œβ”€β”€ classifier/
β”‚ β”‚ └── sentencepiece.bpe.model
β”‚ └── model_95_acc.pth
β”œβ”€β”€ models/ # English GPT-2 model/tokenizer (auto-downloaded)
β”‚ β”œβ”€β”€ merges.txt
β”‚ β”œβ”€β”€ tokenizer.json
β”‚ └── model_weights.pth
β”œβ”€β”€ Dockerfile # Container build config
β”œβ”€β”€ Procfile # Deployment entrypoint (for PaaS)
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ README.md # This file
└── .env # Secret token(s), environment config
```
---
### 🌟 Key Files and Their Roles
- **`app.py`**: Entry point initializing FastAPI app and routes.
- **`Procfile`**: Tells Railway (or similar platforms) how to run the program.
- **`requirements.txt`**: Tracks all Python dependencies for the project.
- **`__init__.py`**: Package initializer for the root module and submodules.
- **`features/text_classifier/`**
- **`controller.py`**: Handles logic between routes and the model.
- **`inferencer.py`**: Runs inference and returns predictions as well as file system
utilities.
- **`features/NP/`**
- **`controller.py`**: Handles logic between routes and the model.
- **`inferencer.py`**: Runs inference and returns predictions as well as file system
utilities.
- **`model_loader.py`**: Loads the ML model and tokenizer.
- **`preprocess.py`**: Prepares input text for the model.
- **`routes.py`**: Defines API routes for text classification.
---
## βš™οΈ Setup & Installation
1. **Clone the repository**
```bash
git clone https://github.com/cyberalertnepal/aiapi
cd aiapi
```
2. **Install dependencies**
```bash
pip install -r requirements.txt
```
3. **Configure secrets**
- Create a `.env` file at the project root:
```env
SECRET_TOKEN=your_secret_token_here
```
- **All endpoints require `Authorization: Bearer <SECRET_TOKEN>`**
---
## 🚦 Running the API Server
```bash
uvicorn app:app --host 0.0.0.0 --port 8000
```
---
## πŸ”’ Security: Bearer Token Auth
All endpoints require authentication via Bearer token:
- Set `SECRET_TOKEN` in `.env`
- Add header: `Authorization: Bearer <SECRET_TOKEN>`
Unauthorized requests receive `403 Forbidden`.
---
## 🧩 API Endpoints
### English (GPT-2) - `/text/`
| Endpoint | Method | Description |
| --------------------------------- | ------ | ----------------------------------------- |
| `/text/analyse` | POST | Classify raw English text |
| `/text/analyse-sentences` | POST | Sentence-by-sentence breakdown |
| `/text/analyse-sentance-file` | POST | Upload file, per-sentence breakdown |
| `/text/upload` | POST | Upload file for overall classification |
| `/text/health` | GET | Health check |
#### Example: Classify English text
```bash
curl -X POST http://localhost:8000/text/analyse \
-H "Authorization: Bearer <SECRET_TOKEN>" \
-H "Content-Type: application/json" \
-d '{"text": "This is a sample text for analysis."}'
```
**Response:**
```json
{
"result": "AI-generated",
"perplexity": 55.67,
"ai_likelihood": 66.6
}
```
#### Example: File upload
```bash
curl -X POST http://localhost:8000/text/upload \
-H "Authorization: Bearer <SECRET_TOKEN>" \
-F 'file=@yourfile.txt;type=text/plain'
```
---
### Nepali (SentencePiece) - `/NP/`
| Endpoint | Method | Description |
| --------------------------------- | ------ | ----------------------------------------- |
| `/NP/analyse` | POST | Classify Nepali text |
| `/NP/analyse-sentences` | POST | Sentence-by-sentence breakdown |
| `/NP/upload` | POST | Upload Nepali PDF for classification |
| `/NP/file-sentences-analyse` | POST | PDF upload, per-sentence breakdown |
| `/NP/health` | GET | Health check |
#### Example: Nepali text classification
```bash
curl -X POST http://localhost:8000/NP/analyse \
-H "Authorization: Bearer <SECRET_TOKEN>" \
-H "Content-Type: application/json" \
-d '{"text": "ΰ€―ΰ₯‹ ΰ€‰ΰ€¦ΰ€Ύΰ€Ήΰ€°ΰ€£ ΰ€΅ΰ€Ύΰ€•ΰ₯ΰ€― ΰ€Ήΰ₯‹ΰ₯€"}'
```
**Response:**
```json
{
"label": "Human",
"confidence": 98.6
}
```
#### Example: Nepali PDF upload
```bash
curl -X POST http://localhost:8000/NP/upload \
-H "Authorization: Bearer <SECRET_TOKEN>" \
-F 'file=@NepaliText.pdf;type=application/pdf'
```
---
## πŸ“ API Docs
- **Swagger UI:** [http://localhost:8000/docs](http://localhost:8000/docs)
- **ReDoc:** [http://localhost:8000/redoc](http://localhost:8000/redoc)
---
## πŸ§ͺ Example: Integration with NestJS
You can easily call this API from a NestJS microservice.
**.env**
```env
FASTAPI_BASE_URL=http://localhost:8000
SECRET_TOKEN=your_secret_token_here
```
**fastapi.service.ts**
```typescript
import { Injectable } from "@nestjs/common";
import { HttpService } from "@nestjs/axios";
import { ConfigService } from "@nestjs/config";
import { firstValueFrom } from "rxjs";
@Injectable()
export class FastAPIService {
constructor(
private http: HttpService,
private config: ConfigService,
) {}
async analyzeText(text: string) {
const url = `${this.config.get("FASTAPI_BASE_URL")}/text/analyse`;
const token = this.config.get("SECRET_TOKEN");
const response = await firstValueFrom(
this.http.post(
url,
{ text },
{
headers: {
Authorization: `Bearer ${token}`,
},
},
),
);
return response.data;
}
}
```
**app.module.ts**
```typescript
import { Module } from "@nestjs/common";
import { ConfigModule } from "@nestjs/config";
import { HttpModule } from "@nestjs/axios";
import { AppController } from "./app.controller";
import { FastAPIService } from "./fastapi.service";
@Module({
imports: [ConfigModule.forRoot(), HttpModule],
controllers: [AppController],
providers: [FastAPIService],
})
export class AppModule {}
```
**app.controller.ts**
```typescript
import { Body, Controller, Post, Get } from '@nestjs/common';
import { FastAPIService } from './fastapi.service';
@Controller()
export class AppController {
constructor(private readonly fastapiService: FastAPIService) {}
@Post('analyze-text')
async callFastAPI(@Body('text') text: string) {
return this.fastapiService.analyzeText(text);
}
@Get()
getHello(): string {
return 'NestJS is connected to FastAPI';
}
}
```
---
## 🧠 Main Functions in Text Classifier (`features/text_classifier/` and `features/text_classifier/`)
- **`load_model()`**
Loads the GPT-2 model and tokenizer from the specified directory paths.
- **`lifespan()`**
Manages the application lifecycle. Initializes the model at startup and handles cleanup on shutdown.
- **`classify_text_sync()`**
Synchronously tokenizes input text and predicts using the GPT-2 model. Returns classification and perplexity.
- **`classify_text()`**
Asynchronously runs `classify_text_sync()` in a thread pool for non-blocking text classification.
- **`analyze_text()`**
**POST** endpoint: Accepts text input, classifies it using `classify_text()`, and returns the result with perplexity.
- **`health()`**
**GET** endpoint: Simple health check for API liveness.
- **`parse_docx()`, `parse_pdf()`, `parse_txt()`**
Utilities to extract and convert `.docx`, `.pdf`, and `.txt` file contents to plain text.
- **`warmup()`**
Downloads the model repository and initializes the model/tokenizer using `load_model()`.
- **`download_model_repo()`**
Downloads the model files from the designated `MODEL` folder.
- **`get_model_tokenizer()`**
Checks if the model already exists; if not, downloads itβ€”otherwise, loads the cached model.
- **`handle_file_upload()`**
Handles file uploads from the `/upload` route. Extracts text, classifies, and returns results.
- **`extract_file_contents()`**
Extracts and returns plain text from uploaded files (PDF, DOCX, TXT).
- **`handle_file_sentence()`**
Processes file uploads by analyzing each sentence (under 10,000 chars) before classification.
- **`handle_sentence_level_analysis()`**
Checks/strips each sentence, then computes AI/human likelihood for each.
- **`analyze_sentences()`**
Splits paragraphs into sentences, classifies each, and returns all results.
- **`analyze_sentence_file()`**
Like `handle_file_sentence()`β€”analyzes sentences in uploaded files.
---
## πŸš€ Deployment
- **Local**: Use `uvicorn` as above.
- **Railway/Heroku**: Use the provided `Procfile`.
- **Hugging Face Spaces**: Use the `Dockerfile` for container deployment.
---
## πŸ’‘ Tips
- **Model files auto-download at first start** if not found.
- **Keep `requirements.txt` up-to-date** after adding dependencies.
- **All endpoints require the correct `Authorization` header**.
- **For security**: Avoid committing `.env` to public repos.
---