# πŸš€ FastAPI AI Text Detector A production-ready FastAPI application for **AI-generated vs. human-written text detection** in both **English** and **Nepali**. Models are auto-managed and endpoints are secured via Bearer token authentication. --- ## πŸ—οΈ Project Structure ``` β”œβ”€β”€ app.py # Main FastAPI app entrypoint β”œβ”€β”€ config.py # Configuration loader (.env, settings) β”œβ”€β”€ features/ β”‚ β”œβ”€β”€ text_classifier/ # English (GPT-2) classifier β”‚ β”‚ β”œβ”€β”€ controller.py β”‚ β”‚ β”œβ”€β”€ inferencer.py β”‚ β”‚ β”œβ”€β”€ model_loader.py β”‚ β”‚ β”œβ”€β”€ preprocess.py β”‚ β”‚ └── routes.py β”‚ └── nepali_text_classifier/ # Nepali (sentencepiece) classifier β”‚ β”œβ”€β”€ controller.py β”‚ β”œβ”€β”€ inferencer.py β”‚ β”œβ”€β”€ model_loader.py β”‚ β”œβ”€β”€ preprocess.py β”‚ └── routes.py β”œβ”€β”€ np_text_model/ # Nepali model artifacts (auto-downloaded) β”‚ β”œβ”€β”€ classifier/ β”‚ β”‚ └── sentencepiece.bpe.model β”‚ └── model_95_acc.pth β”œβ”€β”€ models/ # English GPT-2 model/tokenizer (auto-downloaded) β”‚ β”œβ”€β”€ merges.txt β”‚ β”œβ”€β”€ tokenizer.json β”‚ └── model_weights.pth β”œβ”€β”€ Dockerfile # Container build config β”œβ”€β”€ Procfile # Deployment entrypoint (for PaaS) β”œβ”€β”€ requirements.txt # Python dependencies β”œβ”€β”€ README.md # This file └── .env # Secret token(s), environment config ``` --- ### 🌟 Key Files and Their Roles - **`app.py`**: Entry point initializing FastAPI app and routes. - **`Procfile`**: Tells Railway (or similar platforms) how to run the program. - **`requirements.txt`**: Tracks all Python dependencies for the project. - **`__init__.py`**: Package initializer for the root module and submodules. - **`features/text_classifier/`** - **`controller.py`**: Handles logic between routes and the model. - **`inferencer.py`**: Runs inference and returns predictions as well as file system utilities. - **`features/NP/`** - **`controller.py`**: Handles logic between routes and the model. - **`inferencer.py`**: Runs inference and returns predictions as well as file system utilities. - **`model_loader.py`**: Loads the ML model and tokenizer. - **`preprocess.py`**: Prepares input text for the model. - **`routes.py`**: Defines API routes for text classification. --- ## βš™οΈ Setup & Installation 1. **Clone the repository** ```bash git clone https://github.com/cyberalertnepal/aiapi cd aiapi ``` 2. **Install dependencies** ```bash pip install -r requirements.txt ``` 3. **Configure secrets** - Create a `.env` file at the project root: ```env SECRET_TOKEN=your_secret_token_here ``` - **All endpoints require `Authorization: Bearer `** --- ## 🚦 Running the API Server ```bash uvicorn app:app --host 0.0.0.0 --port 8000 ``` --- ## πŸ”’ Security: Bearer Token Auth All endpoints require authentication via Bearer token: - Set `SECRET_TOKEN` in `.env` - Add header: `Authorization: Bearer ` Unauthorized requests receive `403 Forbidden`. --- ## 🧩 API Endpoints ### English (GPT-2) - `/text/` | Endpoint | Method | Description | | --------------------------------- | ------ | ----------------------------------------- | | `/text/analyse` | POST | Classify raw English text | | `/text/analyse-sentences` | POST | Sentence-by-sentence breakdown | | `/text/analyse-sentance-file` | POST | Upload file, per-sentence breakdown | | `/text/upload` | POST | Upload file for overall classification | | `/text/health` | GET | Health check | #### Example: Classify English text ```bash curl -X POST http://localhost:8000/text/analyse \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" \ -d '{"text": "This is a sample text for analysis."}' ``` **Response:** ```json { "result": "AI-generated", "perplexity": 55.67, "ai_likelihood": 66.6 } ``` #### Example: File upload ```bash curl -X POST http://localhost:8000/text/upload \ -H "Authorization: Bearer " \ -F 'file=@yourfile.txt;type=text/plain' ``` --- ### Nepali (SentencePiece) - `/NP/` | Endpoint | Method | Description | | --------------------------------- | ------ | ----------------------------------------- | | `/NP/analyse` | POST | Classify Nepali text | | `/NP/analyse-sentences` | POST | Sentence-by-sentence breakdown | | `/NP/upload` | POST | Upload Nepali PDF for classification | | `/NP/file-sentences-analyse` | POST | PDF upload, per-sentence breakdown | | `/NP/health` | GET | Health check | #### Example: Nepali text classification ```bash curl -X POST http://localhost:8000/NP/analyse \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" \ -d '{"text": "ΰ€―ΰ₯‹ ΰ€‰ΰ€¦ΰ€Ύΰ€Ήΰ€°ΰ€£ ΰ€΅ΰ€Ύΰ€•ΰ₯ΰ€― ΰ€Ήΰ₯‹ΰ₯€"}' ``` **Response:** ```json { "label": "Human", "confidence": 98.6 } ``` #### Example: Nepali PDF upload ```bash curl -X POST http://localhost:8000/NP/upload \ -H "Authorization: Bearer " \ -F 'file=@NepaliText.pdf;type=application/pdf' ``` --- ## πŸ“ API Docs - **Swagger UI:** [http://localhost:8000/docs](http://localhost:8000/docs) - **ReDoc:** [http://localhost:8000/redoc](http://localhost:8000/redoc) --- ## πŸ§ͺ Example: Integration with NestJS You can easily call this API from a NestJS microservice. **.env** ```env FASTAPI_BASE_URL=http://localhost:8000 SECRET_TOKEN=your_secret_token_here ``` **fastapi.service.ts** ```typescript import { Injectable } from "@nestjs/common"; import { HttpService } from "@nestjs/axios"; import { ConfigService } from "@nestjs/config"; import { firstValueFrom } from "rxjs"; @Injectable() export class FastAPIService { constructor( private http: HttpService, private config: ConfigService, ) {} async analyzeText(text: string) { const url = `${this.config.get("FASTAPI_BASE_URL")}/text/analyse`; const token = this.config.get("SECRET_TOKEN"); const response = await firstValueFrom( this.http.post( url, { text }, { headers: { Authorization: `Bearer ${token}`, }, }, ), ); return response.data; } } ``` **app.module.ts** ```typescript import { Module } from "@nestjs/common"; import { ConfigModule } from "@nestjs/config"; import { HttpModule } from "@nestjs/axios"; import { AppController } from "./app.controller"; import { FastAPIService } from "./fastapi.service"; @Module({ imports: [ConfigModule.forRoot(), HttpModule], controllers: [AppController], providers: [FastAPIService], }) export class AppModule {} ``` **app.controller.ts** ```typescript import { Body, Controller, Post, Get } from '@nestjs/common'; import { FastAPIService } from './fastapi.service'; @Controller() export class AppController { constructor(private readonly fastapiService: FastAPIService) {} @Post('analyze-text') async callFastAPI(@Body('text') text: string) { return this.fastapiService.analyzeText(text); } @Get() getHello(): string { return 'NestJS is connected to FastAPI'; } } ``` --- ## 🧠 Main Functions in Text Classifier (`features/text_classifier/` and `features/text_classifier/`) - **`load_model()`** Loads the GPT-2 model and tokenizer from the specified directory paths. - **`lifespan()`** Manages the application lifecycle. Initializes the model at startup and handles cleanup on shutdown. - **`classify_text_sync()`** Synchronously tokenizes input text and predicts using the GPT-2 model. Returns classification and perplexity. - **`classify_text()`** Asynchronously runs `classify_text_sync()` in a thread pool for non-blocking text classification. - **`analyze_text()`** **POST** endpoint: Accepts text input, classifies it using `classify_text()`, and returns the result with perplexity. - **`health()`** **GET** endpoint: Simple health check for API liveness. - **`parse_docx()`, `parse_pdf()`, `parse_txt()`** Utilities to extract and convert `.docx`, `.pdf`, and `.txt` file contents to plain text. - **`warmup()`** Downloads the model repository and initializes the model/tokenizer using `load_model()`. - **`download_model_repo()`** Downloads the model files from the designated `MODEL` folder. - **`get_model_tokenizer()`** Checks if the model already exists; if not, downloads itβ€”otherwise, loads the cached model. - **`handle_file_upload()`** Handles file uploads from the `/upload` route. Extracts text, classifies, and returns results. - **`extract_file_contents()`** Extracts and returns plain text from uploaded files (PDF, DOCX, TXT). - **`handle_file_sentence()`** Processes file uploads by analyzing each sentence (under 10,000 chars) before classification. - **`handle_sentence_level_analysis()`** Checks/strips each sentence, then computes AI/human likelihood for each. - **`analyze_sentences()`** Splits paragraphs into sentences, classifies each, and returns all results. - **`analyze_sentence_file()`** Like `handle_file_sentence()`β€”analyzes sentences in uploaded files. --- ## πŸš€ Deployment - **Local**: Use `uvicorn` as above. - **Railway/Heroku**: Use the provided `Procfile`. - **Hugging Face Spaces**: Use the `Dockerfile` for container deployment. --- ## πŸ’‘ Tips - **Model files auto-download at first start** if not found. - **Keep `requirements.txt` up-to-date** after adding dependencies. - **All endpoints require the correct `Authorization` header**. - **For security**: Avoid committing `.env` to public repos. ---