# 🚀 FastAPI AI Text Detector

A production-ready FastAPI application for **AI-generated vs. human-written text detection** in both **English** and **Nepali**. Models are auto-managed and endpoints are secured via Bearer token authentication.

---

## 🏗️ Project Structure

```
├── app.py                   # Main FastAPI app entrypoint
├── config.py                # Configuration loader (.env, settings)
├── features/
│   ├── text_classifier/     # English (GPT-2) classifier
│   │   ├── controller.py
│   │   ├── inferencer.py
│   │   ├── model_loader.py
│   │   ├── preprocess.py
│   │   └── routes.py
│   └── nepali_text_classifier/ # Nepali (sentencepiece) classifier
│       ├── controller.py
│       ├── inferencer.py
│       ├── model_loader.py
│       ├── preprocess.py
│       └── routes.py
├── np_text_model/           # Nepali model artifacts (auto-downloaded)
│   ├── classifier/
│   │   └── sentencepiece.bpe.model
│   └── model_95_acc.pth
├── models/                  # English GPT-2 model/tokenizer (auto-downloaded)
│   ├── merges.txt
│   ├── tokenizer.json
│   └── model_weights.pth
├── Dockerfile               # Container build config
├── Procfile                 # Deployment entrypoint (for PaaS)
├── requirements.txt         # Python dependencies
├── README.md                # This file
└── .env                     # Secret token(s), environment config
```

---

### 🌟 Key Files and Their Roles

- **`app.py`**: Entry point initializing FastAPI app and routes.
- **`Procfile`**: Tells Railway (or similar platforms) how to run the program.
- **`requirements.txt`**: Tracks all Python dependencies for the project.
- **`__init__.py`**: Package initializer for the root module and submodules.
- **`features/text_classifier/`**
  - **`controller.py`**: Handles logic between routes and the model.
  - **`inferencer.py`**: Runs inference and returns predictions as well as file system 
  utilities.
- **`features/NP/`**
  - **`controller.py`**: Handles logic between routes and the model.
  - **`inferencer.py`**: Runs inference and returns predictions as well as file system 
  utilities.
  - **`model_loader.py`**: Loads the ML model and tokenizer.
  - **`preprocess.py`**: Prepares input text for the model.
  - **`routes.py`**: Defines API routes for text classification.

---

## ⚙️ Setup & Installation

1. **Clone the repository**

   ```bash
   git clone https://github.com/cyberalertnepal/aiapi
   cd aiapi
   ```

2. **Install dependencies**

   ```bash
   pip install -r requirements.txt
   ```

3. **Configure secrets**

   - Create a `.env` file at the project root:

     ```env
     SECRET_TOKEN=your_secret_token_here
     ```

   - **All endpoints require `Authorization: Bearer <SECRET_TOKEN>`**

---

## 🚦 Running the API Server

```bash
uvicorn app:app --host 0.0.0.0 --port 8000
```

---

## 🔒 Security: Bearer Token Auth

All endpoints require authentication via Bearer token:

- Set `SECRET_TOKEN` in `.env`
- Add header: `Authorization: Bearer <SECRET_TOKEN>`

Unauthorized requests receive `403 Forbidden`.

---

## 🧩 API Endpoints

### English (GPT-2) - `/text/`

| Endpoint                         | Method | Description                               |
| --------------------------------- | ------ | ----------------------------------------- |
| `/text/analyse`                  | POST   | Classify raw English text                 |
| `/text/analyse-sentences`        | POST   | Sentence-by-sentence breakdown            |
| `/text/analyse-sentance-file`    | POST   | Upload file, per-sentence breakdown       |
| `/text/upload`                   | POST   | Upload file for overall classification    |
| `/text/health`                   | GET    | Health check                             |

#### Example: Classify English text

```bash
curl -X POST http://localhost:8000/text/analyse \
  -H "Authorization: Bearer <SECRET_TOKEN>" \
  -H "Content-Type: application/json" \
  -d '{"text": "This is a sample text for analysis."}'
```

**Response:**
```json
{
  "result": "AI-generated",
  "perplexity": 55.67,
  "ai_likelihood": 66.6
}
```

#### Example: File upload

```bash
curl -X POST http://localhost:8000/text/upload \
  -H "Authorization: Bearer <SECRET_TOKEN>" \
  -F 'file=@yourfile.txt;type=text/plain'
```

---

### Nepali (SentencePiece) - `/NP/`

| Endpoint                         | Method | Description                               |
| --------------------------------- | ------ | ----------------------------------------- |
| `/NP/analyse`                    | POST   | Classify Nepali text                      |
| `/NP/analyse-sentences`          | POST   | Sentence-by-sentence breakdown            |
| `/NP/upload`                     | POST   | Upload Nepali PDF for classification      |
| `/NP/file-sentences-analyse`     | POST   | PDF upload, per-sentence breakdown        |
| `/NP/health`                     | GET    | Health check                             |

#### Example: Nepali text classification

```bash
curl -X POST http://localhost:8000/NP/analyse \
  -H "Authorization: Bearer <SECRET_TOKEN>" \
  -H "Content-Type: application/json" \
  -d '{"text": "यो उदाहरण वाक्य हो।"}'
```

**Response:**
```json
{
  "label": "Human",
  "confidence": 98.6
}
```

#### Example: Nepali PDF upload

```bash
curl -X POST http://localhost:8000/NP/upload \
  -H "Authorization: Bearer <SECRET_TOKEN>" \
  -F 'file=@NepaliText.pdf;type=application/pdf'
```

---

## 📝 API Docs

- **Swagger UI:** [http://localhost:8000/docs](http://localhost:8000/docs)
- **ReDoc:** [http://localhost:8000/redoc](http://localhost:8000/redoc)

---

## 🧪 Example: Integration with NestJS

You can easily call this API from a NestJS microservice.

**.env**
```env
FASTAPI_BASE_URL=http://localhost:8000
SECRET_TOKEN=your_secret_token_here
```

**fastapi.service.ts**
```typescript
import { Injectable } from "@nestjs/common";
import { HttpService } from "@nestjs/axios";
import { ConfigService } from "@nestjs/config";
import { firstValueFrom } from "rxjs";

@Injectable()
export class FastAPIService {
  constructor(
    private http: HttpService,
    private config: ConfigService,
  ) {}

  async analyzeText(text: string) {
    const url = `${this.config.get("FASTAPI_BASE_URL")}/text/analyse`;
    const token = this.config.get("SECRET_TOKEN");

    const response = await firstValueFrom(
      this.http.post(
        url,
        { text },
        {
          headers: {
            Authorization: `Bearer ${token}`,
          },
        },
      ),
    );

    return response.data;
  }
}
```

**app.module.ts**
```typescript
import { Module } from "@nestjs/common";
import { ConfigModule } from "@nestjs/config";
import { HttpModule } from "@nestjs/axios";
import { AppController } from "./app.controller";
import { FastAPIService } from "./fastapi.service";

@Module({
  imports: [ConfigModule.forRoot(), HttpModule],
  controllers: [AppController],
  providers: [FastAPIService],
})
export class AppModule {}
```

**app.controller.ts**
```typescript
import { Body, Controller, Post, Get } from '@nestjs/common';
import { FastAPIService } from './fastapi.service';

@Controller()
export class AppController {
  constructor(private readonly fastapiService: FastAPIService) {}

  @Post('analyze-text')
  async callFastAPI(@Body('text') text: string) {
    return this.fastapiService.analyzeText(text);
  }

  @Get()
  getHello(): string {
    return 'NestJS is connected to FastAPI';
  }
}
```

---

## 🧠 Main Functions in Text Classifier (`features/text_classifier/` and `features/text_classifier/`)

- **`load_model()`**  
  Loads the GPT-2 model and tokenizer from the specified directory paths.

- **`lifespan()`**  
  Manages the application lifecycle. Initializes the model at startup and handles cleanup on shutdown.

- **`classify_text_sync()`**  
  Synchronously tokenizes input text and predicts using the GPT-2 model. Returns classification and perplexity.

- **`classify_text()`**  
  Asynchronously runs `classify_text_sync()` in a thread pool for non-blocking text classification.

- **`analyze_text()`**  
  **POST** endpoint: Accepts text input, classifies it using `classify_text()`, and returns the result with perplexity.

- **`health()`**  
  **GET** endpoint: Simple health check for API liveness.

- **`parse_docx()`, `parse_pdf()`, `parse_txt()`**  
  Utilities to extract and convert `.docx`, `.pdf`, and `.txt` file contents to plain text.

- **`warmup()`**  
  Downloads the model repository and initializes the model/tokenizer using `load_model()`.

- **`download_model_repo()`**  
  Downloads the model files from the designated `MODEL` folder.

- **`get_model_tokenizer()`**  
  Checks if the model already exists; if not, downloads it—otherwise, loads the cached model.

- **`handle_file_upload()`**  
  Handles file uploads from the `/upload` route. Extracts text, classifies, and returns results.

- **`extract_file_contents()`**  
  Extracts and returns plain text from uploaded files (PDF, DOCX, TXT).

- **`handle_file_sentence()`**  
  Processes file uploads by analyzing each sentence (under 10,000 chars) before classification.

- **`handle_sentence_level_analysis()`**  
  Checks/strips each sentence, then computes AI/human likelihood for each.

- **`analyze_sentences()`**  
  Splits paragraphs into sentences, classifies each, and returns all results.

- **`analyze_sentence_file()`**  
  Like `handle_file_sentence()`—analyzes sentences in uploaded files.

---

## 🚀 Deployment

- **Local**: Use `uvicorn` as above.
- **Railway/Heroku**: Use the provided `Procfile`.
- **Hugging Face Spaces**: Use the `Dockerfile` for container deployment.

---

## 💡 Tips

- **Model files auto-download at first start** if not found.
- **Keep `requirements.txt` up-to-date** after adding dependencies.
- **All endpoints require the correct `Authorization` header**.
- **For security**: Avoid committing `.env` to public repos.

---