File size: 10,209 Bytes
ab2a904
6f034a7
ab2a904
6f034a7
 
ab2a904
 
6f034a7
 
ab2a904
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6f034a7
 
ab2a904
6f034a7
ab2a904
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6f034a7
ab2a904
6f034a7
ab2a904
6f034a7
ab2a904
6f034a7
ab2a904
 
 
 
6f034a7
ab2a904
6f034a7
ab2a904
 
 
6f034a7
ab2a904
6f034a7
ab2a904
6f034a7
ab2a904
 
 
6f034a7
ab2a904
6f034a7
ab2a904
6f034a7
ab2a904
6f034a7
ab2a904
 
 
6f034a7
 
 
ab2a904
6f034a7
ab2a904
6f034a7
ab2a904
 
6f034a7
ab2a904
6f034a7
ab2a904
6f034a7
ab2a904
6f034a7
ab2a904
6f034a7
ab2a904
 
 
 
 
 
 
6f034a7
ab2a904
6f034a7
ab2a904
 
 
 
 
6f034a7
 
ab2a904
6f034a7
 
ab2a904
 
 
6f034a7
 
 
ab2a904
6f034a7
 
ab2a904
 
 
6f034a7
 
 
 
ab2a904
6f034a7
ab2a904
 
 
 
 
 
 
 
 
6f034a7
 
ab2a904
 
6f034a7
ab2a904
6f034a7
 
ab2a904
 
 
 
 
 
6f034a7
 
ab2a904
6f034a7
ab2a904
 
 
 
 
6f034a7
ab2a904
6f034a7
ab2a904
6f034a7
ab2a904
 
6f034a7
 
 
ab2a904
6f034a7
ab2a904
6f034a7
ab2a904
 
 
 
6f034a7
 
ab2a904
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6f034a7
ab2a904
 
 
6f034a7
ab2a904
6f034a7
ab2a904
6f034a7
 
ab2a904
 
6f034a7
 
 
 
 
 
 
 
 
 
 
 
 
 
ab2a904
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6f034a7
 
ab2a904
6f034a7
ab2a904
 
6f034a7
ab2a904
 
6f034a7
ab2a904
 
6f034a7
ab2a904
 
6f034a7
ab2a904
 
6f034a7
ab2a904
 
6f034a7
ab2a904
 
6f034a7
ab2a904
 
6f034a7
ab2a904
 
6f034a7
ab2a904
 
 
 
 
 
 
 
 
 
 
 
 
 
6f034a7
ab2a904
 
6f034a7
ab2a904
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6f034a7
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
# πŸš€ FastAPI AI Text Detector

A production-ready FastAPI application for **AI-generated vs. human-written text detection** in both **English** and **Nepali**. Models are auto-managed and endpoints are secured via Bearer token authentication.

---

## πŸ—οΈ Project Structure

```
β”œβ”€β”€ app.py                   # Main FastAPI app entrypoint
β”œβ”€β”€ config.py                # Configuration loader (.env, settings)
β”œβ”€β”€ features/
β”‚   β”œβ”€β”€ text_classifier/     # English (GPT-2) classifier
β”‚   β”‚   β”œβ”€β”€ controller.py
β”‚   β”‚   β”œβ”€β”€ inferencer.py
β”‚   β”‚   β”œβ”€β”€ model_loader.py
β”‚   β”‚   β”œβ”€β”€ preprocess.py
β”‚   β”‚   └── routes.py
β”‚   └── nepali_text_classifier/ # Nepali (sentencepiece) classifier
β”‚       β”œβ”€β”€ controller.py
β”‚       β”œβ”€β”€ inferencer.py
β”‚       β”œβ”€β”€ model_loader.py
β”‚       β”œβ”€β”€ preprocess.py
β”‚       └── routes.py
β”œβ”€β”€ np_text_model/           # Nepali model artifacts (auto-downloaded)
β”‚   β”œβ”€β”€ classifier/
β”‚   β”‚   └── sentencepiece.bpe.model
β”‚   └── model_95_acc.pth
β”œβ”€β”€ models/                  # English GPT-2 model/tokenizer (auto-downloaded)
β”‚   β”œβ”€β”€ merges.txt
β”‚   β”œβ”€β”€ tokenizer.json
β”‚   └── model_weights.pth
β”œβ”€β”€ Dockerfile               # Container build config
β”œβ”€β”€ Procfile                 # Deployment entrypoint (for PaaS)
β”œβ”€β”€ requirements.txt         # Python dependencies
β”œβ”€β”€ README.md                # This file
└── .env                     # Secret token(s), environment config
```

---

### 🌟 Key Files and Their Roles

- **`app.py`**: Entry point initializing FastAPI app and routes.
- **`Procfile`**: Tells Railway (or similar platforms) how to run the program.
- **`requirements.txt`**: Tracks all Python dependencies for the project.
- **`__init__.py`**: Package initializer for the root module and submodules.
- **`features/text_classifier/`**
  - **`controller.py`**: Handles logic between routes and the model.
  - **`inferencer.py`**: Runs inference and returns predictions as well as file system 
  utilities.
- **`features/NP/`**
  - **`controller.py`**: Handles logic between routes and the model.
  - **`inferencer.py`**: Runs inference and returns predictions as well as file system 
  utilities.
  - **`model_loader.py`**: Loads the ML model and tokenizer.
  - **`preprocess.py`**: Prepares input text for the model.
  - **`routes.py`**: Defines API routes for text classification.

---

## βš™οΈ Setup & Installation

1. **Clone the repository**

   ```bash
   git clone https://github.com/cyberalertnepal/aiapi
   cd aiapi
   ```

2. **Install dependencies**

   ```bash
   pip install -r requirements.txt
   ```

3. **Configure secrets**

   - Create a `.env` file at the project root:

     ```env
     SECRET_TOKEN=your_secret_token_here
     ```

   - **All endpoints require `Authorization: Bearer <SECRET_TOKEN>`**

---

## 🚦 Running the API Server

```bash
uvicorn app:app --host 0.0.0.0 --port 8000
```

---

## πŸ”’ Security: Bearer Token Auth

All endpoints require authentication via Bearer token:

- Set `SECRET_TOKEN` in `.env`
- Add header: `Authorization: Bearer <SECRET_TOKEN>`

Unauthorized requests receive `403 Forbidden`.

---

## 🧩 API Endpoints

### English (GPT-2) - `/text/`

| Endpoint                         | Method | Description                               |
| --------------------------------- | ------ | ----------------------------------------- |
| `/text/analyse`                  | POST   | Classify raw English text                 |
| `/text/analyse-sentences`        | POST   | Sentence-by-sentence breakdown            |
| `/text/analyse-sentance-file`    | POST   | Upload file, per-sentence breakdown       |
| `/text/upload`                   | POST   | Upload file for overall classification    |
| `/text/health`                   | GET    | Health check                             |

#### Example: Classify English text

```bash
curl -X POST http://localhost:8000/text/analyse \
  -H "Authorization: Bearer <SECRET_TOKEN>" \
  -H "Content-Type: application/json" \
  -d '{"text": "This is a sample text for analysis."}'
```

**Response:**
```json
{
  "result": "AI-generated",
  "perplexity": 55.67,
  "ai_likelihood": 66.6
}
```

#### Example: File upload

```bash
curl -X POST http://localhost:8000/text/upload \
  -H "Authorization: Bearer <SECRET_TOKEN>" \
  -F 'file=@yourfile.txt;type=text/plain'
```

---

### Nepali (SentencePiece) - `/NP/`

| Endpoint                         | Method | Description                               |
| --------------------------------- | ------ | ----------------------------------------- |
| `/NP/analyse`                    | POST   | Classify Nepali text                      |
| `/NP/analyse-sentences`          | POST   | Sentence-by-sentence breakdown            |
| `/NP/upload`                     | POST   | Upload Nepali PDF for classification      |
| `/NP/file-sentences-analyse`     | POST   | PDF upload, per-sentence breakdown        |
| `/NP/health`                     | GET    | Health check                             |

#### Example: Nepali text classification

```bash
curl -X POST http://localhost:8000/NP/analyse \
  -H "Authorization: Bearer <SECRET_TOKEN>" \
  -H "Content-Type: application/json" \
  -d '{"text": "ΰ€―ΰ₯‹ ΰ€‰ΰ€¦ΰ€Ύΰ€Ήΰ€°ΰ€£ ΰ€΅ΰ€Ύΰ€•ΰ₯ΰ€― ΰ€Ήΰ₯‹ΰ₯€"}'
```

**Response:**
```json
{
  "label": "Human",
  "confidence": 98.6
}
```

#### Example: Nepali PDF upload

```bash
curl -X POST http://localhost:8000/NP/upload \
  -H "Authorization: Bearer <SECRET_TOKEN>" \
  -F 'file=@NepaliText.pdf;type=application/pdf'
```

---

## πŸ“ API Docs

- **Swagger UI:** [http://localhost:8000/docs](http://localhost:8000/docs)
- **ReDoc:** [http://localhost:8000/redoc](http://localhost:8000/redoc)

---

## πŸ§ͺ Example: Integration with NestJS

You can easily call this API from a NestJS microservice.

**.env**
```env
FASTAPI_BASE_URL=http://localhost:8000
SECRET_TOKEN=your_secret_token_here
```

**fastapi.service.ts**
```typescript
import { Injectable } from "@nestjs/common";
import { HttpService } from "@nestjs/axios";
import { ConfigService } from "@nestjs/config";
import { firstValueFrom } from "rxjs";

@Injectable()
export class FastAPIService {
  constructor(
    private http: HttpService,
    private config: ConfigService,
  ) {}

  async analyzeText(text: string) {
    const url = `${this.config.get("FASTAPI_BASE_URL")}/text/analyse`;
    const token = this.config.get("SECRET_TOKEN");

    const response = await firstValueFrom(
      this.http.post(
        url,
        { text },
        {
          headers: {
            Authorization: `Bearer ${token}`,
          },
        },
      ),
    );

    return response.data;
  }
}
```

**app.module.ts**
```typescript
import { Module } from "@nestjs/common";
import { ConfigModule } from "@nestjs/config";
import { HttpModule } from "@nestjs/axios";
import { AppController } from "./app.controller";
import { FastAPIService } from "./fastapi.service";

@Module({
  imports: [ConfigModule.forRoot(), HttpModule],
  controllers: [AppController],
  providers: [FastAPIService],
})
export class AppModule {}
```

**app.controller.ts**
```typescript
import { Body, Controller, Post, Get } from '@nestjs/common';
import { FastAPIService } from './fastapi.service';

@Controller()
export class AppController {
  constructor(private readonly fastapiService: FastAPIService) {}

  @Post('analyze-text')
  async callFastAPI(@Body('text') text: string) {
    return this.fastapiService.analyzeText(text);
  }

  @Get()
  getHello(): string {
    return 'NestJS is connected to FastAPI';
  }
}
```

---

## 🧠 Main Functions in Text Classifier (`features/text_classifier/` and `features/text_classifier/`)

- **`load_model()`**  
  Loads the GPT-2 model and tokenizer from the specified directory paths.

- **`lifespan()`**  
  Manages the application lifecycle. Initializes the model at startup and handles cleanup on shutdown.

- **`classify_text_sync()`**  
  Synchronously tokenizes input text and predicts using the GPT-2 model. Returns classification and perplexity.

- **`classify_text()`**  
  Asynchronously runs `classify_text_sync()` in a thread pool for non-blocking text classification.

- **`analyze_text()`**  
  **POST** endpoint: Accepts text input, classifies it using `classify_text()`, and returns the result with perplexity.

- **`health()`**  
  **GET** endpoint: Simple health check for API liveness.

- **`parse_docx()`, `parse_pdf()`, `parse_txt()`**  
  Utilities to extract and convert `.docx`, `.pdf`, and `.txt` file contents to plain text.

- **`warmup()`**  
  Downloads the model repository and initializes the model/tokenizer using `load_model()`.

- **`download_model_repo()`**  
  Downloads the model files from the designated `MODEL` folder.

- **`get_model_tokenizer()`**  
  Checks if the model already exists; if not, downloads itβ€”otherwise, loads the cached model.

- **`handle_file_upload()`**  
  Handles file uploads from the `/upload` route. Extracts text, classifies, and returns results.

- **`extract_file_contents()`**  
  Extracts and returns plain text from uploaded files (PDF, DOCX, TXT).

- **`handle_file_sentence()`**  
  Processes file uploads by analyzing each sentence (under 10,000 chars) before classification.

- **`handle_sentence_level_analysis()`**  
  Checks/strips each sentence, then computes AI/human likelihood for each.

- **`analyze_sentences()`**  
  Splits paragraphs into sentences, classifies each, and returns all results.

- **`analyze_sentence_file()`**  
  Like `handle_file_sentence()`β€”analyzes sentences in uploaded files.

---

## πŸš€ Deployment

- **Local**: Use `uvicorn` as above.
- **Railway/Heroku**: Use the provided `Procfile`.
- **Hugging Face Spaces**: Use the `Dockerfile` for container deployment.

---

## πŸ’‘ Tips

- **Model files auto-download at first start** if not found.
- **Keep `requirements.txt` up-to-date** after adding dependencies.
- **All endpoints require the correct `Authorization` header**.
- **For security**: Avoid committing `.env` to public repos.

---