Spaces:
Running
Running
File size: 12,370 Bytes
6f034a7 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 |
### **FastAPI AI**
This FastAPI app loads a GPT-2 model, tokenizes input text, classifies it, and returns whether the text is AI-generated or human-written.
### **install Dependencies**
```bash
pip install -r requirements.txt
```
This command installs all the dependencies listed in the `requirements.txt` file. It ensures that your environment has the required packages to run the project smoothly.
**NOTE: IF YOU HAVE DONE ANY CHANGES DON'NT FORGOT TO PUT IT IN THE REQUIREMENTS.TXT USING `bash pip freeze > requirements.txt `**
---
### Files STructure
```
βββ app.py
βββ features
βΒ Β βββ text_classifier
βΒ Β βββ controller.py
βΒ Β βββ inferencer.py
βΒ Β βββ __init__.py
βΒ Β βββ model_loader.py
βΒ Β βββ preprocess.py
βΒ Β βββ routes.py
βββ __init__.py
βββ Procfile
βββ readme.md
βββ requirements.txt
```
**`app.py`**: Entry point initializing FastAPI app and routes
**`Procfile`**: Tells Railway how to run the program
**`requirements.txt`**:Have all the packages that we use in our project
**`__init__.py`** : Package initializer for the root module
**FOLDER :features/text_classifier**
**`controller.py`** :Handles logic between routes and model
**`inferencer.py`** : Runs inference and returns predictions as well as files system
**`__init__.py`** :Initializes the module as a package
**`model_loader.py`** : Loads the ML model and tokenizer
**`preprocess.py`** :Prepares input text for the model
**`routes.py`** :Defines API routes for text classification
### **Functions**
1. **`load_model()`**
Loads the GPT-2 model and tokenizer from the specified directory paths.
2. **`lifespan()`**
Manages the application lifecycle. It initializes the model at startup and performs cleanup during shutdown.
3. **`classify_text_sync()`**
Synchronously tokenizes the input text and performs classification using the GPT-2 model. Returns both the classification result and perplexity score.
4. **`classify_text()`**
Asynchronously runs `classify_text_sync()` in a thread pool for non-blocking text classification.
5. **`analyze_text()`**
**POST** endpoint: Accepts text input, classifies it using `classify_text()`, and returns the result along with perplexity.
6. **`health()`**
**GET** endpoint: Performs a simple health check to confirm the API is operational.
7. **`parse_docx()`, `parse_pdf()`, `parse_txt()`**
Utility functions to extract and convert the contents of `.docx`, `.pdf`, and `.txt` files into plain text for classification.
8. **`warmup()`**
Downloads the model repository and initializes the model and tokenizer using the `load_model()` function.
9. **`download_model_repo()`**
Handles downloading the model files from the designated `MODEL` folder.
10. **`get_model_tokenizer()`**
Similar to `warmup()`, but includes a check to see if the model already exists. If not, it downloads the model; otherwise, it uses the previously downloaded one.
11. **`handle_file_upload()`**
Manages file uploads from the `/upload` route. Extracts text from the uploaded file, classifies it, and returns the results.
12. **`extract_file_contents()`**
Extracts and returns plain text content from uploaded files (e.g., PDF, DOCX, TXT).
13. **`handle_file_sentence()`**
Processes uploaded files by analyzing each sentence. Ensures the total file text is under 10,000 characters before classification.
14. **`handle_sentence_level_analysis()`**
Strips and checks each sentenceβs length, then evaluates the likelihood of AI vs. human generation for each sentence.
15. **`analyze_sentences()`**
Divides long paragraphs into individual sentences, classifies each one, and returns a list of their classification results.
16. **`analyze_sentence_file()`**
A route function that analyzes sentences in uploaded files, similar to `handle_file_sentence()`.
---
### **Code Overview**
### **Running and Load Balancing:**
To run the app in production with load balancing:
```bash
uvicorn app:app --host 0.0.0.0 --port 8000
```
This command launches the FastAPI app.
### **Endpoints**
#### 1. **`/text/analyze`**
- **Method:** `POST`
- **Description:** Classifies whether the text is AI-generated or human-written.
- **Request:**
```json
{ "text": "sample text" }
```
- **Response:**
```json
{ "result": "AI-generated", "perplexity": 55.67,"ai_likelihood":66.6%}
```
#### 2. **`/health`**
- **Method:** `GET`
- **Description:** Returns the status of the API.
- **Response:**
```json
{ "status": "ok" }
```
#### 3. **`/text/upload`**
- **Method:** `POST`
- **Description:** Takes the files and check the contains inside and returns the results
- **Request:** Files
- **Response:**
```json
{ "result": "AI-generated", "perplexity": 55.67,"ai_likelihood":66.6%}
```
#### 4. **`/text/analyze_sentence_file`**
- **Method:** `POST`
- **Description:** Takes the files and check the contains inside and returns the results
- **Request:** Files
- **Response:**
```json
{
"content": "Artificial Intelligence (AI) and Machine Learning (ML) are rapidly transforming the way we \ninteract with technology. AI refers to the broader concept of machines being able to carry out \ntasks in a way that we would consider \"smart,\" while ML is a subset of AI that focuses on the \ndevelopment of algorithms that allow computers to learn from and make decisions based on \ndata. These technologies are behind innovations such as voice assistants, recommendation \nsystems, self-driving cars, and medical diagnosis tools. By analyzing large amounts of data, \nAI and ML can identify patterns, make predictions, and continuously improve their \nperformance over time, making them essential tools in modern industries ranging from \nhealthcare and finance to education and entertainment. \n \n",
"analysis": [
{
"sentence": "Artificial Intelligence (AI) and Machine Learning (ML) are rapidly transforming the way we interact with technology.",
"label": "AI-generated",
"perplexity": 8.17,
"ai_likelihood": 100
},
{
"sentence": "AI refers to the broader concept of machines being able to carry out tasks in a way that we would consider \"smart,\" while ML is a subset of AI that focuses on the development of algorithms that allow computers to learn from and make decisions based on data.",
"label": "AI-generated",
"perplexity": 19.34,
"ai_likelihood": 89.62
},
{
"sentence": "These technologies are behind innovations such as voice assistants, recommendation systems, self-driving cars, and medical diagnosis tools.",
"label": "AI-generated",
"perplexity": 40.31,
"ai_likelihood": 66.32
},
{
"sentence": "By analyzing large amounts of data, AI and ML can identify patterns, make predictions, and continuously improve their performance over time, making them essential tools in modern industries ranging from healthcare and finance to education and entertainment.",
"label": "AI-generated",
"perplexity": 26.15,
"ai_likelihood": 82.05
}
]
}```
#### 5. **`/text/analyze_sentences`**
- **Method:** `POST`
- **Description:** Takes the text and check the contains inside and returns the results
- **Request:**
```json
{
"text": "This is an test text. This is an another Text "
}
```
- **Response:**
```json
{
"analysis": [
{
"sentence": "This is an test text.",
"label": "Human-written",
"perplexity": 510.28,
"ai_likelihood": 0
},
{
"sentence": "This is an another Text",
"label": "Human-written",
"perplexity": 3926.05,
"ai_likelihood": 0
}
]
}```
---
### **Running the API**
Start the server with:
```bash
uvicorn app:app --host 0.0.0.0 --port 8000
```
---
### **π§ͺ Testing the API**
You can test the FastAPI endpoint using `curl` like this:
```bash
curl -X POST https://can-org-canspace.hf.space/analyze \
-H "Authorization: Bearer SECRET_CODE" \
-H "Content-Type: application/json" \
-d '{"text": "This is a sample sentence for analysis."}'
```
- The `-H "Authorization: Bearer SECRET_CODE"` part is used to simulate the **handshake**.
- FastAPI checks this token against the one loaded from the `.env` file.
- If the token matches, the request is accepted and processed.
- Otherwise, it responds with a `403 Unauthorized` error.
---
### **API Documentation**
- **Swagger UI:** `https://can-org-canspace.hf.space/docs` -> `/docs`
- **ReDoc:** `https://can-org-canspace.hf.space/redoc` -> `/redoc`
### **π Handshake Mechanism**
In this part, we're implementing a simple handshake to verify that the request is coming from a trusted source (e.g., our NestJS server). Here's how it works:
- We load a secret token from the `.env` file.
- When a request is made to the FastAPI server, we extract the `Authorization` header and compare it with our expected secret token.
- If the token does **not** match, we immediately return a **403 Forbidden** response with the message `"Unauthorized"`.
- If the token **does** match, we allow the request to proceed to the next step.
The verification function looks like this:
```python
def verify_token(auth: str):
if auth != f"Bearer {EXPECTED_TOKEN}":
raise HTTPException(status_code=403, detail="Unauthorized")
```
This provides a basic but effective layer of security to prevent unauthorized access to the API.
### **Implement it with NEST.js**
NOTE: Make an micro service in NEST.JS and implement it there and call it from app.controller.ts
in fastapi.service.ts file what we have done is
### Project Structure
```files
nestjs-fastapi-bridge/
βββ src/
β βββ app.controller.ts
β βββ app.module.ts
β βββ fastapi.service.ts
βββ .env
```
---
### Step-by-Step Setup
#### 1. `.env`
Create a `.env` file at the root with the following:
```environment
FASTAPI_BASE_URL=https://can-org-canspace.hf.space/
SECRET_TOKEN="SECRET_CODE_TOKEN"
```
#### 2. `fastapi.service.ts`
```javascript
// src/fastapi.service.ts
import { Injectable } from "@nestjs/common";
import { HttpService } from "@nestjs/axios";
import { ConfigService } from "@nestjs/config";
import { firstValueFrom } from "rxjs";
@Injectable()
export class FastAPIService {
constructor(
private http: HttpService,
private config: ConfigService,
) {}
async analyzeText(text: string) {
const url = `${this.config.get("FASTAPI_BASE_URL")}/analyze`;
const token = this.config.get("SECRET_TOKEN");
const response = await firstValueFrom(
this.http.post(
url,
{ text },
{
headers: {
Authorization: `Bearer ${token}`,
},
},
),
);
return response.data;
}
}
```
#### 3. `app.module.ts`
```javascript
// src/app.module.ts
import { Module } from "@nestjs/common";
import { ConfigModule } from "@nestjs/config";
import { HttpModule } from "@nestjs/axios";
import { AppController } from "./app.controller";
import { FastAPIService } from "./fastapi.service";
@Module({
imports: [ConfigModule.forRoot(), HttpModule],
controllers: [AppController],
providers: [FastAPIService],
})
export class AppModule {}
```
---
#### 4. `app.controller.ts`
```javascript
// src/app.controller.ts
import { Body, Controller, Post, Get, Query } from '@nestjs/common';
import { FastAPIService } from './fastapi.service';
@Controller()
export class AppController {
constructor(private readonly fastapiService: FastAPIService) {}
@Post('analyze-text')
async callFastAPI(@Body('text') text: string) {
return this.fastapiService.analyzeText(text);
}
@Get()
getHello(): string {
return 'NestJS is connected to FastAPI ';
}
}
```
### π How to Run
Run the server of flask and nest.js:
- for nest.js
```bash
npm run start
```
- for Fastapi
```bash
uvicorn app:app --reload
```
Make sure your FastAPI service is running at `http://localhost:8000`.
### Test with CURL
http://localhost:3000/-> Server of nest.js
```bash
curl -X POST http://localhost:3000/analyze-text \
-H 'Content-Type: application/json' \
-d '{"text": "This is a test input"}'
```
|