File size: 6,129 Bytes
076e53d 7e17f7b 9cf5fee 7e17f7b 9cf5fee 77a88ff 7e17f7b 77a88ff 7e17f7b 77a88ff 7e17f7b 77a88ff 7e17f7b 77a88ff 7e17f7b 124b5b5 7e17f7b 77a88ff 7e17f7b 77a88ff 7e17f7b 77a88ff 7e17f7b 124b5b5 77a88ff 7e17f7b 77a88ff 7e17f7b 77a88ff 7e17f7b 77a88ff b089011 77a88ff 7e17f7b 124b5b5 77a88ff 124b5b5 77a88ff 7e17f7b 77a88ff 7e17f7b 77a88ff 7e17f7b 77a88ff 7e17f7b 77a88ff 7e17f7b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 |
---
title: AI Content Summariser API
emoji: π
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
pinned: false
license: mit
---
# AI Content Summariser API (Backend)
This is the backend API for the AI Content Summariser, a tool that automatically generates concise summaries of articles, documents, and web content using natural language processing.
The frontend application is available in a separate repository: [ai-content-summariser](https://github.com/dang-w/ai-content-summariser).
## Features
- **Text Summarization**: Generate concise summaries using BART-large-CNN model
- **URL Content Extraction**: Automatically extract and process content from web pages
- **Adjustable Parameters**: Control summary length (30-500 chars) and style
- **Advanced Generation Options**: Temperature control (0.7-2.0) and sampling options
- **Caching System**: Store results to improve performance and reduce redundant processing
- **Status Monitoring**: Track model loading and summarization progress in real-time
- **Error Handling**: Robust error handling for various input scenarios
- **CORS Support**: Configured for cross-origin requests from the frontend
## API Endpoints
- `POST /api/summarise` - Summarize text content
- `POST /api/summarise-url` - Extract and summarize content from a URL
- `GET /api/status` - Get the current status of the model and any running jobs
- `GET /health` - Health check endpoint for monitoring
## Technology Stack
- **Framework**: FastAPI for efficient API development
- **NLP Models**: Hugging Face Transformers (BART-large-CNN)
- **Web Scraping**: BeautifulSoup4 for extracting content from URLs
- **HTTP Client**: HTTPX for asynchronous web requests
- **ML Framework**: PyTorch for running the NLP models
- **Testing**: Pytest for unit and integration testing
- **Deployment**: Docker containers on Hugging Face Spaces
## Project Structure
```
ai-content-summariser-api/
βββ app/
β βββ api/
β β βββ routes.py # API endpoints
β βββ services/
β β βββ summariser.py # Text summarization service
β β βββ url_extractor.py # URL content extraction
β β βββ cache.py # Caching functionality
β βββ check_transformers.py # Utility to verify model setup
βββ tests/
β βββ test_api.py # API endpoint tests
β βββ test_summariser.py # Summarizer service tests
βββ main.py # Application entry point
βββ Dockerfile # Docker configuration
βββ requirements.txt # Python dependencies
βββ .env # Environment variables (not in repo)
```
## Getting Started
### Prerequisites
- Python (v3.8+)
- pip
- At least 4GB of RAM (8GB recommended for optimal performance)
- GPU support (optional, but recommended for faster processing)
### Installation
```bash
# Clone the repository
git clone https://github.com/dang-w/ai-content-summariser-api.git
cd ai-content-summariser-api
# Create a virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
```
### Environment Setup
Create a `.env` file in the root directory with the following variables:
```
ENVIRONMENT=development
CORS_ORIGINS=http://localhost:3000,https://ai-content-summariser.vercel.app
TRANSFORMERS_CACHE=/path/to/cache # Optional: custom cache location
```
### Running Locally
```bash
# Start the backend server
uvicorn main:app --reload --host 0.0.0.0 --port 8000
```
The API will be available at `http://localhost:8000`. You can access the API documentation at `http://localhost:8000/docs`.
## Testing
The project includes a comprehensive test suite covering both unit and integration tests.
### Running Tests
```bash
# Run all tests
pytest
# Run tests with verbose output
pytest -v
# Run tests and generate coverage report
pytest --cov=app tests/
# Run tests and generate detailed coverage report
pytest --cov=app --cov-report=term-missing tests/
# Run specific test file
pytest tests/test_api.py
# Run tests without warnings
pytest -W ignore::FutureWarning -W ignore::UserWarning
```
## Docker Deployment
```bash
# Build and run with Docker
docker build -t ai-content-summariser-api .
docker run -p 8000:8000 ai-content-summariser-api
```
## Deployment to Hugging Face Spaces
When deploying to Hugging Face Spaces:
1. Fork this repository to your Hugging Face account
2. Set the following environment variables in the Space settings:
- `TRANSFORMERS_CACHE=/tmp/huggingface_cache`
- `HF_HOME=/tmp/huggingface_cache`
- `HUGGINGFACE_HUB_CACHE=/tmp/huggingface_cache`
- `CORS_ORIGINS=https://ai-content-summariser.vercel.app,http://localhost:3000`
3. Ensure the Space is configured to use the Docker SDK
4. Your API will be available at `https://huggingface.co/spaces/your-username/ai-content-summariser-api`
## Performance Optimizations
The API includes several performance optimizations:
1. **Model Caching**: Models are loaded once and cached for subsequent requests
2. **Result Caching**: Frequently requested summaries are cached to avoid redundant processing
3. **Asynchronous Processing**: Long-running tasks are processed asynchronously
4. **Text Preprocessing**: Input text is cleaned and normalized before processing
5. **Batched Processing**: Large texts are processed in batches for better memory management
## API Request Examples
### Text Summarization
```bash
curl -X 'POST' \
'http://localhost:8000/api/summarise' \
-H 'Content-Type: application/json' \
-d '{
"text": "Your long text to summarize goes here...",
"max_length": 150,
"min_length": 50,
"do_sample": true,
"temperature": 1.2
}'
```
### URL Summarization
```bash
curl -X 'POST' \
'http://localhost:8000/api/summarise-url' \
-H 'Content-Type: application/json' \
-d '{
"url": "https://example.com/article",
"max_length": 150,
"min_length": 50,
"do_sample": true,
"temperature": 1.2
}'
```
## License
This project is licensed under the MIT License.
|