Spaces:
Sleeping
Sleeping
title: AI Content Summariser API | |
emoji: π | |
colorFrom: blue | |
colorTo: indigo | |
sdk: docker | |
app_port: 7860 | |
pinned: false | |
license: mit | |
# AI Content Summariser API (Backend) | |
This is the backend API for the AI Content Summariser, a tool that automatically generates concise summaries of articles, documents, and web content using natural language processing. | |
The frontend application is available in a separate repository: [ai-content-summariser](https://github.com/dang-w/ai-content-summariser). | |
## Features | |
- Text summarization using state-of-the-art NLP models (BART-large-CNN) | |
- URL content extraction and summarization | |
- Adjustable parameters for summary length and style | |
- Efficient API endpoints with proper error handling | |
## API Endpoints | |
- `POST /api/summarise` - Summarize text content | |
- `POST /api/summarise-url` - Extract and summarize content from a URL | |
## Technology Stack | |
- **Framework**: FastAPI for efficient API endpoints | |
- **NLP Models**: Transformer-based models (BART) for summarisation | |
- **Web Scraping**: BeautifulSoup4 for extracting content from URLs | |
- **HTTP Client**: HTTPX for asynchronous web requests | |
- **Deployment**: Hugging Face Spaces or Docker containers | |
## Getting Started | |
### Prerequisites | |
- Python (v3.8+) | |
- pip | |
### Installation | |
```bash | |
# Clone the repository | |
git clone https://github.com/dang-w/ai-content-summariser-api.git | |
cd ai-content-summariser-api | |
# Create a virtual environment | |
python -m venv venv | |
source venv/bin/activate # On Windows: venv\Scripts\activate | |
# Install dependencies | |
pip install -r requirements.txt | |
``` | |
### Running Locally | |
```bash | |
# Start the backend server | |
uvicorn main:app --reload | |
``` | |
The API will be available at `http://localhost:8000`. | |
## Testing | |
The project includes a comprehensive test suite covering both unit and integration tests. | |
### Installing Test Dependencies | |
```bash | |
pip install pytest pytest-cov httpx | |
``` | |
### Running Tests | |
```bash | |
# Run all tests | |
pytest | |
# Run tests with verbose output | |
pytest -v | |
# Run tests and generate coverage report | |
pytest --cov=app tests/ | |
# Run tests and generate detailed coverage report | |
pytest --cov=app --cov-report=term-missing tests/ | |
# Run specific test file | |
pytest tests/test_api.py | |
# Run tests without warnings | |
pytest -W ignore::FutureWarning -W ignore::UserWarning | |
``` | |
### Test Structure | |
- **Unit Tests**: Test individual components in isolation | |
- `tests/test_summariser.py`: Tests for the summarization service | |
- **Integration Tests**: Test API endpoints and component interactions | |
- `tests/test_api.py`: Tests for API endpoints | |
### Mocking Strategy | |
For faster and more reliable tests, we use mocking to avoid loading large ML models during testing: | |
```python | |
# Example of mocked test | |
def test_summariser_with_mock(): | |
with patch('app.services.summariser.AutoTokenizer') as mock_tokenizer_class, \ | |
patch('app.services.summariser.AutoModelForSeq2SeqLM') as mock_model_class: | |
# Test implementation... | |
``` | |
### Continuous Integration | |
Tests are automatically run on pull requests and pushes to the main branch using GitHub Actions. | |
## Running with Docker | |
```bash | |
# Build and run with Docker | |
docker build -t ai-content-summariser-api . | |
docker run -p 8000:8000 ai-content-summariser-api | |
``` | |
## Deployment | |
See the deployment guide in the frontend repository for detailed instructions on deploying both the frontend and backend components. | |
### Deploying to Hugging Face Spaces | |
When deploying to Hugging Face Spaces, make sure to: | |
1. Set the following environment variables in the Space settings: | |
- `TRANSFORMERS_CACHE=/tmp/huggingface_cache` | |
- `HF_HOME=/tmp/huggingface_cache` | |
- `HUGGINGFACE_HUB_CACHE=/tmp/huggingface_cache` | |
2. Use the Docker SDK in your Space settings | |
3. If you encounter memory issues, consider using a smaller model by changing the `model_name` in `summariser.py` | |
## Performance Optimizations | |
The API includes several performance optimizations: | |
1. **Model Caching**: Models are loaded once and cached for subsequent requests | |
2. **Result Caching**: Frequently requested summaries are cached to avoid redundant processing | |
3. **Asynchronous Processing**: Long-running tasks are processed asynchronously | |
## Development | |
### Testing the API | |
You can test the API endpoints using the built-in Swagger documentation at `/docs` when running locally. | |
### Checking Transformers Installation | |
To verify that the transformers library is installed correctly: | |
```bash | |
python -m app.check_transformers | |
``` | |
## License | |
This project is licensed under the MIT License. | |