Spaces:

dang-w
/

ai-content-summariser-api

Sleeping

App Files Files Community

ai-content-summariser-api / README.md

Dan Walsh

Updates to hugging face spaces config

b089011 3 months ago

preview code

raw

history blame

4.56 kB

	---
	title: AI Content Summariser API
	emoji: 📝
	colorFrom: blue
	colorTo: indigo
	sdk: docker
	app_port: 7860
	pinned: false
	license: mit
	---

	# AI Content Summariser API (Backend)

	This is the backend API for the AI Content Summariser, a tool that automatically generates concise summaries of articles, documents, and web content using natural language processing.

	The frontend application is available in a separate repository: [ai-content-summariser](https://github.com/dang-w/ai-content-summariser).

	## Features

	- Text summarization using state-of-the-art NLP models (BART-large-CNN)
	- URL content extraction and summarization
	- Adjustable parameters for summary length and style
	- Efficient API endpoints with proper error handling

	## API Endpoints

	- `POST /api/summarise` - Summarize text content
	- `POST /api/summarise-url` - Extract and summarize content from a URL

	## Technology Stack

	- Framework: FastAPI for efficient API endpoints
	- NLP Models: Transformer-based models (BART) for summarisation
	- Web Scraping: BeautifulSoup4 for extracting content from URLs
	- HTTP Client: HTTPX for asynchronous web requests
	- Deployment: Hugging Face Spaces or Docker containers

	## Getting Started

	### Prerequisites

	- Python (v3.8+)
	- pip

	### Installation

	```bash
	# Clone the repository
	git clone https://github.com/dang-w/ai-content-summariser-api.git
	cd ai-content-summariser-api

	# Create a virtual environment
	python -m venv venv
	source venv/bin/activate # On Windows: venv\Scripts\activate

	# Install dependencies
	pip install -r requirements.txt
	```

	### Running Locally

	```bash
	# Start the backend server
	uvicorn main:app --reload
	```

	The API will be available at `http://localhost:8000`.

	## Testing

	The project includes a comprehensive test suite covering both unit and integration tests.

	### Installing Test Dependencies

	```bash
	pip install pytest pytest-cov httpx
	```

	### Running Tests

	```bash
	# Run all tests
	pytest

	# Run tests with verbose output
	pytest -v

	# Run tests and generate coverage report
	pytest --cov=app tests/

	# Run tests and generate detailed coverage report
	pytest --cov=app --cov-report=term-missing tests/

	# Run specific test file
	pytest tests/test_api.py

	# Run tests without warnings
	pytest -W ignore::FutureWarning -W ignore::UserWarning
	```

	### Test Structure

	- Unit Tests: Test individual components in isolation
	- `tests/test_summariser.py`: Tests for the summarization service

	- Integration Tests: Test API endpoints and component interactions
	- `tests/test_api.py`: Tests for API endpoints

	### Mocking Strategy

	For faster and more reliable tests, we use mocking to avoid loading large ML models during testing:

	```python
	# Example of mocked test
	def test_summariser_with_mock():
	with patch('app.services.summariser.AutoTokenizer') as mock_tokenizer_class, \
	patch('app.services.summariser.AutoModelForSeq2SeqLM') as mock_model_class:
	# Test implementation...
	```

	### Continuous Integration

	Tests are automatically run on pull requests and pushes to the main branch using GitHub Actions.

	## Running with Docker

	```bash
	# Build and run with Docker
	docker build -t ai-content-summariser-api .
	docker run -p 8000:8000 ai-content-summariser-api
	```

	## Deployment

	See the deployment guide in the frontend repository for detailed instructions on deploying both the frontend and backend components.

	### Deploying to Hugging Face Spaces

	When deploying to Hugging Face Spaces, make sure to:

	1. Set the following environment variables in the Space settings:
	- `TRANSFORMERS_CACHE=/tmp/huggingface_cache`
	- `HF_HOME=/tmp/huggingface_cache`
	- `HUGGINGFACE_HUB_CACHE=/tmp/huggingface_cache`

	2. Use the Docker SDK in your Space settings

	3. If you encounter memory issues, consider using a smaller model by changing the `model_name` in `summariser.py`

	## Performance Optimizations

	The API includes several performance optimizations:

	1. Model Caching: Models are loaded once and cached for subsequent requests
	2. Result Caching: Frequently requested summaries are cached to avoid redundant processing
	3. Asynchronous Processing: Long-running tasks are processed asynchronously

	## Development

	### Testing the API

	You can test the API endpoints using the built-in Swagger documentation at `/docs` when running locally.

	### Checking Transformers Installation

	To verify that the transformers library is installed correctly:

	```bash
	python -m app.check_transformers
	```

	## License

	This project is licensed under the MIT License.