Spaces:

dang-w
/

ai-content-summariser-api

Sleeping

App Files Files Community

Dan Walsh commited on Mar 11

Commit

77a88ff

1 Parent(s): 6f0ac93

Updating README

Browse files

Files changed (3) hide show

README.md +85 -58
app/api/__pycache__/routes.cpython-311.pyc +0 -0
app/services/__pycache__/url_extractor.cpython-311.pyc +0 -0

README.md CHANGED Viewed

@@ -17,23 +17,52 @@ The frontend application is available in a separate repository: [ai-content-summ
 ## Features
-- Text summarization using state-of-the-art NLP models (BART-large-CNN)
-- URL content extraction and summarization
-- Adjustable parameters for summary length and style
-- Efficient API endpoints with proper error handling
 ## API Endpoints
 - `POST /api/summarise` - Summarize text content
 - `POST /api/summarise-url` - Extract and summarize content from a URL
 ## Technology Stack
-- **Framework**: FastAPI for efficient API endpoints
-- **NLP Models**: Transformer-based models (BART) for summarisation
 - **Web Scraping**: BeautifulSoup4 for extracting content from URLs
 - **HTTP Client**: HTTPX for asynchronous web requests
-- **Deployment**: Hugging Face Spaces or Docker containers
 ## Getting Started
@@ -41,6 +70,8 @@ The frontend application is available in a separate repository: [ai-content-summ
 - Python (v3.8+)
 - pip
 ### Installation
@@ -57,25 +88,29 @@ source venv/bin/activate  # On Windows: venv\Scripts\activate
 pip install -r requirements.txt
 ```
 ### Running Locally
 ```bash
 # Start the backend server
-uvicorn main:app --reload
 ```
-The API will be available at `http://localhost:8000`.
 ## Testing
 The project includes a comprehensive test suite covering both unit and integration tests.
-### Installing Test Dependencies
-```bash
-pip install pytest pytest-cov httpx
-```
 ### Running Tests
 ```bash
@@ -98,31 +133,7 @@ pytest tests/test_api.py
 pytest -W ignore::FutureWarning -W ignore::UserWarning
 ```
-### Test Structure
-- **Unit Tests**: Test individual components in isolation
-  - `tests/test_summariser.py`: Tests for the summarization service
-- **Integration Tests**: Test API endpoints and component interactions
-  - `tests/test_api.py`: Tests for API endpoints
-### Mocking Strategy
-For faster and more reliable tests, we use mocking to avoid loading large ML models during testing:
-```python
-# Example of mocked test
-def test_summariser_with_mock():
-    with patch('app.services.summariser.AutoTokenizer') as mock_tokenizer_class, \
-         patch('app.services.summariser.AutoModelForSeq2SeqLM') as mock_model_class:
-        # Test implementation...
-```
-### Continuous Integration
-Tests are automatically run on pull requests and pushes to the main branch using GitHub Actions.
-## Running with Docker
 ```bash
 # Build and run with Docker
@@ -130,22 +141,18 @@ docker build -t ai-content-summariser-api .
 docker run -p 8000:8000 ai-content-summariser-api
 ```
-## Deployment
-See the deployment guide in the frontend repository for detailed instructions on deploying both the frontend and backend components.
-### Deploying to Hugging Face Spaces
-When deploying to Hugging Face Spaces, make sure to:
-1. Set the following environment variables in the Space settings:
    - `TRANSFORMERS_CACHE=/tmp/huggingface_cache`
    - `HF_HOME=/tmp/huggingface_cache`
    - `HUGGINGFACE_HUB_CACHE=/tmp/huggingface_cache`
-2. Use the Docker SDK in your Space settings
-3. If you encounter memory issues, consider using a smaller model by changing the `model_name` in `summariser.py`
 ## Performance Optimizations
@@ -154,19 +161,39 @@ The API includes several performance optimizations:
 1. **Model Caching**: Models are loaded once and cached for subsequent requests
 2. **Result Caching**: Frequently requested summaries are cached to avoid redundant processing
 3. **Asynchronous Processing**: Long-running tasks are processed asynchronously
-## Development
-### Testing the API
-You can test the API endpoints using the built-in Swagger documentation at `/docs` when running locally.
-### Checking Transformers Installation
-To verify that the transformers library is installed correctly:
 ```bash
-python -m app.check_transformers
 ```
 ## License

 ## Features
+- **Text Summarization**: Generate concise summaries using BART-large-CNN model
+- **URL Content Extraction**: Automatically extract and process content from web pages
+- **Adjustable Parameters**: Control summary length (30-500 chars) and style
+- **Advanced Generation Options**: Temperature control (0.7-2.0) and sampling options
+- **Caching System**: Store results to improve performance and reduce redundant processing
+- **Status Monitoring**: Track model loading and summarization progress in real-time
+- **Error Handling**: Robust error handling for various input scenarios
+- **CORS Support**: Configured for cross-origin requests from the frontend
 ## API Endpoints
 - `POST /api/summarise` - Summarize text content
 - `POST /api/summarise-url` - Extract and summarize content from a URL
+- `GET /api/status` - Get the current status of the model and any running jobs
+- `GET /health` - Health check endpoint for monitoring
 ## Technology Stack
+- **Framework**: FastAPI for efficient API development
+- **NLP Models**: Hugging Face Transformers (BART-large-CNN)
 - **Web Scraping**: BeautifulSoup4 for extracting content from URLs
 - **HTTP Client**: HTTPX for asynchronous web requests
+- **ML Framework**: PyTorch for running the NLP models
+- **Testing**: Pytest for unit and integration testing
+- **Deployment**: Docker containers on Hugging Face Spaces
+## Project Structure
+```
+ai-content-summariser-api/
+├── app/
+│   ├── api/
+│   │   └── routes.py      # API endpoints
+│   ├── services/
+│   │   ├── summariser.py  # Text summarization service
+│   │   ├── url_extractor.py # URL content extraction
+│   │   └── cache.py       # Caching functionality
+│   └── check_transformers.py # Utility to verify model setup
+├── tests/
+│   ├── test_api.py        # API endpoint tests
+│   └── test_summariser.py # Summarizer service tests
+├── main.py                # Application entry point
+├── Dockerfile             # Docker configuration
+├── requirements.txt       # Python dependencies
+└── .env                   # Environment variables (not in repo)
+```
 ## Getting Started
 - Python (v3.8+)
 - pip
+- At least 4GB of RAM (8GB recommended for optimal performance)
+- GPU support (optional, but recommended for faster processing)
 ### Installation
 pip install -r requirements.txt
 ```
+### Environment Setup
+Create a `.env` file in the root directory with the following variables:
+```
+ENVIRONMENT=development
+CORS_ORIGINS=http://localhost:3000,https://ai-content-summariser.vercel.app
+TRANSFORMERS_CACHE=/path/to/cache  # Optional: custom cache location
+```
 ### Running Locally
 ```bash
 # Start the backend server
+uvicorn main:app --reload --host 0.0.0.0 --port 8000
 ```
+The API will be available at `http://localhost:8000`. You can access the API documentation at `http://localhost:8000/docs`.
 ## Testing
 The project includes a comprehensive test suite covering both unit and integration tests.
 ### Running Tests
 ```bash
 pytest -W ignore::FutureWarning -W ignore::UserWarning
 ```
+## Docker Deployment
 ```bash
 # Build and run with Docker
 docker run -p 8000:8000 ai-content-summariser-api
 ```
+## Deployment to Hugging Face Spaces
+When deploying to Hugging Face Spaces:
+1. Fork this repository to your Hugging Face account
+2. Set the following environment variables in the Space settings:
    - `TRANSFORMERS_CACHE=/tmp/huggingface_cache`
    - `HF_HOME=/tmp/huggingface_cache`
    - `HUGGINGFACE_HUB_CACHE=/tmp/huggingface_cache`
+   - `CORS_ORIGINS=https://ai-content-summariser.vercel.app,http://localhost:3000`
+3. Ensure the Space is configured to use the Docker SDK
+4. Your API will be available at `https://huggingface.co/spaces/your-username/ai-content-summariser-api`
 ## Performance Optimizations
 1. **Model Caching**: Models are loaded once and cached for subsequent requests
 2. **Result Caching**: Frequently requested summaries are cached to avoid redundant processing
 3. **Asynchronous Processing**: Long-running tasks are processed asynchronously
+4. **Text Preprocessing**: Input text is cleaned and normalized before processing
+5. **Batched Processing**: Large texts are processed in batches for better memory management
+## API Request Examples
+### Text Summarization
+```bash
+curl -X 'POST' \
+  'http://localhost:8000/api/summarise' \
+  -H 'Content-Type: application/json' \
+  -d '{
+    "text": "Your long text to summarize goes here...",
+    "max_length": 150,
+    "min_length": 50,
+    "do_sample": true,
+    "temperature": 1.2
+  }'
+```
+### URL Summarization
 ```bash
+curl -X 'POST' \
+  'http://localhost:8000/api/summarise-url' \
+  -H 'Content-Type: application/json' \
+  -d '{
+    "url": "https://example.com/article",
+    "max_length": 150,
+    "min_length": 50,
+    "do_sample": true,
+    "temperature": 1.2
+  }'
 ```
 ## License

app/api/__pycache__/routes.cpython-311.pyc CHANGED Viewed

Binary files a/app/api/__pycache__/routes.cpython-311.pyc and b/app/api/__pycache__/routes.cpython-311.pyc differ

app/services/__pycache__/url_extractor.cpython-311.pyc CHANGED Viewed

Binary files a/app/services/__pycache__/url_extractor.cpython-311.pyc and b/app/services/__pycache__/url_extractor.cpython-311.pyc differ