Dan Walsh
commited on
Commit
Β·
77a88ff
1
Parent(s):
6f0ac93
Updating README
Browse files
README.md
CHANGED
@@ -17,23 +17,52 @@ The frontend application is available in a separate repository: [ai-content-summ
|
|
17 |
|
18 |
## Features
|
19 |
|
20 |
-
- Text
|
21 |
-
- URL
|
22 |
-
- Adjustable
|
23 |
-
-
|
|
|
|
|
|
|
|
|
24 |
|
25 |
## API Endpoints
|
26 |
|
27 |
- `POST /api/summarise` - Summarize text content
|
28 |
- `POST /api/summarise-url` - Extract and summarize content from a URL
|
|
|
|
|
29 |
|
30 |
## Technology Stack
|
31 |
|
32 |
-
- **Framework**: FastAPI for efficient API
|
33 |
-
- **NLP Models**:
|
34 |
- **Web Scraping**: BeautifulSoup4 for extracting content from URLs
|
35 |
- **HTTP Client**: HTTPX for asynchronous web requests
|
36 |
-
- **
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
37 |
|
38 |
## Getting Started
|
39 |
|
@@ -41,6 +70,8 @@ The frontend application is available in a separate repository: [ai-content-summ
|
|
41 |
|
42 |
- Python (v3.8+)
|
43 |
- pip
|
|
|
|
|
44 |
|
45 |
### Installation
|
46 |
|
@@ -57,25 +88,29 @@ source venv/bin/activate # On Windows: venv\Scripts\activate
|
|
57 |
pip install -r requirements.txt
|
58 |
```
|
59 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
60 |
### Running Locally
|
61 |
|
62 |
```bash
|
63 |
# Start the backend server
|
64 |
-
uvicorn main:app --reload
|
65 |
```
|
66 |
|
67 |
-
The API will be available at `http://localhost:8000`.
|
68 |
|
69 |
## Testing
|
70 |
|
71 |
The project includes a comprehensive test suite covering both unit and integration tests.
|
72 |
|
73 |
-
### Installing Test Dependencies
|
74 |
-
|
75 |
-
```bash
|
76 |
-
pip install pytest pytest-cov httpx
|
77 |
-
```
|
78 |
-
|
79 |
### Running Tests
|
80 |
|
81 |
```bash
|
@@ -98,31 +133,7 @@ pytest tests/test_api.py
|
|
98 |
pytest -W ignore::FutureWarning -W ignore::UserWarning
|
99 |
```
|
100 |
|
101 |
-
|
102 |
-
|
103 |
-
- **Unit Tests**: Test individual components in isolation
|
104 |
-
- `tests/test_summariser.py`: Tests for the summarization service
|
105 |
-
|
106 |
-
- **Integration Tests**: Test API endpoints and component interactions
|
107 |
-
- `tests/test_api.py`: Tests for API endpoints
|
108 |
-
|
109 |
-
### Mocking Strategy
|
110 |
-
|
111 |
-
For faster and more reliable tests, we use mocking to avoid loading large ML models during testing:
|
112 |
-
|
113 |
-
```python
|
114 |
-
# Example of mocked test
|
115 |
-
def test_summariser_with_mock():
|
116 |
-
with patch('app.services.summariser.AutoTokenizer') as mock_tokenizer_class, \
|
117 |
-
patch('app.services.summariser.AutoModelForSeq2SeqLM') as mock_model_class:
|
118 |
-
# Test implementation...
|
119 |
-
```
|
120 |
-
|
121 |
-
### Continuous Integration
|
122 |
-
|
123 |
-
Tests are automatically run on pull requests and pushes to the main branch using GitHub Actions.
|
124 |
-
|
125 |
-
## Running with Docker
|
126 |
|
127 |
```bash
|
128 |
# Build and run with Docker
|
@@ -130,22 +141,18 @@ docker build -t ai-content-summariser-api .
|
|
130 |
docker run -p 8000:8000 ai-content-summariser-api
|
131 |
```
|
132 |
|
133 |
-
## Deployment
|
134 |
|
135 |
-
|
136 |
|
137 |
-
|
138 |
-
|
139 |
-
When deploying to Hugging Face Spaces, make sure to:
|
140 |
-
|
141 |
-
1. Set the following environment variables in the Space settings:
|
142 |
- `TRANSFORMERS_CACHE=/tmp/huggingface_cache`
|
143 |
- `HF_HOME=/tmp/huggingface_cache`
|
144 |
- `HUGGINGFACE_HUB_CACHE=/tmp/huggingface_cache`
|
145 |
-
|
146 |
-
|
147 |
-
|
148 |
-
3. If you encounter memory issues, consider using a smaller model by changing the `model_name` in `summariser.py`
|
149 |
|
150 |
## Performance Optimizations
|
151 |
|
@@ -154,19 +161,39 @@ The API includes several performance optimizations:
|
|
154 |
1. **Model Caching**: Models are loaded once and cached for subsequent requests
|
155 |
2. **Result Caching**: Frequently requested summaries are cached to avoid redundant processing
|
156 |
3. **Asynchronous Processing**: Long-running tasks are processed asynchronously
|
|
|
|
|
157 |
|
158 |
-
##
|
159 |
|
160 |
-
###
|
161 |
|
162 |
-
|
163 |
-
|
164 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
165 |
|
166 |
-
|
167 |
|
168 |
```bash
|
169 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
170 |
```
|
171 |
|
172 |
## License
|
|
|
17 |
|
18 |
## Features
|
19 |
|
20 |
+
- **Text Summarization**: Generate concise summaries using BART-large-CNN model
|
21 |
+
- **URL Content Extraction**: Automatically extract and process content from web pages
|
22 |
+
- **Adjustable Parameters**: Control summary length (30-500 chars) and style
|
23 |
+
- **Advanced Generation Options**: Temperature control (0.7-2.0) and sampling options
|
24 |
+
- **Caching System**: Store results to improve performance and reduce redundant processing
|
25 |
+
- **Status Monitoring**: Track model loading and summarization progress in real-time
|
26 |
+
- **Error Handling**: Robust error handling for various input scenarios
|
27 |
+
- **CORS Support**: Configured for cross-origin requests from the frontend
|
28 |
|
29 |
## API Endpoints
|
30 |
|
31 |
- `POST /api/summarise` - Summarize text content
|
32 |
- `POST /api/summarise-url` - Extract and summarize content from a URL
|
33 |
+
- `GET /api/status` - Get the current status of the model and any running jobs
|
34 |
+
- `GET /health` - Health check endpoint for monitoring
|
35 |
|
36 |
## Technology Stack
|
37 |
|
38 |
+
- **Framework**: FastAPI for efficient API development
|
39 |
+
- **NLP Models**: Hugging Face Transformers (BART-large-CNN)
|
40 |
- **Web Scraping**: BeautifulSoup4 for extracting content from URLs
|
41 |
- **HTTP Client**: HTTPX for asynchronous web requests
|
42 |
+
- **ML Framework**: PyTorch for running the NLP models
|
43 |
+
- **Testing**: Pytest for unit and integration testing
|
44 |
+
- **Deployment**: Docker containers on Hugging Face Spaces
|
45 |
+
|
46 |
+
## Project Structure
|
47 |
+
|
48 |
+
```
|
49 |
+
ai-content-summariser-api/
|
50 |
+
βββ app/
|
51 |
+
β βββ api/
|
52 |
+
β β βββ routes.py # API endpoints
|
53 |
+
β βββ services/
|
54 |
+
β β βββ summariser.py # Text summarization service
|
55 |
+
β β βββ url_extractor.py # URL content extraction
|
56 |
+
β β βββ cache.py # Caching functionality
|
57 |
+
β βββ check_transformers.py # Utility to verify model setup
|
58 |
+
βββ tests/
|
59 |
+
β βββ test_api.py # API endpoint tests
|
60 |
+
β βββ test_summariser.py # Summarizer service tests
|
61 |
+
βββ main.py # Application entry point
|
62 |
+
βββ Dockerfile # Docker configuration
|
63 |
+
βββ requirements.txt # Python dependencies
|
64 |
+
βββ .env # Environment variables (not in repo)
|
65 |
+
```
|
66 |
|
67 |
## Getting Started
|
68 |
|
|
|
70 |
|
71 |
- Python (v3.8+)
|
72 |
- pip
|
73 |
+
- At least 4GB of RAM (8GB recommended for optimal performance)
|
74 |
+
- GPU support (optional, but recommended for faster processing)
|
75 |
|
76 |
### Installation
|
77 |
|
|
|
88 |
pip install -r requirements.txt
|
89 |
```
|
90 |
|
91 |
+
### Environment Setup
|
92 |
+
|
93 |
+
Create a `.env` file in the root directory with the following variables:
|
94 |
+
|
95 |
+
```
|
96 |
+
ENVIRONMENT=development
|
97 |
+
CORS_ORIGINS=http://localhost:3000,https://ai-content-summariser.vercel.app
|
98 |
+
TRANSFORMERS_CACHE=/path/to/cache # Optional: custom cache location
|
99 |
+
```
|
100 |
+
|
101 |
### Running Locally
|
102 |
|
103 |
```bash
|
104 |
# Start the backend server
|
105 |
+
uvicorn main:app --reload --host 0.0.0.0 --port 8000
|
106 |
```
|
107 |
|
108 |
+
The API will be available at `http://localhost:8000`. You can access the API documentation at `http://localhost:8000/docs`.
|
109 |
|
110 |
## Testing
|
111 |
|
112 |
The project includes a comprehensive test suite covering both unit and integration tests.
|
113 |
|
|
|
|
|
|
|
|
|
|
|
|
|
114 |
### Running Tests
|
115 |
|
116 |
```bash
|
|
|
133 |
pytest -W ignore::FutureWarning -W ignore::UserWarning
|
134 |
```
|
135 |
|
136 |
+
## Docker Deployment
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
137 |
|
138 |
```bash
|
139 |
# Build and run with Docker
|
|
|
141 |
docker run -p 8000:8000 ai-content-summariser-api
|
142 |
```
|
143 |
|
144 |
+
## Deployment to Hugging Face Spaces
|
145 |
|
146 |
+
When deploying to Hugging Face Spaces:
|
147 |
|
148 |
+
1. Fork this repository to your Hugging Face account
|
149 |
+
2. Set the following environment variables in the Space settings:
|
|
|
|
|
|
|
150 |
- `TRANSFORMERS_CACHE=/tmp/huggingface_cache`
|
151 |
- `HF_HOME=/tmp/huggingface_cache`
|
152 |
- `HUGGINGFACE_HUB_CACHE=/tmp/huggingface_cache`
|
153 |
+
- `CORS_ORIGINS=https://ai-content-summariser.vercel.app,http://localhost:3000`
|
154 |
+
3. Ensure the Space is configured to use the Docker SDK
|
155 |
+
4. Your API will be available at `https://huggingface.co/spaces/your-username/ai-content-summariser-api`
|
|
|
156 |
|
157 |
## Performance Optimizations
|
158 |
|
|
|
161 |
1. **Model Caching**: Models are loaded once and cached for subsequent requests
|
162 |
2. **Result Caching**: Frequently requested summaries are cached to avoid redundant processing
|
163 |
3. **Asynchronous Processing**: Long-running tasks are processed asynchronously
|
164 |
+
4. **Text Preprocessing**: Input text is cleaned and normalized before processing
|
165 |
+
5. **Batched Processing**: Large texts are processed in batches for better memory management
|
166 |
|
167 |
+
## API Request Examples
|
168 |
|
169 |
+
### Text Summarization
|
170 |
|
171 |
+
```bash
|
172 |
+
curl -X 'POST' \
|
173 |
+
'http://localhost:8000/api/summarise' \
|
174 |
+
-H 'Content-Type: application/json' \
|
175 |
+
-d '{
|
176 |
+
"text": "Your long text to summarize goes here...",
|
177 |
+
"max_length": 150,
|
178 |
+
"min_length": 50,
|
179 |
+
"do_sample": true,
|
180 |
+
"temperature": 1.2
|
181 |
+
}'
|
182 |
+
```
|
183 |
|
184 |
+
### URL Summarization
|
185 |
|
186 |
```bash
|
187 |
+
curl -X 'POST' \
|
188 |
+
'http://localhost:8000/api/summarise-url' \
|
189 |
+
-H 'Content-Type: application/json' \
|
190 |
+
-d '{
|
191 |
+
"url": "https://example.com/article",
|
192 |
+
"max_length": 150,
|
193 |
+
"min_length": 50,
|
194 |
+
"do_sample": true,
|
195 |
+
"temperature": 1.2
|
196 |
+
}'
|
197 |
```
|
198 |
|
199 |
## License
|
app/api/__pycache__/routes.cpython-311.pyc
CHANGED
Binary files a/app/api/__pycache__/routes.cpython-311.pyc and b/app/api/__pycache__/routes.cpython-311.pyc differ
|
|
app/services/__pycache__/url_extractor.cpython-311.pyc
CHANGED
Binary files a/app/services/__pycache__/url_extractor.cpython-311.pyc and b/app/services/__pycache__/url_extractor.cpython-311.pyc differ
|
|