Dan Walsh commited on
Commit
77a88ff
Β·
1 Parent(s): 6f0ac93

Updating README

Browse files
README.md CHANGED
@@ -17,23 +17,52 @@ The frontend application is available in a separate repository: [ai-content-summ
17
 
18
  ## Features
19
 
20
- - Text summarization using state-of-the-art NLP models (BART-large-CNN)
21
- - URL content extraction and summarization
22
- - Adjustable parameters for summary length and style
23
- - Efficient API endpoints with proper error handling
 
 
 
 
24
 
25
  ## API Endpoints
26
 
27
  - `POST /api/summarise` - Summarize text content
28
  - `POST /api/summarise-url` - Extract and summarize content from a URL
 
 
29
 
30
  ## Technology Stack
31
 
32
- - **Framework**: FastAPI for efficient API endpoints
33
- - **NLP Models**: Transformer-based models (BART) for summarisation
34
  - **Web Scraping**: BeautifulSoup4 for extracting content from URLs
35
  - **HTTP Client**: HTTPX for asynchronous web requests
36
- - **Deployment**: Hugging Face Spaces or Docker containers
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
 
38
  ## Getting Started
39
 
@@ -41,6 +70,8 @@ The frontend application is available in a separate repository: [ai-content-summ
41
 
42
  - Python (v3.8+)
43
  - pip
 
 
44
 
45
  ### Installation
46
 
@@ -57,25 +88,29 @@ source venv/bin/activate # On Windows: venv\Scripts\activate
57
  pip install -r requirements.txt
58
  ```
59
 
 
 
 
 
 
 
 
 
 
 
60
  ### Running Locally
61
 
62
  ```bash
63
  # Start the backend server
64
- uvicorn main:app --reload
65
  ```
66
 
67
- The API will be available at `http://localhost:8000`.
68
 
69
  ## Testing
70
 
71
  The project includes a comprehensive test suite covering both unit and integration tests.
72
 
73
- ### Installing Test Dependencies
74
-
75
- ```bash
76
- pip install pytest pytest-cov httpx
77
- ```
78
-
79
  ### Running Tests
80
 
81
  ```bash
@@ -98,31 +133,7 @@ pytest tests/test_api.py
98
  pytest -W ignore::FutureWarning -W ignore::UserWarning
99
  ```
100
 
101
- ### Test Structure
102
-
103
- - **Unit Tests**: Test individual components in isolation
104
- - `tests/test_summariser.py`: Tests for the summarization service
105
-
106
- - **Integration Tests**: Test API endpoints and component interactions
107
- - `tests/test_api.py`: Tests for API endpoints
108
-
109
- ### Mocking Strategy
110
-
111
- For faster and more reliable tests, we use mocking to avoid loading large ML models during testing:
112
-
113
- ```python
114
- # Example of mocked test
115
- def test_summariser_with_mock():
116
- with patch('app.services.summariser.AutoTokenizer') as mock_tokenizer_class, \
117
- patch('app.services.summariser.AutoModelForSeq2SeqLM') as mock_model_class:
118
- # Test implementation...
119
- ```
120
-
121
- ### Continuous Integration
122
-
123
- Tests are automatically run on pull requests and pushes to the main branch using GitHub Actions.
124
-
125
- ## Running with Docker
126
 
127
  ```bash
128
  # Build and run with Docker
@@ -130,22 +141,18 @@ docker build -t ai-content-summariser-api .
130
  docker run -p 8000:8000 ai-content-summariser-api
131
  ```
132
 
133
- ## Deployment
134
 
135
- See the deployment guide in the frontend repository for detailed instructions on deploying both the frontend and backend components.
136
 
137
- ### Deploying to Hugging Face Spaces
138
-
139
- When deploying to Hugging Face Spaces, make sure to:
140
-
141
- 1. Set the following environment variables in the Space settings:
142
  - `TRANSFORMERS_CACHE=/tmp/huggingface_cache`
143
  - `HF_HOME=/tmp/huggingface_cache`
144
  - `HUGGINGFACE_HUB_CACHE=/tmp/huggingface_cache`
145
-
146
- 2. Use the Docker SDK in your Space settings
147
-
148
- 3. If you encounter memory issues, consider using a smaller model by changing the `model_name` in `summariser.py`
149
 
150
  ## Performance Optimizations
151
 
@@ -154,19 +161,39 @@ The API includes several performance optimizations:
154
  1. **Model Caching**: Models are loaded once and cached for subsequent requests
155
  2. **Result Caching**: Frequently requested summaries are cached to avoid redundant processing
156
  3. **Asynchronous Processing**: Long-running tasks are processed asynchronously
 
 
157
 
158
- ## Development
159
 
160
- ### Testing the API
161
 
162
- You can test the API endpoints using the built-in Swagger documentation at `/docs` when running locally.
163
-
164
- ### Checking Transformers Installation
 
 
 
 
 
 
 
 
 
165
 
166
- To verify that the transformers library is installed correctly:
167
 
168
  ```bash
169
- python -m app.check_transformers
 
 
 
 
 
 
 
 
 
170
  ```
171
 
172
  ## License
 
17
 
18
  ## Features
19
 
20
+ - **Text Summarization**: Generate concise summaries using BART-large-CNN model
21
+ - **URL Content Extraction**: Automatically extract and process content from web pages
22
+ - **Adjustable Parameters**: Control summary length (30-500 chars) and style
23
+ - **Advanced Generation Options**: Temperature control (0.7-2.0) and sampling options
24
+ - **Caching System**: Store results to improve performance and reduce redundant processing
25
+ - **Status Monitoring**: Track model loading and summarization progress in real-time
26
+ - **Error Handling**: Robust error handling for various input scenarios
27
+ - **CORS Support**: Configured for cross-origin requests from the frontend
28
 
29
  ## API Endpoints
30
 
31
  - `POST /api/summarise` - Summarize text content
32
  - `POST /api/summarise-url` - Extract and summarize content from a URL
33
+ - `GET /api/status` - Get the current status of the model and any running jobs
34
+ - `GET /health` - Health check endpoint for monitoring
35
 
36
  ## Technology Stack
37
 
38
+ - **Framework**: FastAPI for efficient API development
39
+ - **NLP Models**: Hugging Face Transformers (BART-large-CNN)
40
  - **Web Scraping**: BeautifulSoup4 for extracting content from URLs
41
  - **HTTP Client**: HTTPX for asynchronous web requests
42
+ - **ML Framework**: PyTorch for running the NLP models
43
+ - **Testing**: Pytest for unit and integration testing
44
+ - **Deployment**: Docker containers on Hugging Face Spaces
45
+
46
+ ## Project Structure
47
+
48
+ ```
49
+ ai-content-summariser-api/
50
+ β”œβ”€β”€ app/
51
+ β”‚ β”œβ”€β”€ api/
52
+ β”‚ β”‚ └── routes.py # API endpoints
53
+ β”‚ β”œβ”€β”€ services/
54
+ β”‚ β”‚ β”œβ”€β”€ summariser.py # Text summarization service
55
+ β”‚ β”‚ β”œβ”€β”€ url_extractor.py # URL content extraction
56
+ β”‚ β”‚ └── cache.py # Caching functionality
57
+ β”‚ └── check_transformers.py # Utility to verify model setup
58
+ β”œβ”€β”€ tests/
59
+ β”‚ β”œβ”€β”€ test_api.py # API endpoint tests
60
+ β”‚ └── test_summariser.py # Summarizer service tests
61
+ β”œβ”€β”€ main.py # Application entry point
62
+ β”œβ”€β”€ Dockerfile # Docker configuration
63
+ β”œβ”€β”€ requirements.txt # Python dependencies
64
+ └── .env # Environment variables (not in repo)
65
+ ```
66
 
67
  ## Getting Started
68
 
 
70
 
71
  - Python (v3.8+)
72
  - pip
73
+ - At least 4GB of RAM (8GB recommended for optimal performance)
74
+ - GPU support (optional, but recommended for faster processing)
75
 
76
  ### Installation
77
 
 
88
  pip install -r requirements.txt
89
  ```
90
 
91
+ ### Environment Setup
92
+
93
+ Create a `.env` file in the root directory with the following variables:
94
+
95
+ ```
96
+ ENVIRONMENT=development
97
+ CORS_ORIGINS=http://localhost:3000,https://ai-content-summariser.vercel.app
98
+ TRANSFORMERS_CACHE=/path/to/cache # Optional: custom cache location
99
+ ```
100
+
101
  ### Running Locally
102
 
103
  ```bash
104
  # Start the backend server
105
+ uvicorn main:app --reload --host 0.0.0.0 --port 8000
106
  ```
107
 
108
+ The API will be available at `http://localhost:8000`. You can access the API documentation at `http://localhost:8000/docs`.
109
 
110
  ## Testing
111
 
112
  The project includes a comprehensive test suite covering both unit and integration tests.
113
 
 
 
 
 
 
 
114
  ### Running Tests
115
 
116
  ```bash
 
133
  pytest -W ignore::FutureWarning -W ignore::UserWarning
134
  ```
135
 
136
+ ## Docker Deployment
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
137
 
138
  ```bash
139
  # Build and run with Docker
 
141
  docker run -p 8000:8000 ai-content-summariser-api
142
  ```
143
 
144
+ ## Deployment to Hugging Face Spaces
145
 
146
+ When deploying to Hugging Face Spaces:
147
 
148
+ 1. Fork this repository to your Hugging Face account
149
+ 2. Set the following environment variables in the Space settings:
 
 
 
150
  - `TRANSFORMERS_CACHE=/tmp/huggingface_cache`
151
  - `HF_HOME=/tmp/huggingface_cache`
152
  - `HUGGINGFACE_HUB_CACHE=/tmp/huggingface_cache`
153
+ - `CORS_ORIGINS=https://ai-content-summariser.vercel.app,http://localhost:3000`
154
+ 3. Ensure the Space is configured to use the Docker SDK
155
+ 4. Your API will be available at `https://huggingface.co/spaces/your-username/ai-content-summariser-api`
 
156
 
157
  ## Performance Optimizations
158
 
 
161
  1. **Model Caching**: Models are loaded once and cached for subsequent requests
162
  2. **Result Caching**: Frequently requested summaries are cached to avoid redundant processing
163
  3. **Asynchronous Processing**: Long-running tasks are processed asynchronously
164
+ 4. **Text Preprocessing**: Input text is cleaned and normalized before processing
165
+ 5. **Batched Processing**: Large texts are processed in batches for better memory management
166
 
167
+ ## API Request Examples
168
 
169
+ ### Text Summarization
170
 
171
+ ```bash
172
+ curl -X 'POST' \
173
+ 'http://localhost:8000/api/summarise' \
174
+ -H 'Content-Type: application/json' \
175
+ -d '{
176
+ "text": "Your long text to summarize goes here...",
177
+ "max_length": 150,
178
+ "min_length": 50,
179
+ "do_sample": true,
180
+ "temperature": 1.2
181
+ }'
182
+ ```
183
 
184
+ ### URL Summarization
185
 
186
  ```bash
187
+ curl -X 'POST' \
188
+ 'http://localhost:8000/api/summarise-url' \
189
+ -H 'Content-Type: application/json' \
190
+ -d '{
191
+ "url": "https://example.com/article",
192
+ "max_length": 150,
193
+ "min_length": 50,
194
+ "do_sample": true,
195
+ "temperature": 1.2
196
+ }'
197
  ```
198
 
199
  ## License
app/api/__pycache__/routes.cpython-311.pyc CHANGED
Binary files a/app/api/__pycache__/routes.cpython-311.pyc and b/app/api/__pycache__/routes.cpython-311.pyc differ
 
app/services/__pycache__/url_extractor.cpython-311.pyc CHANGED
Binary files a/app/services/__pycache__/url_extractor.cpython-311.pyc and b/app/services/__pycache__/url_extractor.cpython-311.pyc differ