vishal1364's picture
Create README.md
1381a59 verified
# 🧠 Text Summarization for Product Descriptions
A **T5-small-based** abstractive summarization model fine-tuned on synthetic product description data. This model generates concise summaries of detailed product descriptions, ideal for catalog optimization, e-commerce listings, and content generation.
---
## ✨ Model Highlights
- πŸ“Œ Based on [`t5-small`](https://huggingface.co/t5-small)
- πŸ§ͺ Fine-tuned on a synthetic dataset of 50+ product descriptions and their summaries
- ⚑ Supports **abstractive summarization** of English product texts
- 🧠 Built using **Hugging Face Transformers** and **PyTorch**
---
## 🧠 Intended Uses
- βœ… Auto-generating product summaries for catalogs or online listings
- βœ… Shortening verbose product descriptions for UI-friendly displays
- βœ… Content creation support for e-commerce and marketing
---
## 🚫 Limitations
- ❌ English-only (not trained for multilingual input)
- 🧠 Cannot fact-check or verify real-world product details
- πŸ§ͺ Trained on synthetic data β€” real-world generalization may be limited
- ⚠️ May generate generic or repetitive summaries for complex inputs
---
## πŸ‹οΈβ€β™‚οΈ Training Details
| Attribute | Value |
|-------------------|-----------------------------------------------|
| Base Model | `t5-small` |
| Dataset | Custom synthetic CSV of product summaries |
| Input Field | `product_description` |
| Target Field | `summary` |
| Max Token Length | 512 input / 64 summary |
| Epochs | 3 |
| Batch Size | 4 |
| Optimizer | AdamW |
| Loss Function | CrossEntropyLoss (via `Trainer`) |
| Framework | PyTorch + Transformers |
| Hardware | CUDA-enabled GPU |
---
## πŸ“Š Evaluation Metrics
| Metric | Score (Synthetic Eval) |
|-----------|------------------------|
| ROUGE-1 | 24.49 |
| ROUGE-2 | 22.10 |
| ROUGE-L | 24.47 |
| ROUGE-lsum| 24.46 |
---
## πŸš€ Usage
```python
from transformers import T5Tokenizer, T5ForConditionalGeneration
import torch
model_name = "your-username/Text-Summarization-for-Product-Descriptions"
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)
model.eval()
def summarize(text, model, tokenizer, max_input_length=512, max_output_length=64):
model.eval()
device = next(model.parameters()).device # get device (cpu or cuda)
input_text = "summarize: " + text.strip()
inputs = tokenizer(
input_text,
return_tensors="pt",
truncation=True,
padding="max_length",
max_length=max_input_length
).to(device) # move inputs to device
with torch.no_grad():
summary_ids = model.generate(
input_ids=inputs["input_ids"],
attention_mask=inputs["attention_mask"],
max_length=max_output_length,
num_beams=4,
early_stopping=True
)
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
return summary
# Example
text = "This sleek electric kettle features a 1.7-liter capacity, fast-boil tech, auto shut-off, and a 360-degree swivel base."
print("Summary:", summarize(text))
```
## πŸ“ Repository Structure
```
.
β”œβ”€β”€ model/ # Fine-tuned model files (pytorch_model.bin, config.json)
β”œβ”€β”€ tokenizer/ # Tokenizer config and vocab
β”œβ”€β”€ training_script.py # Training code
β”œβ”€β”€ product_descriptions.csv # Source dataset
β”œβ”€β”€ utils.py # Preprocessing & summarization utilities
β”œβ”€β”€ README.md # Model card
```
## 🀝 Contributing
Feel free to raise issues or suggest improvements via pull requests. More training on real-world data and multilingual support is planned in future updates.