|
# π§ Text Summarization for Product Descriptions |
|
|
|
A **T5-small-based** abstractive summarization model fine-tuned on synthetic product description data. This model generates concise summaries of detailed product descriptions, ideal for catalog optimization, e-commerce listings, and content generation. |
|
|
|
--- |
|
|
|
## β¨ Model Highlights |
|
|
|
- π Based on [`t5-small`](https://huggingface.co/t5-small) |
|
- π§ͺ Fine-tuned on a synthetic dataset of 50+ product descriptions and their summaries |
|
- β‘ Supports **abstractive summarization** of English product texts |
|
- π§ Built using **Hugging Face Transformers** and **PyTorch** |
|
|
|
--- |
|
|
|
## π§ Intended Uses |
|
|
|
- β
Auto-generating product summaries for catalogs or online listings |
|
- β
Shortening verbose product descriptions for UI-friendly displays |
|
- β
Content creation support for e-commerce and marketing |
|
|
|
--- |
|
|
|
## π« Limitations |
|
|
|
- β English-only (not trained for multilingual input) |
|
- π§ Cannot fact-check or verify real-world product details |
|
- π§ͺ Trained on synthetic data β real-world generalization may be limited |
|
- β οΈ May generate generic or repetitive summaries for complex inputs |
|
|
|
--- |
|
|
|
## ποΈββοΈ Training Details |
|
|
|
| Attribute | Value | |
|
|-------------------|-----------------------------------------------| |
|
| Base Model | `t5-small` | |
|
| Dataset | Custom synthetic CSV of product summaries | |
|
| Input Field | `product_description` | |
|
| Target Field | `summary` | |
|
| Max Token Length | 512 input / 64 summary | |
|
| Epochs | 3 | |
|
| Batch Size | 4 | |
|
| Optimizer | AdamW | |
|
| Loss Function | CrossEntropyLoss (via `Trainer`) | |
|
| Framework | PyTorch + Transformers | |
|
| Hardware | CUDA-enabled GPU | |
|
|
|
--- |
|
|
|
## π Evaluation Metrics |
|
|
|
| Metric | Score (Synthetic Eval) | |
|
|-----------|------------------------| |
|
| ROUGE-1 | 24.49 | |
|
| ROUGE-2 | 22.10 | |
|
| ROUGE-L | 24.47 | |
|
| ROUGE-lsum| 24.46 | |
|
|
|
--- |
|
|
|
## π Usage |
|
|
|
```python |
|
from transformers import T5Tokenizer, T5ForConditionalGeneration |
|
import torch |
|
|
|
model_name = "your-username/Text-Summarization-for-Product-Descriptions" |
|
tokenizer = T5Tokenizer.from_pretrained(model_name) |
|
model = T5ForConditionalGeneration.from_pretrained(model_name) |
|
model.eval() |
|
|
|
def summarize(text, model, tokenizer, max_input_length=512, max_output_length=64): |
|
model.eval() |
|
device = next(model.parameters()).device # get device (cpu or cuda) |
|
input_text = "summarize: " + text.strip() |
|
inputs = tokenizer( |
|
input_text, |
|
return_tensors="pt", |
|
truncation=True, |
|
padding="max_length", |
|
max_length=max_input_length |
|
).to(device) # move inputs to device |
|
|
|
with torch.no_grad(): |
|
summary_ids = model.generate( |
|
input_ids=inputs["input_ids"], |
|
attention_mask=inputs["attention_mask"], |
|
max_length=max_output_length, |
|
num_beams=4, |
|
early_stopping=True |
|
) |
|
|
|
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True) |
|
return summary |
|
|
|
|
|
# Example |
|
text = "This sleek electric kettle features a 1.7-liter capacity, fast-boil tech, auto shut-off, and a 360-degree swivel base." |
|
print("Summary:", summarize(text)) |
|
``` |
|
## π Repository Structure |
|
``` |
|
. |
|
βββ model/ # Fine-tuned model files (pytorch_model.bin, config.json) |
|
βββ tokenizer/ # Tokenizer config and vocab |
|
βββ training_script.py # Training code |
|
βββ product_descriptions.csv # Source dataset |
|
βββ utils.py # Preprocessing & summarization utilities |
|
βββ README.md # Model card |
|
``` |
|
## π€ Contributing |
|
Feel free to raise issues or suggest improvements via pull requests. More training on real-world data and multilingual support is planned in future updates. |