YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
π§ Text Summarization for Product Descriptions
A T5-small-based abstractive summarization model fine-tuned on synthetic product description data. This model generates concise summaries of detailed product descriptions, ideal for catalog optimization, e-commerce listings, and content generation.
β¨ Model Highlights
- π Based on
t5-small
- π§ͺ Fine-tuned on a synthetic dataset of 50+ product descriptions and their summaries
- β‘ Supports abstractive summarization of English product texts
- π§ Built using Hugging Face Transformers and PyTorch
π§ Intended Uses
- β Auto-generating product summaries for catalogs or online listings
- β Shortening verbose product descriptions for UI-friendly displays
- β Content creation support for e-commerce and marketing
π« Limitations
- β English-only (not trained for multilingual input)
- π§ Cannot fact-check or verify real-world product details
- π§ͺ Trained on synthetic data β real-world generalization may be limited
- β οΈ May generate generic or repetitive summaries for complex inputs
ποΈββοΈ Training Details
Attribute | Value |
---|---|
Base Model | t5-small |
Dataset | Custom synthetic CSV of product summaries |
Input Field | product_description |
Target Field | summary |
Max Token Length | 512 input / 64 summary |
Epochs | 3 |
Batch Size | 4 |
Optimizer | AdamW |
Loss Function | CrossEntropyLoss (via Trainer ) |
Framework | PyTorch + Transformers |
Hardware | CUDA-enabled GPU |
π Evaluation Metrics
Metric | Score (Synthetic Eval) |
---|---|
ROUGE-1 | 24.49 |
ROUGE-2 | 22.10 |
ROUGE-L | 24.47 |
ROUGE-lsum | 24.46 |
π Usage
from transformers import T5Tokenizer, T5ForConditionalGeneration
import torch
model_name = "your-username/Text-Summarization-for-Product-Descriptions"
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)
model.eval()
def summarize(text, model, tokenizer, max_input_length=512, max_output_length=64):
model.eval()
device = next(model.parameters()).device # get device (cpu or cuda)
input_text = "summarize: " + text.strip()
inputs = tokenizer(
input_text,
return_tensors="pt",
truncation=True,
padding="max_length",
max_length=max_input_length
).to(device) # move inputs to device
with torch.no_grad():
summary_ids = model.generate(
input_ids=inputs["input_ids"],
attention_mask=inputs["attention_mask"],
max_length=max_output_length,
num_beams=4,
early_stopping=True
)
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
return summary
# Example
text = "This sleek electric kettle features a 1.7-liter capacity, fast-boil tech, auto shut-off, and a 360-degree swivel base."
print("Summary:", summarize(text))
π Repository Structure
.
βββ model/ # Fine-tuned model files (pytorch_model.bin, config.json)
βββ tokenizer/ # Tokenizer config and vocab
βββ training_script.py # Training code
βββ product_descriptions.csv # Source dataset
βββ utils.py # Preprocessing & summarization utilities
βββ README.md # Model card
π€ Contributing
Feel free to raise issues or suggest improvements via pull requests. More training on real-world data and multilingual support is planned in future updates.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support