YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

🧠 Text Summarization for Product Descriptions

A T5-small-based abstractive summarization model fine-tuned on synthetic product description data. This model generates concise summaries of detailed product descriptions, ideal for catalog optimization, e-commerce listings, and content generation.


✨ Model Highlights

  • πŸ“Œ Based on t5-small
  • πŸ§ͺ Fine-tuned on a synthetic dataset of 50+ product descriptions and their summaries
  • ⚑ Supports abstractive summarization of English product texts
  • 🧠 Built using Hugging Face Transformers and PyTorch

🧠 Intended Uses

  • βœ… Auto-generating product summaries for catalogs or online listings
  • βœ… Shortening verbose product descriptions for UI-friendly displays
  • βœ… Content creation support for e-commerce and marketing

🚫 Limitations

  • ❌ English-only (not trained for multilingual input)
  • 🧠 Cannot fact-check or verify real-world product details
  • πŸ§ͺ Trained on synthetic data β€” real-world generalization may be limited
  • ⚠️ May generate generic or repetitive summaries for complex inputs

πŸ‹οΈβ€β™‚οΈ Training Details

Attribute Value
Base Model t5-small
Dataset Custom synthetic CSV of product summaries
Input Field product_description
Target Field summary
Max Token Length 512 input / 64 summary
Epochs 3
Batch Size 4
Optimizer AdamW
Loss Function CrossEntropyLoss (via Trainer)
Framework PyTorch + Transformers
Hardware CUDA-enabled GPU

πŸ“Š Evaluation Metrics

Metric Score (Synthetic Eval)
ROUGE-1 24.49
ROUGE-2 22.10
ROUGE-L 24.47
ROUGE-lsum 24.46

πŸš€ Usage

from transformers import T5Tokenizer, T5ForConditionalGeneration
import torch

model_name = "your-username/Text-Summarization-for-Product-Descriptions"
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)
model.eval()

def summarize(text, model, tokenizer, max_input_length=512, max_output_length=64):
    model.eval()
    device = next(model.parameters()).device  # get device (cpu or cuda)
    input_text = "summarize: " + text.strip()
    inputs = tokenizer(
        input_text,
        return_tensors="pt",
        truncation=True,
        padding="max_length",
        max_length=max_input_length
    ).to(device)  # move inputs to device

    with torch.no_grad():
        summary_ids = model.generate(
            input_ids=inputs["input_ids"],
            attention_mask=inputs["attention_mask"],
            max_length=max_output_length,
            num_beams=4,
            early_stopping=True
        )

    summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
    return summary


# Example
text = "This sleek electric kettle features a 1.7-liter capacity, fast-boil tech, auto shut-off, and a 360-degree swivel base."
print("Summary:", summarize(text))

πŸ“ Repository Structure

.
β”œβ”€β”€ model/                    # Fine-tuned model files (pytorch_model.bin, config.json)
β”œβ”€β”€ tokenizer/                # Tokenizer config and vocab
β”œβ”€β”€ training_script.py        # Training code
β”œβ”€β”€ product_descriptions.csv # Source dataset
β”œβ”€β”€ utils.py                  # Preprocessing & summarization utilities
β”œβ”€β”€ README.md                 # Model card

🀝 Contributing

Feel free to raise issues or suggest improvements via pull requests. More training on real-world data and multilingual support is planned in future updates.

Downloads last month

-

Downloads are not tracked for this model. How to track
Safetensors
Model size
60.5M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support