vishal1364's picture
Create README.md
1381a59 verified

🧠 Text Summarization for Product Descriptions

A T5-small-based abstractive summarization model fine-tuned on synthetic product description data. This model generates concise summaries of detailed product descriptions, ideal for catalog optimization, e-commerce listings, and content generation.


✨ Model Highlights

  • πŸ“Œ Based on t5-small
  • πŸ§ͺ Fine-tuned on a synthetic dataset of 50+ product descriptions and their summaries
  • ⚑ Supports abstractive summarization of English product texts
  • 🧠 Built using Hugging Face Transformers and PyTorch

🧠 Intended Uses

  • βœ… Auto-generating product summaries for catalogs or online listings
  • βœ… Shortening verbose product descriptions for UI-friendly displays
  • βœ… Content creation support for e-commerce and marketing

🚫 Limitations

  • ❌ English-only (not trained for multilingual input)
  • 🧠 Cannot fact-check or verify real-world product details
  • πŸ§ͺ Trained on synthetic data β€” real-world generalization may be limited
  • ⚠️ May generate generic or repetitive summaries for complex inputs

πŸ‹οΈβ€β™‚οΈ Training Details

Attribute Value
Base Model t5-small
Dataset Custom synthetic CSV of product summaries
Input Field product_description
Target Field summary
Max Token Length 512 input / 64 summary
Epochs 3
Batch Size 4
Optimizer AdamW
Loss Function CrossEntropyLoss (via Trainer)
Framework PyTorch + Transformers
Hardware CUDA-enabled GPU

πŸ“Š Evaluation Metrics

Metric Score (Synthetic Eval)
ROUGE-1 24.49
ROUGE-2 22.10
ROUGE-L 24.47
ROUGE-lsum 24.46

πŸš€ Usage

from transformers import T5Tokenizer, T5ForConditionalGeneration
import torch

model_name = "your-username/Text-Summarization-for-Product-Descriptions"
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)
model.eval()

def summarize(text, model, tokenizer, max_input_length=512, max_output_length=64):
    model.eval()
    device = next(model.parameters()).device  # get device (cpu or cuda)
    input_text = "summarize: " + text.strip()
    inputs = tokenizer(
        input_text,
        return_tensors="pt",
        truncation=True,
        padding="max_length",
        max_length=max_input_length
    ).to(device)  # move inputs to device

    with torch.no_grad():
        summary_ids = model.generate(
            input_ids=inputs["input_ids"],
            attention_mask=inputs["attention_mask"],
            max_length=max_output_length,
            num_beams=4,
            early_stopping=True
        )

    summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
    return summary


# Example
text = "This sleek electric kettle features a 1.7-liter capacity, fast-boil tech, auto shut-off, and a 360-degree swivel base."
print("Summary:", summarize(text))

πŸ“ Repository Structure

.
β”œβ”€β”€ model/                    # Fine-tuned model files (pytorch_model.bin, config.json)
β”œβ”€β”€ tokenizer/                # Tokenizer config and vocab
β”œβ”€β”€ training_script.py        # Training code
β”œβ”€β”€ product_descriptions.csv # Source dataset
β”œβ”€β”€ utils.py                  # Preprocessing & summarization utilities
β”œβ”€β”€ README.md                 # Model card

🀝 Contributing

Feel free to raise issues or suggest improvements via pull requests. More training on real-world data and multilingual support is planned in future updates.