File size: 4,266 Bytes
1381a59
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
# 🧠 Text Summarization for Product Descriptions

A **T5-small-based** abstractive summarization model fine-tuned on synthetic product description data. This model generates concise summaries of detailed product descriptions, ideal for catalog optimization, e-commerce listings, and content generation.

---

## ✨ Model Highlights

- πŸ“Œ Based on [`t5-small`](https://huggingface.co/t5-small)
- πŸ§ͺ Fine-tuned on a synthetic dataset of 50+ product descriptions and their summaries
- ⚑ Supports **abstractive summarization** of English product texts
- 🧠 Built using **Hugging Face Transformers** and **PyTorch**

---

## 🧠 Intended Uses

- βœ… Auto-generating product summaries for catalogs or online listings
- βœ… Shortening verbose product descriptions for UI-friendly displays
- βœ… Content creation support for e-commerce and marketing

---

## 🚫 Limitations

- ❌ English-only (not trained for multilingual input)
- 🧠 Cannot fact-check or verify real-world product details
- πŸ§ͺ Trained on synthetic data β€” real-world generalization may be limited
- ⚠️ May generate generic or repetitive summaries for complex inputs

---

## πŸ‹οΈβ€β™‚οΈ Training Details

| Attribute          | Value                                         |
|-------------------|-----------------------------------------------|
| Base Model         | `t5-small`                                   |
| Dataset            | Custom synthetic CSV of product summaries    |
| Input Field        | `product_description`                        |
| Target Field       | `summary`                                    |
| Max Token Length   | 512 input / 64 summary                        |
| Epochs             | 3                                             |
| Batch Size         | 4                                             |
| Optimizer          | AdamW                                         |
| Loss Function      | CrossEntropyLoss (via `Trainer`)             |
| Framework          | PyTorch + Transformers                       |
| Hardware           | CUDA-enabled GPU                             |

---

## πŸ“Š Evaluation Metrics

| Metric    | Score (Synthetic Eval) |
|-----------|------------------------|
| ROUGE-1   | 24.49                   |
| ROUGE-2   | 22.10                   |
| ROUGE-L   | 24.47                   |
| ROUGE-lsum| 24.46                   |

---

## πŸš€ Usage

```python
from transformers import T5Tokenizer, T5ForConditionalGeneration
import torch

model_name = "your-username/Text-Summarization-for-Product-Descriptions"
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)
model.eval()

def summarize(text, model, tokenizer, max_input_length=512, max_output_length=64):
    model.eval()
    device = next(model.parameters()).device  # get device (cpu or cuda)
    input_text = "summarize: " + text.strip()
    inputs = tokenizer(
        input_text,
        return_tensors="pt",
        truncation=True,
        padding="max_length",
        max_length=max_input_length
    ).to(device)  # move inputs to device

    with torch.no_grad():
        summary_ids = model.generate(
            input_ids=inputs["input_ids"],
            attention_mask=inputs["attention_mask"],
            max_length=max_output_length,
            num_beams=4,
            early_stopping=True
        )

    summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
    return summary


# Example
text = "This sleek electric kettle features a 1.7-liter capacity, fast-boil tech, auto shut-off, and a 360-degree swivel base."
print("Summary:", summarize(text))
```
## πŸ“ Repository Structure
```
.
β”œβ”€β”€ model/                    # Fine-tuned model files (pytorch_model.bin, config.json)
β”œβ”€β”€ tokenizer/                # Tokenizer config and vocab
β”œβ”€β”€ training_script.py        # Training code
β”œβ”€β”€ product_descriptions.csv # Source dataset
β”œβ”€β”€ utils.py                  # Preprocessing & summarization utilities
β”œβ”€β”€ README.md                 # Model card
```
## 🀝 Contributing
Feel free to raise issues or suggest improvements via pull requests. More training on real-world data and multilingual support is planned in future updates.