File size: 2,349 Bytes
e428629
 
 
 
 
 
 
 
 
 
 
 
b5c0331
 
 
 
 
e428629
b5c0331
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
---
title: Masked Word Predictor
emoji: 🌖
colorFrom: green
colorTo: indigo
sdk: gradio
sdk_version: 5.31.0
app_file: app.py
pinned: false
license: apache-2.0
short_description: Masked Word Predicto CPU
---
# 🔍 Masked Word Predictor  
[![Hugging Face Space](https://img.shields.io/badge/HuggingFace-Spaces-blue?logo=huggingface)](https://huggingface.co/spaces/your-username/masked-word-predictor)  
[![Gradio UI](https://img.shields.io/badge/Gradio-5.31.0-green?logo=gradio)]  
[![Model](https://img.shields.io/badge/Model-distilroberta--base-orange)](https://huggingface.co/distilroberta-base)  
[![License](https://img.shields.io/badge/License-MIT-lightgrey)](LICENSE)

---

## 🚀 Overview  
Tap into **Masked Language Modeling** with **DistilRoBERTa**—no training required.  
Type a sentence containing the special `[MASK]` token and get the model’s **top-K** completions instantly, all on **free CPU**.

> **Key AI concepts:**  
> • Masked Language Modeling (MLM) • Transformer-based NLP • Distilled Architectures • Real-time Inference • Edge Deployment • Cloud-native Demo

---

## ✨ Features

| 🔑 Feature                 | 🔍 Why It’s Cool                             |
|----------------------------|----------------------------------------------|
| **🧠 Transformer MLM**        | Uses DistilRoBERTa for lightning-fast fills  |
| **⚡ CPU-Only Inference**      | Runs on free-tier Space (2 vCPU / 16 GB RAM) |
| **🔢 Top-K Control**         | Slider to choose how many predictions to show |
| **🎨 Interactive UI**        | Gradio Blocks: input, button, and DataFrame  |
| **🔧 Zero-Config Deploy**     | Commit three files—Spaces auto-builds        |
| **💡 Educational Demos**       | Great for teaching how MLM works              |

---

## 🏗️ How It Works

1. **User Input** – Sentence with one or more `[MASK]` tokens.  
2. **MLM Pipeline**`pipeline("fill-mask")` computes token-level likelihoods.  
3. **Ranking** – Returns the top-K predicted tokens with scores.  
4. **UI Rendering** – Gradio shows each filled sentence and its confidence.

---

## 🛠️ Local Development

```bash
git clone https://github.com/your-username/masked-word-predictor.git
cd masked-word-predictor
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt
python app.py