File size: 5,268 Bytes
02cdb93
 
 
 
 
 
 
 
 
 
23b0ba7
a39f176
 
02cdb93
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23b0ba7
 
e7ca9d6
cad42f4
5295cb7
cad42f4
 
f6b78d5
e7ca9d6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1217b67
 
 
 
e7ca9d6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1217b67
 
 
 
 
 
 
 
e7ca9d6
 
 
 
 
 
f6b78d5
 
 
e7ca9d6
 
 
 
 
 
1217b67
e7ca9d6
1217b67
f6b78d5
 
e7ca9d6
 
 
 
 
 
 
 
 
 
 
1217b67
 
 
 
 
e7ca9d6
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
---
license: mit
datasets:
- hotpotqa/hotpot_qa
- rajpurkar/squad
- allenai/openbookqa
- google/boolq
- ucinlp/drop
base_model:
- google-t5/t5-base
pipeline_tag: text2text-generation
widget:
  - text: "<extra_id_97>short answer <extra_id_98>easy <extra_id_99> The sun is the center of our solar system."
tags:
- chemistry
- biology
- textbook
- question_generation
- exam
- questions
- evaluation
- true_or_false
- multiple_choice_questions
- descriptive
- short_answer_questions
- long_answer
- problems
- quizzes
- physics
language:
- en
---




# Finetuned T5-Base Question Generator Model

This model is a fine-tuned T5 model designed specifically for **automatic question generation** from any given context or passage. It supports different types of questions like **short answer**, **multiple choice question**, and **true or false quesiton**, while also allowing customization by **difficulty level** β€” easy, medium or hard.

---

## Why is this Project Important?

Educational tools, tutoring platforms, and self-learning systems need a way to **generate relevant questions** automatically from content. Our model bridges that gap by providing a flexible and robust question generation system using a **structured prompt** format and powered by a **fine-tuned `T5-base` model**.
  
### Key Features

- Supports **multiple question types**:  
  - Short answer  
  - Multiple choice  
  - True/false  

- Questions are generated based on:  
  - The **provided context**  
  - The **type of question**  
  - The **difficulty level**  

- Difficulty reflects the **reasoning depth** required (multi-hop inference).

- Uses a **structured prompt format** with clearly defined tags, making it easy to use or integrate into other systems.

- Fine-tuned from the `t5-base` model:  
  - Lightweight and fast  
  - Easy to run on CPU  
  - Ideal for customization by teachers or Educational platforms

### Ideal For

- Teachers creating quizzes or exam material
- EdTech apps generating practice questions  
- Developers building interactive learning tools  
- Automated assessment and content enrichment

### Bonus: Retrieval-Augmented Generation (RAG)

A **custom RAG function** is provided in this github link 
https://github.com/Alla-Avinash/NLP-Question-Generation-with-RAG/blob/main/T5base_question_generation.py

This enables question generation from larger content sources like textbooks:

- Input can be a **subheading** or **small excerpt** from a textbook.
- The model fetches relevant supporting context form the textbook using a retirever.
- Generates questions grounded in the fetched material.

This extends the model beyond single-passage generation into more dynamic, scalable educational use cases.


---

## Prompt Format

To generate good quality questions, the model uses a **structured input prompt** format with special tokens. This helps the model understand the intent and expected output type.


### Prompt Fields:
- `<extra_id_97>` – followed by the **question type**  
  - `short answer`, `multiple choice question`, or `true or false question`
- `<extra_id_98>` – followed by the **difficulty**  
  - `easy`, `medium`, or `hard`
- `<extra_id_99>` – followed by **[optional answer] context** 
  - `optional answer` – for targeted question generation, or you can leave it as blank
  - `context` – the main passage/content from which questions are generated



### Helper Function to Create the Prompt

To simplify prompt construction, use this Python function:

```python
def format_prompt(qtype, difficulty, context, answer=""):
    """
    Format input prompt for question generation
    """
    answer_part = f"[{answer}]" if answer else ""
    return f"<extra_id_97>{qtype} <extra_id_98>{difficulty} <extra_id_99>{answer_part} {context}"

```

---

## Code & Fine-tuning Guide

If you want to see how the T5 base model is Finetuned, you can check out the below github link

https://github.com/Alla-Avinash/NLP-Question-Generation-with-RAG/blob/main/Finetune.ipynb

---

## How to Use the Model

```python
from transformers import T5Tokenizer, T5ForConditionalGeneration

# Load model from Hugging Face Hub
model_name = "Avinash250325/T5BaseQuestionGeneration"
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

# Format input prompt
def format_prompt(qtype, difficulty, context, answer=""):
    answer_part = f"[{answer}]" if answer else ""
    return f"<extra_id_97>{qtype} <extra_id_98>{difficulty} <extra_id_99>{answer_part} {context}"

# You can put any text here to create a question based on this context
context = "The sun is the center of our solar system."

qtype = "short answer"     # qtype: ("short answer", "multiple choice question", "true or false question")
difficulty = "easy"        # difficulty: ("easy", "medium", "hard")
prompt = format_prompt("short answer", "easy", context)

# Tokenize and generate
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=150)

# Decode output
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

```

---

### Try it out in the Huggingface Spaces (without the RAG implementation)

https://huggingface.co/spaces/Avinash250325/Question_Generation_with_RAG