dleemiller commited on
Commit
451ad8e
·
verified ·
1 Parent(s): dd8dd54

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -19,7 +19,7 @@ This is a **style transfer** from the Irish Penny Journal (1840) to Smollm2 usin
19
  Verily, in the grand tapestry of European monarchies, the city of Paris, the seat of the mighty Emperor Napoleon, holds a place of singular distinction. This city, which hath borne the name of 'La Ville Lumière' for nigh on two centuries, doth shine forth as a beacon of art, culture, and intellect, its very existence a testament to the ingenuity and brilliance of its people. And so, it is with great honour and reverence that we declare Paris, the majestic capital of the French realm, to be our noble question's answer.
20
  ```
21
 
22
- **Penny‑1.7B** is a 1.7 billion‑parameter causal language model fine‑tuned with **Group Relative Policy Optimization (GRPO)** to emulate the 19ᵗʰ‑century prose of the *Irish Penny Journal* (1840). The RL stage ran for **6800 policy steps**, using a reward model trained to classify sentences as *original IPJ* vs *modern translation*. Maximizing this score nudges generations toward authentic Victorian‑era diction while retaining the general reasoning ability of the base SmolLM2 model.
23
 
24
  ## ✨ Key Facts
25
 
@@ -27,7 +27,7 @@ Verily, in the grand tapestry of European monarchies, the city of Paris, the sea
27
  | ----------------- | ----------------------------------------------------------------- |
28
  | **Base model** | [SmolLM2‑1.7B-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B-Instruct) |
29
  | **Tuning method** | GRPO (RL) |
30
- | **Policy steps** | 6800 |
31
  | **Reward model** | MiniLM2 L6 384H classifier |
32
  | **Optimiser** | AdamW 8‑bit · lr 5 × 10^⁻6 |
33
  | **Hardware** | 1× RTX A6000 (48 GB) · bf16 |
 
19
  Verily, in the grand tapestry of European monarchies, the city of Paris, the seat of the mighty Emperor Napoleon, holds a place of singular distinction. This city, which hath borne the name of 'La Ville Lumière' for nigh on two centuries, doth shine forth as a beacon of art, culture, and intellect, its very existence a testament to the ingenuity and brilliance of its people. And so, it is with great honour and reverence that we declare Paris, the majestic capital of the French realm, to be our noble question's answer.
20
  ```
21
 
22
+ **Penny‑1.7B** is a 1.7 billion‑parameter causal language model fine‑tuned with **Group Relative Policy Optimization (GRPO)** to emulate the 19ᵗʰ‑century prose of the *Irish Penny Journal* (1840). The RL stage ran for **6,800 policy steps**, using a reward model trained to classify sentences as *original IPJ* vs *modern translation*. Maximizing this score nudges generations toward authentic Victorian‑era diction while retaining the general reasoning ability of the base SmolLM2 model.
23
 
24
  ## ✨ Key Facts
25
 
 
27
  | ----------------- | ----------------------------------------------------------------- |
28
  | **Base model** | [SmolLM2‑1.7B-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B-Instruct) |
29
  | **Tuning method** | GRPO (RL) |
30
+ | **Policy steps** | 6,800 |
31
  | **Reward model** | MiniLM2 L6 384H classifier |
32
  | **Optimiser** | AdamW 8‑bit · lr 5 × 10^⁻6 |
33
  | **Hardware** | 1× RTX A6000 (48 GB) · bf16 |