Spaces:
Sleeping
Sleeping
File size: 5,975 Bytes
3ee5ebe |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 |
# π Free H200 Training: Nano-Coder on Hugging Face
This guide shows you how to train a nano-coder model using **Hugging Face's free H200 GPU access** (4 minutes daily).
## π― What You Get
- **Free H200 GPU**: 4 minutes per day
- **No Credit Card Required**: Completely free
- **Easy Setup**: Just a few clicks
- **Model Sharing**: Automatic upload to HF Hub
## π Quick Start
### Option 1: Hugging Face Space (Recommended)
1. **Create HF Space:**
```bash
huggingface-cli repo create nano-coder-free --type space
```
2. **Upload Files:**
- Upload all the Python files to your space
- Make sure `app.py` is in the root directory
3. **Configure Space:**
- Set **Hardware**: H200 (free tier)
- Set **Python Version**: 3.9+
- Set **Requirements**: `requirements.txt`
4. **Launch Training:**
- Go to your space URL
- Click "π Start Free H200 Training"
- Wait for training to complete (3.5 minutes)
### Option 2: Local Setup with HF Free Tier
1. **Install Dependencies:**
```bash
pip install -r requirements.txt
```
2. **Set HF Token:**
```bash
export HF_TOKEN="your_token_here"
```
3. **Run Free Training:**
```bash
python hf_free_training.py
```
## π Model Configuration (Free Tier)
| Parameter | Free Tier | Full Model |
|-----------|-----------|------------|
| **Layers** | 6 | 12 |
| **Heads** | 6 | 12 |
| **Embedding** | 384 | 768 |
| **Context** | 512 | 1024 |
| **Parameters** | ~15M | ~124M |
| **Training Time** | 3.5 min | 2-4 hours |
## β° Time Management
- **Daily Limit**: 4 minutes of H200 time
- **Training Time**: 3.5 minutes (safe buffer)
- **Automatic Stop**: Script stops before time limit
- **Daily Reset**: New 4 minutes every day at midnight UTC
## π¨ Features
### Training Features
- β
**Automatic Time Tracking**: Stops before limit
- β
**Frequent Checkpoints**: Every 200 iterations
- β
**HF Hub Upload**: Models saved automatically
- β
**Wandb Logging**: Real-time metrics
- β
**Progress Monitoring**: Time remaining display
### Generation Features
- β
**Interactive UI**: Gradio interface
- β
**Custom Prompts**: Any Python code start
- β
**Adjustable Parameters**: Temperature, tokens
- β
**Real-time Generation**: Instant results
## π File Structure
```
nano-coder-free/
βββ app.py # HF Space app
βββ hf_free_training.py # Free H200 training script
βββ prepare_code_dataset.py # Dataset preparation
βββ sample_nano_coder.py # Code generation
βββ requirements.txt # Dependencies
βββ model.py # nanoGPT model
βββ configurator.py # Configuration
βββ README_free_H200.md # This file
```
## π§ Customization
### Adjust Training Parameters
Edit `hf_free_training.py`:
```python
# Model size (smaller = faster training)
n_layer = 4 # Even smaller
n_head = 4 # Even smaller
n_embd = 256 # Even smaller
# Training time (be conservative)
MAX_TRAINING_TIME = 3.0 * 60 # 3 minutes
# Batch size (larger = faster)
batch_size = 128 # If you have memory
```
### Change Dataset
```python
# In prepare_code_dataset.py
dataset = load_dataset("your-dataset") # Your own dataset
```
## π Expected Results
After 3.5 minutes of training on H200:
- **Training Loss**: ~2.5-3.0
- **Validation Loss**: ~2.8-3.3
- **Model Size**: ~15MB
- **Code Quality**: Basic Python functions
- **Iterations**: ~500-1000
## π― Use Cases
### Perfect For:
- β
**Learning**: Understand nanoGPT training
- β
**Prototyping**: Test ideas quickly
- β
**Experiments**: Try different configurations
- β
**Small Models**: Code generation demos
### Not Suitable For:
- β **Production**: Too small for real use
- β **Large Models**: Limited by time/parameters
- β **Long Training**: 4-minute daily limit
## π Daily Workflow
1. **Morning**: Check if you can train today
2. **Prepare**: Have your dataset ready
3. **Train**: Run 3.5-minute training session
4. **Test**: Generate some code samples
5. **Share**: Upload to HF Hub if good
6. **Wait**: Come back tomorrow for more training
## π¨ Troubleshooting
### Common Issues
1. **"Daily limit reached"**
- Wait until tomorrow
- Check your timezone
2. **"No GPU available"**
- H200 might be busy
- Try again in a few minutes
3. **"Training too slow"**
- Reduce model size
- Increase batch size
- Use smaller context
4. **"Out of memory"**
- Reduce batch_size
- Reduce block_size
- Reduce model size
### Performance Tips
- **Batch Size**: Use largest that fits in memory
- **Context Length**: 512 is good for free tier
- **Model Size**: 6 layers is optimal
- **Learning Rate**: 1e-3 for fast convergence
## π Monitoring
### Wandb Dashboard
- Real-time loss curves
- Training metrics
- Model performance
### HF Hub
- Model checkpoints
- Training logs
- Generated samples
### Local Files
- `out-nano-coder-free/ckpt.pt` - Latest model
- `daily_limit_YYYY-MM-DD.txt` - Usage tracking
## π Success Stories
Users have achieved:
- β
Basic Python function generation
- β
Simple class definitions
- β
List comprehensions
- β
Error handling patterns
- β
Docstring generation
## π Resources
- [Hugging Face Spaces](https://huggingface.co/spaces)
- [Free GPU Access](https://huggingface.co/docs/hub/spaces-sdks-docker-gpu)
- [NanoGPT Original](https://github.com/karpathy/nanoGPT)
- [Python Code Dataset](https://huggingface.co/datasets/flytech/python-codes-25k)
## π€ Contributing
Want to improve the free H200 setup?
1. **Optimize Model**: Make it train faster
2. **Better UI**: Improve the Gradio interface
3. **More Datasets**: Support other code datasets
4. **Documentation**: Help others get started
## π License
This project follows the same license as the original nanoGPT repository.
---
**Happy Free H200 Training! π**
Remember: 4 minutes a day keeps the AI doctor away! π |