Spaces:

mlopez6132
/

nano-coder-free

Sleeping

File size: 5,975 Bytes

3ee5ebe

# 🆓 Free H200 Training: Nano-Coder on Hugging Face

This guide shows you how to train a nano-coder model using **Hugging Face's free H200 GPU access** (4 minutes daily).

## 🎯 What You Get

- **Free H200 GPU**: 4 minutes per day
- **No Credit Card Required**: Completely free
- **Easy Setup**: Just a few clicks
- **Model Sharing**: Automatic upload to HF Hub

## 🚀 Quick Start

### Option 1: Hugging Face Space (Recommended)

1. **Create HF Space:**
   ```bash
   huggingface-cli repo create nano-coder-free --type space
   ```

2. **Upload Files:**
   - Upload all the Python files to your space
   - Make sure `app.py` is in the root directory

3. **Configure Space:**
   - Set **Hardware**: H200 (free tier)
   - Set **Python Version**: 3.9+
   - Set **Requirements**: `requirements.txt`

4. **Launch Training:**
   - Go to your space URL
   - Click "🚀 Start Free H200 Training"
   - Wait for training to complete (3.5 minutes)

### Option 2: Local Setup with HF Free Tier

1. **Install Dependencies:**
   ```bash
   pip install -r requirements.txt
   ```

2. **Set HF Token:**
   ```bash
   export HF_TOKEN="your_token_here"
   ```

3. **Run Free Training:**
   ```bash
   python hf_free_training.py
   ```

## 📊 Model Configuration (Free Tier)

| Parameter | Free Tier | Full Model |
|-----------|-----------|------------|
| **Layers** | 6 | 12 |
| **Heads** | 6 | 12 |
| **Embedding** | 384 | 768 |
| **Context** | 512 | 1024 |
| **Parameters** | ~15M | ~124M |
| **Training Time** | 3.5 min | 2-4 hours |

## ⏰ Time Management

- **Daily Limit**: 4 minutes of H200 time
- **Training Time**: 3.5 minutes (safe buffer)
- **Automatic Stop**: Script stops before time limit
- **Daily Reset**: New 4 minutes every day at midnight UTC

## 🎨 Features

### Training Features
- ✅ **Automatic Time Tracking**: Stops before limit
- ✅ **Frequent Checkpoints**: Every 200 iterations
- ✅ **HF Hub Upload**: Models saved automatically
- ✅ **Wandb Logging**: Real-time metrics
- ✅ **Progress Monitoring**: Time remaining display

### Generation Features
- ✅ **Interactive UI**: Gradio interface
- ✅ **Custom Prompts**: Any Python code start
- ✅ **Adjustable Parameters**: Temperature, tokens
- ✅ **Real-time Generation**: Instant results

## 📁 File Structure

```
nano-coder-free/
├── app.py                    # HF Space app
├── hf_free_training.py       # Free H200 training script
├── prepare_code_dataset.py   # Dataset preparation
├── sample_nano_coder.py      # Code generation
├── requirements.txt          # Dependencies
├── model.py                  # nanoGPT model
├── configurator.py           # Configuration
└── README_free_H200.md       # This file
```

## 🔧 Customization

### Adjust Training Parameters

Edit `hf_free_training.py`:

```python
# Model size (smaller = faster training)
n_layer = 4      # Even smaller
n_head = 4       # Even smaller
n_embd = 256     # Even smaller

# Training time (be conservative)
MAX_TRAINING_TIME = 3.0 * 60  # 3 minutes

# Batch size (larger = faster)
batch_size = 128  # If you have memory
```

### Change Dataset

```python
# In prepare_code_dataset.py
dataset = load_dataset("your-dataset")  # Your own dataset
```

## 📈 Expected Results

After 3.5 minutes of training on H200:

- **Training Loss**: ~2.5-3.0
- **Validation Loss**: ~2.8-3.3
- **Model Size**: ~15MB
- **Code Quality**: Basic Python functions
- **Iterations**: ~500-1000

## 🎯 Use Cases

### Perfect For:
- ✅ **Learning**: Understand nanoGPT training
- ✅ **Prototyping**: Test ideas quickly
- ✅ **Experiments**: Try different configurations
- ✅ **Small Models**: Code generation demos

### Not Suitable For:
- ❌ **Production**: Too small for real use
- ❌ **Large Models**: Limited by time/parameters
- ❌ **Long Training**: 4-minute daily limit

## 🔄 Daily Workflow

1. **Morning**: Check if you can train today
2. **Prepare**: Have your dataset ready
3. **Train**: Run 3.5-minute training session
4. **Test**: Generate some code samples
5. **Share**: Upload to HF Hub if good
6. **Wait**: Come back tomorrow for more training

## 🚨 Troubleshooting

### Common Issues

1. **"Daily limit reached"**
   - Wait until tomorrow
   - Check your timezone

2. **"No GPU available"**
   - H200 might be busy
   - Try again in a few minutes

3. **"Training too slow"**
   - Reduce model size
   - Increase batch size
   - Use smaller context

4. **"Out of memory"**
   - Reduce batch_size
   - Reduce block_size
   - Reduce model size

### Performance Tips

- **Batch Size**: Use largest that fits in memory
- **Context Length**: 512 is good for free tier
- **Model Size**: 6 layers is optimal
- **Learning Rate**: 1e-3 for fast convergence

## 📊 Monitoring

### Wandb Dashboard
- Real-time loss curves
- Training metrics
- Model performance

### HF Hub
- Model checkpoints
- Training logs
- Generated samples

### Local Files
- `out-nano-coder-free/ckpt.pt` - Latest model
- `daily_limit_YYYY-MM-DD.txt` - Usage tracking

## 🎉 Success Stories

Users have achieved:
- ✅ Basic Python function generation
- ✅ Simple class definitions
- ✅ List comprehensions
- ✅ Error handling patterns
- ✅ Docstring generation

## 🔗 Resources

- [Hugging Face Spaces](https://huggingface.co/spaces)
- [Free GPU Access](https://huggingface.co/docs/hub/spaces-sdks-docker-gpu)
- [NanoGPT Original](https://github.com/karpathy/nanoGPT)
- [Python Code Dataset](https://huggingface.co/datasets/flytech/python-codes-25k)

## 🤝 Contributing

Want to improve the free H200 setup?

1. **Optimize Model**: Make it train faster
2. **Better UI**: Improve the Gradio interface
3. **More Datasets**: Support other code datasets
4. **Documentation**: Help others get started

## 📝 License

This project follows the same license as the original nanoGPT repository.

---

**Happy Free H200 Training! 🚀**

Remember: 4 minutes a day keeps the AI doctor away! 😄