nano-coder-free / README_free_H200.md
mlopez6132's picture
Upload README_free_H200.md with huggingface_hub
3ee5ebe verified
# πŸ†“ Free H200 Training: Nano-Coder on Hugging Face
This guide shows you how to train a nano-coder model using **Hugging Face's free H200 GPU access** (4 minutes daily).
## 🎯 What You Get
- **Free H200 GPU**: 4 minutes per day
- **No Credit Card Required**: Completely free
- **Easy Setup**: Just a few clicks
- **Model Sharing**: Automatic upload to HF Hub
## πŸš€ Quick Start
### Option 1: Hugging Face Space (Recommended)
1. **Create HF Space:**
```bash
huggingface-cli repo create nano-coder-free --type space
```
2. **Upload Files:**
- Upload all the Python files to your space
- Make sure `app.py` is in the root directory
3. **Configure Space:**
- Set **Hardware**: H200 (free tier)
- Set **Python Version**: 3.9+
- Set **Requirements**: `requirements.txt`
4. **Launch Training:**
- Go to your space URL
- Click "πŸš€ Start Free H200 Training"
- Wait for training to complete (3.5 minutes)
### Option 2: Local Setup with HF Free Tier
1. **Install Dependencies:**
```bash
pip install -r requirements.txt
```
2. **Set HF Token:**
```bash
export HF_TOKEN="your_token_here"
```
3. **Run Free Training:**
```bash
python hf_free_training.py
```
## πŸ“Š Model Configuration (Free Tier)
| Parameter | Free Tier | Full Model |
|-----------|-----------|------------|
| **Layers** | 6 | 12 |
| **Heads** | 6 | 12 |
| **Embedding** | 384 | 768 |
| **Context** | 512 | 1024 |
| **Parameters** | ~15M | ~124M |
| **Training Time** | 3.5 min | 2-4 hours |
## ⏰ Time Management
- **Daily Limit**: 4 minutes of H200 time
- **Training Time**: 3.5 minutes (safe buffer)
- **Automatic Stop**: Script stops before time limit
- **Daily Reset**: New 4 minutes every day at midnight UTC
## 🎨 Features
### Training Features
- βœ… **Automatic Time Tracking**: Stops before limit
- βœ… **Frequent Checkpoints**: Every 200 iterations
- βœ… **HF Hub Upload**: Models saved automatically
- βœ… **Wandb Logging**: Real-time metrics
- βœ… **Progress Monitoring**: Time remaining display
### Generation Features
- βœ… **Interactive UI**: Gradio interface
- βœ… **Custom Prompts**: Any Python code start
- βœ… **Adjustable Parameters**: Temperature, tokens
- βœ… **Real-time Generation**: Instant results
## πŸ“ File Structure
```
nano-coder-free/
β”œβ”€β”€ app.py # HF Space app
β”œβ”€β”€ hf_free_training.py # Free H200 training script
β”œβ”€β”€ prepare_code_dataset.py # Dataset preparation
β”œβ”€β”€ sample_nano_coder.py # Code generation
β”œβ”€β”€ requirements.txt # Dependencies
β”œβ”€β”€ model.py # nanoGPT model
β”œβ”€β”€ configurator.py # Configuration
└── README_free_H200.md # This file
```
## πŸ”§ Customization
### Adjust Training Parameters
Edit `hf_free_training.py`:
```python
# Model size (smaller = faster training)
n_layer = 4 # Even smaller
n_head = 4 # Even smaller
n_embd = 256 # Even smaller
# Training time (be conservative)
MAX_TRAINING_TIME = 3.0 * 60 # 3 minutes
# Batch size (larger = faster)
batch_size = 128 # If you have memory
```
### Change Dataset
```python
# In prepare_code_dataset.py
dataset = load_dataset("your-dataset") # Your own dataset
```
## πŸ“ˆ Expected Results
After 3.5 minutes of training on H200:
- **Training Loss**: ~2.5-3.0
- **Validation Loss**: ~2.8-3.3
- **Model Size**: ~15MB
- **Code Quality**: Basic Python functions
- **Iterations**: ~500-1000
## 🎯 Use Cases
### Perfect For:
- βœ… **Learning**: Understand nanoGPT training
- βœ… **Prototyping**: Test ideas quickly
- βœ… **Experiments**: Try different configurations
- βœ… **Small Models**: Code generation demos
### Not Suitable For:
- ❌ **Production**: Too small for real use
- ❌ **Large Models**: Limited by time/parameters
- ❌ **Long Training**: 4-minute daily limit
## πŸ”„ Daily Workflow
1. **Morning**: Check if you can train today
2. **Prepare**: Have your dataset ready
3. **Train**: Run 3.5-minute training session
4. **Test**: Generate some code samples
5. **Share**: Upload to HF Hub if good
6. **Wait**: Come back tomorrow for more training
## 🚨 Troubleshooting
### Common Issues
1. **"Daily limit reached"**
- Wait until tomorrow
- Check your timezone
2. **"No GPU available"**
- H200 might be busy
- Try again in a few minutes
3. **"Training too slow"**
- Reduce model size
- Increase batch size
- Use smaller context
4. **"Out of memory"**
- Reduce batch_size
- Reduce block_size
- Reduce model size
### Performance Tips
- **Batch Size**: Use largest that fits in memory
- **Context Length**: 512 is good for free tier
- **Model Size**: 6 layers is optimal
- **Learning Rate**: 1e-3 for fast convergence
## πŸ“Š Monitoring
### Wandb Dashboard
- Real-time loss curves
- Training metrics
- Model performance
### HF Hub
- Model checkpoints
- Training logs
- Generated samples
### Local Files
- `out-nano-coder-free/ckpt.pt` - Latest model
- `daily_limit_YYYY-MM-DD.txt` - Usage tracking
## πŸŽ‰ Success Stories
Users have achieved:
- βœ… Basic Python function generation
- βœ… Simple class definitions
- βœ… List comprehensions
- βœ… Error handling patterns
- βœ… Docstring generation
## πŸ”— Resources
- [Hugging Face Spaces](https://huggingface.co/spaces)
- [Free GPU Access](https://huggingface.co/docs/hub/spaces-sdks-docker-gpu)
- [NanoGPT Original](https://github.com/karpathy/nanoGPT)
- [Python Code Dataset](https://huggingface.co/datasets/flytech/python-codes-25k)
## 🀝 Contributing
Want to improve the free H200 setup?
1. **Optimize Model**: Make it train faster
2. **Better UI**: Improve the Gradio interface
3. **More Datasets**: Support other code datasets
4. **Documentation**: Help others get started
## πŸ“ License
This project follows the same license as the original nanoGPT repository.
---
**Happy Free H200 Training! πŸš€**
Remember: 4 minutes a day keeps the AI doctor away! πŸ˜„