Phi-4 Mini Instruct - Fine-tuned on Dolly 15K (MLX)

This model is a fine-tuned version of microsoft/Phi-4-mini-instruct using Apple's MLX framework, trained on the Databricks Dolly 15K instruction dataset.

Model Description

This model was fine tuned for educational purposes only and attempts to enhance Phi-4 Mini's instruction-following capabilities by fine-tuning it on 15,000 high-quality human-generated instruction-following examples from the Dolly dataset. The fine-tuning was performed using LoRA (Low-Rank Adaptation) on Apple Silicon hardware, making it optimized for deployment on Mac devices.

Key Features

  • Base Model: Phi-4-mini-instruct (3.8B parameters)
  • Fine-tuning Method: LoRA with MLX
  • Training Dataset: Dolly 15K (12,000 training, 3,000 validation examples)
  • Optimized for: Apple Silicon (M1, M2, M3, M4)
  • License: MIT

Intended Uses & Limitations

Intended Uses

  • General instruction following and question answering
  • Educational applications
  • Content generation
  • Code assistance
  • Creative writing tasks

Limitations

  • The model inherits limitations from the base Phi-4 model
  • Performance is optimized for Apple Silicon; may have different characteristics on other hardware
  • Should not be used for critical decision-making without human oversight
  • May exhibit biases present in the training data

Training Details

Training Configuration

  • LoRA Rank: 64
  • LoRA Alpha: 16
  • LoRA Dropout: 0.1
  • Target Layers: 16
  • Learning Rate: 1e-4
  • Batch Size: 2
  • Training Iterations: 1,000
  • Max Sequence Length: 2,048

Training Hardware

  • Trained on Apple Silicon (M-series chip)
  • Peak memory usage: ~8GB
  • Training time: 25-60 minutes depending on specific chip

Training Data

The model was fine-tuned on the Databricks Dolly 15K dataset, which consists of:

  • Total examples: 15,000 instruction-following records
  • Training split: 12,000 examples (80%)
  • Validation split: 3,000 examples (20%)
  • Categories: Brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization

Performance

Inference Speed (on Apple Silicon)

  • Prompt processing: ~240-315 tokens/second
  • Generation: ~66-73 tokens/second
  • Peak memory usage: 7.7-8.0 GB

Example Outputs

Machine Learning Explanation:

"Machine learning is when our computers learn from experience by looking at examples of things that have already been done, and then apply that knowledge to new situations. For example, if we want to teach a computer to recognize cats, we can show it many pictures of cats and tell it 'this is a cat', and it will eventually be able to recognize when it sees a cat in a picture."

Renewable Energy Benefits:

"Renewable energy is sustainable and minimizes carbon emissions. It is readily available in most locations and can be exploited in the smallest of communities. It can be used for various purposes, such as generating heat and electricity and powering electric vehicles. Renewable energy is a clean fuel source. It helps to reduce air pollution and the risk of climate change."

How to Use

With MLX

from mlx_lm import load, generate

# Load the model
model, tokenizer = load("your-username/phi-4-mini-instruct-dolly-15k-mlx")

# Generate text
prompt = "<|user|>\nExplain quantum computing in simple terms<|end|>\n<|assistant|>\n"
response = generate(model, tokenizer, prompt=prompt, max_tokens=300)
print(response)

Chat Format

The model uses Phi-4's chat template with special tokens:

  • <|user|> - User message start
  • <|assistant|> - Assistant message start
  • <|end|> - Message end

Example:

<|user|>
What is machine learning?
<|end|>
<|assistant|>
Machine learning is...
<|end|>

Training Procedure

Data Preprocessing

  1. Dolly 15K dataset was downloaded and formatted for Phi-4's chat template
  2. Instructions and responses were wrapped with appropriate special tokens
  3. Data was split 80/20 for training/validation
  4. Saved as JSONL files for MLX compatibility

LoRA Fine-tuning

  1. Applied LoRA adapters to 16 transformer layers
  2. Trained for 1,000 iterations with validation every 200 steps
  3. Saved checkpoints every 500 iterations
  4. Final adapter weights were fused with the base model

Post-processing

The LoRA adapters were merged with the base model weights to create this standalone model, eliminating the need for adapter loading during inference.

Evaluation

Validation loss decreased from ~3.0 to ~1.5-2.0 during training, indicating successful learning of the instruction-following patterns in the Dolly dataset.

Environmental Impact

This model was trained on energy-efficient Apple Silicon hardware, resulting in lower power consumption compared to traditional GPU training. Estimated carbon footprint is minimal due to:

  • Short training time (< 1 hour)
  • Efficient LoRA method (only 0.082% of parameters trained)
  • Apple Silicon's power efficiency

Citation

If you use this model, please cite:

@misc{phi4-mini-dolly-mlx,
  author = {Your Name},
  title = {Phi-4 Mini Instruct Fine-tuned on Dolly 15K for MLX},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/your-username/phi-4-mini-instruct-dolly-15k-mlx}
}

Acknowledgments

  • Microsoft for the Phi-4 base model
  • Databricks for the Dolly 15K dataset
  • Apple MLX team for the framework
  • The open-source community

Model Card Contact

For questions or issues with this model, please open an issue on the GitHub repository or contact via Hugging Face.

Downloads last month
29
Safetensors
Model size
3.84B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ianphil/phi4-mini-dolly-15k-mlx

Adapter
(41)
this model

Dataset used to train ianphil/phi4-mini-dolly-15k-mlx