--- base_model: unsloth/llama-3.2-1b-instruct-unsloth-bnb-4bit library_name: peft license: apache-2.0 datasets: - garage-bAInd/Open-Platypus language: - en tags: - MATH - LEETCODE - text-generation-inference - SCIENCE --- # Model Card for SicMundus ## Model Details ### Model Description This model, **Pinnacle**, is a fine-tuned version of `unsloth/Llama-3.2-1B-Instruct` utilizing Parameter Efficient Fine-Tuning (PEFT) with LoRA (Low-Rank Adaptation). It has been trained on the `Open-Platypus` dataset with a structured Alpaca-style prompt format. The primary goal is to enhance instruction-following capabilities while maintaining efficiency through 4-bit quantization. - **Developed by:** Ragul - **Funded by:** Self-funded - **Organization:** Pinnacle Organization - **Shared by:** Ragul - **Model type:** Instruction-tuned Language Model - **Language(s) (NLP):** English - **License:** Apache 2.0 (or specify if different) - **Finetuned from model:** `unsloth/Llama-3.2-1B-Instruct` ### Model Sources - **Repository:** [https://huggingface.co/ragul2607/SicMundus] - **Paper:** N/A (or link to relevant research) - **Demo:** [Gradio, HF Spaces, etc.] ## Uses ### Direct Use - General-purpose instruction-following tasks - Text generation - Code generation assistance - Conversational AI applications ### Downstream Use - Further fine-tuning on domain-specific datasets - Deployment in chatbot applications - Text summarization or document completion ### Out-of-Scope Use - Not designed for real-time critical applications (e.g., medical or legal advice) - May not be suitable for handling highly sensitive data ## Bias, Risks, and Limitations While the model is designed to be a general-purpose assistant, it inherits biases from the pre-trained Llama model and the Open-Platypus dataset. Users should be aware of potential biases in generated responses, particularly regarding sensitive topics. ### Recommendations - Use in conjunction with human oversight. - Avoid deploying in high-stakes scenarios without additional testing. ## How to Get Started with the Model To use the fine-tuned model, follow these steps: ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_path = "path/to/SicMundus" tokenizer = AutoTokenizer.from_pretrained(model_path) model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16, device_map="auto") def generate_response(prompt): inputs = tokenizer(prompt, return_tensors="pt").to("cuda") output = model.generate(**inputs, max_new_tokens=100) return tokenizer.decode(output[0], skip_special_tokens=True) prompt = "Explain the concept of reinforcement learning." print(generate_response(prompt)) ``` ## Training Details ### Training Data - **Dataset:** `garage-bAInd/Open-Platypus` - **Preprocessing:** The dataset was formatted using Alpaca-style prompts with instruction, input, and output fields. ### Training Procedure - **Training Framework:** Hugging Face `transformers` + `trl` (PEFT + LoRA) - **Precision:** Mixed precision (FP16/BF16 based on hardware support) - **Batch size:** 2 per device with gradient accumulation - **Learning rate:** 2e-4 - **Max Steps:** 100 - **Optimizer:** AdamW 8-bit - **LoRA Config:** Applied to key transformer layers (q_proj, k_proj, v_proj, etc.) ### Speeds, Sizes, Times - **Checkpoint Size:** ~2GB (LoRA adapters stored separately) - **Fine-tuning Time:** ~1 hour on A100 GPU ## Evaluation ### Testing Data, Factors & Metrics - **Testing Data:** A subset of Open-Platypus - **Factors:** Performance on general instruction-following tasks - **Metrics:** - Perplexity (PPL) - Response Coherence - Instruction-following accuracy ### Results - **Perplexity:** TBD - **Response Quality:** Qualitatively improved over base model on test prompts ## Model Examination - **Interpretability:** Standard transformer-based behavior with LoRA fine-tuning. - **Explainability:** Outputs can be analyzed with attention visualization tools. ## Environmental Impact - **Hardware Type:** A100 GPU - **Hours used:** ~1 hour - **Cloud Provider:** Local GPU / AWS / Hugging Face Accelerate - **Carbon Emitted:** Estimated using [Machine Learning Impact Calculator](https://mlco2.github.io/impact) ## Technical Specifications ### Model Architecture and Objective - Transformer-based architecture (Llama-3.2-1B) - Instruction-following optimization with PEFT-LoRA ### Compute Infrastructure - **Hardware:** A100 (or specify if different) - **Software:** Python, PyTorch, `transformers`, `unsloth`, `peft` ## Citation If using this model, please cite: ```bibtex @misc{SicMundus, author = {Ragul}, title = {SicMundus: Fine-Tuned Llama-3.2-1B-Instruct}, year = {2025}, url = {https://huggingface.co/ragul2607/SicMundus} } ``` ## More Information - **Contact:** [https://github.com/ragultv] - **Further Work:** Integrate with RLHF for better alignment ## Model Card Authors - Ragul