Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models Paper • 2401.00788 • Published Jan 1, 2024 • 24
view article Article OpenEvolve: An Open Source Implementation of Google DeepMind's AlphaEvolve By codelion • 18 days ago • 19
Reinforcement Learning for Reasoning in Large Language Models with One Training Example Paper • 2504.20571 • Published Apr 29 • 94
Unsloth Dynamic 2.0 Quants Collection New 2.0 version of our Dynamic GGUF + Quants. Dynamic 2.0 achieves superior accuracy & outperforms all leading quantization methods. • 32 items • Updated 8 days ago • 114
🧠Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community • 24 items • Updated 19 days ago • 148
Dolphin 3.0 Collection Dolphin 3.0 is the next generation of the Dolphin series of instruct-tuned models. Designed to be the ultimate general purpose local model. • 9 items • Updated Feb 7 • 161
view article Article Self-Hosting LLaMA 3.1 70B (or any ~70B LLM) Affordably By abhinand • Aug 20, 2024 • 16
view article Article Fine-tune Llama 3.1 Ultra-Efficiently with Unsloth By mlabonne • Jul 29, 2024 • 329
view article Article Llama-3.1-Storm-8B: Improved SLM with Self-Curation + Model Merging By akjindal53244 • Aug 19, 2024 • 77
smol llama Collection 🚧"raw" pretrained smol_llama checkpoints - WIP 🚧 • 4 items • Updated Apr 29, 2024 • 6
Foundation Text-Generation Models Below 360M Parameters Collection Great candidates for fine-tuning targeting Wllama and Transformers.js for mobile devices, ordered by number of parameters. • 36 items • Updated Apr 6 • 32
RLHF Workflow: From Reward Modeling to Online RLHF Paper • 2405.07863 • Published May 13, 2024 • 72