synthetic-data - a leonardlin Collection

leonardlin 's Collections

8b-class-japanese-models

speed

sota

evals

tuning

rag

context

safety

image

vision

code

prompt injection

TOREAD

data

voice

synthetic-data

updated May 21, 2024

Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling

Paper • 2401.16380 • Published Jan 29, 2024 • 50
Best Practices and Lessons Learned on Synthetic Data for Language Models

Paper • 2404.07503 • Published Apr 11, 2024 • 32
WizardLM: Empowering Large Language Models to Follow Complex Instructions

Paper • 2304.12244 • Published Apr 24, 2023 • 14
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models

Paper • 2402.13064 • Published Feb 20, 2024 • 49
Textbooks Are All You Need

Paper • 2306.11644 • Published Jun 20, 2023 • 145
Orca 2: Teaching Small Language Models How to Reason

Paper • 2311.11045 • Published Nov 18, 2023 • 75
Orca: Progressive Learning from Complex Explanation Traces of GPT-4

Paper • 2306.02707 • Published Jun 5, 2023 • 46
WizardCoder: Empowering Code Large Language Models with Evol-Instruct

Paper • 2306.08568 • Published Jun 14, 2023 • 29
Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs

Paper • 2310.13961 • Published Oct 21, 2023 • 5