KevinG/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-10-mix-100-10-more-rounds-AT-1 Text Generation • 8B • Updated Jun 26 • 6
KevinG/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-100-reproduce-AT-1 Text Generation • 8B • Updated Jun 26 • 6
KevinG/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-100-reproduce-no-seed-AT-1 Text Generation • 8B • Updated Jun 26 • 6
KevinG/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-100-reproduce-version-AT-1 Text Generation • 8B • Updated Jun 26 • 6
KevinG/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-100-reproduce-version-3-AT-1 Text Generation • 8B • Updated Jun 26 • 6
KevinG/Meta-Llama-3-8B-Instruct-GRPO-alpaca_combine_100_no_KL_42_reproduce Text Generation • 8B • Updated Jun 27 • 7
AmberYifan/llama3-8b-full-pretrain-mix-high-tweet-1m-en-gpt Text Generation • 8B • Updated Jul 3 • 10