REFINE-AF: A Task-Agnostic Framework to Align Language Models via Self-Generated Instructions using Reinforcement Learning from Automated Feedback
Abstract
Semi-automated frameworks using open-source small LLMs and reinforcement learning significantly improve instruction dataset generation for LLM fine-tuning across various tasks.
Instruction-based Large Language Models (LLMs) have proven effective in numerous few-shot or zero-shot Natural Language Processing (NLP) tasks. However, creating human-annotated instruction data is time-consuming, expensive, and often limited in quantity and task diversity. Previous research endeavors have attempted to address this challenge by proposing frameworks capable of generating instructions in a semi-automated and task-agnostic manner directly from the model itself. Many of these efforts have relied on large API-only parameter-based models such as GPT-3.5 (175B), which are expensive, and subject to limits on a number of queries. This paper explores the performance of three open-source small LLMs such as LLaMA 2-7B, LLama 2-13B, and Mistral 7B, using a semi-automated framework, thereby reducing human intervention, effort, and cost required to generate an instruction dataset for fine-tuning LLMs. Furthermore, we demonstrate that incorporating a Reinforcement Learning (RL) based training algorithm into this LLMs-based framework leads to further enhancements. Our evaluation of the dataset reveals that these RL-based frameworks achieve a substantial improvements in 63-66% of the tasks compared to previous approaches.
Community
PAPER - REFINE-AF: A Task-Agnostic Framework to Align Language Models via Self-Generated Instructions using Reinforcement Learning from Automated Feedback
AUTHORS - Aniruddha Roy, Pretam Ray, Abhilash Nandy, Somak Aditya, Pawan Goyal
ABSTRACT -
Instruction-based Large Language Models (LLMs) have proven effective in numerous few-shot or zero-shot Natural Language Processing (NLP) tasks. However, creating human-annotated instruction data is time-consuming, expensive, and often limited in quantity and task diversity. Previous research endeavors have attempted to address this challenge by proposing frameworks capable of generating instructions in a semi-automated and task-agnostic manner directly from the model itself. Many of these efforts have relied on large API-only parameter-based models such as GPT-3.5 (175B), which are expensive, and subject to limits on a number of queries. This paper explores the performance of three open-source small LLMs such as LLaMA 2-7B, LLama 2-13B, and Mistral 7B, using a semi-automated framework, thereby reducing human intervention, effort, and cost required to generate an instruction dataset for fine-tuning LLMs. Furthermore, we demonstrate that incorporating a Reinforcement Learning (RL) based training algorithm into this LLMs-based framework leads to further enhancements. Our evaluation of the dataset reveals that these RL-based frameworks achieve a substantial improvements in 63-66% of the tasks compared to previous approaches.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Improving Model Alignment Through Collective Intelligence of Open-Source LLMS (2025)
- Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG Evaluation Prompts (2025)
- Synthetic Data Generation Using Large Language Models: Advances in Text and Code (2025)
- Improving In-Context Learning with Reasoning Distillation (2025)
- OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs (2025)
- Distillation and Refinement of Reasoning in Small Language Models for Document Re-ranking (2025)
- Think, Prune, Train, Improve: Scaling Reasoning without Scaling Models (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
@abhi1nandy2 Really interesting work!!
I was curious though: given the availability of more capable models like LLaMA 3, LLaMA 3.1, and Qwen 2.5 at the time of your research, what motivated the choice to focus on LLaMA 1 and LLaMA 2 for your experiments?
Would love to hear your thoughts on how the framework might extend to newer model families.
Hi @Ritvik19 ,
Thanks for the thoughtful question! At the time of our experiments, LLaMA 1 and 2 were the most accessible models with open weights and well-documented setups, making them practical choices for developing and validating our framework.
That said, REFINE-AF is designed to be model-agnostic, and we're currently working on extending it to newer models like LLaMA 3.1, Qwen 2.5, DeepSeek-R1, Mistral, and Google’s Gemma. These mainly require integrating the new models and tuning a few parameters, but the core framework remains unchanged.
Thank you for your interest in our work!
Thanks for the clarification!
I would like to know when were the experiments conducted?
It would be really interesting to see how REFINE-AF performs on more capable recent models like LLaMA 3.1 or Qwen 2.5, which already show strong instruction-following out of the box. Curious if evaluating the benefit of your method on such models is part of your roadmap?
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper