MORepair
Collection
A Collection of MORepair that includes the multi-objective fine-tuned CodeLlama-13B and all evaluation benchmarks.
•
6 items
•
Updated
•
1
CodeLlama-13b-MORepair is a program repair model fine-tuned from CodeLlama-13b-instruct using a novel multi-objective fine-tuning framework called MOREPAIR. This model is specifically designed to improve automated program repair capabilities by learning both code transformations and repair logic reasoning.
If you use this model in your research, please cite:
@article{10.1145/3735129,
author = {Yang, Boyang and Tian, Haoye and Ren, Jiadong and Zhang, Hongyu and Klein, Jacques and Bissyande, Tegawende and Le Goues, Claire and Jin, Shunfu},
title = {MORepair: Teaching LLMs to Repair Code via Multi-Objective Fine-Tuning},
year = {2025},
publisher = {Association for Computing Machinery},
issn = {1049-331X},
url = {https://doi.org/10.1145/3735129},
doi = {10.1145/3735129},
journal = {ACM Trans. Softw. Eng. Methodol.},
}
The model was trained using MOREPAIR, which employs:
Here's how to use the model with the Hugging Face Transformers library:
pip install transformers torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
# Load model and tokenizer
model_name = "barty/CodeLlama-13B-MORepair"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
load_in_8bit=True,
torch_dtype=torch.float16
)
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
def repair_code(buggy_code, filename="example.java"):
# Construct prompt in the format model expects
prompt = f"""[INST] This is an incorrect code({filename}):
```java
{buggy_code}
```
You are a software engineer. Can you repair the incorrect code?
[/INST]
```java
"""
# Calculate token count for length control
prompt_tokens = len(tokenizer.tokenize(prompt))
max_new_tokens = 500 - prompt_tokens
# Generate repair
output = pipe(
prompt,
min_length=prompt_tokens + 64,
max_length=prompt_tokens + max_new_tokens,
temperature=1.0,
do_sample=True
)
# Extract the generated code
full_text = output[0]['generated_text']
fixed_code = full_text.split('[/INST]')[1].strip()
return full_text, fixed_code
# Example usage
buggy_code = """
public static int findMinRotated(int[] arr) {
int left = 0;
int right = arr.length - 1;
while (left < right) {
int mid = (left + right) / 2;
if (arr[mid] > arr[right])
left = mid; // Bug: should be mid + 1
else
right = mid;
}
return arr[left];
}
"""
full_response, fixed_code = repair_code(buggy_code)
print("Fixed code:")
print(fixed_code)
load_in_8bit=True
: Enables 8-bit quantization for efficient inferencetemperature=1.0
: Controls randomness in generationdo_sample=True
: Enables sampling-based generationmin_length
: Minimum length of generated textmax_length
: Maximum length of generated text