---
license: mit
base_model:
- Qwen/Qwen2.5-14B-Instruct
---
<h2 align="center">ICLR25 | LLMOPT: Learning to Define and Solve General Optimization Problems from Scratch </h2>
<p align="center">
    <a href=""><strong>Caigao Jiang</strong></a><sup>*</sup>
    ·
    <a href=""><strong>Xiang Shu</strong></a><sup>*</sup>
    ·
    <a href=""><strong>Hong Qian</strong></a><sup>†</sup>
    ·
    <a href=""><strong>Xingyu Lu</strong></a><sup>†</sup>
    <br>
    <a href=""><strong>Jun Zhou</strong></a>
    ·
    <a href=""><strong>Aimin Zhou</strong></a>
    ·
    <a href=""><strong>Yang Yu</strong></a>
    <div align='center'>
        <sup>*</sup>Equal Contribution, <sup>†</sup>Corresponding Authors.
    </div>
    <p align="center">
        <b>East China Normal University    |    Ant Group   |   Nanjing University  </b></p> 
    <p align="center" style="white-space: nowrap;">
        <a href="https://openreview.net/pdf?id=9OMvtboTJg" style="display: inline-block;"><img src='https://img.shields.io/badge/Paper-LLMOPT-red'></a>
        <a href='https://huggingface.co/ant-opt/LLMOPT-Qwen2.5-14B' style="display: inline-block;"><img src='https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Model-yellow'></a>
        <a href='https://github.com/ant-opt/LLMOPT/tree/main/data/testset' style="display: inline-block;"><img src='https://img.shields.io/badge/Dataset-Testset-blue'></a>
        <a href='https://github.com/ant-opt/LLMOPT' style="display: inline-block;"><img src='https://img.shields.io/badge/GitHub-Repo-blue'></a>
    </p>
</p>

## 🤖Model Release

We release the [LLMOPT-Qwen2.5-14B](https://huggingface.co/ant-opt/LLMOPT-Qwen2.5-14B) model on Hugging Face and conduct comprehensive performance evaluations. We have updated the model evaluation results as shown in the following table, where the original results correspond to Table 1 and Table 2 in the paper. The differences in results stem from two reasons. Firstly, we exclude all Mamo EasyLP and ComplexLP datasets from the training process, reserving them exclusively for the test. Additionally, unlike the version described in our paper which used [Qwen1.5-14B](https://huggingface.co/Qwen/Qwen1.5-14B), this release is fine-tuned from the latest [Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) model. The performance metrics for [LLMOPT-Qwen2.5-14B](https://huggingface.co/ant-opt/LLMOPT-Qwen2.5-14B) are as follows:

|              Dataset              |      NL4Opt      |    Mamo Easy    |   Mamo Complex   |      NLP4LP      |    ComplexOR    |    IndustryOR    | ICML Competition |    OptiBench    |     OptMath     |       AVG       |
| :-------------------------------: | :--------------: | :--------------: | :--------------: | :--------------: | :--------------: | :--------------: | :--------------: | :--------------: | :--------------: | :--------------: |
|            #Questions            |       230       |       652       |       211       |       242       |        18        |       100       |       410       |       605       |       166       |        -        |
|      ER with self-correction      |     100.00%     |     100.00%     |      99.05%      |     100.00%     |     100.00%     |      94.00%      |      99.66%      |      82.31%      |      75.30%      |      94.48%      |
| **SA with self-correction** | **97.31%** | **95.31%** | **85.78%** | **86.49%** | **76.47%** | **44.00%** | **95.76%** | **66.44%** | **40.00%** | **76.40%** |
|     AST with self-correction     |       1.38       |       1.13       |       2.13       |       1.50       |       3.46       |       2.14       |       1.47       |       1.54       |       4.06       |       2.09       |
|      ER w/o self-correction      |      97.42%      |      98.29%      |      77.73%      |      97.93%      |      88.89%      |      61.00%      |      93.90%      |      73.22%      |      31.93%      |      80.03%      |
|      SA w/o self-correction      |      80.28%      |      89.53%      |      44.08%      |      73.42%      |      35.29%      |      29.00%      |      75.35%      |      53.83%      |      12.50%      |      54.81%      |

In the experiment, we use three performance metrics to comprehensively evaluate the optimization generalization of the algorithm, namely, **Execution Rate (ER), Solving Accuracy (SA), and Average Solving Times (AST)**. Specifically, **ER** refers to the proportion of solutions whose code can run without any errors and has running results output. **SA** refers to the proportion of solutions that correctly solve the optimization problem, i.e., find the optimal solution. **AST** refers to the average number of times the self-correction process is performed during the test. 

## 📊Dataset Release

### Data Structure

To facilitate the evaluation, we process all datasets into a unified data structure. Specifically, each dataset is organized in a `jsonl` file, and each line is an independent piece of data. Each data includes four attributes, `question`, `answer`, `ori`, and `index`. The `question` field is a complete string description of the optimization problem, including complete data that can solve a problem. The `answer` field is a `float` type value, which indicates the objective function value corresponding to the optimal solution of the problem, i.e., the ground truth. The `ori` field indicates the source of the problem, that is, the name of the dataset. In order to facilitate statistical results, we use the `index` field to number the data in each dataset.

The data are [available](https://github.com/antgroup/LLMOPT/tree/main/data/testset). 

An example: (The first data of the NL4Opt dataset)

```json
{
    "question": "There has been an oil spill in the ocean and ducks need to be taken to shore to be cleaned either by boat or by canoe. A boat can take 10 ducks per trip while a canoe can take 8 ducks per trip. Since the boats are motor powered, they take 20 minutes per trip while the canoes take 40 minutes per trip. In order to avoid further environmental damage, there can be at most 12 boat trips and at least 60% of the trips should be by canoe. If at least 300 ducks need to be taken to shore, how many of each transportation method should be used to minimize the total amount of time needed to transport the ducks?", 
    "answer": 1160, 
    "ori": "5_nl4opt_test", 
    "index": 1
}
```

### Dataset Source

Here we explain the sources of all data sets and the detailed data processing process. For ground truth values with more than two decimal places, they will be rounded to two decimal places. If you find any omissions in manual labeling, please feel free to correct them.

##### 1. NL4Opt

The data for this testset comes from the competition, [NL4Opt](https://nl4opt.github.io/). We only used the test split. We manually labeled these 230 optimization problems. The [original dataset](https://huggingface.co/datasets/CardinalOperations/NL4OPT) contains 245 problems, of which 15 were found to be unsolvable after manual inspection, so we manually removed these problems. The sorted data can be found in the `./data/testset/nl4opt_test.jsonl`.

##### 2. Mamo Easy

This testset comes from the paper [Mamo: a Mathematical Modeling Benchmark with Solvers](https://arxiv.org/pdf/2405.13144v1). We obtained the original dataset of 652 data from [huggingface](https://huggingface.co/datasets/CardinalOperations/MAMO/viewer/default/easy_lp?views%5B%5D=easy_lp). Since we found some wrong ground truth value in the open-source data, we manually checked and re-labeled all the data. The manually checked data is stored in `./data/testset/mamo_easy_test.jsonl`.

##### 3. Mamo Complex

This testset comes from the paper [Mamo: a Mathematical Modeling Benchmark with Solvers](https://arxiv.org/pdf/2405.13144v1). We sorted out 211 original problems from the `complex_lp` spilt of the [huggingface](https://huggingface.co/datasets/CardinalOperations/MAMO/viewer/default/complex_lp?views%5B%5D=complex_lp) and stored the original data in a unified format in `./data/testset/mamo_complex_test.jsonl`.

##### 4. NLP4LP

This testset comes from the paper [OptiMUS: Optimization Modeling Using MIP Solvers and large language models](https://arxiv.org/abs/2310.06116). We sorted out these 242 feasible original problems from [huggingface](https://huggingface.co/datasets/udell-lab/NLP4LP) and stored the original data in a unified format in `./data/testset/nlp4lp.jsonl`.

##### 5. ComplexOR

This testset comes from the paper [Chain-of-Experts: When LLMs Meet Complex Operation Research Problems](https://openreview.net/pdf?id=HobyL1B9CZ). We sorted out these 18 feasible original problems from the [github repo](https://github.com/xzymustbexzy/Chain-of-Experts/tree/main/dataset/ComplexOR) and stored the original data in a unified format in `./data/testset/complexor.jsonl`.

##### 6. IndustryOR

This testset comes from the paper [ORLM: A Customizable Framework in Training Large Models for Automated Optimization Modeling](https://arxiv.org/abs/2405.17743). We sorted out these 100 original problems from [huggingface](https://huggingface.co/datasets/CardinalOperations/IndustryOR) and stored the original data in a unified format in `./data/testset/industryor.jsonl`.

##### 7. ICML Competition

The data for this testset comes from the competition, [ICML 2024 Challenges on Automated Math Reasoning - Track 3: Automated Optimization Problem-Solving with Code](https://www.codabench.org/competitions/2438/). We only used the test split. Since the competition organizer did not open source the ground truth of the testset, we manually labeled these 410 problems. The original dataset contains 421 problems, of which 11 were found to be unsolvable after manual inspection, so we manually removed these problems. The sorted data can be found in the `./data/testset/task3_test.jsonl`.

##### 8. OptiBench

This testset comes from the paper [OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization Modeling](https://arxiv.org/pdf/2407.09887v2). We sorted out these 605 original problems from the [repository](https://github.com/yangzhch6/ReSocratic/blob/main/data/OptiBench.json) and stored the original data in a unified format in `./data/testset/optibench.jsonl`.

##### 9. OptMath

This testset comes from the paper [OptMATH: A Scalable Bidirectional Data Synthesis Framework for Optimization Modeling](https://arxiv.org/pdf/2502.11102). We sorted out these 165 original problems from the [repository](https://github.com/AuroraLHL/OptMATH/blob/main/benchmark/OptMATH_Bench.json) and stored the original data in a unified format in `./data/testset/optmath.jsonl`.


## ⚙️Inference

The following example code for model inference in getting the experiement data:

```python
model = AutoModelForCausalLM.from_pretrained(path,torch_dtype="auto",device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(path_t)
prompt = "Give me a short introduction to large language model."
messages = [{"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": prompt}
            ]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)
generated_ids = model.generate(model_inputs.input_ids,max_new_tokens=512)
generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids generated_ids)]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
```

## ⌛️Future Work

With the remarkable progress and rapid development of reasoning models (like DeepSeek R1 and OpenAI O1-3) in solving complex mathematical problems, we have also developed the LLMOPT Reasoning model. We will soon release our LLMOPT Reasoning version along with a new benchmarking effort.

## 📄Citation

If you encounter any question about our work, please do not hesitate to submit an issue. If you do find our resources helpful, please cite our [paper](https://huggingface.co/papers/2410.13213).

```
@inproceedings{JiangShu2025llmopt,
  title     = {LLMOPT: Learning to Define and Solve General Optimization Problems from Scratch},
  author    = {Caigao Jiang and Xiang Shu and Hong Qian and Xingyu Lu and Jun Zhou and Aimin Zhou and Yang Yu},
  booktitle = {Proceedings of the Thirteenth International Conference on Learning Representations (ICLR)},
  year      = {2025},
  address   = {Singapore, Singapore},
  url       = {https://openreview.net/pdf?id=9OMvtboTJg}
}
```