--- license: mit base_model: - Qwen/Qwen2.5-14B-Instruct ---
Caigao Jiang*
·
Xiang Shu*
·
Hong Qian†
·
Xingyu Lu†
Jun Zhou
·
Aimin Zhou
·
Yang Yu
East China Normal University | Ant Group | Nanjing University
## 🤖Model Release We release the [LLMOPT-Qwen2.5-14B](https://huggingface.co/ant-opt/LLMOPT-Qwen2.5-14B) model on Hugging Face and conduct comprehensive performance evaluations. We have updated the model evaluation results as shown in the following table, where the original results correspond to Table 1 and Table 2 in the paper. The differences in results stem from two reasons. Firstly, we exclude all Mamo EasyLP and ComplexLP datasets from the training process, reserving them exclusively for the test. Additionally, unlike the version described in our paper which used [Qwen1.5-14B](https://huggingface.co/Qwen/Qwen1.5-14B), this release is fine-tuned from the latest [Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) model. The performance metrics for [LLMOPT-Qwen2.5-14B](https://huggingface.co/ant-opt/LLMOPT-Qwen2.5-14B) are as follows: | Dataset | NL4Opt | Mamo Easy | Mamo Complex | NLP4LP | ComplexOR | IndustryOR | ICML Competition | OptiBench | OptMath | AVG | | :-------------------------------: | :--------------: | :--------------: | :--------------: | :--------------: | :--------------: | :--------------: | :--------------: | :--------------: | :--------------: | :--------------: | | #Questions | 230 | 652 | 211 | 242 | 18 | 100 | 410 | 605 | 166 | - | | ER with self-correction | 100.00% | 100.00% | 99.05% | 100.00% | 100.00% | 94.00% | 99.66% | 82.31% | 75.30% | 94.48% | | **SA with self-correction** | **97.31%** | **95.31%** | **85.78%** | **86.49%** | **76.47%** | **44.00%** | **95.76%** | **66.44%** | **40.00%** | **76.40%** | | AST with self-correction | 1.38 | 1.13 | 2.13 | 1.50 | 3.46 | 2.14 | 1.47 | 1.54 | 4.06 | 2.09 | | ER w/o self-correction | 97.42% | 98.29% | 77.73% | 97.93% | 88.89% | 61.00% | 93.90% | 73.22% | 31.93% | 80.03% | | SA w/o self-correction | 80.28% | 89.53% | 44.08% | 73.42% | 35.29% | 29.00% | 75.35% | 53.83% | 12.50% | 54.81% | In the experiment, we use three performance metrics to comprehensively evaluate the optimization generalization of the algorithm, namely, **Execution Rate (ER), Solving Accuracy (SA), and Average Solving Times (AST)**. Specifically, **ER** refers to the proportion of solutions whose code can run without any errors and has running results output. **SA** refers to the proportion of solutions that correctly solve the optimization problem, i.e., find the optimal solution. **AST** refers to the average number of times the self-correction process is performed during the test. ## 📊Dataset Release ### Data Structure To facilitate the evaluation, we process all datasets into a unified data structure. Specifically, each dataset is organized in a `jsonl` file, and each line is an independent piece of data. Each data includes four attributes, `question`, `answer`, `ori`, and `index`. The `question` field is a complete string description of the optimization problem, including complete data that can solve a problem. The `answer` field is a `float` type value, which indicates the objective function value corresponding to the optimal solution of the problem, i.e., the ground truth. The `ori` field indicates the source of the problem, that is, the name of the dataset. In order to facilitate statistical results, we use the `index` field to number the data in each dataset. The data are [available](https://github.com/antgroup/LLMOPT/tree/main/data/testset). An example: (The first data of the NL4Opt dataset) ```json { "question": "There has been an oil spill in the ocean and ducks need to be taken to shore to be cleaned either by boat or by canoe. A boat can take 10 ducks per trip while a canoe can take 8 ducks per trip. Since the boats are motor powered, they take 20 minutes per trip while the canoes take 40 minutes per trip. In order to avoid further environmental damage, there can be at most 12 boat trips and at least 60% of the trips should be by canoe. If at least 300 ducks need to be taken to shore, how many of each transportation method should be used to minimize the total amount of time needed to transport the ducks?", "answer": 1160, "ori": "5_nl4opt_test", "index": 1 } ``` ### Dataset Source Here we explain the sources of all data sets and the detailed data processing process. For ground truth values with more than two decimal places, they will be rounded to two decimal places. If you find any omissions in manual labeling, please feel free to correct them. ##### 1. NL4Opt The data for this testset comes from the competition, [NL4Opt](https://nl4opt.github.io/). We only used the test split. We manually labeled these 230 optimization problems. The [original dataset](https://huggingface.co/datasets/CardinalOperations/NL4OPT) contains 245 problems, of which 15 were found to be unsolvable after manual inspection, so we manually removed these problems. The sorted data can be found in the `./data/testset/nl4opt_test.jsonl`. ##### 2. Mamo Easy This testset comes from the paper [Mamo: a Mathematical Modeling Benchmark with Solvers](https://arxiv.org/pdf/2405.13144v1). We obtained the original dataset of 652 data from [huggingface](https://huggingface.co/datasets/CardinalOperations/MAMO/viewer/default/easy_lp?views%5B%5D=easy_lp). Since we found some wrong ground truth value in the open-source data, we manually checked and re-labeled all the data. The manually checked data is stored in `./data/testset/mamo_easy_test.jsonl`. ##### 3. Mamo Complex This testset comes from the paper [Mamo: a Mathematical Modeling Benchmark with Solvers](https://arxiv.org/pdf/2405.13144v1). We sorted out 211 original problems from the `complex_lp` spilt of the [huggingface](https://huggingface.co/datasets/CardinalOperations/MAMO/viewer/default/complex_lp?views%5B%5D=complex_lp) and stored the original data in a unified format in `./data/testset/mamo_complex_test.jsonl`. ##### 4. NLP4LP This testset comes from the paper [OptiMUS: Optimization Modeling Using MIP Solvers and large language models](https://arxiv.org/abs/2310.06116). We sorted out these 242 feasible original problems from [huggingface](https://huggingface.co/datasets/udell-lab/NLP4LP) and stored the original data in a unified format in `./data/testset/nlp4lp.jsonl`. ##### 5. ComplexOR This testset comes from the paper [Chain-of-Experts: When LLMs Meet Complex Operation Research Problems](https://openreview.net/pdf?id=HobyL1B9CZ). We sorted out these 18 feasible original problems from the [github repo](https://github.com/xzymustbexzy/Chain-of-Experts/tree/main/dataset/ComplexOR) and stored the original data in a unified format in `./data/testset/complexor.jsonl`. ##### 6. IndustryOR This testset comes from the paper [ORLM: A Customizable Framework in Training Large Models for Automated Optimization Modeling](https://arxiv.org/abs/2405.17743). We sorted out these 100 original problems from [huggingface](https://huggingface.co/datasets/CardinalOperations/IndustryOR) and stored the original data in a unified format in `./data/testset/industryor.jsonl`. ##### 7. ICML Competition The data for this testset comes from the competition, [ICML 2024 Challenges on Automated Math Reasoning - Track 3: Automated Optimization Problem-Solving with Code](https://www.codabench.org/competitions/2438/). We only used the test split. Since the competition organizer did not open source the ground truth of the testset, we manually labeled these 410 problems. The original dataset contains 421 problems, of which 11 were found to be unsolvable after manual inspection, so we manually removed these problems. The sorted data can be found in the `./data/testset/task3_test.jsonl`. ##### 8. OptiBench This testset comes from the paper [OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization Modeling](https://arxiv.org/pdf/2407.09887v2). We sorted out these 605 original problems from the [repository](https://github.com/yangzhch6/ReSocratic/blob/main/data/OptiBench.json) and stored the original data in a unified format in `./data/testset/optibench.jsonl`. ##### 9. OptMath This testset comes from the paper [OptMATH: A Scalable Bidirectional Data Synthesis Framework for Optimization Modeling](https://arxiv.org/pdf/2502.11102). We sorted out these 165 original problems from the [repository](https://github.com/AuroraLHL/OptMATH/blob/main/benchmark/OptMATH_Bench.json) and stored the original data in a unified format in `./data/testset/optmath.jsonl`. ## ⚙️Inference The following example code for model inference in getting the experiement data: ```python model = AutoModelForCausalLM.from_pretrained(path,torch_dtype="auto",device_map="auto") tokenizer = AutoTokenizer.from_pretrained(path_t) prompt = "Give me a short introduction to large language model." messages = [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], return_tensors="pt").to(device) generated_ids = model.generate(model_inputs.input_ids,max_new_tokens=512) generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids generated_ids)] response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] print(response) ``` ## ⌛️Future Work With the remarkable progress and rapid development of reasoning models (like DeepSeek R1 and OpenAI O1-3) in solving complex mathematical problems, we have also developed the LLMOPT Reasoning model. We will soon release our LLMOPT Reasoning version along with a new benchmarking effort. ## 📄Citation If you encounter any question about our work, please do not hesitate to submit an issue. If you do find our resources helpful, please cite our [paper](https://huggingface.co/papers/2410.13213). ``` @inproceedings{JiangShu2025llmopt, title = {LLMOPT: Learning to Define and Solve General Optimization Problems from Scratch}, author = {Caigao Jiang and Xiang Shu and Hong Qian and Xingyu Lu and Jun Zhou and Aimin Zhou and Yang Yu}, booktitle = {Proceedings of the Thirteenth International Conference on Learning Representations (ICLR)}, year = {2025}, address = {Singapore, Singapore}, url = {https://openreview.net/pdf?id=9OMvtboTJg} } ```