Ranking LLM-Generated Loop Invariants for Program Verification
Abstract
A re-ranking approach using a contrastive ranker improves the accuracy of synthesized loop invariants from LLMs, reducing the need for program verifier calls.
Synthesizing inductive loop invariants is fundamental to automating program verification. In this work, we observe that Large Language Models (such as gpt-3.5 or gpt-4) are capable of synthesizing loop invariants for a class of programs in a 0-shot setting, yet require several samples to generate the correct invariants. This can lead to a large number of calls to a program verifier to establish an invariant. To address this issue, we propose a {\it re-ranking} approach for the generated results of LLMs. We have designed a ranker that can distinguish between correct inductive invariants and incorrect attempts based on the problem definition. The ranker is optimized as a contrastive ranker. Experimental results demonstrate that this re-ranking mechanism significantly improves the ranking of correct invariants among the generated candidates, leading to a notable reduction in the number of calls to a verifier.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Lemur: Integrating Large Language Models in Automated Program Verification (2023)
- A Deductive Verification Infrastructure for Probabilistic Programs (2023)
- Large Language Model-Aware In-Context Learning for Code Generation (2023)
- VerilogEval: Evaluating Large Language Models for Verilog Code Generation (2023)
- The Program Testing Ability of Large Language Models for Code (2023)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper