Refining Hybrid Genetic Search for CVRP via Reinforcement Learning-Finetuned LLM

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Capacitated Vehicle RoutingLarge Language ModelReinforcement Finetuning

While large language models (LLMs) are increasingly used as automated heuristic designers for vehicle routing problems (VRPs), current state-of-the-art methods predominantly rely on prompting massive, general-purpose models like GPT-4. This work challenges that paradigm by demonstrating that a smaller, specialized LLM, when meticulously fine-tuned, can generate components that surpass expert-crafted heuristics within advanced solvers. We propose RFTHGS, a novel Reinforcement learning (RL) framework for Fine-Tuning a compact LLM to generate high-performance crossover operators for the Hybrid Genetic Search (HGS) solver, applied to the Capacitated VRP (CVRP). Our method employs a multi-tiered, curriculum-based reward function that progressively guides the LLM to master generating first compilable, then executable, and finally, superior-performing operators that exceed human expert designs. This is coupled with an operator caching mechanism that discourages plagiarism and promotes diversity during training. Comprehensive experiments show that our fine-tuned LLM produces crossover operators which significantly outperform the expert-designed ones in HGS. The performance advantage remains consistent, generalizing from small-scale instances to large-scale problems with up to 1000 nodes. Furthermore, RFTHGS exceeds the performance of leading neuro-combinatorial baselines, prompt-based methods, and commercial LLMs such as GPT-4o and GPT-4o-mini.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes RFTHGS, a reinforcement learning framework that fine-tunes a compact language model to generate crossover operators for the Hybrid Genetic Search solver applied to the Capacitated VRP. This work resides in the 'Heuristic Generation and Automated Design' leaf of the taxonomy, which contains only three papers total. This is a notably sparse research direction within the broader field of automated heuristic design for VRP, suggesting the paper enters relatively unexplored territory focused on generating rather than selecting algorithmic components.

The taxonomy reveals that most related work concentrates in neighboring branches: 'Operator and Heuristic Selection via RL' contains five papers focused on choosing among predefined operators, while 'Construction Heuristics' and 'Improvement Heuristics' branches collectively house over twenty papers building or refining solutions directly. The sibling papers in the same leaf (Reevo LLM and Automated Metaheuristics Design) emphasize code generation via prompting large models or evolutionary search over broad metaheuristic frameworks, whereas this work targets fine-tuning smaller models for specific solver components within an established genetic search architecture.

Among twenty-three candidates examined through semantic search and citation expansion, none clearly refute any of the three main contributions. For the core RFTHGS framework, eight candidates were examined with zero refutable overlaps; the multi-tiered curriculum reward mechanism examined five candidates with no prior work providing the same training strategy; and the demonstration that fine-tuned small LLMs can exceed expert designs examined ten candidates without finding contradictory evidence. This absence of refutation across all contributions suggests the specific combination of fine-tuning compact LLMs for operator generation within HGS has not been directly addressed in the limited search scope.

Based on the top-twenty-three semantic matches examined, the work appears to occupy a distinct position combining fine-tuning methodology with operator-level component generation for classical solvers. The sparse population of the 'Heuristic Generation' leaf and the lack of refutable prior work across all contributions indicate novelty within the examined scope, though the limited search scale means potentially relevant work outside these candidates remains unassessed.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Automated heuristic design for vehicle routing problems using reinforcement learning. The field has evolved into a rich landscape organized around several complementary directions. Construction heuristics via deep RL (e.g., Attention Routing[32], Dynamic Attention VRP[15]) focus on building solutions from scratch using neural policies, while improvement and hybrid heuristics (Learning TwoOpt[13], Hybrid Deep Local[47]) refine existing routes through learned operators. Hyper-heuristic and meta-level frameworks (Deep RL Hyper-Heuristic[10], RL Hyper-Heuristic HVRP[9]) operate at a higher abstraction, selecting or generating low-level heuristics adaptively. Parallel branches address dynamic and stochastic settings (Stochastic VRP RL[16], Dynamic Uncertain VRP[8]), electric vehicle constraints (Q-Learning Electric VRP[34], Last-Mile Electric VRP[36]), and specialized domains such as drone routing (Drone Vehicle Routing[3], SmartPathfinder Drones[6]). Methodological advances explore algorithmic innovations like inverse RL (Inverse RL TwoOpt[35]) and policy optimization (Policy Optimisation VRP[38]), while surveys and comparative studies (VRP RL Advancements[7], Metaheuristic RL Review[33]) synthesize progress across these threads. A particularly active line of work centers on heuristic generation and automated design, where systems learn to construct or discover novel heuristics rather than merely applying fixed operators. Reevo LLM[11] and Automated Metaheuristics Design[29] exemplify efforts to automate the design process itself, leveraging large language models or evolutionary search to produce problem-specific strategies. Hybrid Genetic RL CVRP[0] sits squarely within this branch, combining genetic algorithms with reinforcement learning to evolve heuristics for the capacitated VRP. Compared to Reevo LLM[11], which emphasizes code generation and symbolic reasoning, Hybrid Genetic RL CVRP[0] integrates evolutionary operators more tightly with RL-based policy refinement. Meanwhile, Automated Metaheuristics Design[29] explores broader metaheuristic frameworks, whereas Hybrid Genetic RL CVRP[0] targets a specific hybridization strategy for CVRP instances. These contrasting emphases highlight ongoing questions about the balance between generality and problem-specific tuning in automated heuristic design.

Claimed Contributions

RFTHGS: RL framework for fine-tuning LLMs to generate crossover operators for HGS

8 retrieved papers

The authors propose RFTHGS, a reinforcement learning framework that fine-tunes small language models (14B parameters) to automatically generate crossover operators for the Hybrid Genetic Search solver. This framework uses solution quality as feedback to guide the LLM toward producing operators that outperform expert-designed ones.

8 retrieved papers

Multi-tiered curriculum-based reward function with operator caching

5 retrieved papers

The authors develop a hierarchical reward function that decomposes learning into three progressive stages: compilability, executability, and performance superiority. They also introduce an operator caching mechanism using Abstract Syntax Trees to prevent plagiarism and encourage diverse operator exploration during training.

5 retrieved papers

Demonstration that fine-tuned small LLMs can exceed expert-designed components in state-of-the-art solvers

10 retrieved papers

The authors demonstrate that a specialized 14B-parameter LLM, when fine-tuned with their RL framework, can generate crossover operators that outperform both human-designed operators and those from much larger general-purpose models like GPT-4o, establishing a new paradigm for automated heuristic design in combinatorial optimization.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[11] Reevo: Large language models as hyper-heuristics with reflective evolution PDF

Ye HaoRan, Wang JiaRui, Cao, Zhiguang, Berto, Federico, Hua, Chuanbo, Kim Haeyeon, Park, Jinkyoo, Song Guo-jie (2024)

[29] Automated design of metaheuristics using reinforcement learning within a novel general search framework PDF

Wenjie Yi, Rong Qu, W. Yi, Licheng Jiao, Ben Niu, B. Niu (2022)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

RFTHGS: RL framework for fine-tuning LLMs to generate crossover operators for HGS

[51] When large language models meet evolutionary algorithms PDF

Cannot Refute

[52] Large Language Model Empowered Design of Fluid Antenna Systems: Challenges, Frameworks, and Case Studies for 6G PDF

Cannot Refute

[53] Deep neural crossover: A multi-parent operator that leverages gene correlations PDF

Cannot Refute

[54] Artificial evolutionary intelligence (AEI): evolutionary computation evolves with large language models PDF

Cannot Refute

[55] Leveraging Large Language Models for Dynamic Multi-Objective Optimization in UAV Sensor-Target Assignment PDF

Cannot Refute

[56] Neural network-driven hybrid algorithm generation of integrating grey wolf, bee, ant, genetic, and bat metaheuristics PDF

Cannot Refute

[57] Deep Reinforcement Learning-driven Metaheuristics towards an AI Foundation Model for Multi-Objective Optimisation PDF

Cannot Refute

[58] Learning guided hybrid genetic search for routing optimization PDF

Cannot Refute

Contribution

Multi-tiered curriculum-based reward function with operator caching

[59] From self-learning to self-evolving architectures in large language models: A short survey PDF

Cannot Refute

[60] Guided self-evolving llms with minimal human supervision PDF

Cannot Refute

[61] Testing the limits of SMILES-based de novo molecular generation with curriculum and deep reinforcement learning PDF

Cannot Refute

[62] DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation PDF

Cannot Refute

[63] cMALC-D: Contextual Multi-Agent LLM-Guided Curriculum Learning with Diversity-Based Context Blending PDF

Cannot Refute

Contribution

Demonstration that fine-tuned small LLMs can exceed expert-designed components in state-of-the-art solvers

[64] Large Language Models as Optimizers PDF

Cannot Refute

[65] Evolution of Heuristics: Towards Efficient Automatic Algorithm Design Using Large Language Model PDF

Cannot Refute

[66] LLM-Assisted Reinforcement Learning: Leveraging Lightweight Large Language Model Capabilities for Efficient Task Scheduling in Multi-Cloud Environment PDF

Cannot Refute

[67] Improving Existing Optimization Algorithms with LLMs PDF

Cannot Refute

[68] LLM Agent for Hyper-Parameter Optimization PDF

Cannot Refute

[69] PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization PDF

Cannot Refute

[70] Hybrid LLM-algorithm optimization: iterated fine-tuning for combinatorial problems PDF

Cannot Refute

[71] BUFFALO: PPA-Configurable, LLM-based Buffer Tree Generation via Group Relative Policy Optimization PDF

Cannot Refute

[72] HeurAgenix: Leveraging LLMs for Solving Complex Combinatorial Optimization Challenges PDF

Cannot Refute

[73] LLMs for Cold-Start Cutting Plane Separator Configuration PDF

Cannot Refute

Refining Hybrid Genetic Search for CVRP via Reinforcement Learning-Finetuned LLM

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[11] Reevo: Large language models as hyper-heuristics with reflective evolution PDF

[29] Automated design of metaheuristics using reinforcement learning within a novel general search framework PDF

Contribution Analysis

RFTHGS: RL framework for fine-tuning LLMs to generate crossover operators for HGS

[51] When large language models meet evolutionary algorithms PDF

[52] Large Language Model Empowered Design of Fluid Antenna Systems: Challenges, Frameworks, and Case Studies for 6G PDF

[53] Deep neural crossover: A multi-parent operator that leverages gene correlations PDF

[54] Artificial evolutionary intelligence (AEI): evolutionary computation evolves with large language models PDF

[55] Leveraging Large Language Models for Dynamic Multi-Objective Optimization in UAV Sensor-Target Assignment PDF

[56] Neural network-driven hybrid algorithm generation of integrating grey wolf, bee, ant, genetic, and bat metaheuristics PDF

[57] Deep Reinforcement Learning-driven Metaheuristics towards an AI Foundation Model for Multi-Objective Optimisation PDF

[58] Learning guided hybrid genetic search for routing optimization PDF

Multi-tiered curriculum-based reward function with operator caching

[59] From self-learning to self-evolving architectures in large language models: A short survey PDF

[60] Guided self-evolving llms with minimal human supervision PDF

[61] Testing the limits of SMILES-based de novo molecular generation with curriculum and deep reinforcement learning PDF

[62] DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation PDF

[63] cMALC-D: Contextual Multi-Agent LLM-Guided Curriculum Learning with Diversity-Based Context Blending PDF

Demonstration that fine-tuned small LLMs can exceed expert-designed components in state-of-the-art solvers

[64] Large Language Models as Optimizers PDF

[65] Evolution of Heuristics: Towards Efficient Automatic Algorithm Design Using Large Language Model PDF

[66] LLM-Assisted Reinforcement Learning: Leveraging Lightweight Large Language Model Capabilities for Efficient Task Scheduling in Multi-Cloud Environment PDF

[67] Improving Existing Optimization Algorithms with LLMs PDF

[68] LLM Agent for Hyper-Parameter Optimization PDF

[69] PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization PDF

[70] Hybrid LLM-algorithm optimization: iterated fine-tuning for combinatorial problems PDF

[71] BUFFALO: PPA-Configurable, LLM-based Buffer Tree Generation via Group Relative Policy Optimization PDF

[72] HeurAgenix: Leveraging LLMs for Solving Complex Combinatorial Optimization Challenges PDF

[73] LLMs for Cold-Start Cutting Plane Separator Configuration PDF

Table of Contents