Refining Hybrid Genetic Search for CVRP via Reinforcement Learning-Finetuned LLM

ICLR 2026 Conference SubmissionAnonymous Authors
Capacitated Vehicle RoutingLarge Language ModelReinforcement Finetuning
Abstract:

While large language models (LLMs) are increasingly used as automated heuristic designers for vehicle routing problems (VRPs), current state-of-the-art methods predominantly rely on prompting massive, general-purpose models like GPT-4. This work challenges that paradigm by demonstrating that a smaller, specialized LLM, when meticulously fine-tuned, can generate components that surpass expert-crafted heuristics within advanced solvers. We propose RFTHGS, a novel Reinforcement learning (RL) framework for Fine-Tuning a compact LLM to generate high-performance crossover operators for the Hybrid Genetic Search (HGS) solver, applied to the Capacitated VRP (CVRP). Our method employs a multi-tiered, curriculum-based reward function that progressively guides the LLM to master generating first compilable, then executable, and finally, superior-performing operators that exceed human expert designs. This is coupled with an operator caching mechanism that discourages plagiarism and promotes diversity during training. Comprehensive experiments show that our fine-tuned LLM produces crossover operators which significantly outperform the expert-designed ones in HGS. The performance advantage remains consistent, generalizing from small-scale instances to large-scale problems with up to 1000 nodes. Furthermore, RFTHGS exceeds the performance of leading neuro-combinatorial baselines, prompt-based methods, and commercial LLMs such as GPT-4o and GPT-4o-mini.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes RFTHGS, a reinforcement learning framework that fine-tunes a compact language model to generate crossover operators for the Hybrid Genetic Search solver applied to the Capacitated VRP. This work resides in the 'Heuristic Generation and Automated Design' leaf of the taxonomy, which contains only three papers total. This is a notably sparse research direction within the broader field of automated heuristic design for VRP, suggesting the paper enters relatively unexplored territory focused on generating rather than selecting algorithmic components.

The taxonomy reveals that most related work concentrates in neighboring branches: 'Operator and Heuristic Selection via RL' contains five papers focused on choosing among predefined operators, while 'Construction Heuristics' and 'Improvement Heuristics' branches collectively house over twenty papers building or refining solutions directly. The sibling papers in the same leaf (Reevo LLM and Automated Metaheuristics Design) emphasize code generation via prompting large models or evolutionary search over broad metaheuristic frameworks, whereas this work targets fine-tuning smaller models for specific solver components within an established genetic search architecture.

Among twenty-three candidates examined through semantic search and citation expansion, none clearly refute any of the three main contributions. For the core RFTHGS framework, eight candidates were examined with zero refutable overlaps; the multi-tiered curriculum reward mechanism examined five candidates with no prior work providing the same training strategy; and the demonstration that fine-tuned small LLMs can exceed expert designs examined ten candidates without finding contradictory evidence. This absence of refutation across all contributions suggests the specific combination of fine-tuning compact LLMs for operator generation within HGS has not been directly addressed in the limited search scope.

Based on the top-twenty-three semantic matches examined, the work appears to occupy a distinct position combining fine-tuning methodology with operator-level component generation for classical solvers. The sparse population of the 'Heuristic Generation' leaf and the lack of refutable prior work across all contributions indicate novelty within the examined scope, though the limited search scale means potentially relevant work outside these candidates remains unassessed.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
23
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: Automated heuristic design for vehicle routing problems using reinforcement learning. The field has evolved into a rich landscape organized around several complementary directions. Construction heuristics via deep RL (e.g., Attention Routing[32], Dynamic Attention VRP[15]) focus on building solutions from scratch using neural policies, while improvement and hybrid heuristics (Learning TwoOpt[13], Hybrid Deep Local[47]) refine existing routes through learned operators. Hyper-heuristic and meta-level frameworks (Deep RL Hyper-Heuristic[10], RL Hyper-Heuristic HVRP[9]) operate at a higher abstraction, selecting or generating low-level heuristics adaptively. Parallel branches address dynamic and stochastic settings (Stochastic VRP RL[16], Dynamic Uncertain VRP[8]), electric vehicle constraints (Q-Learning Electric VRP[34], Last-Mile Electric VRP[36]), and specialized domains such as drone routing (Drone Vehicle Routing[3], SmartPathfinder Drones[6]). Methodological advances explore algorithmic innovations like inverse RL (Inverse RL TwoOpt[35]) and policy optimization (Policy Optimisation VRP[38]), while surveys and comparative studies (VRP RL Advancements[7], Metaheuristic RL Review[33]) synthesize progress across these threads. A particularly active line of work centers on heuristic generation and automated design, where systems learn to construct or discover novel heuristics rather than merely applying fixed operators. Reevo LLM[11] and Automated Metaheuristics Design[29] exemplify efforts to automate the design process itself, leveraging large language models or evolutionary search to produce problem-specific strategies. Hybrid Genetic RL CVRP[0] sits squarely within this branch, combining genetic algorithms with reinforcement learning to evolve heuristics for the capacitated VRP. Compared to Reevo LLM[11], which emphasizes code generation and symbolic reasoning, Hybrid Genetic RL CVRP[0] integrates evolutionary operators more tightly with RL-based policy refinement. Meanwhile, Automated Metaheuristics Design[29] explores broader metaheuristic frameworks, whereas Hybrid Genetic RL CVRP[0] targets a specific hybridization strategy for CVRP instances. These contrasting emphases highlight ongoing questions about the balance between generality and problem-specific tuning in automated heuristic design.

Claimed Contributions

RFTHGS: RL framework for fine-tuning LLMs to generate crossover operators for HGS

The authors propose RFTHGS, a reinforcement learning framework that fine-tunes small language models (14B parameters) to automatically generate crossover operators for the Hybrid Genetic Search solver. This framework uses solution quality as feedback to guide the LLM toward producing operators that outperform expert-designed ones.

8 retrieved papers
Multi-tiered curriculum-based reward function with operator caching

The authors develop a hierarchical reward function that decomposes learning into three progressive stages: compilability, executability, and performance superiority. They also introduce an operator caching mechanism using Abstract Syntax Trees to prevent plagiarism and encourage diverse operator exploration during training.

5 retrieved papers
Demonstration that fine-tuned small LLMs can exceed expert-designed components in state-of-the-art solvers

The authors demonstrate that a specialized 14B-parameter LLM, when fine-tuned with their RL framework, can generate crossover operators that outperform both human-designed operators and those from much larger general-purpose models like GPT-4o, establishing a new paradigm for automated heuristic design in combinatorial optimization.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

RFTHGS: RL framework for fine-tuning LLMs to generate crossover operators for HGS

The authors propose RFTHGS, a reinforcement learning framework that fine-tunes small language models (14B parameters) to automatically generate crossover operators for the Hybrid Genetic Search solver. This framework uses solution quality as feedback to guide the LLM toward producing operators that outperform expert-designed ones.

Contribution

Multi-tiered curriculum-based reward function with operator caching

The authors develop a hierarchical reward function that decomposes learning into three progressive stages: compilability, executability, and performance superiority. They also introduce an operator caching mechanism using Abstract Syntax Trees to prevent plagiarism and encourage diverse operator exploration during training.

Contribution

Demonstration that fine-tuned small LLMs can exceed expert-designed components in state-of-the-art solvers

The authors demonstrate that a specialized 14B-parameter LLM, when fine-tuned with their RL framework, can generate crossover operators that outperform both human-designed operators and those from much larger general-purpose models like GPT-4o, establishing a new paradigm for automated heuristic design in combinatorial optimization.

Refining Hybrid Genetic Search for CVRP via Reinforcement Learning-Finetuned LLM | Novelty Validation