RECAST: Expanding the Boundaries of LLMs' Complex Instruction Following with Multi-Constraint Data

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.7 Download Report PDF

LLMComplex Instruction FollowingData synthesisReinforement Learning

Large language models (LLMs) are increasingly expected to tackle complex tasks, driven by their expanding applications and users' growing proficiency in crafting sophisticated prompts. However, as the number of explicitly stated requirements increases (particularly more than $10$ constraints), LLMs often struggle to accurately follow such complex instructions, which limits their applicability in complex real-world scenarios. To the best of our knowledge, existing datasets do not exceed 10 constraints per instance. To address this challenge, we propose RECAST, an efficient and scalable framework for synthesizing datasets where each example incorporates far more constraints than those in existing benchmarks, aiming to challenge and extend the boundaries of models’ ability to follow complex instructions. These constraints are extracted from real-world prompt-response pairs to ensure practical relevance. Using this framework, we construct RECAST- $30$ K, a large-scale, high-quality dataset comprising $30$ k instances spanning $19$ constraint types. Experimental results demonstrate that models fine-tuned on RECAST-30K substantially improve in following complex instructions while maintaining their general capabilities without degradation. Moreover, RECAST enables automatic verification of constraint satisfaction via rule-based validators for quantitative constraints and LLM-based validators for qualitative ones, the verifiability provided by RECAST enables the design of reward functions for reinforcement learning, which further boosts model performance on complex and challenging tasks.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces RECAST, a framework for synthesizing instruction-following datasets with far more constraints per instance than existing benchmarks (exceeding 10 constraints, targeting practical scenarios with 19 constraint types). It resides in the 'Constraint-Based Data Generation' leaf under 'Training Data Construction and Synthesis', alongside three sibling papers: Constraint Back-translation, RECAST Verifiable, and Conifer. This leaf represents a focused but not overcrowded research direction within the broader taxonomy of 50 papers across 22 leaf nodes, indicating moderate activity in constraint-based data synthesis methods.

The taxonomy reveals neighboring leaves such as 'Iterative and Refinement-Based Generation' (automated complexity expansion) and 'Long-Form and Multi-Constraint Datasets' (specialized corpora for extended text). RECAST's approach diverges from iterative refinement methods by emphasizing static constraint extraction from real-world prompt-response pairs, aligning more closely with verification-oriented synthesis. The broader 'Training Data Construction and Synthesis' branch contains six papers total, suggesting this is an active but not saturated area. Related branches like 'Evaluation Benchmarks and Metrics' (14 papers) and 'Training Methods and Optimization' (7 papers) indicate the field prioritizes assessment and optimization alongside data generation.

Among 30 candidates examined, the RECAST framework (Contribution 1) shows one refutable candidate out of 10 examined, suggesting some overlap with prior constraint-based synthesis work. The RECAST-30K dataset (Contribution 2) and RLVC training method (Contribution 3) each examined 10 candidates with zero refutations, indicating these contributions appear more distinct within the limited search scope. The framework's emphasis on scaling beyond 10 constraints per instance and extracting constraints from real-world data may differentiate it from existing synthesis pipelines, though the single refutable match warrants attention to prior constraint extraction techniques.

Based on the top-30 semantic matches examined, the work appears to occupy a moderately explored niche within constraint-based data generation. The taxonomy structure suggests the field is actively developing evaluation benchmarks and training methods, but data synthesis approaches remain less saturated. The analysis does not cover exhaustive literature review or domain-specific constraint generation methods outside the examined candidates, leaving open questions about overlap with specialized constraint frameworks or industry-scale synthesis pipelines not captured in this search scope.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: complex instruction following with multiple constraints. This field addresses how language models can satisfy diverse, simultaneous requirements—ranging from format and length restrictions to content and stylistic rules—within a single response. The taxonomy organizes research into several major branches: Training Data Construction and Synthesis focuses on generating high-quality constraint-rich examples, often through automated pipelines or back-translation techniques (e.g., Constraint Back-translation[3], Conifer[7]); Training Methods and Optimization explores curriculum strategies and specialized loss functions; Evaluation Benchmarks and Metrics develops testbeds that measure adherence to multi-dimensional constraints (e.g., FollowBench[17], CFBench[23]); Inference-Time Strategies and Self-Correction examines runtime verification and iterative refinement; and additional branches cover mechanistic analysis, generalization, domain-specific applications, and multi-task architectures. Together, these branches reflect a maturing effort to move beyond single-objective instruction tuning toward systems that juggle competing or layered requirements. Within the data construction landscape, a handful of works emphasize constraint-based generation to produce training corpora that stress-test models on verifiable rules. RECAST[0] sits squarely in this cluster, proposing methods to synthesize instruction–response pairs where constraints are explicit and checkable, much like RECAST Verifiable[6] and Conifer[7]. Compared to Constraint Back-translation[3], which reverses the generation process to ensure coverage, RECAST[0] prioritizes scalable synthesis with built-in verification hooks. Meanwhile, evaluation-focused efforts such as LIFEBench[4] and Multi-Dimensional Constraint[5] highlight the need for rigorous benchmarks that go beyond surface-level metrics. Open questions remain around balancing constraint diversity with data efficiency, and whether models trained on synthetic constraint-heavy data generalize to real-world multi-constraint scenarios. RECAST[0] contributes to this ongoing dialogue by offering a framework that bridges data synthesis and verifiability, positioning itself as a practical tool for scaling constraint-aware instruction tuning.

Claimed Contributions

RECAST framework for synthesizing multi-constraint instruction-following datasets

Can Refute

10 retrieved papers

The authors introduce RECAST, a data-synthesis framework that systematically mines and verifies both rule-based and model-based constraints from existing prompt-response pairs. This framework enables the construction of instruction-following datasets with unprecedented constraint complexity, addressing limitations in existing benchmarks that typically contain fewer than 10 constraints per instance.

10 retrieved papers

Can Refute

RECAST-30K dataset with 30k instances spanning 19 constraint types

10 retrieved papers

The authors release RECAST-30K, a large-scale dataset constructed using the RECAST framework. This dataset contains 30,000 training instances with diverse, verifiable constraints across 19 types, designed to benchmark and improve complex instruction-following performance in language models.

10 retrieved papers

RLVC reinforcement learning method using constraint-specific rewards

10 retrieved papers

The authors propose RLVC (Reinforcement Learning with Verifiable Constraints), which exploits the verifiable nature of constraints in RECAST-30K to provide fine-grained, per-constraint reward signals during policy optimization. This method treats each constraint as a separate optimization target, enabling more effective learning for complex multi-constraint scenarios.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[3] Constraint back-translation improves complex instruction following of large language models PDF

Yunjia Qi, Hao Peng, Xiaozhi Wang, Bin Xu, Lei Hou, Juanzi Li (2025)

[6] RECAST: Strengthening LLMs' Complex Instruction Following with Constraint-Verifiable Data PDF

W Liu, Z Guo, M Xie, J Xu, Z Huang, M Tian (2025)

[7] Conifer: Improving complex constrained instruction-following ability of large language models PDF

Sun Hao-Ran, Liu Li-xin, Li Junjie, Lixin Liu, Wang Feng-Yu, Junjie Li, Dong Bao-hua, Fengyu Wang, Lin Ran, Baohua Dong, Ran Lin, Ruohui Huang (2024)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

RECAST framework for synthesizing multi-constraint instruction-following datasets

[54] UltraIF: Advancing Instruction Following from the Wild PDF

Can Refute

[2] Suri: Multi-constraint instruction following for long-form text generation PDF

Cannot Refute

[6] RECAST: Strengthening LLMs' Complex Instruction Following with Constraint-Verifiable Data PDF

Cannot Refute

[51] Infobench: Evaluating instruction following ability in large language models PDF

Cannot Refute

[52] Codeif-bench: Evaluating instruction-following capabilities of large language models in interactive code generation PDF

Cannot Refute

[53] DecIF: Improving Instruction-Following through Meta-Decomposition PDF

Cannot Refute

[55] SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis PDF

Cannot Refute

[56] Towards Better Instruction Following Retrieval Models PDF

Cannot Refute

[57] Mm-ifengine: Towards multimodal instruction following PDF

Cannot Refute

[58] Evaluating the Instruction-following Abilities of Language Models using Knowledge Tasks PDF

Cannot Refute

Contribution

RECAST-30K dataset with 30k instances spanning 19 constraint types

[3] Constraint back-translation improves complex instruction following of large language models PDF

Cannot Refute

[6] RECAST: Strengthening LLMs' Complex Instruction Following with Constraint-Verifiable Data PDF

Cannot Refute

[7] Conifer: Improving complex constrained instruction-following ability of large language models PDF

Cannot Refute

[10] EIFBENCH: Extremely Complex Instruction Following Benchmark for Large Language Models PDF

Cannot Refute

[16] Benchmarking Complex Instruction-Following with Multiple Constraints Composition PDF

Cannot Refute

[17] Followbench: A multi-level fine-grained constraints following benchmark for large language models PDF

Cannot Refute

[51] Infobench: Evaluating instruction following ability in large language models PDF

Cannot Refute

[68] Unnatural instructions: Tuning language models with (almost) no human labor PDF

Cannot Refute

[69] Cif-bench: A chinese instruction-following benchmark for evaluating the generalizability of large language models PDF

Cannot Refute

[70] WizardLM: Empowering large pre-trained language models to follow complex instructions PDF

Cannot Refute

Contribution

RLVC reinforcement learning method using constraint-specific rewards

[6] RECAST: Strengthening LLMs' Complex Instruction Following with Constraint-Verifiable Data PDF

Cannot Refute

[59] Efficient multi-task reinforcement learning via task-specific action correction PDF

Cannot Refute

[60] Omni-thinker: Scaling cross-domain generalization in llms via multi-task rl with hybrid rewards PDF

Cannot Refute

[61] The perfect blend: Redefining rlhf with mixture of judges PDF

Cannot Refute

[62] GR (1)-guided deep reinforcement learning for multi-task motion planning under a stochastic environment PDF

Cannot Refute

[63] UCP: a unified framework for code generation with pseudocode-based multi-task learning and reinforcement alignment: Y. Wen et al. PDF

Cannot Refute

[64] A hierarchical compliance-based contextual policy search for robotic manipulation tasks with multiple objectives PDF

Cannot Refute

[65] Multi Task Inverse Reinforcement Learning for Common Sense Reward PDF

Cannot Refute

[66] Multi-Level Aware Preference Learning: Enhancing RLHF for Complex Multi-Instruction Tasks PDF

Cannot Refute

[67] Light-IF: Endowing LLMs with Generalizable Reasoning via Preview and Self-Checking for Complex Instruction Following PDF

Cannot Refute

RECAST: Expanding the Boundaries of LLMs' Complex Instruction Following with Multi-Constraint Data

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[3] Constraint back-translation improves complex instruction following of large language models PDF

[6] RECAST: Strengthening LLMs' Complex Instruction Following with Constraint-Verifiable Data PDF

[7] Conifer: Improving complex constrained instruction-following ability of large language models PDF

Contribution Analysis

RECAST framework for synthesizing multi-constraint instruction-following datasets

[54] UltraIF: Advancing Instruction Following from the Wild PDF

[2] Suri: Multi-constraint instruction following for long-form text generation PDF

[6] RECAST: Strengthening LLMs' Complex Instruction Following with Constraint-Verifiable Data PDF

[51] Infobench: Evaluating instruction following ability in large language models PDF

[52] Codeif-bench: Evaluating instruction-following capabilities of large language models in interactive code generation PDF

[53] DecIF: Improving Instruction-Following through Meta-Decomposition PDF

[55] SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis PDF

[56] Towards Better Instruction Following Retrieval Models PDF

[57] Mm-ifengine: Towards multimodal instruction following PDF

[58] Evaluating the Instruction-following Abilities of Language Models using Knowledge Tasks PDF

RECAST-30K dataset with 30k instances spanning 19 constraint types

[3] Constraint back-translation improves complex instruction following of large language models PDF

[6] RECAST: Strengthening LLMs' Complex Instruction Following with Constraint-Verifiable Data PDF

[7] Conifer: Improving complex constrained instruction-following ability of large language models PDF

[10] EIFBENCH: Extremely Complex Instruction Following Benchmark for Large Language Models PDF

[16] Benchmarking Complex Instruction-Following with Multiple Constraints Composition PDF

[17] Followbench: A multi-level fine-grained constraints following benchmark for large language models PDF

[51] Infobench: Evaluating instruction following ability in large language models PDF

[68] Unnatural instructions: Tuning language models with (almost) no human labor PDF

[69] Cif-bench: A chinese instruction-following benchmark for evaluating the generalizability of large language models PDF

[70] WizardLM: Empowering large pre-trained language models to follow complex instructions PDF

RLVC reinforcement learning method using constraint-specific rewards

[6] RECAST: Strengthening LLMs' Complex Instruction Following with Constraint-Verifiable Data PDF

[59] Efficient multi-task reinforcement learning via task-specific action correction PDF

[60] Omni-thinker: Scaling cross-domain generalization in llms via multi-task rl with hybrid rewards PDF

[61] The perfect blend: Redefining rlhf with mixture of judges PDF

[62] GR (1)-guided deep reinforcement learning for multi-task motion planning under a stochastic environment PDF

[63] UCP: a unified framework for code generation with pseudocode-based multi-task learning and reinforcement alignment: Y. Wen et al. PDF

[64] A hierarchical compliance-based contextual policy search for robotic manipulation tasks with multiple objectives PDF

[65] Multi Task Inverse Reinforcement Learning for Common Sense Reward PDF

[66] Multi-Level Aware Preference Learning: Enhancing RLHF for Complex Multi-Instruction Tasks PDF

[67] Light-IF: Endowing LLMs with Generalizable Reasoning via Preview and Self-Checking for Complex Instruction Following PDF

Table of Contents