PRO-MOF: Policy Optimization with Universal Atomistic Models for Controllable MOF Generation

ICLR 2026 Conference SubmissionAnonymous Authors
metal-organic frameworkmaterial generationAI for sciencephysical modeling
Abstract:

Generating physically stable and novel metal-organic frameworks (MOFs) for inverse design that meet specific performance targets is a significant challenge. Existing generative models often struggle to explore the vast chemical and structural space effectively, leading to suboptimal solutions or mode collapse. To address this, we propose PRO-MOF, a hierarchical reinforcement learning (HRL) framework for controllable MOF generation. Our approach decouples the MOF design process into two policies: a high-level policy for proposing chemical building blocks and a low-level policy for assembling their 3D structures. By converting the deterministic Flow Matching model into a Stochastic Differential Equation (SDE), we enable the low-level policy to perform compelling exploration. The framework is optimized in a closed loop with high-fidelity physical reward signals provided by a pre-trained universal atomistic model (UMA). Furthermore, we introduce a Pass@K Group Relative Policy Optimization (GRPO) scheme that effectively balances exploration and exploitation by rewarding in-group diversity. Experiments on multiple inverse design tasks, such as maximizing CO2 working capacity and targeting specific pore diameters, show that PRO-MOF significantly outperforms existing baselines, including diffusion-based methods and genetic algorithms, in both success rate and the discovery of top-performing materials. Our work demonstrates that hierarchical reinforcement learning combined with a high-fidelity physical environment is a powerful paradigm for solving complex material discovery problems.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces PRO-MOF, a hierarchical reinforcement learning framework for controllable MOF generation targeting specific performance metrics such as CO2 working capacity. It resides in the 'Deep Reinforcement Learning for MOF Inverse Design' leaf, which contains only two papers including this work. This sparse population suggests the leaf represents an emerging rather than saturated research direction within the broader taxonomy of fifty papers spanning nine major branches. The hierarchical decomposition into high-level chemical block selection and low-level 3D assembly policies distinguishes PRO-MOF from generic RL approaches.

The taxonomy reveals that PRO-MOF's leaf sits within the 'Reinforcement Learning and Optimization-Based Inverse Design' branch, which also includes a sibling leaf on genetic algorithms and evolutionary optimization. Neighboring branches employ generative AI methods such as diffusion models and large language models, which propose structures without iterative reward-driven refinement. The scope note for PRO-MOF's leaf explicitly excludes genetic algorithms and non-RL optimization, positioning the work at the intersection of sequential decision-making and neural generation. This boundary clarifies that PRO-MOF's novelty lies in combining RL with generative modeling rather than evolutionary search.

Among twenty-five candidates examined, the hierarchical RL framework and Pass@K GRPO scheme show no clear refutation across five and ten candidates respectively, suggesting limited prior work on these specific mechanisms within the search scope. However, the SDE-based stochastic exploration for flow matching models encountered two refutable candidates among ten examined, indicating that converting deterministic flow models to stochastic processes has precedent in related literature. The analysis does not claim exhaustive coverage; these statistics reflect top-K semantic matches and citation expansion, not a comprehensive field survey.

Given the limited search scope of twenty-five candidates, the framework appears to occupy a relatively underexplored niche combining hierarchical RL with flow-based generation for MOF design. The sparse population of the taxonomy leaf and the absence of refutation for two of three contributions suggest potential novelty, though the SDE conversion technique shows overlap with existing methods. A broader literature search would be necessary to confirm whether the hierarchical policy decomposition and GRPO scheme represent substantive advances or incremental refinements of known techniques.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
25
Contribution Candidate Papers Compared
2
Refutable Paper

Research Landscape Overview

Core task: Controllable inverse design of metal-organic frameworks with target properties. The field has evolved into a rich landscape organized around several complementary strategies. Generative AI and deep learning methods leverage neural architectures such as diffusion models and large language models to propose novel MOF structures, while reinforcement learning and optimization-based approaches frame inverse design as a sequential decision-making problem where agents learn to navigate chemical space toward desired properties. Application-specific branches focus on gas separation and storage challenges, machine learning prediction emphasizes descriptor engineering for property forecasting, and reticular chemistry exploits topology-driven design principles. Meanwhile, defect engineering and controlled synthesis branches address structural modification and nanoarchitecture fabrication, and specialized applications explore property tuning for catalysis, sensing, and beyond. Representative works such as MOFGPT[2] and Deep Dreaming MOFs[1] illustrate how generative models can propose candidates, whereas dziner[11] and Reverse Topology Prediction[13] demonstrate topology-aware inverse design strategies. A particularly active line of work centers on deep reinforcement learning for MOF inverse design, where agents iteratively refine structures by balancing exploration of chemical space with exploitation of known high-performing motifs. PRO-MOF[0] sits squarely within this branch, employing reinforcement learning to steer the generation process toward frameworks meeting specific adsorption or separation targets. This contrasts with purely generative approaches like Multimodal Diffusion MOFs[14] that sample structures without explicit reward-driven optimization, and with application-focused studies such as Direct Air Capture[5] that prioritize experimental validation over algorithmic novelty. The main trade-off across these directions involves the tension between computational efficiency, chemical validity, and the ability to incorporate domain constraints. PRO-MOF[0] addresses this by integrating property prediction within the RL loop, positioning it alongside works like Inverse Design ZIFs[19] that similarly combine optimization with structural feasibility checks, yet differing in the granularity of control over multivariate composition and defect incorporation.

Claimed Contributions

PRO-MOF hierarchical reinforcement learning framework

The authors introduce a hierarchical RL framework that decouples MOF design into two policies: a high-level policy for proposing chemical building blocks and a low-level policy for assembling 3D structures. The framework is optimized using reward signals from a universal atomistic model.

5 retrieved papers
Pass@K Group Relative Policy Optimization scheme

The authors propose a Pass@K-inspired reward and advantage estimation scheme adapted from large language model training to the materials domain. This scheme promotes structural diversity and mitigates mode collapse by rewarding the generation of at least one successful candidate within a batch of diverse attempts.

10 retrieved papers
SDE-based stochastic exploration for flow matching models

The authors convert the deterministic flow matching ODE into an equivalent stochastic differential equation to enable the low-level geometric policy to perform stochastic exploration, which is necessary for effective reinforcement learning-based optimization.

10 retrieved papers
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

PRO-MOF hierarchical reinforcement learning framework

The authors introduce a hierarchical RL framework that decouples MOF design into two policies: a high-level policy for proposing chemical building blocks and a low-level policy for assembling 3D structures. The framework is optimized using reward signals from a universal atomistic model.

Contribution

Pass@K Group Relative Policy Optimization scheme

The authors propose a Pass@K-inspired reward and advantage estimation scheme adapted from large language model training to the materials domain. This scheme promotes structural diversity and mitigates mode collapse by rewarding the generation of at least one successful candidate within a batch of diverse attempts.

Contribution

SDE-based stochastic exploration for flow matching models

The authors convert the deterministic flow matching ODE into an equivalent stochastic differential equation to enable the low-level geometric policy to perform stochastic exploration, which is necessary for effective reinforcement learning-based optimization.