PRO-MOF: Policy Optimization with Universal Atomistic Models for Controllable MOF Generation

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

metal-organic frameworkmaterial generationAI for sciencephysical modeling

Generating physically stable and novel metal-organic frameworks (MOFs) for inverse design that meet specific performance targets is a significant challenge. Existing generative models often struggle to explore the vast chemical and structural space effectively, leading to suboptimal solutions or mode collapse. To address this, we propose PRO-MOF, a hierarchical reinforcement learning (HRL) framework for controllable MOF generation. Our approach decouples the MOF design process into two policies: a high-level policy for proposing chemical building blocks and a low-level policy for assembling their 3D structures. By converting the deterministic Flow Matching model into a Stochastic Differential Equation (SDE), we enable the low-level policy to perform compelling exploration. The framework is optimized in a closed loop with high-fidelity physical reward signals provided by a pre-trained universal atomistic model (UMA). Furthermore, we introduce a Pass@K Group Relative Policy Optimization (GRPO) scheme that effectively balances exploration and exploitation by rewarding in-group diversity. Experiments on multiple inverse design tasks, such as maximizing CO2 working capacity and targeting specific pore diameters, show that PRO-MOF significantly outperforms existing baselines, including diffusion-based methods and genetic algorithms, in both success rate and the discovery of top-performing materials. Our work demonstrates that hierarchical reinforcement learning combined with a high-fidelity physical environment is a powerful paradigm for solving complex material discovery problems.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces PRO-MOF, a hierarchical reinforcement learning framework for controllable MOF generation targeting specific performance metrics such as CO2 working capacity. It resides in the 'Deep Reinforcement Learning for MOF Inverse Design' leaf, which contains only two papers including this work. This sparse population suggests the leaf represents an emerging rather than saturated research direction within the broader taxonomy of fifty papers spanning nine major branches. The hierarchical decomposition into high-level chemical block selection and low-level 3D assembly policies distinguishes PRO-MOF from generic RL approaches.

The taxonomy reveals that PRO-MOF's leaf sits within the 'Reinforcement Learning and Optimization-Based Inverse Design' branch, which also includes a sibling leaf on genetic algorithms and evolutionary optimization. Neighboring branches employ generative AI methods such as diffusion models and large language models, which propose structures without iterative reward-driven refinement. The scope note for PRO-MOF's leaf explicitly excludes genetic algorithms and non-RL optimization, positioning the work at the intersection of sequential decision-making and neural generation. This boundary clarifies that PRO-MOF's novelty lies in combining RL with generative modeling rather than evolutionary search.

Among twenty-five candidates examined, the hierarchical RL framework and Pass@K GRPO scheme show no clear refutation across five and ten candidates respectively, suggesting limited prior work on these specific mechanisms within the search scope. However, the SDE-based stochastic exploration for flow matching models encountered two refutable candidates among ten examined, indicating that converting deterministic flow models to stochastic processes has precedent in related literature. The analysis does not claim exhaustive coverage; these statistics reflect top-K semantic matches and citation expansion, not a comprehensive field survey.

Given the limited search scope of twenty-five candidates, the framework appears to occupy a relatively underexplored niche combining hierarchical RL with flow-based generation for MOF design. The sparse population of the taxonomy leaf and the absence of refutation for two of three contributions suggest potential novelty, though the SDE conversion technique shows overlap with existing methods. A broader literature search would be necessary to confirm whether the hierarchical policy decomposition and GRPO scheme represent substantive advances or incremental refinements of known techniques.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Controllable inverse design of metal-organic frameworks with target properties. The field has evolved into a rich landscape organized around several complementary strategies. Generative AI and deep learning methods leverage neural architectures such as diffusion models and large language models to propose novel MOF structures, while reinforcement learning and optimization-based approaches frame inverse design as a sequential decision-making problem where agents learn to navigate chemical space toward desired properties. Application-specific branches focus on gas separation and storage challenges, machine learning prediction emphasizes descriptor engineering for property forecasting, and reticular chemistry exploits topology-driven design principles. Meanwhile, defect engineering and controlled synthesis branches address structural modification and nanoarchitecture fabrication, and specialized applications explore property tuning for catalysis, sensing, and beyond. Representative works such as MOFGPT[2] and Deep Dreaming MOFs[1] illustrate how generative models can propose candidates, whereas dziner[11] and Reverse Topology Prediction[13] demonstrate topology-aware inverse design strategies. A particularly active line of work centers on deep reinforcement learning for MOF inverse design, where agents iteratively refine structures by balancing exploration of chemical space with exploitation of known high-performing motifs. PRO-MOF[0] sits squarely within this branch, employing reinforcement learning to steer the generation process toward frameworks meeting specific adsorption or separation targets. This contrasts with purely generative approaches like Multimodal Diffusion MOFs[14] that sample structures without explicit reward-driven optimization, and with application-focused studies such as Direct Air Capture[5] that prioritize experimental validation over algorithmic novelty. The main trade-off across these directions involves the tension between computational efficiency, chemical validity, and the ability to incorporate domain constraints. PRO-MOF[0] addresses this by integrating property prediction within the RL loop, positioning it alongside works like Inverse Design ZIFs[19] that similarly combine optimization with structural feasibility checks, yet differing in the granularity of control over multivariate composition and defect incorporation.

Claimed Contributions

PRO-MOF hierarchical reinforcement learning framework

5 retrieved papers

The authors introduce a hierarchical RL framework that decouples MOF design into two policies: a high-level policy for proposing chemical building blocks and a low-level policy for assembling 3D structures. The framework is optimized using reward signals from a universal atomistic model.

5 retrieved papers

Pass@K Group Relative Policy Optimization scheme

10 retrieved papers

The authors propose a Pass@K-inspired reward and advantage estimation scheme adapted from large language model training to the materials domain. This scheme promotes structural diversity and mitigates mode collapse by rewarding the generation of at least one successful candidate within a batch of diverse attempts.

10 retrieved papers

SDE-based stochastic exploration for flow matching models

Can Refute

10 retrieved papers

The authors convert the deterministic flow matching ODE into an equivalent stochastic differential equation to enable the low-level geometric policy to perform stochastic exploration, which is necessary for effective reinforcement learning-based optimization.

10 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[5] Inverse design of metalâorganic frameworks for direct air capture of CO 2 via deep reinforcement learning PDF

Parkï¼Hyun-Soo, Hyunsoo Park, Majumdar, Sauradeep, Sauradeep Majumdar, Zhang Xiaoqi, Xiaoqi Zhang, Kim, Jihan, Jihan Kim, Smit Berend, Berend Smit (2024)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

PRO-MOF hierarchical reinforcement learning framework

[61] Hierarchical Generation of Molecular Graphs using Structural Motifs PDF

Cannot Refute

[62] Scalable fragment-based 3d molecular design with reinforcement learning PDF

Cannot Refute

[63] Fragmentâbased deep molecular generation using hierarchical chemical graph representation and multiâresolution graph variational autoencoder PDF

Cannot Refute

[64] SHARP: Generating Synthesizable Molecules via Fragment-Based Hierarchical Action-Space Reinforcement Learning for Pareto Optimization PDF

Cannot Refute

[65] A Data-Driven Perspective on the Hierarchical Assembly of Molecular Structures. PDF

Cannot Refute

Contribution

Pass@K Group Relative Policy Optimization scheme

[51] Group Relative Policy Optimization for Image Captioning PDF

Cannot Refute

[52] Group-aware reinforcement learning for output diversity in large language models PDF

Cannot Refute

[53] Multi-agent diverse generative adversarial networks PDF

Cannot Refute

[54] Diverse policy optimization for structured action space PDF

Cannot Refute

[55] MATRIX: Multi-Agent Trajectory Generation with Diverse Contexts PDF

Cannot Refute

[56] Multi-Agent Reinforcement Learning with Focal Diversity Optimization PDF

Cannot Refute

[57] System neural diversity: Measuring behavioral heterogeneity in multi-agent learning PDF

Cannot Refute

[58] Learning to Cooperate with Humans using Generative Agents PDF

Cannot Refute

[59] Learning and Planning Multi-Agent Tasks via an MoE-based World Model PDF

Cannot Refute

[60] Reinforcement Learning in Generative AI: State-of-the-Art Performance PDF

Cannot Refute

Contribution

SDE-based stochastic exploration for flow matching models

[66] Flow-GRPO: Training Flow Matching Models via Online RL PDF

Can Refute

[67] Stochastic Interpolants: A Unifying Framework for Flows and Diffusions PDF

Can Refute

[68] Flow matching for stochastic linear control systems PDF

Cannot Refute

[69] An Introduction to Flow Matching and Diffusion Models PDF

Cannot Refute

[70] Error bounds for flow matching methods PDF

Cannot Refute

[71] Trajectory flow matching with applications to clinical time series modelling PDF

Cannot Refute

[72] Stochastic process learning via operator flow matching PDF

Cannot Refute

[73] FM-TS: Flow Matching for Time Series Generation PDF

Cannot Refute

[74] Explicit Flow Matching: On The Theory of Flow Matching Algorithms with Applications PDF

Cannot Refute

[75] Robust and Reliable de novo Protein Design: A Flow-Matching-Based Protein Generative Model Achieves Remarkably High Success Rates PDF

Cannot Refute

PRO-MOF: Policy Optimization with Universal Atomistic Models for Controllable MOF Generation

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[5] Inverse design of metalâorganic frameworks for direct air capture of CO 2 via deep reinforcement learning PDF

Contribution Analysis

PRO-MOF hierarchical reinforcement learning framework

[61] Hierarchical Generation of Molecular Graphs using Structural Motifs PDF

[62] Scalable fragment-based 3d molecular design with reinforcement learning PDF

[63] Fragmentâbased deep molecular generation using hierarchical chemical graph representation and multiâresolution graph variational autoencoder PDF

[64] SHARP: Generating Synthesizable Molecules via Fragment-Based Hierarchical Action-Space Reinforcement Learning for Pareto Optimization PDF

[65] A Data-Driven Perspective on the Hierarchical Assembly of Molecular Structures. PDF

Pass@K Group Relative Policy Optimization scheme

[51] Group Relative Policy Optimization for Image Captioning PDF

[52] Group-aware reinforcement learning for output diversity in large language models PDF

[53] Multi-agent diverse generative adversarial networks PDF

[54] Diverse policy optimization for structured action space PDF

[55] MATRIX: Multi-Agent Trajectory Generation with Diverse Contexts PDF

[56] Multi-Agent Reinforcement Learning with Focal Diversity Optimization PDF

[57] System neural diversity: Measuring behavioral heterogeneity in multi-agent learning PDF

[58] Learning to Cooperate with Humans using Generative Agents PDF

[59] Learning and Planning Multi-Agent Tasks via an MoE-based World Model PDF

[60] Reinforcement Learning in Generative AI: State-of-the-Art Performance PDF

SDE-based stochastic exploration for flow matching models

[66] Flow-GRPO: Training Flow Matching Models via Online RL PDF

[67] Stochastic Interpolants: A Unifying Framework for Flows and Diffusions PDF

[68] Flow matching for stochastic linear control systems PDF

[69] An Introduction to Flow Matching and Diffusion Models PDF

[70] Error bounds for flow matching methods PDF

[71] Trajectory flow matching with applications to clinical time series modelling PDF

[72] Stochastic process learning via operator flow matching PDF

[73] FM-TS: Flow Matching for Time Series Generation PDF

[74] Explicit Flow Matching: On The Theory of Flow Matching Algorithms with Applications PDF

[75] Robust and Reliable de novo Protein Design: A Flow-Matching-Based Protein Generative Model Achieves Remarkably High Success Rates PDF

Table of Contents

[5] Inverse design of metalâorganic frameworks for direct air capture of CO 2 via deep reinforcement learning PDF

[63] Fragmentâbased deep molecular generation using hierarchical chemical graph representation and multiâresolution graph variational autoencoder PDF