PerfGuard: A Performance-Aware Agent for Visual Content Generation

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

AgentLarge Language ModelImage GenerationImage Editing

The advancement of Large Language Model (LLM)-powered agents has enabled automated task processing through reasoning and tool invocation capabilities. However, existing frameworks often operate under the idealized assumption that tool executions are invariably successful, relying solely on textual descriptions that fail to distinguish precise performance boundaries and cannot adapt to iterative tool updates. This gap introduces uncertainty in planning and execution, particularly in domains like visual content generation (AIGC), where nuanced tool performance significantly impacts outcomes. To address this, we propose PerfGuard, a performance-aware agent framework for visual content generation that systematically models tool performance boundaries and integrates them into task planning and scheduling. Our framework introduces three core mechanisms: (1) Performance-Aware Selection Modeling (PASM), which replaces generic tool descriptions with a multi-dimensional scoring system based on fine-grained performance evaluations; (2) Adaptive Preference Update (APU), which dynamically optimizes tool selection by comparing theoretical rankings with actual execution rankings; and (3) Capability-Aligned Planning Optimization (CAPO), which guides the planner to generate subtasks aligned with performance-aware strategies. Experimental comparisons against state-of-the-art methods demonstrate PerfGuard’s advantages in tool selection accuracy, execution reliability, and alignment with user intent, validating its robustness and practical utility for complex AIGC tasks.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces PerfGuard, a framework that replaces generic tool descriptions with multi-dimensional performance scoring and adaptive preference updates for visual content generation agents. According to the taxonomy, it occupies the 'Multi-Dimensional Performance Modeling and Adaptive Selection' leaf under 'Performance-Aware Agent Frameworks for Visual Generation'. Notably, this leaf contains only the original paper itself, with no sibling papers identified, suggesting this specific combination of fine-grained performance modeling and dynamic optimization represents a relatively sparse research direction within the broader field of agentic visual generation.

The taxonomy reveals that the broader 'Performance-Aware Agent Frameworks' branch contains one sibling leaf focused on 'Agentic Super-Resolution with Customized Pipeline Profiling', indicating that performance-aware approaches exist but target different visual tasks. The neighboring 'General-Purpose Agentic Systems' branch encompasses three leaves addressing minimal agentic behavior, educational content generation, and creative industry applications, all of which incorporate tool selection without specialized performance modeling. This structural positioning suggests PerfGuard bridges a gap between general-purpose tool selection frameworks and domain-specific optimization, carving out a niche that emphasizes explicit performance boundaries rather than relying on textual descriptions or static planning.

Among the 21 candidates examined through semantic search and citation expansion, none were found to refute the three core contributions. For Performance-Aware Selection Modeling (PASM), 10 candidates were examined with zero refutable overlaps; Adaptive Preference Update (APU) examined 1 candidate with no refutations; and Capability-Aligned Planning Optimization (CAPO) examined 10 candidates, also with zero refutations. This limited search scope suggests that within the top-K semantic matches analyzed, no prior work explicitly combines multi-dimensional scoring, dynamic preference updates, and capability-aligned planning in the same manner, though the analysis does not claim exhaustive coverage of all potentially relevant literature.

Based on the taxonomy structure and the limited literature search, the work appears to address an underexplored intersection of performance modeling and adaptive tool selection for visual generation. The absence of sibling papers in its taxonomy leaf and zero refutations across 21 candidates examined indicate potential novelty, though the scope remains constrained to top-K semantic matches and does not encompass broader manual surveys or domain-specific venues that might reveal closer prior work.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: performance-aware tool selection for visual content generation agents. The field organizes around three main branches. The first branch, Performance-Aware Agent Frameworks for Visual Generation, focuses on systems that explicitly model and optimize performance dimensions—such as quality, latency, and resource consumption—when selecting among multiple generative tools. The second branch, General-Purpose Agentic Systems with Tool Selection Capabilities, encompasses broader agent architectures that incorporate tool selection as one component among many reasoning and planning tasks, often without specialized performance modeling. The third branch, Multi-Agent Detection and Analysis Systems for Visual Content, addresses collaborative or multi-agent setups where detection, analysis, or verification tasks complement generation, sometimes requiring coordination across agents with different capabilities. Representative works like 4kagent[1] and UniShield[2] illustrate how agents can integrate tool selection with broader reasoning or safety mechanisms, while AI Green Creativity[3] highlights sustainability considerations in generative workflows. A particularly active line of work explores the trade-offs between generation quality, computational cost, and environmental impact, as seen in discussions around green AI practices[3] and the broader vision for agentic AI systems[4]. Another emerging theme involves personalized or adaptive selection strategies, where agents tailor tool choices to user preferences or educational contexts[5]. Within this landscape, PerfGuard[0] sits squarely in the Performance-Aware Agent Frameworks branch, emphasizing multi-dimensional performance modeling and adaptive selection. Compared to more general-purpose systems like 4kagent[1], which balances tool selection with diverse reasoning tasks, PerfGuard[0] concentrates on explicitly optimizing performance metrics during the selection process. This focus distinguishes it from works that treat tool selection as a secondary concern, positioning it among efforts that prioritize measurable efficiency and quality guarantees in visual generation pipelines.

Claimed Contributions

Performance-Aware Selection Modeling (PASM)

10 retrieved papers

A mechanism that systematically models tool performance boundaries using multi-dimensional scoring across specific capability dimensions (e.g., color, shape, texture for generation; addition, removal, replacement for editing) rather than relying on generic textual descriptions. This enables precise task-tool matching by computing weighted suitability scores for tool selection.

10 retrieved papers

Adaptive Preference Update (APU)

1 retrieved paper

A feedback-driven mechanism that iteratively refines the tool performance boundary matrix by comparing predicted tool rankings with observed execution performance. It employs an exploration-exploitation strategy and adjusts performance scores based on the difference between theoretical and actual rankings to improve real-world adaptability.

1 retrieved paper

Capability-Aligned Planning Optimization (CAPO)

10 retrieved papers

An optimization mechanism that extends Step-aware Preference Optimization to align the Planner's autoregressive decision-making with tool performance boundaries. It generates multiple candidate subtasks per step, evaluates them using a Decision Performance Estimator, and optimizes planning through stepwise supervision to ensure consistency with performance-aware tool selection.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Performance-Aware Selection Modeling (PASM)

[7] Mllmguard: A multi-dimensional safety evaluation suite for multimodal large language models PDF

Cannot Refute

[8] A survey of data quality measurement and monitoring tools PDF

Cannot Refute

[9] Performance metrics in multi-objective optimization PDF

Cannot Refute

[10] The Multi-Dimensional Landscape of Graph Drawing Metrics PDF

Cannot Refute

[11] Matlab GUI-based Tool to Determine Performance Metrics of Physical Unclonable Functions PDF

Cannot Refute

[12] A systematic review on performance evaluation metric selection method for IoT-based applications PDF

Cannot Refute

[13] Calculating Software's Energy Use and Carbon Emissions: A Survey of the State of Art, Challenges, and the Way Ahead PDF

Cannot Refute

[14] A study on the relationships of classifier performance metrics PDF

Cannot Refute

[15] A survey of OCR evaluation tools and metrics PDF

Cannot Refute

[16] Machine learning for predicting DataCube atomic force microscope (AFM)âMultiDAT-AFM PDF

Cannot Refute

Contribution

Adaptive Preference Update (APU)

[6] Correlation of Test Sets and Actual Clinical Performance. PDF

Cannot Refute

Contribution

Capability-Aligned Planning Optimization (CAPO)

[17] PARQO: Penalty-Aware Robust Plan Selection in Query Optimization PDF

Cannot Refute

[18] Tool-planner: Task planning with clusters across multiple tools PDF

Cannot Refute

[19] Towards the autonomous optimization of urban logistics: Training generative ai with scientific tools via agentic digital twins and model context protocol PDF

Cannot Refute

[20] HyperPlan: A Framework for Motion Planning Algorithm Selection and Parameter Optimization PDF

Cannot Refute

[21] Agentic Feature Augmentation: Unifying Selection and Generation with Teaming, Planning, and Memories PDF

Cannot Refute

[22] T-Eval: Evaluating the Tool Utilization Capability Step by Step PDF

Cannot Refute

[23] Online Tool Selection with Learned Grasp Prediction Models PDF

Cannot Refute

[24] The art of application performance testing: from strategy to tools PDF

Cannot Refute

[25] â¦ and scaling up testing for human papillomavirus as part of a comprehensive programme for prevention and control of cervical cancer: a step-by-step guide PDF

Cannot Refute

[26] Biodiversity conservation planning tools: present status and challenges for the future PDF

Cannot Refute

PerfGuard: A Performance-Aware Agent for Visual Content Generation

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

Contribution Analysis

Performance-Aware Selection Modeling (PASM)

[7] Mllmguard: A multi-dimensional safety evaluation suite for multimodal large language models PDF

[8] A survey of data quality measurement and monitoring tools PDF

[9] Performance metrics in multi-objective optimization PDF

[10] The Multi-Dimensional Landscape of Graph Drawing Metrics PDF

[11] Matlab GUI-based Tool to Determine Performance Metrics of Physical Unclonable Functions PDF

[12] A systematic review on performance evaluation metric selection method for IoT-based applications PDF

[13] Calculating Software's Energy Use and Carbon Emissions: A Survey of the State of Art, Challenges, and the Way Ahead PDF

[14] A study on the relationships of classifier performance metrics PDF

[15] A survey of OCR evaluation tools and metrics PDF

[16] Machine learning for predicting DataCube atomic force microscope (AFM)âMultiDAT-AFM PDF

Adaptive Preference Update (APU)

[6] Correlation of Test Sets and Actual Clinical Performance. PDF

Capability-Aligned Planning Optimization (CAPO)

[17] PARQO: Penalty-Aware Robust Plan Selection in Query Optimization PDF

[18] Tool-planner: Task planning with clusters across multiple tools PDF

[19] Towards the autonomous optimization of urban logistics: Training generative ai with scientific tools via agentic digital twins and model context protocol PDF

[20] HyperPlan: A Framework for Motion Planning Algorithm Selection and Parameter Optimization PDF

[21] Agentic Feature Augmentation: Unifying Selection and Generation with Teaming, Planning, and Memories PDF

[22] T-Eval: Evaluating the Tool Utilization Capability Step by Step PDF

[23] Online Tool Selection with Learned Grasp Prediction Models PDF

[24] The art of application performance testing: from strategy to tools PDF

[25] â¦ and scaling up testing for human papillomavirus as part of a comprehensive programme for prevention and control of cervical cancer: a step-by-step guide PDF

[26] Biodiversity conservation planning tools: present status and challenges for the future PDF

Table of Contents

[16] Machine learning for predicting DataCube atomic force microscope (AFM)âMultiDAT-AFM PDF

[25] â¦ and scaling up testing for human papillomavirus as part of a comprehensive programme for prevention and control of cervical cancer: a step-by-step guide PDF