Fair Policy Aggregation from Standard Policy Optimization
Overview
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors develop an algorithm that achieves quantile-fair policy aggregation by making O(n) calls to a policy optimization subroutine, avoiding explicit dependence on the MDP's state or action space size. This contrasts with prior methods requiring full access to transition probabilities.
The authors propose using a distribution over policies induced by individually optimal policies (the optimal occupancy distribution) rather than the uniform distribution over all policies. This choice enables tractable quantile estimation and avoids exponential sample complexity issues.
The authors design an algorithm based on the multiplicative weights update method that computes quantile-fair policies efficiently, requiring only O(log n) policy evaluations instead of solving large linear programs over states and actions.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
[15] From fair solutions to compromise solutions in multi-objective deep reinforcement learning PDF
[36] Fairness in Preference-based Reinforcement Learning PDF
Contribution Analysis
Detailed comparisons for each claimed contribution
Efficient algorithm for quantile-fair policy aggregation using policy optimization as black-box
The authors develop an algorithm that achieves quantile-fair policy aggregation by making O(n) calls to a policy optimization subroutine, avoiding explicit dependence on the MDP's state or action space size. This contrasts with prior methods requiring full access to transition probabilities.
Optimal occupancy distribution for defining quantile fairness
The authors propose using a distribution over policies induced by individually optimal policies (the optimal occupancy distribution) rather than the uniform distribution over all policies. This choice enables tractable quantile estimation and avoids exponential sample complexity issues.
Multiplicative weights update method for computing quantile-fair policies
The authors design an algorithm based on the multiplicative weights update method that computes quantile-fair policies efficiently, requiring only O(log n) policy evaluations instead of solving large linear programs over states and actions.