REALIGN: Regularized Procedure Alignment with Matching Video Embeddings via Partial Gromov-Wasserstein Optimal Transport
Overview
Overall Novelty Assessment
The paper introduces REALIGN, a framework based on Regularized Fused Partial Gromov-Wasserstein Optimal Transport for unsupervised procedure learning from instructional videos. It resides in the Optimal Transport-Based Alignment leaf, which contains only two papers including this one. This leaf sits within the broader Temporal Alignment and Correspondence Learning branch, which encompasses three distinct methodological approaches. The sparse population of this specific leaf suggests that optimal transport formulations for procedural alignment remain relatively underexplored compared to embedding-based or cross-modal methods.
The taxonomy reveals that Temporal Alignment and Correspondence Learning is one of eight major research directions in the field. Neighboring branches include Step Discovery and Segmentation, which focuses on identifying action boundaries without alignment, and Procedure Representation and Task Modeling, which builds structured task graphs. The scope note for Optimal Transport-Based Alignment explicitly excludes contrastive or embedding methods, positioning REALIGN within a methodologically distinct subfield. The broader Temporal Alignment branch contains eleven papers across three leaves, indicating moderate activity in correspondence learning overall but concentration in embedding-based approaches rather than transport-based formulations.
Among ten candidates examined, the core REALIGN framework contribution shows one refutable candidate from six examined, while the unified alignment loss contribution also identifies one refutable candidate from four examined. The partial alignment scheme contribution was not tested against any candidates. The limited search scope—ten total candidates rather than an exhaustive survey—means these statistics reflect only top-K semantic matches and immediate citations. The presence of refutable candidates for two of three contributions suggests some overlap with prior work in the examined sample, though the scale of examination leaves substantial uncertainty about the broader literature landscape.
Given the sparse population of the Optimal Transport-Based Alignment leaf and the limited search scope, the analysis captures a narrow slice of potentially relevant work. The taxonomy structure indicates this is a methodologically specialized area within a diverse field, but the ten-candidate examination cannot definitively characterize novelty relative to the full literature. The refutable candidates identified represent overlaps within the examined sample, not comprehensive prior art assessment.
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors propose REALIGN, a novel unsupervised procedure learning framework that extends Fused Gromov-Wasserstein Optimal Transport with partial alignment constraints. This formulation jointly models visual correspondences and temporal relations while enabling robust handling of irrelevant frames, repeated actions, and non-monotonic step orders common in instructional videos.
The method introduces a partial transport formulation that relaxes balanced marginal constraints by incorporating a virtual sink node. This allows irrelevant or background frames to be mapped to a null mass instead of being forced into spurious correspondences, addressing a key limitation of prior fully balanced optimal transport methods.
The authors develop a unified loss function that combines FPGWOT distances with Laplace-shaped temporal priors, structural regularization, and inter-sequence contrastive learning. This integration stabilizes training by avoiding degenerate solutions and preventing collapse to trivial mappings without requiring multiple separate regularizers.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
[7] Unsupervised procedure learning via joint dynamic summarization PDF
Contribution Analysis
Detailed comparisons for each claimed contribution
REALIGN framework based on Regularized Fused Partial Gromov-Wasserstein Optimal Transport
The authors propose REALIGN, a novel unsupervised procedure learning framework that extends Fused Gromov-Wasserstein Optimal Transport with partial alignment constraints. This formulation jointly models visual correspondences and temporal relations while enabling robust handling of irrelevant frames, repeated actions, and non-monotonic step orders common in instructional videos.
[49] Procedure learning via regularized gromov-wasserstein optimal transport PDF
[50] THESAURUS: Contrastive Graph Clustering by Swapping Fused Gromov-Wasserstein Couplings PDF
[51] Weakly-Supervised Temporal Action Alignment Driven by Unbalanced Spectral Fused Gromov-Wasserstein Distance PDF
[52] A Fused Gromov-Wasserstein Approach to Subgraph Contrastive Learning PDF
[53] A Fused Gromov-Wasserstein Framework for Unsupervised Knowledge Graph Entity Alignment PDF
[54] Temporally Consistent Unbalanced Optimal Transport for Unsupervised Action Segmentation PDF
Partial alignment scheme with virtual sink node for handling background and redundant frames
The method introduces a partial transport formulation that relaxes balanced marginal constraints by incorporating a virtual sink node. This allows irrelevant or background frames to be mapped to a null mass instead of being forced into spurious correspondences, addressing a key limitation of prior fully balanced optimal transport methods.
Unified alignment loss integrating temporal priors and contrastive regularization
The authors develop a unified loss function that combines FPGWOT distances with Laplace-shaped temporal priors, structural regularization, and inter-sequence contrastive learning. This integration stabilizes training by avoiding degenerate solutions and preventing collapse to trivial mappings without requiring multiple separate regularizers.