Efficient Testing for Correlation Clustering: Improved Algorithms and Optimal Bounds

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.5 Download Report PDF

Correlation ClusteringStructural BalanceProperty Testing

Correlation clustering is an important unsupervised learning problem with broad applications. In this problem, we are given a labeled complete graph $G=(V,E^+ \cup E^-)$ , and the optimal clustering is defined as a partition of the vertices that minimizes the $+$ edges between clusters and $-$ edges within clusters. We investigate efficient algorithms to test the \emph{cost} of correlation clustering: here, we want to know whether the graph could be (nearly) perfectly clustered (with $0$ cost) or is far away from admitting any perfect clustering. The problem has attracted significant attention aimed at modern large-scale applications, and the state-of-the-art results use $\widetilde{O}({1}/{\varepsilon^7})$ queries and time (up to log factors) to decide whether a graph is perfectly clusterable or needs to flip labels of $\varepsilon {\binom n 2}$ edges to become clusterable. In this paper, we improve this bound significantly by designing an algorithm that uses ${O}({1}/{\varepsilon^2})$ queries and time. Furthermore, we derive the first algorithm that tests the cost for the special setting of correlation clustering with $k$ clusters with ${O}(1/{\varepsilon^4})$ queries and time for constant $k$ . Finally, for the special case of $k=2$ , which corresponds to the strong structure balance problem in social networks, we obtain tight bounds of $\Theta({1}/{\varepsilon})$ queries -- the first set of \emph{tight} bounds in these problems. We conduct experiments on simulated and real-world datasets, and empirical results demonstrate the advantages of our algorithms.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper contributes improved query-efficient algorithms for testing correlation clustering cost, reducing the complexity from Õ(1/ε⁷) to O(1/ε²) queries for the general case and O(1/ε⁴) for fixed-k clusters. Within the taxonomy, it resides in the 'Testing and Query-Efficient Algorithms' leaf, which contains only two papers total. This represents a sparse research direction compared to more crowded branches like 'Combinatorial and LP-Based Approximation' (five papers) or 'Constrained and Capacitated Variants' (five papers), suggesting that query-efficient testing remains an underexplored area despite its practical importance for large-scale applications.

The taxonomy reveals that testing methods occupy a distinct niche between streaming algorithms (four papers focused on dynamic updates) and active learning approaches (three papers using queries to reduce labeling cost). The scope_note clarifies that testing algorithms verify clustering quality without computing full solutions, distinguishing them from streaming methods that maintain solutions incrementally. Neighboring branches like 'Robustness and Noisy Input Handling' (three papers) and 'Theoretical Foundations and Complexity Analysis' (three papers) address complementary concerns—handling uncertain data and establishing hardness results—but do not directly compete with the query-efficiency focus of this work.

Among 25 candidates examined across three contributions, the analysis found limited prior work overlap. The core contribution (improved testing algorithm) examined 10 candidates with zero refutations, while the fixed-k algorithm examined 5 candidates also with zero refutations. Only the structural balance testing contribution showed overlap, with 2 refutable candidates among 10 examined. This suggests that within the limited search scope, the main algorithmic improvements appear relatively novel, though the structural balance component encounters more substantial prior work. The small candidate pool (25 total) indicates this assessment reflects top-semantic-match results rather than exhaustive coverage.

Based on the limited literature search, the work appears to advance a sparse research direction with modest prior overlap in its core contributions. The taxonomy structure confirms that query-efficient testing remains less developed than approximation or streaming branches, and the sibling paper count (one other paper in the same leaf) reinforces this impression. However, the analysis covers only top-25 semantic matches, leaving open the possibility of relevant work outside this scope, particularly in adjacent areas like property testing or sublinear algorithms not captured by correlation-clustering-specific queries.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: testing correlation clustering cost. The field of correlation clustering has evolved into a rich landscape organized around several complementary research directions. At the highest level, one finds branches devoted to algorithmic approximation and optimization (e.g., Combinatorial Correlation Clustering[1], General Weighted Graphs[22]), streaming and dynamic algorithms (Dynamic Correlation Clustering[5], Approximate Dynamic Streams[16]), and testing or query-efficient algorithms (Query-Efficient[42]). Other branches address fairness constraints (Fair Correlation Clustering[14], Cost-Free Fairness[4]), active learning and query-based methods (Cold-Start Active[15], Same-Cluster Queries[26]), objective function variants (Min-Max via MultiCut[13], Overlapping Correlation Clustering[3]), chromatic and categorical relationship clustering (Chromatic Correlation Clustering[6], Chromatic Revisited[27]), robustness to noisy input (Noisy Input[17]), online and incremental settings (Online Correlation Clustering[40]), specialized optimization techniques (Sherali-Adams[39], Quantum-Assisted[31]), practical heuristics and empirical evaluation (Large Scale Optimization[10], Correlation Clustering MapReduce[9]), application-specific adaptations (Multi-robot Task Allocation[44], Higher-order Image Segmentation[50]), and theoretical foundations (Inapproximability Local[34], Methodologies Fundamental Results[49]). A particularly active line of work explores the trade-off between computational efficiency and solution quality in large-scale or resource-constrained settings, with streaming and dynamic algorithms (e.g., Dynamic Correlation Clustering[5]) balancing memory footprint against approximation guarantees, while testing and query-efficient methods aim to verify or estimate clustering cost with minimal probes. Efficient Testing Correlation Clustering[0] sits squarely within the testing and query-efficient branch, sharing its emphasis on sublinear-time verification with Query-Efficient[42], which also investigates how few queries suffice to assess clustering quality. Compared to streaming approaches like Approximate Dynamic Streams[16] that process edges incrementally, Efficient Testing Correlation Clustering[0] focuses on property testing: deciding whether a given clustering is near-optimal without examining the entire input. This distinction highlights an open question in the field—how to certify clustering cost efficiently when full computation is prohibitively expensive—and positions the work as a natural complement to both dynamic and active learning paradigms.

Claimed Contributions

Improved algorithm for testing correlation clustering cost

10 retrieved papers

The authors design an algorithm that tests whether a graph is clusterable or far from clusterable using O(1/ε²) queries and time, significantly improving the previous best bound of Õ(1/ε⁷). The algorithm samples O(1/ε) vertices uniformly and tests their induced subgraph, with analysis using Janson's inequality from random graph theory.

10 retrieved papers

Algorithm for testing k-clusterability with fixed k

5 retrieved papers

The authors present the first nontrivial algorithm for testing k-clusterability in correlation clustering, which uses O(1/ε⁴) queries and time for constant k. The algorithm combines the general clusterability testing algorithm with a new procedure that distinguishes clusterable graphs from k-clusterable graphs.

5 retrieved papers

Tight bounds for structural balance testing

Can Refute

10 retrieved papers

The authors develop an O(1/ε) query algorithm for testing structural balance (2-clusterability) by sampling triangles directly, and prove a matching Ω(1/ε) lower bound. This represents the first set of tight bounds in these property testing problems for correlation clustering.

10 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[42] Query-Efficient Correlation Clustering PDF

David GarcÃa-Soriano, Konstantin Kutzkov, David Garc'ia-Soriano, Francesco Bonchi, Charalampos Tsourakakis, F. Bonchi, Charalampos E. Tsourakakis (2022)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Improved algorithm for testing correlation clustering cost

[1] Combinatorial correlation clustering PDF

Cannot Refute

[9] Correlation clustering in mapreduce PDF

Cannot Refute

[26] Approximate Correlation Clustering Using Same-Cluster Queries PDF

Cannot Refute

[33] Information-Theoretic Active Correlation Clustering PDF

Cannot Refute

[64] Query-efficient correlation clustering with noisy oracle PDF

Cannot Refute

[65] Sublinear time and space algorithms for correlation clustering via sparse-dense decompositions PDF

Cannot Refute

[66] A combinatorial multi-armed bandit approach to correlation clustering PDF

Cannot Refute

[67] Clustering with noisy queries PDF

Cannot Refute

[68] SPARSE-PIVOT: Dynamic correlation clustering for node insertions PDF

Cannot Refute

[69] Sample-Efficient" Clustering and Conquer" Procedures for Parallel Large-Scale Ranking and Selection PDF

Cannot Refute

Contribution

Algorithm for testing k-clusterability with fixed k

[41] Correlation clustering with a fixed number of clusters PDF

Cannot Refute

[47] Bipartite Correlation Clustering -- Maximizing Agreements PDF

Cannot Refute

[51] Towards a Query-Optimal and Time-Efficient Algorithm for Clustering with a Faulty Oracle PDF

Cannot Refute

[52] Correlation Clustering with Same-Cluster Queries Bounded by Optimal Cost PDF

Cannot Refute

[53] Correlation Clustering with Constrained Cluster Sizes and Extended Weights Bounds PDF

Cannot Refute

Contribution

Tight bounds for structural balance testing

[55] Testing cluster properties of signed graphs PDF

Can Refute

[59] Testing properties of signed graphs PDF

Can Refute

[54] Efficient stable community search in temporal signed graphs PDF

Cannot Refute

[56] Clique identification in signed graphs: a balance theory based model PDF

Cannot Refute

[57] Scalable Algorithm for Finding Balanced Subgraphs with Tolerance in Signed Networks PDF

Cannot Refute

[58] Finding large balanced subgraphs in signed networks PDF

Cannot Refute

[60] Testing the presence of balanced and bipartite components in a sparse graph is QMA1-hard PDF

Cannot Refute

[61] Balanced clique computation in signed networks: Concepts and algorithms PDF

Cannot Refute

[62] Locally Balancing Signed Graphs PDF

Cannot Refute

[63] Low rank modeling of signed networks PDF

Cannot Refute

Efficient Testing for Correlation Clustering: Improved Algorithms and Optimal Bounds

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[42] Query-Efficient Correlation Clustering PDF

Contribution Analysis

Improved algorithm for testing correlation clustering cost

[1] Combinatorial correlation clustering PDF

[9] Correlation clustering in mapreduce PDF

[26] Approximate Correlation Clustering Using Same-Cluster Queries PDF

[33] Information-Theoretic Active Correlation Clustering PDF

[64] Query-efficient correlation clustering with noisy oracle PDF

[65] Sublinear time and space algorithms for correlation clustering via sparse-dense decompositions PDF

[66] A combinatorial multi-armed bandit approach to correlation clustering PDF

[67] Clustering with noisy queries PDF

[68] SPARSE-PIVOT: Dynamic correlation clustering for node insertions PDF

[69] Sample-Efficient" Clustering and Conquer" Procedures for Parallel Large-Scale Ranking and Selection PDF

Algorithm for testing k-clusterability with fixed k

[41] Correlation clustering with a fixed number of clusters PDF

[47] Bipartite Correlation Clustering -- Maximizing Agreements PDF

[51] Towards a Query-Optimal and Time-Efficient Algorithm for Clustering with a Faulty Oracle PDF

[52] Correlation Clustering with Same-Cluster Queries Bounded by Optimal Cost PDF

[53] Correlation Clustering with Constrained Cluster Sizes and Extended Weights Bounds PDF

Tight bounds for structural balance testing

[55] Testing cluster properties of signed graphs PDF

[59] Testing properties of signed graphs PDF

[54] Efficient stable community search in temporal signed graphs PDF

[56] Clique identification in signed graphs: a balance theory based model PDF

[57] Scalable Algorithm for Finding Balanced Subgraphs with Tolerance in Signed Networks PDF

[58] Finding large balanced subgraphs in signed networks PDF

[60] Testing the presence of balanced and bipartite components in a sparse graph is QMA1-hard PDF

[61] Balanced clique computation in signed networks: Concepts and algorithms PDF

[62] Locally Balancing Signed Graphs PDF

[63] Low rank modeling of signed networks PDF

Table of Contents