Efficient Testing for Correlation Clustering: Improved Algorithms and Optimal Bounds

ICLR 2026 Conference SubmissionAnonymous Authors
Correlation ClusteringStructural BalanceProperty Testing
Abstract:

Correlation clustering is an important unsupervised learning problem with broad applications. In this problem, we are given a labeled complete graph G=(V,E+E)G=(V,E^+ \cup E^-), and the optimal clustering is defined as a partition of the vertices that minimizes the ++ edges between clusters and - edges within clusters. We investigate efficient algorithms to test the \emph{cost} of correlation clustering: here, we want to know whether the graph could be (nearly) perfectly clustered (with 00 cost) or is far away from admitting any perfect clustering. The problem has attracted significant attention aimed at modern large-scale applications, and the state-of-the-art results use O~(1/ε7)\widetilde{O}({1}/{\varepsilon^7}) queries and time (up to log factors) to decide whether a graph is perfectly clusterable or needs to flip labels of ε(n2)\varepsilon {\binom n 2} edges to become clusterable. In this paper, we improve this bound significantly by designing an algorithm that uses O(1/ε2){O}({1}/{\varepsilon^2}) queries and time. Furthermore, we derive the first algorithm that tests the cost for the special setting of correlation clustering with kk clusters with O(1/ε4){O}(1/{\varepsilon^4}) queries and time for constant kk. Finally, for the special case of k=2k=2, which corresponds to the strong structure balance problem in social networks, we obtain tight bounds of Θ(1/ε)\Theta({1}/{\varepsilon}) queries -- the first set of \emph{tight} bounds in these problems. We conduct experiments on simulated and real-world datasets, and empirical results demonstrate the advantages of our algorithms.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper contributes improved query-efficient algorithms for testing correlation clustering cost, reducing the complexity from Õ(1/ε⁷) to O(1/ε²) queries for the general case and O(1/ε⁴) for fixed-k clusters. Within the taxonomy, it resides in the 'Testing and Query-Efficient Algorithms' leaf, which contains only two papers total. This represents a sparse research direction compared to more crowded branches like 'Combinatorial and LP-Based Approximation' (five papers) or 'Constrained and Capacitated Variants' (five papers), suggesting that query-efficient testing remains an underexplored area despite its practical importance for large-scale applications.

The taxonomy reveals that testing methods occupy a distinct niche between streaming algorithms (four papers focused on dynamic updates) and active learning approaches (three papers using queries to reduce labeling cost). The scope_note clarifies that testing algorithms verify clustering quality without computing full solutions, distinguishing them from streaming methods that maintain solutions incrementally. Neighboring branches like 'Robustness and Noisy Input Handling' (three papers) and 'Theoretical Foundations and Complexity Analysis' (three papers) address complementary concerns—handling uncertain data and establishing hardness results—but do not directly compete with the query-efficiency focus of this work.

Among 25 candidates examined across three contributions, the analysis found limited prior work overlap. The core contribution (improved testing algorithm) examined 10 candidates with zero refutations, while the fixed-k algorithm examined 5 candidates also with zero refutations. Only the structural balance testing contribution showed overlap, with 2 refutable candidates among 10 examined. This suggests that within the limited search scope, the main algorithmic improvements appear relatively novel, though the structural balance component encounters more substantial prior work. The small candidate pool (25 total) indicates this assessment reflects top-semantic-match results rather than exhaustive coverage.

Based on the limited literature search, the work appears to advance a sparse research direction with modest prior overlap in its core contributions. The taxonomy structure confirms that query-efficient testing remains less developed than approximation or streaming branches, and the sibling paper count (one other paper in the same leaf) reinforces this impression. However, the analysis covers only top-25 semantic matches, leaving open the possibility of relevant work outside this scope, particularly in adjacent areas like property testing or sublinear algorithms not captured by correlation-clustering-specific queries.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
25
Contribution Candidate Papers Compared
2
Refutable Paper

Research Landscape Overview

Core task: testing correlation clustering cost. The field of correlation clustering has evolved into a rich landscape organized around several complementary research directions. At the highest level, one finds branches devoted to algorithmic approximation and optimization (e.g., Combinatorial Correlation Clustering[1], General Weighted Graphs[22]), streaming and dynamic algorithms (Dynamic Correlation Clustering[5], Approximate Dynamic Streams[16]), and testing or query-efficient algorithms (Query-Efficient[42]). Other branches address fairness constraints (Fair Correlation Clustering[14], Cost-Free Fairness[4]), active learning and query-based methods (Cold-Start Active[15], Same-Cluster Queries[26]), objective function variants (Min-Max via MultiCut[13], Overlapping Correlation Clustering[3]), chromatic and categorical relationship clustering (Chromatic Correlation Clustering[6], Chromatic Revisited[27]), robustness to noisy input (Noisy Input[17]), online and incremental settings (Online Correlation Clustering[40]), specialized optimization techniques (Sherali-Adams[39], Quantum-Assisted[31]), practical heuristics and empirical evaluation (Large Scale Optimization[10], Correlation Clustering MapReduce[9]), application-specific adaptations (Multi-robot Task Allocation[44], Higher-order Image Segmentation[50]), and theoretical foundations (Inapproximability Local[34], Methodologies Fundamental Results[49]). A particularly active line of work explores the trade-off between computational efficiency and solution quality in large-scale or resource-constrained settings, with streaming and dynamic algorithms (e.g., Dynamic Correlation Clustering[5]) balancing memory footprint against approximation guarantees, while testing and query-efficient methods aim to verify or estimate clustering cost with minimal probes. Efficient Testing Correlation Clustering[0] sits squarely within the testing and query-efficient branch, sharing its emphasis on sublinear-time verification with Query-Efficient[42], which also investigates how few queries suffice to assess clustering quality. Compared to streaming approaches like Approximate Dynamic Streams[16] that process edges incrementally, Efficient Testing Correlation Clustering[0] focuses on property testing: deciding whether a given clustering is near-optimal without examining the entire input. This distinction highlights an open question in the field—how to certify clustering cost efficiently when full computation is prohibitively expensive—and positions the work as a natural complement to both dynamic and active learning paradigms.

Claimed Contributions

Improved algorithm for testing correlation clustering cost

The authors design an algorithm that tests whether a graph is clusterable or far from clusterable using O(1/ε²) queries and time, significantly improving the previous best bound of Õ(1/ε⁷). The algorithm samples O(1/ε) vertices uniformly and tests their induced subgraph, with analysis using Janson's inequality from random graph theory.

10 retrieved papers
Algorithm for testing k-clusterability with fixed k

The authors present the first nontrivial algorithm for testing k-clusterability in correlation clustering, which uses O(1/ε⁴) queries and time for constant k. The algorithm combines the general clusterability testing algorithm with a new procedure that distinguishes clusterable graphs from k-clusterable graphs.

5 retrieved papers
Tight bounds for structural balance testing

The authors develop an O(1/ε) query algorithm for testing structural balance (2-clusterability) by sampling triangles directly, and prove a matching Ω(1/ε) lower bound. This represents the first set of tight bounds in these property testing problems for correlation clustering.

10 retrieved papers
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Improved algorithm for testing correlation clustering cost

The authors design an algorithm that tests whether a graph is clusterable or far from clusterable using O(1/ε²) queries and time, significantly improving the previous best bound of Õ(1/ε⁷). The algorithm samples O(1/ε) vertices uniformly and tests their induced subgraph, with analysis using Janson's inequality from random graph theory.

Contribution

Algorithm for testing k-clusterability with fixed k

The authors present the first nontrivial algorithm for testing k-clusterability in correlation clustering, which uses O(1/ε⁴) queries and time for constant k. The algorithm combines the general clusterability testing algorithm with a new procedure that distinguishes clusterable graphs from k-clusterable graphs.

Contribution

Tight bounds for structural balance testing

The authors develop an O(1/ε) query algorithm for testing structural balance (2-clusterability) by sampling triangles directly, and prove a matching Ω(1/ε) lower bound. This represents the first set of tight bounds in these property testing problems for correlation clustering.