DISCO: Diversifying Sample Condensation for Accelerating Model Evaluation
Overview
Overall Novelty Assessment
The paper proposes DISCO, a greedy sample selection method that identifies test samples maximizing inter-model disagreement to predict full benchmark performance with reduced evaluation cost. Within the taxonomy, DISCO resides in the 'Greedy Sample Selection for Performance Prediction' leaf under 'Benchmark Evaluation Acceleration'. Notably, this leaf contains no sibling papers in the current taxonomy, suggesting a relatively sparse research direction. The broader 'Benchmark Evaluation Acceleration' category includes only three leaves, indicating that greedy disagreement-based approaches represent a less crowded niche compared to training-focused sample selection methods.
The taxonomy reveals that DISCO's closest neighbors lie in adjacent leaves: 'Capability Coverage Maximization' (containing EffiEval) and 'Lifelong Benchmark Evaluation with Model Reuse'. While EffiEval emphasizes clustering-based coverage of task dimensions, DISCO adopts a simpler greedy strategy targeting model response diversity. The broader 'Efficient Evaluation Through Sample Reduction' branch contrasts with 'Strategic Sample Selection for Training Data Efficiency', which dominates the taxonomy with active learning and data curation methods. DISCO's focus on test-time efficiency without model modification distinguishes it from adaptive evaluation techniques like test-time adaptation, which belong to a separate subtopic.
Among the three contributions analyzed, the literature search examined 26 candidates total. The core DISCO method (9 candidates examined, 0 refutable) and the model signature framework (7 candidates, 0 refutable) appear relatively novel within the limited search scope. However, the information-theoretic justification for disagreement-based selection (10 candidates examined, 1 refutable) shows overlap with prior work. The analysis indicates that while the algorithmic approach may be distinctive, the theoretical grounding has some precedent among the examined candidates. The absence of sibling papers in DISCO's taxonomy leaf suggests limited direct competition, though the small search scale (26 papers) leaves open the possibility of unexamined related work.
Based on the top-26 semantic matches and taxonomy structure, DISCO appears to occupy a relatively underexplored niche within benchmark evaluation acceleration. The greedy disagreement-based approach contrasts with clustering-heavy methods in neighboring leaves, and the lack of sibling papers suggests limited direct prior work in this specific formulation. However, the analysis covers a narrow slice of the literature, and the refutable theoretical contribution indicates that some conceptual elements have precedent. A more exhaustive search might reveal additional related efforts in efficient evaluation or active testing domains.
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors propose DISCO, a method that selects evaluation samples based on inter-model disagreement rather than clustering-based representativeness. This greedy, sample-wise approach simplifies subset selection by focusing on samples that maximize diversity in model responses.
The authors establish that inter-model disagreement, measured via Jensen-Shannon Divergence or Predictive Diversity Score, is information-theoretically optimal for selecting samples that best differentiate and rank models when estimating benchmark performance.
The authors introduce a direct prediction approach using model signatures (concatenated outputs on selected samples) as input to simple metamodels, bypassing the complexity of estimating hidden model parameters required by prior methods like IRT-based approaches.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
Contribution Analysis
Detailed comparisons for each claimed contribution
DISCO method for efficient model evaluation via sample selection
The authors propose DISCO, a method that selects evaluation samples based on inter-model disagreement rather than clustering-based representativeness. This greedy, sample-wise approach simplifies subset selection by focusing on samples that maximize diversity in model responses.
[51] Assessing generalization of SGD via disagreement PDF
[52] Agree to Disagree: Robust Anomaly Detection with Noisy Labels PDF
[53] TriagedMSA: Triaging Sentimental Disagreement in Multimodal Sentiment Analysis PDF
[54] Reward Uncertainty for Exploration in Preference-based Reinforcement Learning PDF
[55] Handling disagreement in hate speech modelling PDF
[56] How does disagreement help generalization against label corruption? PDF
[57] A Note on" Assessing Generalization of SGD via Disagreement" PDF
[58] Agree to disagree: Diversity through disagreement for better transferability PDF
[59] Querying Easily Flip-flopped Samples for Deep Active Learning PDF
Information-theoretic justification for disagreement-based selection
The authors establish that inter-model disagreement, measured via Jensen-Shannon Divergence or Predictive Diversity Score, is information-theoretically optimal for selecting samples that best differentiate and rank models when estimating benchmark performance.
[63] DISCO: Diversifying Sample Condensation for Efficient Model Evaluation PDF
[60] Theory of disagreement-based active learning PDF
[61] Training Robust Deep Neural Networks on Noisy Labels Using Adaptive Sample Selection with Disagreement PDF
[62] Active learning for estimating reachable sets for systems with unknown dynamics PDF
[64] Committee-Based Sample Selection for Probabilistic Classifiers PDF
[65] Hybrid Disagreement-Diversity Active Learning for Bioacoustic Sound Event Detection PDF
[66] A Spatial-Spectral Disagreement-Based Sample Selection With an Application to Hyperspectral Data Classification PDF
[67] Ensemble multiple kernel active learning for classification of multisource remote sensing data PDF
[68] A disagreement-based active matrix completion approach with provable guarantee PDF
[69] Self-Supervised Exploration via Disagreement PDF
Model signature-based performance prediction framework
The authors introduce a direct prediction approach using model signatures (concatenated outputs on selected samples) as input to simple metamodels, bypassing the complexity of estimating hidden model parameters required by prior methods like IRT-based approaches.