VisCoder2: Building Multi-Language Visualization Coding Agents

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 5.5 Download Report PDF

Code ModelsVisualizationFine-tuning

Large language models (LLMs) have recently enabled coding agents capable of generating, executing, and revising visualization code. However, existing models often fail in practical workflows due to limited language coverage, unreliable execution, and lack of iterative correction mechanisms. Progress has been constrained by narrow datasets and benchmarks that emphasize single-round generation and single-language tasks. To address these challenges, we introduce three complementary resources for advancing visualization coding agents. VisCode-Multi-679K is a large-scale, supervised dataset containing 679K validated and executable visualization samples with multi-turn correction dialogues across 12 programming languages. VisPlotBench is a benchmark for systematic evaluation, featuring executable tasks, rendered outputs, and protocols for both initial generation and multi-round self-debug. Finally, we present VisCoder2, a family of multi-language visualization models trained on VisCode-Multi-679K. Experiments show that VisCoder2 significantly outperforms strong open-source baselines and approaches the performance of proprietary models like GPT-4.1, with further gains from iterative self-debug, reaching 82.4% overall execution pass rate at the 32B scale, particularly in symbolic or compiler-dependent languages.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces three resources for multi-language visualization coding agents: a 679K-sample dataset with multi-turn correction dialogues across 12 languages, a benchmark for generation and self-debug evaluation, and a model family trained on this data. Within the taxonomy, it resides in the 'Multi-Language Visualization Coding Agents with Self-Debug' leaf, which contains only one sibling paper (Multi-Language VisCoding). This leaf sits under 'Visualization Code Generation and Debugging', a moderately populated branch with three sub-categories totaling four papers, indicating a relatively sparse but emerging research direction.

The taxonomy reveals neighboring work in grammar-agnostic visualization pipelines (LIDA) and multimodal code generation from flowcharts, both excluding iterative debugging mechanisms. The broader 'Multi-Language Code Generation and Translation' branch addresses cross-language synthesis but focuses on general-purpose code rather than visualization-specific tasks. The paper's emphasis on executable visualization samples and multi-turn correction dialogues positions it at the intersection of visualization synthesis and iterative debugging, diverging from translation-focused approaches (RepoTransAgent, Rectifier) that handle repository-level code conversion without visualization constraints.

Among 12 candidates examined, the dataset contribution shows 1 refutable candidate out of 2 examined, the benchmark has 2 refutable candidates among 6 examined, and the model family has 1 refutable among 4 examined. The limited search scope suggests that within top-ranked semantic matches, some prior work addresses overlapping aspects—particularly in multi-language visualization generation and benchmarking. The dataset and benchmark contributions appear to face more substantial prior work than the model family, though the small candidate pool (12 total) means these findings reflect a narrow slice of the literature rather than exhaustive coverage.

Based on examination of 12 semantically similar papers, the work appears to advance a sparsely populated research direction by combining multi-language support, iterative debugging, and large-scale training data. The analysis captures top-ranked matches but does not encompass the full landscape of visualization generation or code debugging research. The taxonomy structure and sibling count suggest the specific combination of features may be relatively novel within the examined scope.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: multi-language visualization code generation and iterative debugging. The field structure reflects a convergence of code generation, visualization synthesis, and debugging capabilities across multiple programming languages. The taxonomy organizes work into several main branches: Multi-Language Code Generation and Translation addresses cross-language synthesis and repository-level translation (e.g., RepoTransAgent[3], CruxEval-X[1]); Visualization Code Generation and Debugging focuses on producing and refining chart or plot code from natural language or structured inputs (e.g., LIDA[4], Flowchart to Code[5]); Integrated Development and Debugging Platforms explores holistic environments that combine generation with iterative correction; Cross-Lingual Adaptation and Iterative Training examines training strategies that enable models to handle diverse languages and self-improve; and Performance Visualization and Debugging Tools targets runtime analysis and profiling visualizations (e.g., Performance Debugging Visualization[6]). Together, these branches capture the spectrum from single-language chart generation to multi-language code translation with debugging loops. A particularly active line of work centers on agents that generate visualization code in multiple languages and iteratively debug their outputs, exemplified by VisCoder2[0] and Multi-Language VisCoding[9]. These approaches emphasize self-correction mechanisms that allow models to detect and fix errors across Python, R, or other languages, contrasting with earlier single-language or non-iterative methods like LIDA[4]. Another emerging theme involves cross-language translation with debugging support, as seen in Generate Debug Translate[10] and Rectifier[2], which address syntactic and semantic mismatches when converting code between languages. VisCoder2[0] sits squarely within the visualization-focused debugging cluster, sharing the multi-language emphasis of Multi-Language VisCoding[9] but extending iterative refinement capabilities. Compared to broader translation agents like RepoTransAgent[3], VisCoder2[0] narrows its scope to visualization tasks, trading generality for specialized debugging heuristics tailored to chart generation. Open questions remain around scaling these iterative loops to more complex visualizations and integrating runtime feedback from execution environments.

Claimed Contributions

VisCode-Multi-679K dataset

Can Refute

2 retrieved papers

A supervised instruction-tuning dataset comprising 679K executable visualization code samples paired with rendered outputs and multi-turn correction dialogues, spanning twelve programming languages including Python, LaTeX, LilyPond, SVG, HTML, Asymptote, Mermaid, and Vega-Lite.

2 retrieved papers

Can Refute

VisPlotBench benchmark

Can Refute

6 retrieved papers

A benchmark containing 888 executable visualization tasks across eight programming languages and thirteen visual categories, with standardized evaluation protocols for both single-round code generation and multi-round iterative self-debugging.

6 retrieved papers

Can Refute

VisCoder2 model family

Can Refute

4 retrieved papers

A family of visualization coding agents trained on VisCode-Multi-679K at multiple scales (3B, 7B, 14B, 32B parameters) that can generate, execute, and iteratively refine visualization code across multiple programming languages, approaching the performance of proprietary models like GPT-4.1.

4 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[9] Towards Multi-Language Visualization Coding Agents PDF

Y Ni (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

VisCode-Multi-679K dataset

[9] Towards Multi-Language Visualization Coding Agents PDF

Can Refute

[11] TOWARDS ROBUST AND ACCURATE TEXT-TO-CODE GENERATION PDF

Cannot Refute

Contribution

VisPlotBench benchmark

[9] Towards Multi-Language Visualization Coding Agents PDF

Can Refute

[17] Matplotagent: Method and evaluation for llm-based agentic scientific data visualization PDF

Can Refute

[14] VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation PDF

Cannot Refute

[15] Plotgen: Multi-agent llm-based scientific data visualization via multimodal feedback PDF

Cannot Refute

[16] nvBench 2.0: A Benchmark for Natural Language to Visualization under Ambiguity PDF

Cannot Refute

[18] PersonaVlog: Personalized Multimodal Vlog Generation with Multi-Agent Collaboration and Iterative Self-Correction PDF

Cannot Refute

Contribution

VisCoder2 model family

[9] Towards Multi-Language Visualization Coding Agents PDF

Can Refute

[7] Beyond Language Barriers: Multi-Agent Coordination for Multi-Language Code Generation PDF

Cannot Refute

[12] Arbitrary shape text detection fusing InceptionNeXt and multi-scale attention mechanism PDF

Cannot Refute

[13] LaV-CoT: Language-Aware Visual CoT with Multi-Aspect Reward Optimization for Real-World Multilingual VQA PDF

Cannot Refute

VisCoder2: Building Multi-Language Visualization Coding Agents

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[9] Towards Multi-Language Visualization Coding Agents PDF

Contribution Analysis

VisCode-Multi-679K dataset

[9] Towards Multi-Language Visualization Coding Agents PDF

[11] TOWARDS ROBUST AND ACCURATE TEXT-TO-CODE GENERATION PDF

VisPlotBench benchmark

[9] Towards Multi-Language Visualization Coding Agents PDF

[17] Matplotagent: Method and evaluation for llm-based agentic scientific data visualization PDF

[14] VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation PDF

[15] Plotgen: Multi-agent llm-based scientific data visualization via multimodal feedback PDF

[16] nvBench 2.0: A Benchmark for Natural Language to Visualization under Ambiguity PDF

[18] PersonaVlog: Personalized Multimodal Vlog Generation with Multi-Agent Collaboration and Iterative Self-Correction PDF

VisCoder2 model family

[9] Towards Multi-Language Visualization Coding Agents PDF

[7] Beyond Language Barriers: Multi-Agent Coordination for Multi-Language Code Generation PDF

[12] Arbitrary shape text detection fusing InceptionNeXt and multi-scale attention mechanism PDF

[13] LaV-CoT: Language-Aware Visual CoT with Multi-Aspect Reward Optimization for Real-World Multilingual VQA PDF

Table of Contents