TikZilla: Scaling Text-to-TikZ with High-Quality Data and Reinforcement Learning
Overview
Overall Novelty Assessment
The paper introduces TikZilla, a family of small language models (3B and 8B parameters) trained to generate TikZ code from textual descriptions using a two-stage pipeline combining supervised fine-tuning and reinforcement learning. According to the taxonomy tree, this work resides in the 'Reinforcement Learning-Enhanced Generation' leaf under 'Text-to-TikZ Generation Methods'. Notably, this leaf contains only the original paper itself—no sibling papers are listed—suggesting this specific combination of RL-based training for TikZ generation represents a relatively sparse research direction within the broader field of 30 papers examined.
The taxonomy reveals that neighboring leaves include 'Supervised Fine-Tuning Approaches' (containing one paper on prompt-based LLM pipelines) and 'Zero-Shot and Unaligned Data Methods' (one paper on leveraging unaligned graphics programs). The broader 'Text-to-TikZ Generation Methods' branch sits alongside 'Indirect Generation via TikZ Intermediates' (which uses TikZ as a bridge to image synthesis or multimodal understanding) and 'Domain-Specific TikZ Generation' (targeting mathematical diagrams or specialized scientific figures). TikZilla diverges from these directions by directly synthesizing general-purpose TikZ code while incorporating semantic feedback from rendered outputs, rather than relying on intermediate representations or domain-specific templates.
Among 15 total candidates examined across three contributions, no clearly refuting prior work was identified. The 'TikZilla model family with two-stage training' contribution examined 10 candidates with zero refutable matches, while the 'domain-specific reward model for RL' contribution examined 5 candidates, also with zero refutations. The 'DaTikZ-V4 dataset construction' contribution examined no candidates. This limited search scope—15 papers from semantic search and citation expansion—suggests that within the examined literature, the combination of supervised fine-tuning followed by RL with inverse-graphics-based rewards appears relatively unexplored, though the analysis does not claim exhaustive coverage of all possible prior work.
Based on the top-15 semantic matches and taxonomy structure, the work appears to occupy a novel position by combining RL-based training with TikZ generation, a direction not represented by sibling papers in the same taxonomy leaf. However, the limited search scope and the presence of related supervised and zero-shot approaches in neighboring leaves indicate that the broader problem space is moderately populated. The analysis covers direct methodological overlap but does not exhaustively examine all possible dataset construction techniques or reward modeling strategies in adjacent domains.
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors build a new dataset for Text-to-TikZ that is over four times larger than its predecessor, sourced from arXiv, GitHub, TeX StackExchange, and synthetic data. They enhance quality through LLM-based debugging of uncompilable code and VLM-generated figure descriptions, addressing the noise and small scale of prior datasets.
The authors introduce TikZilla, a family of small Qwen-based models trained using supervised fine-tuning for syntax alignment followed by reinforcement learning with a domain-specific reward model. This two-stage approach substantially improves Text-to-TikZ generation quality, enabling even 3B parameter models to outperform GPT-4o.
The authors propose the first domain-specific reward model for Text-to-TikZ by retraining an image encoder from DeTikZify-V2 on their larger dataset. This encoder provides semantically meaningful reward signals during RL optimization, correlating more strongly with human judgments than general-purpose metrics like CLIPScore or DreamSIM.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
Contribution Analysis
Detailed comparisons for each claimed contribution
DaTikZ-V4 dataset construction
The authors build a new dataset for Text-to-TikZ that is over four times larger than its predecessor, sourced from arXiv, GitHub, TeX StackExchange, and synthetic data. They enhance quality through LLM-based debugging of uncompilable code and VLM-generated figure descriptions, addressing the noise and small scale of prior datasets.
TikZilla model family with two-stage training
The authors introduce TikZilla, a family of small Qwen-based models trained using supervised fine-tuning for syntax alignment followed by reinforcement learning with a domain-specific reward model. This two-stage approach substantially improves Text-to-TikZ generation quality, enabling even 3B parameter models to outperform GPT-4o.
[31] Training language models to self-correct via reinforcement learning PDF
[32] Teaching large language models to reason with reinforcement learning PDF
[33] Compact language models via pruning and knowledge distillation PDF
[34] Q-sft: Q-learning for language models via supervised fine-tuning PDF
[35] Hallucination of multimodal large language models: A survey PDF
[36] Mobile edge intelligence for large language models: A contemporary survey PDF
[37] SVDiff: Compact Parameter Space for Diffusion Fine-Tuning PDF
[38] Stop overthinking: A survey on efficient reasoning for large language models PDF
[39] Empowering Compact Language Models with Knowledge Distillation PDF
[40] Making Small Language Models Efficient Reasoners: Intervention, Supervision, Reinforcement PDF
Domain-specific reward model for RL
The authors propose the first domain-specific reward model for Text-to-TikZ by retraining an image encoder from DeTikZify-V2 on their larger dataset. This encoder provides semantically meaningful reward signals during RL optimization, correlating more strongly with human judgments than general-purpose metrics like CLIPScore or DreamSIM.