Point-MoE: Large-Scale Multi-Dataset Training with Mixture-of-Experts for 3D Semantic Segmentation
Overview
Overall Novelty Assessment
The paper introduces Point-MoE, a Mixture-of-Experts architecture for large-scale multi-dataset 3D semantic segmentation that operates without dataset labels at inference. It resides in the Cross-Dataset Label Harmonization and Taxonomy Alignment leaf, which contains five papers addressing label-space conflicts across heterogeneous datasets. This leaf sits within the broader Multi-Dataset Integration and Domain Adaptation branch, indicating a moderately populated research direction focused on reconciling inconsistent taxonomies and sensor modalities. The taxonomy shows this is an active but not overcrowded area, with sibling papers exploring unified taxonomies and hierarchical mappings.
The taxonomy reveals neighboring research directions that contextualize Point-MoE's positioning. The Unsupervised and Semi-Supervised Domain Adaptation leaf (six papers) addresses similar heterogeneity challenges through unlabeled data, while Multi-Task and Multi-Domain Unified Architectures (three papers) explore shared-parameter models across tasks. The Vision-Language and Open-Vocabulary Segmentation leaf (four papers) offers an alternative approach to label alignment via textual semantics. Point-MoE diverges from these by using sparsely activated expert routing rather than explicit taxonomy engineering or language grounding, suggesting a distinct methodological stance within the broader multi-dataset integration landscape.
Among thirty candidates examined, none clearly refute the three core contributions. The Point-MoE architecture contribution examined ten candidates with zero refutable matches, as did the multi-dataset training protocol and MoE design space exploration. This limited search scope suggests that within the top-thirty semantic matches, no prior work combines mixture-of-experts routing with dataset-agnostic multi-dataset 3D segmentation in the same manner. However, the analysis does not cover the full literature: sibling papers like MSeg3D and Label Name Mantra address overlapping problems through different mechanisms, and the search may not capture all relevant MoE or multi-dataset work.
Based on the limited search of thirty candidates, Point-MoE appears to occupy a relatively novel position by applying sparse expert routing to multi-dataset 3D segmentation without dataset supervision. The taxonomy context shows this sits in a moderately active research area with established sibling approaches, but the specific combination of MoE architecture and dataset-agnostic inference has not been clearly anticipated in the examined literature. A more exhaustive search beyond top-thirty semantic matches would be needed to assess whether similar MoE-based multi-dataset strategies exist in adjacent communities or application domains.
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors propose Point-MoE, a sparse mixture-of-experts architecture built on Point Transformer V3 that replaces attention projection layers with expert MLPs and a router. This design enables dynamic expert specialization across heterogeneous 3D datasets without using dataset labels during training or inference.
The authors establish a realistic training and evaluation regime for large-scale multi-dataset joint training in 3D semantic segmentation where no dataset labels are available at inference time. This protocol enables fair comparison across diverse indoor and outdoor datasets in both seen and zero-shot settings.
The authors conduct comprehensive ablation studies examining key MoE design choices including number of experts, sparsity level, placement within the architecture, normalization strategies, and training configurations. These experiments reveal effective configurations and trade-offs specific to 3D point cloud understanding.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
[8] Cross-dataset collaborative learning for semantic segmentation in autonomous driving PDF
[12] MSeg: A Composite Dataset for Multi-Domain Semantic Segmentation PDF
[32] Training Semantic Segmentation on Heterogeneous Datasets PDF
[45] Label name is mantra: Unifying point cloud segmentation across heterogeneous datasets PDF
Contribution Analysis
Detailed comparisons for each claimed contribution
Point-MoE architecture for multi-dataset 3D semantic segmentation
The authors propose Point-MoE, a sparse mixture-of-experts architecture built on Point Transformer V3 that replaces attention projection layers with expert MLPs and a router. This design enables dynamic expert specialization across heterogeneous 3D datasets without using dataset labels during training or inference.
[61] ME-ODAL: Mixture-of-Experts Ensemble of CNN Models for 3D Object Detection from Automotive LiDAR Point Clouds PDF
[66] FMP-Net: Fractal Multi-Gate Mixture-of-Experts Panoramic Segmentation for Point Cloud PDF
[67] LLM in the Loop: A Framework for Contextualizing Counterfactual Segment Perturbations in Point Clouds PDF
[68] Convoluted mixture of deep experts for robust semantic segmentation PDF
[69] LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes PDF
[70] Graphfit: Learning multi-scale graph-convolutional representation for point cloud normal estimation PDF
[71] U4D: Uncertainty-Aware 4D World Modeling from LiDAR Sequences PDF
[72] Uni3D-MoE: Scalable Multimodal 3D Scene Understanding via Mixture of Experts PDF
[73] Fast training of diffusion transformer with extreme masking for 3d point clouds generation PDF
[74] Experts weights averaging: A new general training scheme for vision transformers PDF
Multi-dataset training protocol without dataset labels at inference
The authors establish a realistic training and evaluation regime for large-scale multi-dataset joint training in 3D semantic segmentation where no dataset labels are available at inference time. This protocol enables fair comparison across diverse indoor and outdoor datasets in both seen and zero-shot settings.
[21] COLA: COarse LAbel pre-training for 3D semantic segmentation of sparse LiDAR datasets PDF
[31] CitySeg: A 3D Open Vocabulary Semantic Segmentation Foundation Model in City-scale Scenarios PDF
[32] Training Semantic Segmentation on Heterogeneous Datasets PDF
[37] Real-time joint semantic segmentation and depth estimation using asymmetric annotations PDF
[51] 3DLabelProp: Geometric-Driven Domain Generalization for LiDAR Semantic Segmentation in Autonomous Driving PDF
[52] WildScenes: A benchmark for 2D and 3D semantic segmentation in large-scale natural environments PDF
[53] Unidseg: Unified cross-domain 3d semantic segmentation via visual foundation models prior PDF
[54] Learning to adapt sam for segmenting cross-domain point clouds PDF
[55] Technical Report for ICRA 2025 GOOSE 3D Semantic Segmentation Challenge: Adaptive Point Cloud Understanding for Heterogeneous Robotic Systems PDF
[56] EgoLifter: Open-world 3D Segmentation for Egocentric Perception PDF
Systematic exploration of MoE design space for 3D point clouds
The authors conduct comprehensive ablation studies examining key MoE design choices including number of experts, sparsity level, placement within the architecture, normalization strategies, and training configurations. These experiments reveal effective configurations and trade-offs specific to 3D point cloud understanding.