Deep Learning with Learnable Product-Structured Activations

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.5 Download Report PDF

deep learning architectureimplicit neural representationlow-rank tensor decompositionpartial differential equations

Modern neural architectures are fundamentally constrained by their reliance on fixed activation functions, limiting their ability to adapt representations to task-specific structure and efficiently capture high-order interactions. We introduce deep low-rank separated neural networks (LRNNs), a novel architecture generalizing MLPs that achieves enhanced expressivity by learning adaptive, factorized activation functions. LRNNs generalize the core principles underpinning continuous low-rank function decomposition to the setting of deep learning, constructing complex, high-dimensional neuron activations through a multiplicative composition of simpler, learnable univariate transformations. This product structure inherently captures multiplicative interactions and allows each LRNN neuron to learn highly flexible, data-dependent activation functions. We provide a detailed theoretical analysis that establishes the universal approximation property of LRNNs and reveals why they are capable of excellent empirical performance. Specifically, we show that LRNNs can mitigate the curse of dimensionality for functions with low-rank structure. Moreover, the learnable product-structured activations enable LRNNs to adaptively control their spectral bias, crucial for signal representation tasks. These theoretical insights are validated through extensive experiments where LRNNs achieve state-of-the-art performance across diverse domains including image and audio representation, numerical solution of PDEs, sparse-view CT reconstruction, and supervised learning tasks. Our results demonstrate that LRNNs provide a powerful and versatile building block with a distinct inductive bias for learning compact yet expressive representations.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces deep low-rank separated neural networks (LRNNs), which learn adaptive activation functions through multiplicative composition of univariate transformations. According to the taxonomy tree, this work occupies the 'Low-Rank Factorized Activations' leaf under 'Product-Structured Activation Functions'. Notably, this leaf contains only the original paper itself—no sibling papers are present. This isolation suggests the specific combination of low-rank factorization with learnable product-structured activations represents a relatively unexplored niche within the broader field of adaptive activation functions.

The taxonomy reveals that neighboring research directions include 'Fixed Polynomial Product Activations' and 'Logarithmic Product Transformations' within the same parent branch, plus 'Learnable Parametric Activation Functions' in a parallel branch. The scope notes clarify that fixed polynomial approaches lack learnable factorization, while parametric methods avoid product structures entirely. LRNNs appear positioned at the intersection of these themes—combining the adaptivity of learnable parametric activations with the multiplicative interaction modeling of product structures, but through a factorized lens that distinguishes it from both polynomial expansions and simple parameterization.

Among thirty candidates examined, the contribution-level analysis shows mixed novelty signals. The core LRNN architecture and theoretical analysis each examined ten candidates with zero refutations, suggesting these aspects face limited direct prior work within the search scope. However, the variance-controlled initialization mechanism examined ten candidates and found one refutable match, indicating this component has more substantial overlap with existing techniques. The limited search scale means these findings reflect top-thirty semantic matches rather than exhaustive coverage, so unexamined literature may contain additional relevant work.

Given the sparse taxonomy leaf and low refutation rates across most contributions, the work appears to occupy a genuinely underexplored intersection of low-rank methods and adaptive activations. The initialization component shows expected overlap with standard neural network practices. The analysis is constrained by examining only thirty candidates from semantic search, leaving open the possibility of relevant work in adjacent subfields not captured by this retrieval strategy.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: learning adaptive product-structured activation functions in neural networks. The field explores how neural networks can move beyond fixed, element-wise nonlinearities by learning activation functions that adapt to data and exploit multiplicative or product-based structures. The taxonomy organizes this landscape into three main branches. Learnable Parametric Activation Functions encompasses methods that parameterize activation shapes with trainable coefficients, allowing networks to discover problem-specific nonlinearities rather than relying on hand-crafted choices like ReLU or sigmoid. Product-Structured Activation Functions focuses on architectures that explicitly incorporate multiplicative interactions—such as low-rank factorizations or polynomial expansions—to capture richer feature dependencies. Attention-Based Product Mechanisms examines how gating and attention operations introduce learned product terms that modulate representations. Representative works illustrate these themes: REAct[1] and Learning Activation Functions[3] exemplify parametric approaches, while Ladder Polynomial Neural Networks[5] and Kernel Product Neural Networks[7] demonstrate polynomial and kernel-based product structures. Several active lines of work highlight trade-offs between expressiveness and computational cost. Parametric activation methods offer flexibility but require careful regularization to avoid overfitting, whereas product-structured designs can model complex interactions yet may introduce additional parameters or computational overhead. Deep Learning with Learnable[0] sits within the Product-Structured Activation Functions branch, specifically targeting low-rank factorized activations. This emphasis on factorization distinguishes it from polynomial expansions like Ladder Polynomial Neural Networks[5], which build higher-order terms explicitly, and from kernel-based approaches such as Kernel Product Neural Networks[7], which leverage kernel tricks for feature interactions. By focusing on low-rank structures, Deep Learning with Learnable[0] aims to balance expressive power with parameter efficiency, addressing a central challenge in adaptive activation design.

Claimed Contributions

Deep low-rank separated neural networks (LRNNs) architecture

10 retrieved papers

The authors propose LRNNs, a new neural network architecture that generalizes MLPs by replacing fixed scalar activations with learnable product-structured activation functions. Each LRNN neuron learns a flexible, data-dependent activation through multiplicative composition of simpler univariate transformations, enabling adaptive non-linearities and efficient capture of high-order interactions.

10 retrieved papers

Theoretical analysis of LRNNs

10 retrieved papers

The authors establish theoretical foundations for LRNNs, proving universal approximation capabilities and demonstrating that LRNNs can overcome the curse of dimensionality for functions with decaying functional ANOVA structure. They also show that learnable product-structured activations enable adaptive control of spectral bias, which is crucial for signal representation tasks.

10 retrieved papers

Variance-controlled initialization mechanism

Can Refute

10 retrieved papers

The authors introduce a scaling mechanism that ensures stable gradient flow through arbitrarily wide product structures. This mechanism bounds the variance of LRNN activations and gradients independently of projection width, enabling automatic relevance determination and stable optimization even for wide product structures.

10 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Deep low-rank separated neural networks (LRNNs) architecture

[29] Don't Forget the Nonlinearity: Unlocking Activation Functions in Efficient Fine-Tuning PDF

Cannot Refute

[30] Adaptive morphing activation function for neural networks PDF

Cannot Refute

[31] SS-KAN: Self-supervised Kolmogorov-Arnold networks for limited data remote sensing semantic segmentation PDF

Cannot Refute

[32] Tunable Nonlinear Activation Functions Enabled by WOâ Films for Adaptive Diffractive Deep Neural Networks PDF

Cannot Refute

[33] Balanced Learnable Activation Function (BLAF): Enhancing Classification Accuracy in Deep Neural Networks PDF

Cannot Refute

[34] Learning activation functions in deep (spline) neural networks PDF

Cannot Refute

[35] Graph-adaptive activation functions for graph neural networks PDF

Cannot Refute

[36] Trainable activation function with differentiable negative side and adaptable rectified point PDF

Cannot Refute

[37] You say factorization machine, i say neural network-it's all in the activation PDF

Cannot Refute

[38] ENN: A Neural Network with DCT Adaptive Activation Functions PDF

Cannot Refute

Contribution

Theoretical analysis of LRNNs

[9] The gap between theory and practice in function approximation with deep neural networks PDF

Cannot Refute

[10] Universal approximation property of random neural networks PDF

Cannot Refute

[11] Nonparametric regression on low-dimensional manifolds using deep ReLU networks: Function approximation and statistical recovery PDF

Cannot Refute

[12] A survey on kolmogorov-arnold network PDF

Cannot Refute

[13] Forwardâbackward stochastic neural networks: deep learning of high-dimensional partial differential equations PDF

Cannot Refute

[14] Tensor neural networks for high-dimensional FokkerâPlanck equations PDF

Cannot Refute

[15] Deep neural network approximation theory for high-dimensional functions PDF

Cannot Refute

[16] Efficient PDE-constrained optimization under high-dimensional uncertainty using derivative-informed neural operators PDF

Cannot Refute

[17] Functional tensor decompositions for physics-informed neural networks PDF

Cannot Refute

[18] Universal approximation of functions on sets PDF

Cannot Refute

Contribution

Variance-controlled initialization mechanism

[23] How to start training: The effect of initialization and architecture PDF

Can Refute

[19] A review on weight initialization strategies for neural networks PDF

Cannot Refute

[20] Generalization bounds of stochastic gradient descent for wide and deep neural networks PDF

Cannot Refute

[21] Precise gradient descent training dynamics for finite-width multi-layer neural networks PDF

Cannot Refute

[22] Understanding the dynamics of gradient flow in overparameterized linear models PDF

Cannot Refute

[24] On the initialization of graph neural networks PDF

Cannot Refute

[25] Effects of depth, width, and initialization: A convergence analysis of layer-wise training for deep linear neural networks PDF

Cannot Refute

[26] Gradient descent provably optimizes over-parameterized neural networks PDF

Cannot Refute

[27] Impact of weight initialization techniques on neural network efficiency and performance: a case study with MNIST dataset PDF

Cannot Refute

[28] Exploring and Improving Initialization for Deep Graph Neural Networks: A Signal Propagation Perspective PDF

Cannot Refute

Deep Learning with Learnable Product-Structured Activations

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

Contribution Analysis

Deep low-rank separated neural networks (LRNNs) architecture

[29] Don't Forget the Nonlinearity: Unlocking Activation Functions in Efficient Fine-Tuning PDF

[30] Adaptive morphing activation function for neural networks PDF

[31] SS-KAN: Self-supervised Kolmogorov-Arnold networks for limited data remote sensing semantic segmentation PDF

[32] Tunable Nonlinear Activation Functions Enabled by WOâ Films for Adaptive Diffractive Deep Neural Networks PDF

[33] Balanced Learnable Activation Function (BLAF): Enhancing Classification Accuracy in Deep Neural Networks PDF

[34] Learning activation functions in deep (spline) neural networks PDF

[35] Graph-adaptive activation functions for graph neural networks PDF

[36] Trainable activation function with differentiable negative side and adaptable rectified point PDF

[37] You say factorization machine, i say neural network-it's all in the activation PDF

[38] ENN: A Neural Network with DCT Adaptive Activation Functions PDF

Theoretical analysis of LRNNs

[9] The gap between theory and practice in function approximation with deep neural networks PDF

[10] Universal approximation property of random neural networks PDF

[11] Nonparametric regression on low-dimensional manifolds using deep ReLU networks: Function approximation and statistical recovery PDF

[12] A survey on kolmogorov-arnold network PDF

[13] Forwardâbackward stochastic neural networks: deep learning of high-dimensional partial differential equations PDF

[14] Tensor neural networks for high-dimensional FokkerâPlanck equations PDF

[15] Deep neural network approximation theory for high-dimensional functions PDF

[16] Efficient PDE-constrained optimization under high-dimensional uncertainty using derivative-informed neural operators PDF

[17] Functional tensor decompositions for physics-informed neural networks PDF

[18] Universal approximation of functions on sets PDF

Variance-controlled initialization mechanism

[23] How to start training: The effect of initialization and architecture PDF

[19] A review on weight initialization strategies for neural networks PDF

[20] Generalization bounds of stochastic gradient descent for wide and deep neural networks PDF

[21] Precise gradient descent training dynamics for finite-width multi-layer neural networks PDF

[22] Understanding the dynamics of gradient flow in overparameterized linear models PDF

[24] On the initialization of graph neural networks PDF

[25] Effects of depth, width, and initialization: A convergence analysis of layer-wise training for deep linear neural networks PDF

[26] Gradient descent provably optimizes over-parameterized neural networks PDF

[27] Impact of weight initialization techniques on neural network efficiency and performance: a case study with MNIST dataset PDF

[28] Exploring and Improving Initialization for Deep Graph Neural Networks: A Signal Propagation Perspective PDF

Table of Contents

[32] Tunable Nonlinear Activation Functions Enabled by WOâ Films for Adaptive Diffractive Deep Neural Networks PDF

[13] Forwardâbackward stochastic neural networks: deep learning of high-dimensional partial differential equations PDF

[14] Tensor neural networks for high-dimensional FokkerâPlanck equations PDF