AnyTouch 2: General Optical Tactile Representation Learning For Dynamic Tactile Perception

ICLR 2026 Conference SubmissionAnonymous Authors
Tactile Representation LearningTactile DatasetDynamic Tactile Perception
Abstract:

Real-world contact-rich manipulation demands robots to perceive temporal tactile feedback, capture subtle surface deformations, and reason about object properties and force dynamics. Although optical tactile sensors are uniquely capable of providing such rich information, existing tactile datasets and models remain limited. These resources primarily focus on object-level attributes (e.g., material) while largely overlooking fine-grained temporal dynamics. We consider that advancing dynamic tactile perception requires a systematic hierarchy of dynamic perception capabilities to guide both data collection and model design. To address the lack of tactile data with rich dynamic information, we present ToucHD, a large-scale tactile dataset spanning tactile atomic actions, real-world manipulations, and touch-force paired data. Beyond scale, ToucHD establishes a comprehensive dynamic data ecosystem that explicitly supports hierarchical perception capabilities from the data perspective. Building on it, we propose AnyTouch 2, a general tactile representation learning framework for diverse optical tactile sensors that unifies object-level understanding with fine-grained, force-aware dynamic perception. The framework captures both pixel-level and action-specific deformations across frames, while explicitly modeling physical force dynamics, thereby learning multi-level dynamic perception capabilities from the model perspective. We evaluate our model on benchmarks that covers static object properties and dynamic physical attributes, as well as real-world manipulation tasks spanning multiple tiers of dynamic perception capabilities—from basic object-level understanding to force-aware dexterous manipulation. Experimental results demonstrate consistent and strong performance across sensors and tasks, highlighting the framework’s effectiveness as a general dynamic tactile perception model.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
30
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: tactile representation learning for dynamic perception. The field has organized itself around several complementary directions. Self-supervised and contrastive tactile representation learning explores how to extract generalizable features from raw tactile signals, often drawing on cross-modal alignment or temporal consistency principles (e.g., Multimodal Contrastive Pretraining[1], Temporal Contrastive Learning[9]). Task-specific tactile learning for manipulation focuses on end-to-end policies for grasping and insertion, where tactile feedback directly informs control (e.g., Dexterous Grasping RL[4], Tactile RL Insertion[34]). Meanwhile, tactile sensor hardware and signal processing addresses the physical design and low-level encoding of touch signals, neuromorphic and spiking tactile processing investigates event-driven architectures (Spiking Texture Recognition[45], SpikeTouch Optimization[50]), and human tactile perception and psychophysics examines biological mechanisms. Haptic rendering and human-robot interaction studies how to convey tactile information to users, while non-tactile and peripheral applications extend these ideas to other sensory modalities or domains. A particularly active line of work concerns cross-sensor and transferable tactile representations, where the goal is to learn encoders that generalize across different sensor types and tasks. AnyTouch Dynamic[0] sits squarely in this branch, emphasizing dynamic perception and the ability to handle temporal sequences from diverse tactile hardware. It shares this transferability emphasis with AnyTouch Unified[6] and Transferable Tactile Transformers[48], both of which also aim to unify representations across sensor modalities. In contrast, works like Incomplete Tactile Autoencoders[5] focus more narrowly on reconstruction under partial observations, and Predictive Visuo Tactile[3] integrates vision and touch through predictive modeling rather than purely tactile transfer. The main open question in this cluster is how to balance sensor-agnostic generality with the fine-grained, sensor-specific details that often matter for downstream manipulation tasks.

Claimed Contributions

Tactile Dynamic Pyramid and ToucHD Dataset

The authors propose a five-tier tactile dynamic pyramid framework that stratifies tactile data by the complexity of dynamic perception capabilities they support, and introduce ToucHD, a large-scale hierarchical dataset spanning simulated atomic actions, real-world manipulations, and touch-force pairs to enrich higher-tier dynamic tactile data.

10 retrieved papers
AnyTouch 2 General Tactile Representation Learning Framework

The authors develop AnyTouch 2, a unified representation learning framework that integrates pixel-level deformation modeling, semantic-level tactile feature understanding, and physical-level force dynamics prediction to support hierarchical dynamic tactile perception across multiple sensor types.

10 retrieved papers
Multi-Level Dynamic Enhanced Modules

The authors introduce specialized modules including frame-difference reconstruction for capturing fine-grained temporal variations, action matching for semantic-level dynamic understanding, and force prediction tasks to explicitly model physical properties underlying tactile interactions.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Tactile Dynamic Pyramid and ToucHD Dataset

The authors propose a five-tier tactile dynamic pyramid framework that stratifies tactile data by the complexity of dynamic perception capabilities they support, and introduce ToucHD, a large-scale hierarchical dataset spanning simulated atomic actions, real-world manipulations, and touch-force pairs to enrich higher-tier dynamic tactile data.

Contribution

AnyTouch 2 General Tactile Representation Learning Framework

The authors develop AnyTouch 2, a unified representation learning framework that integrates pixel-level deformation modeling, semantic-level tactile feature understanding, and physical-level force dynamics prediction to support hierarchical dynamic tactile perception across multiple sensor types.

Contribution

Multi-Level Dynamic Enhanced Modules

The authors introduce specialized modules including frame-difference reconstruction for capturing fine-grained temporal variations, action matching for semantic-level dynamic understanding, and force prediction tasks to explicitly model physical properties underlying tactile interactions.

AnyTouch 2: General Optical Tactile Representation Learning For Dynamic Tactile Perception | Novelty Validation