Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning
Overview
Overall Novelty Assessment
The paper introduces MIKASA, a benchmark suite for evaluating memory capabilities in reinforcement learning, with particular emphasis on tabletop robotic manipulation. It resides in the 'Memory-Intensive Task Benchmarks' leaf of the taxonomy, which contains only two papers total (including this one). This sparse population suggests the research direction—systematic memory evaluation in RL—remains relatively underdeveloped compared to the broader field of memory mechanisms and architectures, where multiple crowded subtopics exist (e.g., Transformer-Based Memory with four papers, Episodic Memory with three).
The taxonomy reveals that MIKASA sits within 'Memory Benchmarking and Evaluation,' a branch containing three leaves: Memory-Intensive Task Benchmarks, Memory Interpretability and Analysis, and Partially Observable and Control Tasks. Neighboring branches focus on memory mechanisms (External and Episodic Memory Systems, Recurrent and Sequence Models) and applications (Cognitive and Neuroscience-Inspired Memory, Embodied and Robotic Agents). The paper's dual focus on general memory evaluation (MIKASA-Base) and robotic manipulation (MIKASA-Robo) positions it at the intersection of benchmarking and embodied applications, bridging two otherwise separate research directions.
Among 30 candidates examined, the classification framework contribution shows one refutable candidate out of ten examined, suggesting some prior taxonomic work exists. The MIKASA-Base unified benchmark found no clear refutations across ten candidates, indicating potential novelty in its cross-scenario evaluation approach. The MIKASA-Robo robotic benchmark identified one refutable candidate among ten examined, likely reflecting existing robotic memory tasks but possibly differing in scope or design. The limited search scale (30 total candidates) means these statistics capture only the most semantically similar prior work, not an exhaustive field survey.
Based on the top-30 semantic matches examined, the work appears to occupy a relatively sparse research area within memory benchmarking, particularly for robotic manipulation scenarios. The taxonomy structure confirms that systematic memory evaluation remains less explored than memory mechanism design. However, the presence of at least one overlapping work for two of three contributions suggests the paper builds incrementally on emerging benchmarking efforts rather than pioneering an entirely new direction.
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors introduce a systematic taxonomy that organizes memory-intensive tasks into four key categories: object memory, spatial memory, sequential memory, and memory capacity. This framework enables systematic evaluation of memory-enhanced agents across diverse scenarios without added complexity.
The authors present MIKASA-Base, a Gymnasium-based framework that consolidates widely used open-source memory-intensive environments under a common API. This benchmark standardizes task access and evaluation, supporting fair comparisons and reproducible research in memory-centric RL.
The authors develop MIKASA-Robo, an open-source benchmark comprising 32 robotic tabletop manipulation tasks across 12 categories. These tasks target specific memory-dependent skills in realistic settings and address the gap in standardized benchmarks for memory evaluation in robotic manipulation.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
[20] Memory Gym: Towards Endless Tasks to Benchmark Memory Capabilities of Agents PDF
Contribution Analysis
Detailed comparisons for each claimed contribution
Comprehensive classification framework for memory-intensive RL tasks
The authors introduce a systematic taxonomy that organizes memory-intensive tasks into four key categories: object memory, spatial memory, sequential memory, and memory capacity. This framework enables systematic evaluation of memory-enhanced agents across diverse scenarios without added complexity.
[30] Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation PDF
[58] Rmm: Reinforced memory management for class-incremental learning PDF
[59] A review of meta-heuristic high utility patterns mining methods PDF
[60] Inventory planning in capacitated high-tech assembly systems under non-stationary demand PDF
[61] Improving the Accuracy of Extracting Useful Information in Search Engines from the Web Using Deep Reinforcement Learning Based on the Q-Learning Algorithm PDF
[62] Memory Reduction through Experience Classification f or Deep Reinforcement Learning with Prioritized Experience Replay PDF
[63] Synthetic POMDPs to Challenge Memory-Augmented RL: Memory Demand Structure Modeling PDF
[64] Dissociations between rule-based and information-integration categorization are not caused by differences in task difficulty PDF
[65] Lightweight Multi-Class Autoencoder Model for Malicious Traffic Detection in Private 5G Networks PDF
[66] Exploring new computing paradigms for data-intensive applications PDF
MIKASA-Base unified benchmark for memory RL
The authors present MIKASA-Base, a Gymnasium-based framework that consolidates widely used open-source memory-intensive environments under a common API. This benchmark standardizes task access and evaluation, supporting fair comparisons and reproducible research in memory-centric RL.
[5] Memory-r1: Enhancing large language model agents to manage and utilize memories via reinforcement learning PDF
[9] Mastering memory tasks with world models PDF
[20] Memory Gym: Towards Endless Tasks to Benchmark Memory Capabilities of Agents PDF
[51] Craftax: A Lightning-Fast Benchmark for Open-Ended Reinforcement Learning PDF
[52] Coom: A game benchmark for continual reinforcement learning PDF
[53] Stable Hadamard Memory: Revitalizing Memory-Augmented Agents for Reinforcement Learning PDF
[54] The rise of agentic ai: A review of definitions, frameworks, architectures, applications, evaluation metrics, and challenges PDF
[55] Augmented Memory: Sample-Efficient Generative Molecular Design with Reinforcement Learning PDF
[56] FindingDory: A Benchmark to Evaluate Memory in Embodied Agents PDF
[57] EAST: a comprehensive evaluation framework for swarm intelligence-based UAV path planning PDF
MIKASA-Robo benchmark of memory-intensive robotic manipulation tasks
The authors develop MIKASA-Robo, an open-source benchmark comprising 32 robotic tabletop manipulation tasks across 12 categories. These tasks target specific memory-dependent skills in realistic settings and address the gap in standardized benchmarks for memory evaluation in robotic manipulation.