DeltaDorsal
Enhancing Hand Pose Estimation with Dorsal Features in Egocentric Views
Abstract
The proliferation of XR devices has made egocentric hand pose estimation a vital task, yet this perspective is inherently challenged by frequent finger occlusions. To address this, we propose a novel approach that leverages the rich information in dorsal hand skin deformation, unlocked by recent advances in dense visual featurizers. We introduce a dual-stream delta encoder that learns pose by contrasting features from a dynamic hand with a baseline relaxed position.
DeltaDorsal extracts skin deformations for hand pose estimation without the need for temporal continuity. Our approach improves tracking under self-occlusion and in scenarios where conventional visual cues are weak or absent. Results show that DeltaDorsal outperforms state-of-art hand pose models in egocentric, self-occluded conditions and better recognizes subtle gestures previously difficult to capture from purely visual data.
Main Contributions
- An analysis of the prevalence and impact of self-occlusion scenarios in common egocentric hand datasets, motivating the use of dorsal features.
- Developed an open-source end-to-end pipeline that transforms dorsal skin imagery into hand pose predictions and click detection without temporal dependencies.
- An evaluation of the system’s performance on 12 participants versus state-of-the-art baselines, as well as analyses with respect to occlusion, skin tone, image size, and backbone.
- Several exemplary applications of the system in key hand interactions, including pinching, tapping, and isometric force click.
High-Resolution Dorsal Dataset Collection
We collected a new dataset of over 170,000 high-resolution frames of dorsal hand data across 17 gestures from 12 participants.
System Design
Evaluation
Our proposed system reduces the mean per-joint angle error (MPJAE) by over 18% compared to SOTA models, mitigates the negative impacts of self-occlusion, and is not meaningfully affected by skin color. To demonstrate the practical utility of our approach, we evaluate its performance on downstream applications like pinch and tap detection. Finally, to illustrate the potential of our skin deformation analysis, we showcase an interaction not possible with conventional egocentric methods: isometric “force click” detection with no discernible hand motion, akin to a trackpad press on surface or pressing fingers together from an already-touching pose.
For more system infomation please refer to our paper.