First-person egocentric demonstrations captured on a consumer Pico VR HMD. Operators perform real household manipulation tasks while wearing the headset; every modality the device exposes is recorded synchronously.
Each episode ships a stereo color pair (60 fps), a 320×240 ToF depth stream (5 fps, uint16 mm), 6-axis IMU, head 6-DoF pose, and per-frame skeletal tracking of 25 joints on each hand.
Left, right, and depth cameras come with full intrinsics + 4×4 extrinsics referenced to the HMD body frame. Stereo baseline is shipped per episode — usable for triangulation, point-cloud lifting, or vision-language pretraining.
Each episode is segmented into hierarchical pick / place / wipe / rearrange / fold spans with start & end frames, bilingual text, and a clear “Action end” sentinel marking when the operator releases the task.