RQ3: Which visual foundation models are best suited for keypoint extraction?

One representative rollout per condition (in-distribution, novel objects, scene variations) for each feature model and extraction method combination. Evaluated on Place Shoe and Place Mug.