Moving GraspNet Evaluation

Data Preparation

The first step of evaluation is to prepare your own data. You are required to give prediction of given grasp poses, the format of which is given as follows:

scene_xx
├── cam1
│   ├── color
│   │   ├── f1.png
│   │   └── f2.png
│   ├── depth
│   │   ├── f1.png
│   │   └── f2.png
│   └── query_grasp_poses
│       ├── grasp_group.npy
│       └── grasp_query.json
└── cam2
    ├── color
    │   ├── f1.png
    │   └── f2.png
    ├── depth
    │   ├── f1.png
    │   └── f2.png
    └── query_grasp_poses
        ├── grasp_group.npy
        └── grasp_query.json

grasp_group.npy is the query grasp pose in each sequence in camera frame. However, no all the grasp pose is given in the first frame. The frame that the grasp poses are given in listed in start_frame.json. The format of start_frame.json is given below.

[
    {
        "start_frame": "f1",
        "object_id": 1
    },
    {
        "start_frame": "f1",
        "object_id": 1
    },
    {
        "start_frame": "f2",
        "object_id": 2
    }
]

A simple way to load them is to use our API as follows.

from graspnetAPI import MovingGraspNetEval
from graspnetAPI.moving_graspnet import CAMERA_PREFIX
import os
import pickle
import open3d as o3d

MOVING_GRASPNET_ROOT = "/data/moving_graspnet"
SCENE_NAME = "scene_0002"
dump_dir = "fake_pred"

mgne = MovingGraspNetEval(root=MOVING_GRASPNET_ROOT, pred_dir=dump_dir)

camera_sn = "036422060215"
frame_dict = mgne.collision_path_dict_initial_frame
scene_name = list(frame_dict.keys())[0]
print("=" * 40)
print("valid camera:{}".format(frame_dict[scene_name].keys()))
print("=" * 40)
camera_sn = list(frame_dict[scene_name].keys())[0]

query_grasp_pose_list = mgne.load_query_pose(scene_name, camera_sn)
print("sample query grasp pose")
print(query_grasp_pose_list[0])
print("=" * 40)

Each query grasp pose is given by a dict. “start_frame” denotes in which frame the grasp pose first appears. “object_id” denotes the object index that the grasp pose attatched to. “grasp_pose” is a Grasp instance that represents one grasp pose to track.

========================================
valid camera:dict_keys(['036422060215', '037522062165', '104122061850', '104122063678', '104422070042', '104422070044', '105422061350', 'f0190496'])
========================================
sample query grasp pose
{'start_frame': '1636200737221', 'object_id': 18, 'grasp_pose': Grasp: score:0.7000000476837158, width:0.0627814531326294, height:0.019999999552965164, depth:0.019999999552965164, translation:[ 0.08247392 -0.1532829   1.27433228]
rotation:
[[ 0.30200437 -0.17476678  0.93711179]
[ 0.16736801 -0.95801312 -0.23261975]
[ 0.93846774  0.22713524 -0.26006883]]
object id:18}
========================================

You can also visulize the query by running the following script.

VIS_GRASP = False
if VIS_GRASP:
    pcd = mgne.load_point_cloud(
        scene_name, camera_sn, query_grasp_pose_list[0]["start_frame"]
    )
    grasp = query_grasp_pose_list[0]["grasp_pose"]
    o3d.visualization.draw_geometries([pcd, grasp.to_open3d_geometry()])
    # o3d.io.write_point_cloud("pcd.pcd", pcd)
    # grasp.save_npy("g.npy")
_images/moving_query_example.png

You are required to given tracking predictions on these poses in the whole sequence. The score is given by the Multi Grasp pose Tracking Accuracy (MGTA). It is similar to MOTA in multi object tracking task. However, the distance between two grasp poses is calculated in another way given below.

    @classmethod
    def dist_from_grasp_pose(cls, g1: Grasp, g2: Grasp):
        """Calculate grasp distance given two grasp pose.

        Args:
            g1(Grasp): grasp pose 1.
            g2(Grasp): grasp pose 2.

        Returns:
            GraspDist: distance of the two grasp pose.
        """
        translation = np.linalg.norm(g1.translation - g2.translation)
        trace = np.matmul(g1.rotation_matrix, g2.rotation_matrix.T).trace()
        trace = min(max(trace, 1.0), 3.0)
        rotation = math.acos(0.5 * (trace - 1.0))
        return cls(translation, rotation)

You should NOT make preditions before the start frame. Violations will leads to false positive and lower MGTA score. The number of your prediction is not required. However, both false negatives and false positives will result in lower MGTA score.

The file structure of the dump directory should be as follows:

pred
└── scene_xx
    ├── cam1
    │   ├── f1
    │   │   └── 1.npy
    │   └── f2
    │       ├── 1.npy
    │       └── 2.npy
    └── cam2
        ├── f1
        │   ├── 1.npy
        │   └── 2.npy
        └── f2
            ├── 2.npy
            └── 3.npy

It is given in the “scene/camera/frame/predict_id” order. Grasp poses with the same predicted id in different frame are regard as the tracking result in different time step.

Evaluation API

As an example, you can generate predictions from ground truth for which the MGTA score should be 1. Examples are given below.

print("generating ground truth track pose...")
mgne._generate_gt(scene_name, camera_sn, dump_dir=dump_dir)
print("Done.")
========================================
generating ground truth track pose...
Done.

After preparing the prediction files, you can evaluate the Multi Grasppose Tracking Accuracy(MGTA) by calling the evaluation API as shown below.

print("Evaluating ground truth track pose...")
mgta = mgne.get_seq_mgta(scene_name, camera_sn)
print("Done.")
print("=" * 40)
print("mgta of ground truth:{}".format(mgta))
Evaluating ground truth track pose...

CLEAR Config:
THRESHOLD            : 0.5
PRINT_CONFIG         : True
Done.
========================================
mgta of ground truth:1.0