1,6,256,900,1]->[1,256,900,6,1]
# 根据ref_p给定的网格信息将将feat特征图的值给填充到网格中此时每个query包含的从特征图中提取的信息
sampled_feat...= F.grid_sample(feat, reference_points_cam_lvl)
sampled_feat = sampled_feat.view(B, N, C, num_query..., 1).permute(0, 2, 3, 1, 4)
sampled_feats.append(sampled_feat)
sampled_feats = torch.stack...(sampled_feats, -1)
# [1,256,900,6,1,4]
sampled_feats = sampled_feats.view(B, C, num_query, num_cam...是获取不同特征图的组合然后组合在一起
# mask: [1,1,900,6,1,1] 获得在每个相机图像下每个query是否会投影上去
return reference_points_3d, sampled_feats