GripNGrab
Project Overview
Task:
Develop an algorithm for Object Detection, 6DOF Pose Estimation, and Grasp Pose Extraction for the RCUP@Work Objects Dataset.
This project aims to enable robotic systems to detect objects, estimate their 6 Degrees of Freedom (6DOF) pose, and determine optimal grasp poses for manipulation tasks. The implementation is built from scratch using fundamental techniques, avoiding pre-built grasp detection solutions.
How It Works
Pipeline Overview
-
Object Detection
- Detect objects in the RCUP@Work dataset using deep convolutional neural network (CNN).
- Output: Bounding boxes for detected objects.
-
6DOF Pose Estimation
- Estimate the position (x, y, z) and orientation (roll, pitch, yaw) of the detected objects and (u,v) keypoints extracted.
- Output: A 6DOF pose for each object.
-
Grasp Pose Extraction
- Analyze the object’s geometry and pose to compute optimal grasp points.
- Output: Grasp poses suitable for robotic manipulation.
Mathematical Concepts
-
Keypoint Detection
- Utilizes a custom algorithm to extract corner points from object images, with fallback mechanisms for low keypoint scenarios.
- Mathematics:
- Edge Detection: Apply Gaussian blur to reduce noise and use Canny edge detection to identify edges.
- Contour Approximation: Filter contours by length and approximate them using the Douglas-Peucker algorithm to extract corner points.
- Fallback Strategy: Iteratively relax constraints (e.g., reduce contour length and approximation epsilon) to ensure at least 4 keypoints are detected.
- Keypoint Selection: Limit the number of keypoints to a maximum threshold (e.g., 10) for computational efficiency.
- Algorithm Overview:
def extract_corner_points(image, min_contour_length=50, max_corners=10, epsilon_factor=0.02): # Step 1: Preprocess image (grayscale, blur, edge detection) # Step 2: Find and filter contours by length # Step 3: Approximate contours to extract corner points # Step 4: Apply fallback if fewer than 4 keypoints are found # Step 5: Limit keypoints to `max_corners` and visualize results return corner_points, edges - Output:
- Extracted 2D corner keypoints (N, 2).
- Edge-detected image for visualization.
-
6DOF Pose Estimation
- Relies on solving the Perspective-n-Point (PnP) problem.
- Mathematics:
- Camera projection model:
Wheres * [u, v, 1]^T = K * [R | t] * [X, Y, Z, 1]^TKis the camera intrinsic matrix,[R | t]is the rotation and translation matrix, and[X, Y, Z]are 3D object points.
- Camera projection model:
-
Grasp Pose Extraction
- Computes grasp points based on object geometry and stability analysis.
- Mathematics:
- Principal Component Analysis (PCA): Identify the object’s major and minor axes from contour points.
- Grasp point selection: Choose two points along the minor axis, equidistant from the object’s centroid.
- Mathematics:
Deliverables
Visualizations
-
Images or videos demonstrating:
-
Detected objects with bounding boxes.

-
Grasp points overlaid on objects.






-
Estimated 6DOF poses visualized as coordinate frames.


-
Challenges and Improvements
Current Challenge: 6DOF Pose Estimation Lagging
Reason:
- Incorrect keypoint detection due to the presence of a variety of objects with complex shapes.
Proposed Improvement:
- Implement a combined keypoint + bounding box detection neural network.
Why It Works:
- Every object’s 2D keypoints can be detected and matched with its 3D points, enabling accurate Perspective-n-Point (PnP) solutions.
Limitation:
- The RCUP@Work dataset only provides bounding box annotations, not keypoints.
- Creating a custom, well-performing model for all 18 objects would require significant time for annotation and training.
Impact:
- This limitation results in lower accuracy and robustness in 6DOF pose estimation.