AI News, UC Berkeley Releases Massive Dex-Net 2.0 Dataset

UC Berkeley Releases Massive Dex-Net 2.0 Dataset

Picking things up is such a fundamental skill for robots, and robots have been picking up things for such a long time, that it’s sometimes difficult to understand how challenging grasping still is.

Where grasping gets really tricky is when you’re trying to design a system that can use standardized (and affordable) grippers and sensors to reliably pick up almost anything, including that infinitely long tail of objects that are, for whatever reason, weird and annoying to grasp.

One way around this is to design grasping hardware that uses clever tricks (like enveloping grasps or adhesives) to compensate for not really knowing the best way to pick up a given object, but this may not be a long-term sustainable approach: Solving the problem in software is much more efficient and scalable, if you can pull it off.

Today, Professor Goldberg and AUTOLAB researcher Jeff Mahler are announcing the release of an enormous dataset that provides the foundation for Dex-Net 2.0, a project that uses neural networks to develop highly reliable robot grasping across a wide variety of rigid objects.

At UC Berkeley, Goldberg and Mahler have been working to solve this problem by training a convolutional neural network (CNN) to be able to predict exactly how robust a particular grasp on a given object will be (whether the grasp will fail when the object is lifted, moved, and shaken a bit).

Instead, Dex-Net 2.0 relies on “a probabilistic model to generate synthetic point clouds, grasps, and grasp robustness labels from datasets of 3D object meshes using physics-based models of grasping, image rendering, and camera noise.” In other words, Dex-Net 2.0 leverages cloud computing to rapidly generate a large training set for a CNN, in “a hybrid of well-established analytic methods from robotics and Deep Learning,”as Goldberg explains: The key to Dex-Net 2.0 is a hybrid approach to machine learning Jeff Mahler and I developed that combines physics with Deep Learning.

It combines a large dataset of 3D object shapes, a physics-based model of grasp mechanics, and sampling statistics to generate 6.7 million training examples, and then using a Deep Learning network to learn a function that can rapidly find robust grasps when given a 3D sensor point cloud.

Jeff Mahler: Our current system requires some knowledge specific to the hardware setup, including the focal length and bounds on where the RGB-D sensor will be relative to the robot, the geometry of a parallel-jaw robot gripper (specified as CAD model), and a friction coefficient for the gripper.


To reduce data collection time for deep learning of robust robotic grasp plans, we explore training from a synthetic dataset of 6.7 million point clouds, grasps, and robust analytic grasp metrics generated from thousands of 3D models from Dex-Net 1.0 in randomized poses on a table.

Experiments with over 1,000 trials on an ABB YuMi comparing grasp planning methods on singulated objects suggest that a GQ-CNN trained with only synthetic data from Dex-Net 2.0 can be used to plan grasps in 0.8s with a success rate of 93% on eight known objects with adversarial geometry and is 3x faster than registering point clouds to a precomputed dataset of objects and indexing grasps.

The GQ-CNN is also the highest performing method on a dataset of ten novel household objects, with zero false positives out of 29 grasps classified as robust (100% precision) and a 1.5x higher success rate than a registration-based method.

Releasing the Dexterity Network (Dex-Net) 2.0 Dataset for Deep Grasping

Reliable robot grasping across many objects is challenging due to sensor noise and

used to train deep neural networks to plan grasps from a point clouds on a physical

grasps across a wide variety of objects directly from images (4) with no

To reduce training time, one alternative is to use Cloud Computing to rapidly compute grasps across a large

dataset of object mesh models (5) using physics-based models of grasping (6).

These methods rank grasps by a quantity called the grasp robustness, which is the probability of grasp success predicted by models from mechanics, such as whether

3D object meshes (10) using physics-based models of grasping, image rendering, and camera noise. The

main insight behind the method is that robust parallel-jaw grasps of an object are strongly correlated with the shape of the object. These

hypothesize that Deep CNNs are able to learn these correlations using a hierarchical set of filters that recognize geometric primitives, similar to the Gabor-like filters

and executing the most robust grasp estimated by the GQ-CNN: When trained on Dex-Net 2.0, the GQ-CNN learns a set of low-level filters that appear

the fingers and object: To evaluate GQ-CNN-based grasp planning on a physical robot, we ran over 1,000 trials

of grasping on an ABB YuMi to investigate: We first measured the ability of our method to plan grasps that could maintain a grasp

on a set of 40 novel objects including objects with moving parts and deformation, such as

important because it suggests that the robot could anticipate failures based on its confidence labels and perform

method: Over summer 2017, we are releasing a subset of our code, datasets, and the trained GQ-CNN weights which we hope will facilitate further research and comparisons.

Today we’re releasing the Dex-Net 2.0 Training Dataset and Code, which includes the Dex-Net 2.0 dataset with 6.7 million synthetic datapoints, pretrained GQ-CNN models from the paper, and the gqcnn Python package for replicating our experiments on classifying robust grasps on synthetic data with GQ-CNNs. We

hope this will facilitate development of new GQ-CNN architectures and training methods that perform better on both synthetic datasets and datasets collected with our robot. You

can access the release with these links: [datasets] [models] [code] Please note that strong performance on this particular dataset may not be indicative of performance on other robots because the dataset is specific to: 1)

The set of poses of the camera relative to the table: 50-70 centimeters directly above a table looking straight down.

Nonetheless, the algorithms behind the dataset can be used to generate datasets for other two-finger grippers, cameras, and camera poses relative to the robot. We

hypothesize that GQ-CNN-based grasp planning will perform best if the training datasets are generated using the gripper geometry, camera intrinsics, and camera location specific to the hardware setup.

datasets are specific to a hardware setup, we volunteer to benchmark performance on the physical robot for models that we deed signficantly outperform

are tutorials to replicate the results from our RSS paper, and we invite researchers to try to improve classification performance on synthetic datasets as well as datasets of grasps collected with our physical ABB YuMi robot.

encourage interested parties to set up a Primesense Carmine 1.08 or Microsoft Kinect for Xbox 360 roughly 50-70 cm above a table and attempt grasps planned by a GQ-CNN-based grasp planner. While

our dataset may not generalize to other hardware setups as noted above, we hope that with further research it may be possible to use GQ-CNNs for lifting and transporting objects with other robots. If

We are also aiming for the following releases and dates of additional data and functionality from Dex-Net over summer and fall 2017: See the project website for updates and progress.

DexNet 2.0: 99% Precision Grasping

UC Berkeley AUTOLAB Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics. Jeffrey Mahler, Jacky Liang, Sherdil Niyaz,..