AI News, OpenAI Demonstrates Complex Manipulation Transfer from Simulation to Real World

OpenAI Demonstrates Complex Manipulation Transfer from Simulation to Real World

In-hand manipulation is one of those things that’s fairly high on the list of “skills that are effortless for humans but extraordinarily difficult for robots.” Without even really thinking about it, we’re able to adaptively coordinate four fingers and a thumb with our palm and friction and gravity to move things around in one hand without using our other hand—you’ve probably done this a handful (heh) of times today already, just with your cellphone.

Learning through practice and experience is still the way to go for complex tasks like this, and the challenge is finding a way to learn faster and more efficiently than just giving a robot hand something to manipulate over and over until it learns what works and what doesn’t, which would probably take about a hundred years.

Rather than wait a hundred years, researchers at OpenAI have used reinforcement learning to train a convolutional neural network to control a five-fingered Shadow hand to manipulate objects, all in just 50 hours.

They’ve managed this by doing it in simulation, a technique that is notoriously “doomed to succeed,” but by carefully randomizing the simulation to better match real-world variability, a real Shadow hand was able to successfully perform in-hand manipulation on real objects without any retraining at all.

The issue with using simulation to train robots is that the real world is impossible to precisely simulate, and it’s even more impossible to precisely simulate when it comes to thorny little things like friction and compliance and object-object interaction.

OpenAI is instead making accuracy secondary to variability, giving its moderately realistic simulations a bunch of slightly different tweaks with the goal of making the behaviors that they train robust enough to function outside of simulation as well.

To reiterate, OpenAI is well aware that the simulations that it’s using aren’t nearly complex enough to accurately model a huge pile of important things, from friction to the way that the fingertips of a real robot hand experience wear over time.

This includes the mass and dimensions of the object, friction of both the object’s surface and the robot’s fingertips, how well the robot’s joints are damped, actuator forces, joint limits, motor backlash and noise, and more.

And that’s just the manipulation—there’s just as much variability in the way the RGB cameras were trained in object pose estimation, which is a bit easier to visualize: OpenAI calls this “domain randomization,” and with in-hand manipulation, OpenAI says, “we wanted to see if scaling up domain randomization could solve a task well beyond the reach of current methods in robotics.” Here’s how it went, with two independently trained networks (one for vision and one for manipulation) visually detecting the pose of the cube and then manipulating it into different orientations.

These cube manipulations (the system can do at least 50 in a row successfully) are the result of 6,144 CPU cores and 8 GPUs collecting 100 years of simulated robot experience in 50 hours.

The upshot of all of this is that it turns out you can, in fact, train robots to do complex physical things in simulation, and then immediately use those skills out of simulation, which is a big deal, because training in simulation is much much faster than training in the real world.

It’s not clear what the asymptotic performance curve looks like, but we consider the project pretty much completed since even achieving one rotation is so far beyond what current state-of-the-art methods can do— we initially chose 50 because we thought 25 would ensure it’s clearly demonstrated we’ve solved it, and then we added a 100% safety margin :).

We actually tried manipulating a squishy and slightly smaller foam block using the same policy just for fun, and the performance doesn’t significantly differ at all from the block that’s entirely solid.

We also ran experiments with different-sized blocks in simulation (here are some tiny and giant ones), where we re-trained on the new setting, which worked equally well (haven’t tried this on the real robot though).

One of our summer interns, Hsiao-Yu Fish Tung, is actually working on making the vision model fully invariant to camera placement using the same basic technique of randomizing the camera pose and orientation over a wide range.

In the long term, we’re hoping to give robots general manipulation capabilities so that they can learn about their environment similar to how a toddler would learn, by playing with objects in their vicinity but not necessarily with adult supervision.

We think that intelligence is grounded in interaction with the real world, and that in order to accomplish our mission of building safe artificial general intelligence, we have to be able to learn from real-world sensory experiences as well as from simulation data.

Last month, we showed an earlier version of this robot where we'd trained its vision system using domain randomization, that is, by showing it simulated objects with a variety of color, backgrounds, and textures, without the use of any real images.

(The vision system is never trained on a real image.) The imitation network observes a demonstration, processes it to infer the intent of the task, and then accomplishes the intent starting from another starting configuration.

Applied to block stacking, the training data consists of pairs of trajectories that stack blocks into a matching set of towers in the same order, but start from different start states.

At test time, the imitation network was able to parse demonstrations produced by a human, even though it had never seen messy human data before.

The imitation network uses soft attention over the demonstration trajectory and the state vector which represents the locations of the blocks, allowing the system to work with demonstrations of variable length.

It also performs attention over the locations of the different blocks, allowing it to imitate longer trajectories than it's ever seen, and stack blocks into a configuration that has more blocks than any demonstration in its training data.

Watch a robot hand learn to manipulate objects just like a human hand

(Most of the work, in both simulation and reality, was done with a child’s building block with letters on its sides.) They also gave the program short-term memory, so after a few seconds of handling the cube, it got a sense of the block’s exact size and other factors and adjusted for them.

In both virtual training and a physical test to see how well the training transferred to the real hand, the hand was instructed to manipulate a cube in a series of new orientations so that, for example, the side with the A on it was facing up and side with the P on it was facing out.

The virtual hand, after the equivalent of 100 years of trial-and-error practice (sped up in simulation), performed an average of 30 consecutive reorientations without getting stuck or dropping the cube.The physical hand completed an average of 15 consecutive reorientations without getting stuck or dropping the cube, the researchers report today.

What didn't pan out

We built a simulated version of our robotics setup using the MuJoCo physics engine.

This simulation is only a coarse approximation of the real robot: The simulation can be made more realistic by calibrating its parameters to match robot behavior, but many of these effects simply cannot be modeled accurately in current simulators.

By building simulations that support transfer, we have reduced the problem of controlling a robot in the real world to accomplishing a task in simulation, which is a problem well-suited for reinforcement learning.

While the task of manipulating an object in a simulated hand is already somewhat difficult, learning to do so across all combinations of randomized physical parameters is substantially more difficult.

Because most dynamics parameters cannot be inferred from a single observation, we used an LSTM — a type of neural network with memory — to make it possible for the network to learn about the dynamics of the environment.

For development and testing, we validated our control policy against objects with embedded motion tracking sensors to isolate the performance of our control and vision networks.

By combining these two independent networks, the control network that reorients the object given its pose and the vision network that maps images from cameras to the object’s pose, Dactyl can manipulate an object by seeing it.

Learning to rotate an object in simulation without randomizations requires about 3 years of simulated experience, while achieving similar performance in a fully randomized simulation requires about 100 years of experience.

This project completes a full cycle of AI development that OpenAI has been pursuing for the past two years: we've developed a new learning algorithm, scaled it massively to solve hard simulated tasks, and then applied the resulting system to the real world.

Generalizing from Simulation

Our latest robotics techniques allow robot controllers, trained entirely in simulation and deployed on physical robots, to react to unplanned changes in the environment as they solve simple tasks.

Our new results provide more evidence that general-purpose robots can be built by training entirely in simulation, followed by a small amount of self-calibration in the real world.

But most tasks do not — to define a dense reward for block stacking, you'd need to encode that the arm is close to the block, that the arm approaches the block in the correct orientation, that the block is lifted off the ground, the distance of block to the desired position, etc.

We spent a number of months unsuccessfully trying to get conventional RL algorithms working on pick-and-place tasks before ultimately developing a new reinforcement learning algorithm, Hindsight Experience Replay (HER), which allows agents to learn from a binary reward by pretending that a failure was what they wanted to do all along and learning from it accordingly.

(The actor is the policy, and the critic is a network which receives action/state pairs and estimates their Q-value, or sum of future rewards, providing training signal to the actor.) While the critic has access to the full state of the simulator, the actor only has access to RGB and depth data.

We see three approaches to building general-purpose robots: training on huge fleets of physical robots, making simulators increasingly match the real world, and randomizing the simulator to allow the model to generalize to the real-world.

Robots Are Teaching Themselves With Simulations, What’s Next?

This robotic hand practiced rotating a block for 100 years inside a 50 hour simulation! Is this the next revolutionary step for neural networks? A.I. Is Monitoring ...

Learning Dexterity

We've trained a human-like robot hand to manipulate physical objects with unprecedented dexterity. Our system, called Dactyl, is trained entirely in simulation ...

OpenAI's Dactyl improves Dexterity of Robotic Hands without Human Input

OpenAI has trained a human-like robot hand to manipulate physical objects with unprecedented dexterity. Their system, called Dactyl, is trained entirely in ...

[ROS tutorial] OpenAI Gym For ROS based Robots 101. Gazebo Simulator

Learn what you need to use the Open AI-Gym in your next robotics project with ROS based robots. [The course is available here] [ A brief ..

HANK BE STRONG - Rolling Line VR Toy Train Simulator - Map

I got told to check out this map so I did. It snowy and stuff. I wanted to see how many trains I could get onto the track. I then wanted to see if I could get a few off ...

Robots learn Grasping by sharing their hand-eye coordination experience with each other | QPT

A human child is able to reliably grasp objects after one year, and takes around four years to acquire more sophisticated precision grasps. However, networked ...

*NEW UPDATE* Rock Monster, Airport, Fire Train? - Tracks - The Train Set Game Ep 8

Welcome back to Tracks - The Train Set Game! A new update has just come out introducing all kinds of new objects to play with, including a fire station, ...

Robots that Learn

Learn more: // Originally on Vimeo: Directed by: Jonas Schneider Starring: Josh Tobin .

Intrinsically Motivated Goal Exploration Processes for Open-Ended Robot Learning

Intrinsically Motivated Multi-Task Reinforcement Learning with open-source Explauto library and Poppy Humanoid Robot Sébastien Forestier, Yoan Mollard, ...

Flexible Muscle-Based Locomotion for Bipedal Creatures

We present a control method for simulated bipeds, in which natural gaits are discovered through optimization. No motion capture or key frame animation was ...