AI News, OpenAI Releases Algorithm That Helps Robots Learn from Hindsight
- On Sunday, June 3, 2018
- By Read More
OpenAI Releases Algorithm That Helps Robots Learn from Hindsight
Being able to learn from mistakes is a powerful ability that humans (being mistake-prone) take advantage of all the time.
we also collect information about how we fail that we may later be able to apply to a goal that’s slightly different, making us much more effective at generalizing what we learn than robots tend to be.
It’s a failure to hit a home run, which sucks, but you’ve actually learned two things in the process: You’ve learned one way of not hitting a home run, and you’ve also learned exactly how to hit a foul ball.
With hindsight experience replay, you decide to learn from what you just did anyway, essentially by saying, “You know, if I’d wanted to hit a foul ball, that would have been perfect!” You might not have achieved your original goal, but you’ve still made progress.
The trade-off, though, is thatit makes learning slower, because the robot isn’t getting incremental feedback, it’s just being told over and over “no cookie for you” unless it gets very lucky and manages to succeed by accident.
Just imagine the robot not succeeding and then being like, “Yeah I totally meant to do that.” With HER, you’d say, “Oh, well, in that case, great, have a cookie!” By doing this substitution, the reinforcement learning algorithm can obtain a learning signal since it has achieved some goal;
To learn more aboutwhat makes HER more effective than other reinforcement learning algorithms, we spoke via email withMatthias Plappert, a member of the technical staff at OpenAI: IEEE Spectrum:Can you explain what the difference is between sparse and dense rewards, and why you recommend sparse rewards as being more realistic in robotics applications?
We also found that learning with HER in simulation is often much simpler since it does not require extensive tuning of the reward function (it is typically much easier to detect if an outcome was successful) and due to the fact that the critic (a neural network that tries to predict how well the agent will do in the future) has a much simpler job as well (since it does not need to learn a very complex function but instead also only has to differentiate between successful vs.
OpenAI has made an open source version of HER available, and they’re releasing a set of simulated robot environments based on real robot platforms, including a Shadow hand and a Fetch research robot.
- On Tuesday, February 18, 2020
Lecture 14 | Deep Reinforcement Learning
In Lecture 14 we move from supervised learning to reinforcement learning (RL), in which an agent must learn to interact with an environment in order to ...
MIT 6.S094: Deep Reinforcement Learning for Motion Planning
This is lecture 2 of course 6.S094: Deep Learning for Self-Driving Cars taught in Winter 2017. This lecture introduces types of machine learning, the neuron as a ...
Andrew Trask - Really Quick Questions with an AI Researcher
I ask 67 questions to Oxford Scholar and AI researcher Andrew Trask as we go for a walk through Granary Square in London, England. Trask is a PhD student at ...
How Microsoft AI defeated Ms Pacman : Build 2018
Microsoft Research developed a model called Hybrid Reward Architecture for scaling reinforcement learning to tasks that have extremely complex value ...
How to Make an Amazing Video Game Bot Easily
In this video, we first go over the history of video game AI, then I introduce OpenAI's Universe, which lets you build a bot that can play thousands of different video ...
Stochastic Games and Multiagent RL - Georgia Tech - Machine Learning
Watch on Udacity: Check out the full Advanced Operating Systems course for free ..
Hindsight Experience Replay | Two Minute Papers #192
The paper "Hindsight Experience Replay" is available here: Our Patreon page with the details: ..
Robot Archer iCub
Humanoid robot iCub learns the skill of archery. After being instructed how to hold the bow and release the arrow, the robot learns by itself to aim and shoot ...
An autonomous obstacle-avoiding virtual car, trained with Deep Reinforcement Learning
This video log is part of a series of machine learning experiments. Project site at Github: This ..
Max Tegmark: "Life 3.0: Being Human in the Age of AI" | Talks at Google
Max Tegmark, professor of physics at MIT, comes to Google to discuss his thoughts on the fundamental nature of reality and what it means to be human in the ...