AI News, How Robots Can Learn From Your Pathetic Failures

How Robots Can Learn From Your Pathetic Failures

Robots that can learn from demonstrations are capable of watching a human do something, and then copying (or even improving on) the motions that the human makes in order to learn new tasks.

In the following video, a human shows a robot how to prop up a block and toss a ball into a basket without actually succeeding at either task:

The researchers developed learning algorithms that allow the robot to analyze your behavior and mathematically determine what parts of the task you're getting right (or you think you're getting right) and where you're screwing up, and eventually, it teaches itself to perform the task better than you.

Robot learning by demonstration

Robot Learning from Demonstration (LfD) or Robot Programming by Demonstration (PbD) (also known as Imitation Learning and Apprenticeship Learning) is a paradigm for enabling robots to autonomously perform new tasks.

Rather than requiring users to analytically decompose and manually program a desired behavior, work in LfD - PbD takes the view that an appropriate robot controller can be derived from observations of a human's own performance thereof.

The task itself may involve multiple subtasks, such as juicing the orange, throwing the rest of the orange in the trash and pouring the liquid in a cup.

In a traditional programming scenario, a human programmer would have to reason in advance and code a robot controller that is capable of responding to any situation the robot may face, no matter how unlikely.

If errors or new circumstances arise after the robot is deployed, the entire costly process may need to be repeated, and the robot recalled or taken out of service while it is fixed.

Furthermore, by utilizing expert knowledge from the user, in the form of demonstrations, the actual learning should be fast compared to current trial-and-error learning, particularly in high dimensional spaces (henceforth addressing part of the well-known curse of dimensionality).

How to imitate consists in determining how the robot will actually perform the learned behaviors to maximize the metric found when solving the What to Imitate problem.

We can think of the perceptual equivalence as dealing with the manner in which the agents perceive the world, and makes sure that the information necessary to perform the task is available to both.

Current approaches to encoding skills through LfD - PbD can be broadly divided between two trends: a low-level representation of the skill, taking the form of a non-linear mapping between sensory and motor information, and, a high-level representation of the skill that decomposes the skill in a sequence of action-perception units.

Different from simple record and play, here the controller is provided with prior knowledge in the form of primitive motion patterns and learns parameters for these patterns from the demonstration.

Learning generally performs inference from statistical analysis of the data across demonstrations, where the signals are modeled via a probability density function, and analyzed with various non-linear regression techniques stemming from machine learning.

For specific tasks this may be true, but to date there does not exist a database of general purpose primitive actions, and it is unclear if the variability of human motion may really be reduced to a finite list.

An alternative is to watch the human perform the complete task and to automatically segment the task to extract the primitive actions (which may then become task-dependent), see e.g.

One group of work investigates how to combine imitation learning with reinforcement learning, a method by which the robot learns through trial and error, so as to maximize a reward.

Other works take inspiration in the way humans teach each other and introduce interactive and bidirectional teaching scenarios whereby the robot becomes an active partner during the teaching.

Reinforcement learning, in contrast, allows the robot to discover new control policies through free exploration of the state-action space, but often takes a long time to converge.

Particularly, demonstrations are used to initiate and guide the exploration done during reinforcement learning, reducing the time to find an improved control policy, which may depart from the demonstrated behavior.

Finally, RL and imitation learning can be used in conjunction at run time, by letting the demonstrator take over part of the control during one trial (Ross et al.

Figure 12 and Figure 13 show two examples of techniques that use Reinforcement Learning in conjunction with LfD - PbD to improve the robot's performance beyond that of a demonstrator, with respect to a known reward function.

discrete state action space), alternative approaches derive a cost function in a continuous space (Ratliff et al 2006, 2009), and include extensions of IRL for continuous state-action space (Howard et al.

The hope is that this multiplicity of policies will make the controller more robust, offering alternative ways to complete the task, when the context no longer allows the robot to perform the task in the optimal way.

This work offers an interesting alternative to approaches that combine imitation learning and reinforcement learning, in that no reward needs to be explicitly determined, see Figure 14, see also Learning from Failure.

The combination of reinforcement learning and imitation learning has been shown effective in addressing the acquisition of skills that require fine tuning of the robot's dynamics.

Likewise, more interactive learning techniques have proven successful in allowing for collaborative improvement of the learnt policy by switching between human-guided and robot-initiated learning.

There will need to be a formalism to allow the robot to select information, to reduce redundant information, select features, and store efficiently new data.

Interested readers may also read (Byrne 2002, Call and Carpenter 2002 and Tomasello et al 1993) that provide some of the biological background on imitation learning in animals upon which the Key Issues in LfD – PbD proposed by (Nehaniv

Robot Fail Compilation

Subscribe for new videos coming soon !

Imitation Learning of Motion Parameters for Dynamic Manipulation Tasks

This video presents an imitation learning approach for a fluid pouring task, which consists of grasping a bottle containing a fluid and pouring a specified amount of the fluid into a container...

Humans Need Not Apply

Discuss this video: ## Robots, Etc: Terex Port automation:

Learning Continuous Human-Robot Interactions from Human-Human Demonstrations

We present a novel imitation learning approach for learning human-robot interactions from human-human demonstrations. During training, the movements of both interactants are recorded via motion...

RoboCop 2 (3/11) Movie CLIP - Robo Flops (1990) HD

RoboCop 2 movie clips: BUY THE MOVIE: Don't miss the HOTTEST NEW TRAILERS: CLIP DESCRIPTION: The Old Man (Dan O'Herlihy) is not

Atlas Robot - Swearing Mod - Boston Dynamics

Sweet new swearing module for the Atlas Robot! This thing loses his ish like Bill O'Reilly or Casey Kasem! This is a parody! Comedy! I believe the "Fair Use" standards apply to this dumb...

BRETT the Robot learns to put things together on his own

Full Story: UC Berkeley researchers have developed algorithms that enable robots to learn motor..

Hapless Boston Dynamics robot in shelf-stacking fail

This outtake was part of the advances made by former Google-owned Boston Dynamics. Though robots have come a long way, the clip suggests View the video at

Robot Learns to Flip Pancakes

Pancake day special! The video shows a Barrett WAM robot learning to flip pancakes by reinforcement learning. The motion is encoded in a mixture of basis force fields through an extension...

Introducing SpotMini

SpotMini is a new smaller version of the Spot robot, weighing 55 lbs dripping wet (65 lbs if you include its arm.) SpotMini is all-electric (no hydraulics) and runs for about 90 minutes on...