In October 2017, researchers at Google DeepMind published a paper on an artificial intelligence (AI) program called AlphaGo Zero.

Talking about the achievement, lead researcher David Silver explained that AlphaGo Zero had invented “its own variants which humans don’t even know about or play at the moment.” And it’s here that a new and exciting use for AI comes to light.

During the games, AlphaGo played a handful of highly inventive winning moves, several of which - including move 37 in game two - were so surprising they overturned hundreds of years of received wisdom, and have since been examined extensively by players of all levels.

Unlike the earlier versions of AlphaGo which learnt how to play the game using thousands of human amateur and professional games, AlphaGo Zero learnt to play the game of Go simply by playing games against itself, starting from completely random play.  In doing so, it surpassed the performance of all previous versions, including those which beat the World Go Champions Lee Sedol and Ke Jie, becoming arguably the strongest Go player of all time.

Researchers at Deepmind, part of the Google family of companies, have taken a new approach with this AI.According to team leader, Dr. David Silver, older versions learned by assimilating the playing styles of the best human Go players.The new version, called AlphaGo Zero, taught itself how to play,and practiced against itself as well.AlphaGoZero then beat the programthat beat the best humans in 100 straight games.

How to Go from good to better The new self-taught version of AlphaGo is not only more effective than older versions, it's more creative.In teaching itself, it re-discovered many of the patterns of play that humans have developed and used, but also found new ones onits ownwhich were superior to the ones human players used.

The software is a distillation of previous systems DeepMind has built: It’s more powerful, but simpler and doesn’t require the computational power of a Google server to run.

AlphaGo Zero isn’t the first algorithm to learn from self-play—Elon Musk’s nonprofit OpenAI has used similar techniques to train an AI playing a video game—but its capabilities show that it’s one of the most powerful examples of the technology to date.

“By not using this human data, by not using human features or human expertise in any fashion, we’ve actually removed the constraints of human knowledge,”

Instead of Go moves, DeepMind claims the AlphaGo Zero algorithm will be able to learn the interactions between proteins in the human body to further scientific research, or the laws of physics to help create new building materials.

The idea of using AI to help mine the vast potential combinations of molecules to built a super-battery or some other futuristic device isn’t new;

DeepMind’s first paper in Nature last year showed that the algorithm learned for a while from how humans played the game, and then started to play itself to refine those skills.

AlphaGo Zero could beat the version of AlphaGo that faced Lee Sedol after training for just 36 hours and earned its 100-o score after 72 hours.

It’s not brute computing power that did the trick either: AlphaGo Zero was trained on one machine with 4 of Google’s speciality AI chips, TPUs, while the previous version was trained on servers with 48 TPUs.

Simple, general methods are valued in AI research because less effort is required to bring that same solution to other problems, Tim Salimans, an AI research scientist at OpenAI told Quartz in an email.

Whereas the original AlphaGo learned by ingesting data from hundreds of thousands of games played by human experts, AlphaGo Zero, also developed by the Alphabet subsidiary DeepMind, started with nothing but a blank board and the rules of the game.

Hassabis says the techniques used to build AlphaGo Zero are powerful enough to be applied in real-world situations where it’s necessary to explore a vast landscape of possibilities, including drug discovery and materials science.

Reinforcement learning is inspired by the way animals seem to learn through experimentation and feedback, and DeepMind has used the technique to achieve superhuman performance in simpler Atari games.

It is already being tested as a way to teach robots to grasp awkward objects, for example, and as a means of conserving energy in data centers by reconfiguring hardware on the fly.

“By not using human data or human expertise, we’ve actually removed the constraints of human knowledge,” says David Silver, the lead researcher at DeepMind and a professor at University College London.

DeepMind is already the darling of the AI industry, and its latest achievement is sure to grab headlines and spark debate about progress toward much more powerful forms of AI.

“It’s a nice illustration of the recent progress in deep learning and reinforcement learning, but I wouldn’t read too much into it as a sign of what computers can learn without human knowledge,” Domingos says.

“What would be really impressive would be if AlphaGo beat [legendary South Korean champion] Lee Sedol after playing roughly as many games as he played in his career before becoming a champion.

But despite the work still to be done, Hassabis is hopeful that within 10 years AI will play an important role in solving important problems in science, medicine, or other fields.

