AI News, Artificial intelligence is more human than it seems. So who's behind ... artificial intelligence

AI Alignment Podcast: Human Compatible: Artificial Intelligence and the Problem of Control with Stuart Russell

The book is a cornerstone piece, alongside Superintelligence and Life 3.0, that articulates the civilization-scale problem we face of aligning machine intelligence with human goals and values.

Not only is this a further articulation and development of the AI alignment problem, but Stuart also proposes a novel solution which bring us to a better understanding of what it will take to create beneficial machine intelligence.

Important timestamps:  0:00 Intro 2:10 Intentions and background on the book 4:30 Human intellectual tradition leading up to the problem of control 7:41 Summary of the structure of the book 8:28 The issue with the current formulation of building intelligent machine systems 10:57 Beginnings of a solution 12:54 Might tool AI be of any help here?

So that section in the first edition in 94 was a little equivocal, let’s say, you know, we could lose control or we could have a golden age and let’s try to be optimistic.

The way we had set up the whole field was basically kind of a copy of human intelligence in that a human is intelligent, if their actions achieve their goals.

But if machines are more intelligent than humans, then giving them the wrong objective would basically be setting up a kind of a chess match between humanity and a machine that has an objective that’s across purposes with our own.

One of the things that I found so great about your book was the history of evolution and concepts and ideas as they pertain to information theory, computer science, decision theory and rationality.

So in the 20th century you had a whole lot of disciplines, economics developed around the idea of maximizing utility or welfare or profit depending on which branch you look at.

Control theory is about minimizing a cost function, so the cost function described some deviation from ideal behavior and then you build systems that minimize the cost.

And then the game is how does the employer get the employee to do something that the employer actually wants them to do, given that the employee, the agent has their own utility function and would rather be sitting home drinking beers and watching football on the telly.

In fact, maybe I can’t think of any, where the entity that’s supposedly in control, namely us, is less intelligent than the entity that it’s supposedly controlling, namely the machine.

Lucas: So providing some framing and context here for the listener, the first part of your book, chapters one through three explores the idea of intelligence in humans and in machines.

There you give this historical development of ideas and I feel that this history you give of computer science and the AI alignment problem really helps to demystify both the person and evolution as a process and the background behind this problem.

And then the third part, chapter seven through ten suggests a new way to think about AI, to ensure that machines remain beneficial to humans forever.

In reaction to this, you’ve developed cooperative inverse reinforcement learning and inverse reinforcement learning, which is sort of part of the latter stages of this book where you’re arguing for new definition that is more conducive to alignment.

The more general model would be that the machine understands that the human has internally some overall preference structure of which this particular objective fetch the coffee or take me to the airport is just a little local manifestation.

One way of thinking about is to say that the standard model of AI assumes that the machine has perfect knowledge of the objective and the model I’m proposing assumes that the model has imperfect knowledge of the objective or partial knowledge of the objective.

When the machine has partial knowledge of the objective there’s whole lot of new things that come into play that simply don’t arise when the machine thinks it knows the objective.

Whereas a machine that knows that it doesn’t know the full objective could say, well, given what I know, this action looks okay, but I want to check with the boss before going ahead because it might be that this plan actually violate some part of the human preference structure that it doesn’t know about.

So, for example, if you were doing handwriting recognition, you might think, oh, okay, well in order to find an ‘S’ I have to look for a line that’s curvy and I follow the line and it has to have three bends, it has to be arranged this way.

The way that they did it was to develop machine learning systems that could take images of characters that were labeled and then train a recognizer that could recognize new instances of characters.

If it’s a little vacuum cleaning robot or lawn mowing robot, certainly a domestic robot that’s supposed to keep your house clean and look after the dog while you’re out.

There’s simply no way to build those kinds of systems except as agents and as we improve the capabilities of these systems, whether it’s for perception or planning and behaving in the real physical world.

It’s a collection of technologies and those technologies have been built within a framework, the standard model that has been very useful and is shared with these other fields, economic, statistics, operations of search, control theory.

They happen to have human components, but when you put a couple of hundred thousand humans together into one of these corporations, they kind of have this super intelligent understanding, manipulation capabilities and so on.

As you move through and past single agent cases to multiple agent cases where we give rise to game theory and decision theory and how that all affects AI alignment.

So for laypersons, I think this book is critical for showing the problem, demystifying it, making it simple, and giving the foundational and core concepts for which human beings need to exist in this world today.

And then for the research community, as you just discussed, it seems like this rejection of the standard model and this clear identification of systems with exogenous objectives that are sort of singular and lack context and nuance.

And in the book I give the example of the durian fruit, which some people really love and some people find utterly disgusting, and I don’t know which I am because I’ve never tried it.

The reason why psychology, economics, moral philosophy become absolutely central, is that these fields have studied questions of human preferences, human motivation, and also the fundamental question which machines are going to face, of how do you act on behalf of more than one person?

The version of the problem where there’s one machine and one human is relatively constrained and relatively straightforward to solve, but when you get one machine and many humans or many machines and many humans, then all kinds of complications come in, which social scientists have studied for centuries.

And psychology comes in because the process whereby the machine is going to learn about human preferences requires that there be some connection between those preferences and the behavior that humans exhibit, because the inverse reinforcement learning process involves observing the behavior and figuring out what are the underlying preferences that would explain that behavior, and then how can I help the human with those preferences.

I’m just pushing here and wondering more about the value of human speech, about what our revealed preferences might be, how this fits in with your book and narrative, as well as furthering neuroscience and psychology, and how all of these things can decrease uncertainty over human preferences for the AI.

Now it’s entirely possible logically that in fact he wanted to lose every single game that he played, but his decision making was so far from rational that even though he wanted to lose, he kept playing the best possible move.

Donald Davidson calls it radical interpretation: that from the outside, you can sort of flip all the bits and come up with an explanation that’s sort the complete reverse of what any reasonable person would think the explanation to be.

For example, let’s take the situation where Kasparov can checkmate his opponent in one move, and it’s blatantly obvious and in fact, he’s taken a whole sequence of moves to get to that situation.

If in all such cases where there’s an obvious way to achieve the objective, he simply does something different, in other words, let’s say he resigns, so whenever he’s in a position with an obvious immediate win, he instantly resigns, then in what sense is it meaningful to say that Kasparov actually wants to win the game if he always resigns whenever he has a chance of winning?

So by observing human behavior in situations where the decision is kind of an obvious one that doesn’t require a huge amount of calculation, then it’s reasonable to assume that the preferences are the ones that they reveal by choosing the obvious action.

If you offer someone a lump of coal or a $1,000 bill and they choose a $1,000 bill, it’s unreasonable to say, “Oh, they really prefer the lump of coal, but they’re just really stupid, so they keep choosing the $1,000 dollar bill.”

So in fact it’s quite natural that we’re able to gradually infer the preferences of imperfect entities, but we have to make some assumptions that we might call minimal rationality, which is that in cases where the choice is obvious, people will generally tend to make the obvious choice.

Maybe it worries him, but it doesn’t worry me because we do a reasonably good job of inferring each other’s preferences all the time by just ascribing at least a minimum amount of rationality in human decision making behavior.

And if you choose the ham and cheese pizza, they’ll infer that you prefer the ham and cheese pizza, and not the bubblegum and pineapple one, as seems pretty reasonable.

Horribly to me, the biggest deviation from rationality that humans exhibit is the fact that our choices are always made in the context of a whole hierarchy of commitments that effectively put us into what’s usually a much, much smaller decision-making situation than the real problem.

So the real problem is I’m alive, I’m in this enormous world, I’m going to live for a few more decades hopefully, and then my descendants will live for years after that and lots of other people on the world will live for a long time.

It’s what motor control commands do I send to my 600 odd muscles in order to optimize my payoff for the rest of time until the heat death of the universe?

For example, my late colleague Bob Wilensky had a project called the Unix Consultant, which was a natural language system, and it was actually built as an agent, that would help you with Unix stuff, so managing files on your desktop and so on.

You could ask it questions like, “Could you make some more space on my disk?”, and the system needs to know that RM*, which means “remove all files”, is probably not the right thing to do, that this request to make space on the disk is actually part of a larger plan that the user might have.

 This is very natural for humans and in philosophy of language, my other late colleague Paul Grice, was famous for pointing out that many statements, questions, requests, commands in language have this characteristic that they don’t really mean what they say.

So we talk about Gricean analysis, where you don’t take the meaning literally, but you look at the context in which it was said and the motivations of the speaker and so on to infer what is a reasonable course of action when you hear that request.

So for laypersons who might not be involved or experts in AI research, plus the AI alignment community, plus potential researchers who might be brought in by this process or book, plus policymakers who may also listen to it, what’s at stake here?

If these were possible at all, it would enable other inventions that people have talked about as possibly the biggest event in human history, for example, creating the ability for people to live forever or much, much longer life span than we currently have, or creating the possibility for people to travel faster than light so that we could colonize the universe.

The last part of the book is a proposal for how we could do that, how you could change this notion of what we mean by an intelligent system so that rather than copying this sort of abstract human model, this idea of rationality, of decision making in the interest, in the pursuit of one’s own objectives, we have this other kind of system, this sort of coupled binary system where the machine is necessarily acting in the service of human preferences.

The third alternative is that we create general purpose, superhuman intelligent machines and we lose control of them, and they’re pursuing objectives that are ultimately mistaken objectives.

I already mentioned the possibility that you’d be able to use that capability to solve problems that we find very difficult, such as eternal life, curing disease, solving the problem of climate change, solving the problem of faster than light travel and so on.

They can’t build bridges or lay railroad tracks or build hospitals because they’re really, really expensive and they haven’t yet developed the productive capacities to produce goods that could pay for all those things.

The money all goes to pay all those humans, whether it’s the scientists and engineers who designed the MRI machine or the people who worked on the production line or the people who worked mining the metals that go into making the MRI machine.

One is everyone is relatively much better off, assuming that we can get politics and economics out of the way, and also there’s then much less incentive for people to go around starting wars and killing each other, because there isn’t this struggle which has sort of characterized most of human history.

One thing that the superintelligence will hopefully also do is reduce existential risk to zero, right?  And so if existential risk is reduced to zero, then basically what happens is the entire cosmic endowment, some hundreds of thousands of galaxies, become unlocked to us.

For me personally, and why I’m passionate about AI alignment and existential risk issues, is that the reduction of existential risk to zero and having an aligned intelligence that’s capable of authentically spreading through the cosmic endowment, to me seems to potentially unlock a kind of transcendent object at the end of time, ultimately influenced by what we do here and now, which is directed and created by coming to better know what is good, and spreading that.

What I find so beautiful and important and meaningful about this problem in particular, and why anyone who’s reading your book, why it’s so important for them for core reading, and reading for laypersons, for computer scientists, for just everyone, is that if we get this right, this universe can be maybe one of the universes and perhaps the multiverse, where something like the most beautiful thing physically possible could be made by us within the laws of physics.

If we do go ahead developing general purpose intelligence systems that are beneficial and so on, then, parts of that technology, the general purpose intelligent capabilities could be put into systems that are not beneficial as it were, that don’t have a safety catch.

We’re kind of totally failing to control malware and the ability of people to inflict damage on others by uncontrolled software that’s getting worse.

This notion that if we develop machines that are capable of running every aspect of our civilization, then that changes the dynamic that’s been in place since the beginning of human history or pre history.

And if you add it all up, if you look, there’s about a hundred odd billion people who’ve ever lived and they spend each about 10 years learning stuff on average.

Or if you like something more recent in WALL-E the human race is on a, sort of a cruise ship in space and they all become obese and stupid because the machines look after everything and all they do is consume and enjoy.

Once you turn things over to the machines, it’s practically impossible, I think, to reverse that process, we have to keep our own human civilization going in perpetuity and that requires a kind of a cultural process that I don’t yet understand how it would work, exactly.

Because the effort involved in learning, let’s say going to medical school, it’s 15 years of school and then college and then medical school and then residency.

Lucas: This makes me wonder and think about how from an evolutionary cosmological perspective, how this sort of transition from humans being the most intelligent form of life on this planet to machine intelligence being the most intelligent form of life.

I can’t see that that’s a plausible direction, but it could be that we decide at some point that we cannot solve the control problem or we can’t solve the misuse problem or we can’t solve the enfeeblement problem.

I think the click through catastrophe is already pretty big and it results from very, very simple minded algorithms that know nothing about human cognition or politics or anything else.

Several countries have now decided since Chernobyl and Fukushima to ban nuclear power, the EU has much stronger restriction on genetically modified foods than a lot of other countries, so there are pockets where people have pushed back against technological progress and said, “No, not all technology is good and not all uses of technology are good and so we need to exercise a choice.”

I’m quoting you here, you say, “A compassionate and jubilant use of humanity’s cosmic endowment sounds wonderful, but we also have to reckon with the rapid rate of innovation in the malfeasance sector, ill intentioned people are thinking up new ways to misuse AI so quickly that this chapter is likely to be outdated even before an attains printed form.

But when you think about it, why would you want to allow AI systems to impersonate human beings so that in other words, the human who’s in conversation, believes that if they’re talking to another human being, that they owe that other human being a whole raft of respect, politeness, all kinds of obligations that are involved in interacting with other humans.

People should be trained in how to recognize potentially unsafe designs for AI systems, but there should, I think, be a role for regulation where at some point you would say, if you want to put an AI system on the internet, for example, just as if you want to put software into the app store, it has to pass a whole bunch of checks to make sure that it’s safe to make sure that it won’t wreak havoc.

Lucas: I basically agree that these regulations should be implemented today, but they seem pretty temporary or transient as the uncertainty in the AI system for the humans’ objective function or utility function decreases.

So if we have timelines from AI researchers that range from 50 to a hundred years for AGI, we could potentially see laws and regulations like this go up in the next five to 10 and then disappear again somewhere within the next hundred to 150 years max.

Well self, evidently it’s possible that machines could do a better job than the humans we currently have that would be better only in a narrow sense that maybe it would reduce crime, maybe it would increase economic output, we’d have better health outcomes, people would be more educated than they would with humans making those decisions, but there would be a dramatic loss in autonomy.

Of course it’s been humans making the decisions, although within any local context it’s only a subset of humans who are making the decisions and a lot of other people don’t have as much autonomy.

Now this is an empirical question across all people where autonomy fits in their preference hierarchies and whether it’s like a terminal value or not, and whether under reflection and self idealization, our preferences distill into something else or not.

One could imagine that if we formulate things not quite right and the effect of the algorithms that we build is to make machines that don’t value autonomy in the right way or don’t have it folded into the overall preference structure in the right way, that we could end up with a subtle but gradual and very serious loss of autonomy in a way that we may not even notice as it happens.

Stuart: I think the two major differences are one, I believe that to understand this whole set of issues or even just to understand what’s happening with AI and what’s going to happen, you have to understand something about AI.

The point is really what is intelligence and how have we taken that qualitative understanding of what that means and turned it into this technical discipline where the standard model is machines that achieve fixed objectives.

It wasn’t invented for this purpose, but it happens to fit this purpose and then the approach of how we solve this problem is fleshed out in terms of understanding that it’s this coupled system between the human that has the preferences and the machine that’s trying to satisfy those preferences and doesn’t know what they are.

I think that wanting to convey the essence of intelligence, how that notion has developed, how is it really an integral part of our whole intellectual tradition and our technological society and how that model is fundamentally wrong and what’s the new model that we have to replace it with.

I think that you really set the AI alignment problem up well resulting from there being intelligences and multi-agent scenarios, trying to do different things, and then you suggest a solution, which we’ve discussed here already.

Sanne Blauw

She is the author of the Dutch bestseller "Het bestverkochte boek ooit (met deze titel)", which will come out as "The Number Bias"

The Real Reason to be Afraid of Artificial Intelligence | Peter Haas | TEDxDirigo

A robotics researcher afraid of robots, Peter Haas, invites us into his world of understand where the threats of robots and artificial intelligence lie. Before we get ...

Artificial Intelligence: Mankind's Last Invention

Artificial Intelligence: Mankind's Last Invention - Technological Singularity Explained Signup and get 20% off a premium subscription! ..

Why Elon Musk is worried about artificial intelligence

Elon Musk talks to Kristie Lu Stout about hyperloop, the threat of AI, and what he wants to work on if he had more time.

10 reasons why human level Artificial Intelligence is a false promise

Full video - Androids & Artificial Intelligence: A Modern Myth (2 hrs 40 mins) available as a digital download (site link below). UPDATE 2017 RE: ADDITIONAL ...

Artificial Intelligence for Humanity: Making it So | Matthew Scassero | TEDxGreatMills

What benefits can artificial intelligence offer to humankind for health, food, and connectivity? Matthew Scassero discusses current evolutions in AI, and stresses ...

The Artificial Intelligence That Deleted A Century

In the last week of December, 2028, humanity forgot about more than a century of pop culture. You've probably never thought about it, and never found it strange ...

Artificial Intelligence: Can robots make better music than humans?

An American university has challenged its students to write algorithms capable of creating 'human-quality' stories and music. So are the robots finally taking over ...

Artificial Intelligence and the Future of Business | Hans-Christian Boos | TEDxWHU

Hans-Christian Boos has a mission: empowering human potential, freeing up time for creativity and innovative thinking through artificial intelligence (AI).

The Future of Artificial Intelligence

Humans create an AI designed to produce paperclips. It has one goal, to maximize the number of paperclips in the universe. The machine becomes more ...

✪ Humans Not Needed (Artificial Intelligence Composed Music | Beatles Inspired Pop Song)

As manufacturing, transportation, retail, journalism and many more industries are overtaken by artificial intelligence and automation, the sheer number of new ...