Avoiding Negative Side Effects: Concrete Problems in AI Safety part 1

May 11, 2019Artis Modus

Robert Miles

We can expect AI systems to accidentally create serious negative side effects – how can we avoid that?
The first of several videos about the paper “Concrete Problems in AI Safety”.

Earlier on Computerphile: https://www.youtube.com/watch?v=AjyM-f8rDpg
The Stop Button Problem: https://youtu.be/3TYT1QfdfsM

Read the full paper here: https://arxiv.org/abs/1606.06565

Thanks again to all my wonderful Patreon Supporters:
– Chad Jones
– Ichiro Dohi
– Stefan Skiles
– Katie Byrne
– Ziyang Liu
– Joshua Richardson
– Fabian Consiglio
– Jonatan R
– Øystein Flygt
– Björn Mosten
– Michael Greve
– robertvanduursen
– The Guru Of Vision
– Fabrizio Pisani
– Alexander Hartvig Nielsen
– Volodymyr
– Peggy Youell
– Konstantin Shabashov
– Adam Dodd
– DGJono
– Matthias Meger
– Scott Stevens
– Emilio Alvarez
– Benjamin Aaron Degenhart
– Michael Ore
– Robert Bridges
– Dmitri Afanasjev
https://www.patreon.com/robertskmiles

Source

Similar Posts

36 thoughts on “Avoiding Negative Side Effects: Concrete Problems in AI Safety part 1”

stribika says:

June 24, 2018 at 11:58 am

Didn't we just move the problem into the distance function? Changing any state the distance function does not measure will be zero distance, so the AI will not care about those states.
Destava says:

June 26, 2018 at 6:36 am

"Your Rengar Game" lol
Anonymous says:

July 17, 2018 at 7:55 pm

Cat ok: don't care
Charles Porrot says:

August 3, 2018 at 8:24 pm

Another issue I'm thinking about: what if the conditions to make tea are not filled (water lacking in the kettle, no more tea, heater is broken), and the robot needs to alter his environment in order to complete its mission (respectively: fill the kettle, order/buy some tea, fix the heater)?
Hendrik :D says:

October 19, 2018 at 2:54 pm

Hi Robert!
I'm a CS student from Germany, and a few days ago, my mother of all people asked me, if she should be worried about AI.
So the good news is, this topic now reached literal everyone and their mother.
The bad news is, that this lovely AI concerned person doesn't speak English.
So, I decided to add German subtitles to this video. I've never done this before, and apparently, the creator has to approve them. Would you mind?

Thanks,
Hendrik
Java Monkey says:

December 25, 2018 at 3:30 am

Gotta appreciate the “Mo’ Money, Mo’ Problems” bit at the end.
zuupcat says:

January 2, 2019 at 12:05 am

5:21 House maids or house mates? Either you are extremely posh or you lived in a dorm.
Mr. Peanut says:

January 2, 2019 at 3:53 am

It still seems like reducing all side effects would just push the problem back one step. What if there's a scenario where it has to cause a side effect, it's just a matter of which one? Let's look at the car for an example, let's say it's driving down the road when child comes running into the street. The AI is able to tell that it can't slow down in time, so it has to choose between hitting the child and swerving to the side and hitting a tree. It has to have a side effect, but how would it choose which one?
WisdomThumbs says:

February 16, 2019 at 10:21 am

One way that simple rule ("limit changes to the environment") could go wrong, I think, is that it makes cleaning up pre-existing messes more difficult. And it'll give you hell as the floor gets more and more worn out, or as cups crack, or when any number of other little, unavoidable changes build up. EDIT: okay yeah, I didn't think about Next-Door-Susan or kids or pets causing changes in the environment, and the AI responding to that.
SafetySkull says:

February 20, 2019 at 10:43 pm

A problem I thought of with the "if I do nothing" distance metric is that the first robot to make a cup of tea will likely be a momentous occasion. News articles will be written, the engineers will go on talk shows and their research will result in a huge impact on the AI research world at large. So from a robot's perspective the least impactful way to make a cup of tea is to make it and then kill the witnesses, frame someone for it and then destroy all the research along with itself. Or, less grimly, make the tea when no one is looking so nobody suspects it was made by an AI.
トカゲの財団 says:

March 18, 2019 at 5:11 am

Just make a neural network that would train on the data of how the task is ought to be completed (say, real human making a cup of tea).
Al4s says:

March 22, 2019 at 6:33 pm

That would probably mean the AI also tries to avoid changing your behaviour, meaning it will basically be a tea ninja making your tea and sneaking it onto the table while you aren't looking. That sounds kind of cool.
asudh dhabei says:

March 23, 2019 at 7:30 pm

imagine somebody getting revived and the robot thinking: reviving is changing the environment, can't have that happen!
Jeffrey Aborot says:

April 3, 2019 at 10:56 am

Adding baby to the equation, even without an AI-enabled tea-making robot, destroys the kitchen under any conceivable objective function.
Jonathan Diaz says:

April 6, 2019 at 1:48 pm

2:02 well if it doesn't care about the time I popped off and went 18/2/13 on ap Rengar support than it must not be that intelligent after all…
ハェフィシェフ says:

April 9, 2019 at 11:16 pm

The funny comments you make or write on your videos, like your rare medical condition or in the other video about companies being superintelligence, when you ask your viewers to come up with a idea no human would come up with are golden. Your comedic timing and wittiness is amazing
Catapulted Toextremes says:

April 10, 2019 at 5:50 am

things like "world state" is why GAI is pure science fiction
Ben Smith says:

April 11, 2019 at 7:46 pm

This was fantastic!!
Craig Minns says:

April 12, 2019 at 12:35 am

Good explainer. You're getting close to the idea of a General Satisfaction Equilibrium. Much more complex than a simple Nash equilibrium, but vital to the future of AI.
Broker Fox says:

April 12, 2019 at 10:40 pm

You simply have to sort out your face hair. It just has to be done. Love the videos otherwise
Connor Keenum says:

April 13, 2019 at 3:26 pm

This sounds like the genie problem. Where you have 3 wishes and an asshole Genie that grants your wishes but with unintended negative side effects like described in the video.
Conor Norris says:

April 14, 2019 at 10:39 am

what if the first reasonably advanced agi we made was based on this dont change things situation but it got obsesive and punished any changes in technological level trapping us in a wierd unchanging yet still quite high tech world
galerius07 says:

April 16, 2019 at 6:46 pm

In addition to the other safety measures, could you put the robot on a sort of extreme hedonic treadmill? Beyond a certain level of efficiency or ammount of tea or frequency of making tea, there are diminished or even zero returns, so that the robot doesn't care if it takes an extra thirty seconds to walk around the baby or makes 7.5 ounces of tea instead of 8? Obviously this isn't a substitute for the other safety measures, but it seems like it would help reduce the likelihood of creating large losses for small gains.
Tempnamious says:

April 17, 2019 at 6:50 pm

The answer is Evolution. In this case directed Evolution. I highly recommend talking with a geneticist – even if they don't know what you're talking about. Evolution by directed means over many generations will produce the desired goal without any of the nasty side effects as these will be "bred" out of the population of AI's due to selections pressures. Please just carefully consider the environment in which you direct the evolution – it is an extremely powerful tool . This is already used in computer science very regularly from what I understand and produces spectacular results. Combine your extensive knowledge with that of a Geneticist and that is where the answer will lie. The reason for this is that we want our technology to interact with out environment in a similar fashion to the way a "good Actor" from our species would, and we have evolved to this point. Technology has an advantage in that it can go through many more generations considerably more quickly than we are able to and therefore significantly speed up the process of Evolution – Evolution still remains the key.
Leo Perez says:

April 18, 2019 at 12:37 am

reduce your reliance on sunlight by becoming a vampire
Ividboy says:

April 18, 2019 at 7:58 am

For the model of the robot predicting the state of the world had it done nothing, isn't it also true that the robot will change the world state away from what it predicted, and thus will want to reverse these changes? You still have the problem of the robot having an image of the world state in it's head, and thus it wants to change the world in sometimes major ways to make it conform to that image. Except in this case the world state the robot wants to happen is different than the world state before it went to make tea. For example, say that one of the people is really creeped out by the robot, and is tolerating it just sitting there, but when it starts to move, the guy is too creeped out and leaves the room. Well before the robot gives you your cup of tea, it would want to forceably drag the guy that left back to where he was in the room, because according to it's predictions, that guy is still there if the robot does nothing.
Cristian Pop says:

April 19, 2019 at 4:29 am

But…but…but…what if you program a robot to make you a cup of tea and keep everything as it was before it started and because it has enough computing power, it figures out how to create a cup of tea out of nothing (so it doesn't need the tea bag, the water, the tea cup, the spoon etc.).
puskajussi37 says:

April 19, 2019 at 5:59 pm

I love the tought that after bringing you the tea, the robot tries to convince you that it did actually nothing and that the tea just materialized in your hand
Jose Miguel Macias Vocar says:

April 20, 2019 at 5:22 am

every time someone refers to the description box as the "doobly-doo" it reminds me of the PBS Idea Channel 🙁
Count of Darkness says:

April 20, 2019 at 11:51 pm

It is hard to achieve as well, but to operate in a human-like manner the robot must learn to attribute some changes to itself, while other changes to the world. Yes, it causes a lot of new problems like "whether it was me who killed him or the bullet itself". On the other hand now you need to simulate only the changes in the objects that you are going to interact with (and some chain of changes that those interactions may produce where the least you are certain about the next step, the least you are responsible for its outcomes).
Elyandarin says:

April 21, 2019 at 5:02 am

As the AI goes about its tea-making business, a human steps around it.
The robot calculates that if it hadn't been there, the human would have slipped on a puddle of water, falling down badly and breaking its leg.
Diligently, the robot breaks the human's leg before going back.
(Also, the cup of tea is the least amount of most tasteless tea possible because making you satisfied counts as a change to the environment.)
Barbungar Mora Sealo says:

April 24, 2019 at 8:37 pm

A.I are smarter than that I doubt they would hurt a child getting tea.
Akmon Ra says:

April 25, 2019 at 10:48 am

Another issue is it seems making predictions about the future would get increasingly complex given the task and factors involved, until the computation needed for such predictions would be greater than the amount of matter in the known universe.
bscutajar says:

April 29, 2019 at 6:23 am

Man each one of this guys videos are incredibly interesting
IgorD says:

May 7, 2019 at 1:15 am

I'm just getting into this field. But. What's wrong with telling the robot to keep the changes in the environment at a minimum, but only for changes that the robot itself could do, not other creatures. Basically "avoid interaction with anything except what's allowed". Like that building surveying robot from Bostons Dynamics. Has a goal to get upstairs and check out the room. Using hallways, staircases and doors is allowed. Ain't gonna trip over pesky babies along the way.
Though I feel like I'm still missing a bigger picture here
Gijs van der Giessen says:

May 9, 2019 at 3:49 pm

Well the not changing anything has another problem: the robot doesn’t have a model of human values. So imagine it’s going to make you a cup of tea and the way is blocked by two things (and no way around them and they’re for the goal of making tea equally preferable) a £0.50 worth of a block of LEGO and a priceless vase. Now it’s going to be a 50/50 which one the bot destroys. Cause the change would be equal (one object will be destroyed) on the way to its reward (making a cup of tea). Yikes.

Edit: and I can’t really think of a way to improve it cause if you try to give the bot a system of values becomes it becomes arbitrarily complex again. If you say never break anything, well then it might never move cause that maximizes the chance of never breaking anything. If you say minus points for breaking things, dependent on monetary value. Well, your new next-gen gaming computer is more expensive than your cat. So bye-bye cat.

Well then you say, living creatures have priority. Okay bye bye gaming computer when the bot encounters a wasp or another insect. Hmm maybe not include insects then?

And you could go on and on like this. It would become an arbitrarily complex system.

Comments are closed.