Videos

Avoiding Negative Side Effects: Concrete Problems in AI Safety part 1



Robert Miles

We can expect AI systems to accidentally create serious negative side effects – how can we avoid that?
The first of several videos about the paper “Concrete Problems in AI Safety”.

Earlier on Computerphile: https://www.youtube.com/watch?v=AjyM-f8rDpg
The Stop Button Problem: https://youtu.be/3TYT1QfdfsM

Read the full paper here: https://arxiv.org/abs/1606.06565

Thanks again to all my wonderful Patreon Supporters:
– Chad Jones
– Ichiro Dohi
– Stefan Skiles
– Katie Byrne
– Ziyang Liu
– Joshua Richardson
– Fabian Consiglio
– Jonatan R
– Øystein Flygt
– Björn Mosten
– Michael Greve
– robertvanduursen
– The Guru Of Vision
– Fabrizio Pisani
– Alexander Hartvig Nielsen
– Volodymyr
– Peggy Youell
– Konstantin Shabashov
– Adam Dodd
– DGJono
– Matthias Meger
– Scott Stevens
– Emilio Alvarez
– Benjamin Aaron Degenhart
– Michael Ore
– Robert Bridges
– Dmitri Afanasjev
https://www.patreon.com/robertskmiles

Source

Similar Posts

36 thoughts on “Avoiding Negative Side Effects: Concrete Problems in AI Safety part 1
  1. Didn't we just move the problem into the distance function? Changing any state the distance function does not measure will be zero distance, so the AI will not care about those states.

  2. Another issue I'm thinking about: what if the conditions to make tea are not filled (water lacking in the kettle, no more tea, heater is broken), and the robot needs to alter his environment in order to complete its mission (respectively: fill the kettle, order/buy some tea, fix the heater)?

  3. Hi Robert!
    I'm a CS student from Germany, and a few days ago, my mother of all people asked me, if she should be worried about AI.
    So the good news is, this topic now reached literal everyone and their mother.
    The bad news is, that this lovely AI concerned person doesn't speak English.
    So, I decided to add German subtitles to this video. I've never done this before, and apparently, the creator has to approve them. Would you mind?

    Thanks,
    Hendrik

  4. It still seems like reducing all side effects would just push the problem back one step. What if there's a scenario where it has to cause a side effect, it's just a matter of which one? Let's look at the car for an example, let's say it's driving down the road when child comes running into the street. The AI is able to tell that it can't slow down in time, so it has to choose between hitting the child and swerving to the side and hitting a tree. It has to have a side effect, but how would it choose which one?

  5. One way that simple rule ("limit changes to the environment") could go wrong, I think, is that it makes cleaning up pre-existing messes more difficult. And it'll give you hell as the floor gets more and more worn out, or as cups crack, or when any number of other little, unavoidable changes build up. EDIT: okay yeah, I didn't think about Next-Door-Susan or kids or pets causing changes in the environment, and the AI responding to that.

  6. A problem I thought of with the "if I do nothing" distance metric is that the first robot to make a cup of tea will likely be a momentous occasion. News articles will be written, the engineers will go on talk shows and their research will result in a huge impact on the AI research world at large. So from a robot's perspective the least impactful way to make a cup of tea is to make it and then kill the witnesses, frame someone for it and then destroy all the research along with itself. Or, less grimly, make the tea when no one is looking so nobody suspects it was made by an AI.

  7. That would probably mean the AI also tries to avoid changing your behaviour, meaning it will basically be a tea ninja making your tea and sneaking it onto the table while you aren't looking. That sounds kind of cool.

  8. The funny comments you make or write on your videos, like your rare medical condition or in the other video about companies being superintelligence, when you ask your viewers to come up with a idea no human would come up with are golden. Your comedic timing and wittiness is amazing

  9. Good explainer. You're getting close to the idea of a General Satisfaction Equilibrium. Much more complex than a simple Nash equilibrium, but vital to the future of AI.

  10. This sounds like the genie problem. Where you have 3 wishes and an asshole Genie that grants your wishes but with unintended negative side effects like described in the video.

  11. what if the first reasonably advanced agi we made was based on this dont change things situation but it got obsesive and punished any changes in technological level trapping us in a wierd unchanging yet still quite high tech world

  12. In addition to the other safety measures, could you put the robot on a sort of extreme hedonic treadmill? Beyond a certain level of efficiency or ammount of tea or frequency of making tea, there are diminished or even zero returns, so that the robot doesn't care if it takes an extra thirty seconds to walk around the baby or makes 7.5 ounces of tea instead of 8? Obviously this isn't a substitute for the other safety measures, but it seems like it would help reduce the likelihood of creating large losses for small gains.

  13. The answer is Evolution. In this case directed Evolution. I highly recommend talking with a geneticist – even if they don't know what you're talking about. Evolution by directed means over many generations will produce the desired goal without any of the nasty side effects as these will be "bred" out of the population of AI's due to selections pressures. Please just carefully consider the environment in which you direct the evolution – it is an extremely powerful tool . This is already used in computer science very regularly from what I understand and produces spectacular results. Combine your extensive knowledge with that of a Geneticist and that is where the answer will lie. The reason for this is that we want our technology to interact with out environment in a similar fashion to the way a "good Actor" from our species would, and we have evolved to this point. Technology has an advantage in that it can go through many more generations considerably more quickly than we are able to and therefore significantly speed up the process of Evolution – Evolution still remains the key.

  14. For the model of the robot predicting the state of the world had it done nothing, isn't it also true that the robot will change the world state away from what it predicted, and thus will want to reverse these changes? You still have the problem of the robot having an image of the world state in it's head, and thus it wants to change the world in sometimes major ways to make it conform to that image. Except in this case the world state the robot wants to happen is different than the world state before it went to make tea. For example, say that one of the people is really creeped out by the robot, and is tolerating it just sitting there, but when it starts to move, the guy is too creeped out and leaves the room. Well before the robot gives you your cup of tea, it would want to forceably drag the guy that left back to where he was in the room, because according to it's predictions, that guy is still there if the robot does nothing.

  15. But…but…but…what if you program a robot to make you a cup of tea and keep everything as it was before it started and because it has enough computing power, it figures out how to create a cup of tea out of nothing (so it doesn't need the tea bag, the water, the tea cup, the spoon etc.).

  16. I love the tought that after bringing you the tea, the robot tries to convince you that it did actually nothing and that the tea just materialized in your hand

  17. It is hard to achieve as well, but to operate in a human-like manner the robot must learn to attribute some changes to itself, while other changes to the world. Yes, it causes a lot of new problems like "whether it was me who killed him or the bullet itself". On the other hand now you need to simulate only the changes in the objects that you are going to interact with (and some chain of changes that those interactions may produce where the least you are certain about the next step, the least you are responsible for its outcomes).

  18. As the AI goes about its tea-making business, a human steps around it.
    The robot calculates that if it hadn't been there, the human would have slipped on a puddle of water, falling down badly and breaking its leg.
    Diligently, the robot breaks the human's leg before going back.
    (Also, the cup of tea is the least amount of most tasteless tea possible because making you satisfied counts as a change to the environment.)

  19. Another issue is it seems making predictions about the future would get increasingly complex given the task and factors involved, until the computation needed for such predictions would be greater than the amount of matter in the known universe.

  20. I'm just getting into this field. But. What's wrong with telling the robot to keep the changes in the environment at a minimum, but only for changes that the robot itself could do, not other creatures. Basically "avoid interaction with anything except what's allowed". Like that building surveying robot from Bostons Dynamics. Has a goal to get upstairs and check out the room. Using hallways, staircases and doors is allowed. Ain't gonna trip over pesky babies along the way.
    Though I feel like I'm still missing a bigger picture here

  21. Well the not changing anything has another problem: the robot doesn’t have a model of human values. So imagine it’s going to make you a cup of tea and the way is blocked by two things (and no way around them and they’re for the goal of making tea equally preferable) a £0.50 worth of a block of LEGO and a priceless vase. Now it’s going to be a 50/50 which one the bot destroys. Cause the change would be equal (one object will be destroyed) on the way to its reward (making a cup of tea). Yikes.

    Edit: and I can’t really think of a way to improve it cause if you try to give the bot a system of values becomes it becomes arbitrarily complex again. If you say never break anything, well then it might never move cause that maximizes the chance of never breaking anything. If you say minus points for breaking things, dependent on monetary value. Well, your new next-gen gaming computer is more expensive than your cat. So bye-bye cat.

    Well then you say, living creatures have priority. Okay bye bye gaming computer when the bot encounters a wasp or another insect. Hmm maybe not include insects then?

    And you could go on and on like this. It would become an arbitrarily complex system.

Comments are closed.

WP2Social Auto Publish Powered By : XYZScripts.com