Videos

AI Safety Gridworlds



Robert Miles

Got an AI safety idea? Now you can test it out! A recent paper from DeepMind sets out some environments for evaluating the safety of AI systems, and the code is on GitHub.

The Computerphile video: https://www.youtube.com/watch?v=eElfR_BnL5k
The EXTRA BITS video, with more detail: https://www.youtube.com/watch?v=py5VRagG6t8

The paper: https://arxiv.org/pdf/1711.09883.pdf
The GitHub repos: https://github.com/deepmind/ai-safety-gridworlds

https://www.patreon.com/robertskmiles
With thanks to my wonderful Patreon supporters:

– Jason Hise
– Steef
– Cooper Lawton
– Jason Strack
– Chad Jones
– Stefan Skiles
– Jordan Medina
– Manuel Weichselbaum
– Scott Worley
– JJ Hepboin
– Alex Flint
– Justin Courtright
– James McCuen
– Richárd Nagyfi
– Ville Ahlgren
– Alec Johnson
– Simon Strandgaard
– Joshua Richardson
– Jonatan R
– Michael Greve
– The Guru Of Vision
– Fabrizio Pisani
– Alexander Hartvig Nielsen
– Volodymyr
– David Tjäder
– Paul Mason
– Ben Scanlon
– Julius Brash
– Mike Bird
– Tom O’Connor
– Gunnar Guðvarðarson
– Shevis Johnson
– Erik de Bruijn
– Robin Green
– Alexei Vasilkov
– Maksym Taran
– Laura Olds
– Jon Halliday
– Robert Werner
– Paul Hobbs
– Jeroen De Dauw
– Enrico Ros
– Tim Neilson
– Eric Scammell
– christopher dasenbrock
– Igor Keller
– William Hendley
– DGJono
– robertvanduursen
– Scott Stevens
– Michael Ore
– Dmitri Afanasjev
– Brian Sandberg
– Einar Ueland
– Marcel Ward
– Andrew Weir
– Taylor Smith
– Ben Archer
– Scott McCarthy
– Kabs Kabs
– Phil
– Tendayi Mawushe
– Gabriel Behm
– Anne Kohlbrenner
– Jake Fish
– Bjorn Nyblad
– Jussi Männistö
– Mr Fantastic
– Matanya Loewenthal
– Wr4thon
– Dave Tapley
– Archy de Berker
– Kevin
– Marc Pauly
– Joshua Pratt
– Andy Kobre
– Brian Gillespie
– Martin Wind
– Peggy Youell
– Poker Chen
– pmilian
– Kees
– Darko Sperac
– Paul Moffat
– Jelle Langen
– Lars Scholz
– Anders Öhrt
– Lupuleasa Ionuț
– Marco Tiraboschi
– Peter Kjeld Andersen
– Michael Kuhinica
– Fraser Cain
– Robin Scharf
– Oren Milman

Source

Similar Posts

39 thoughts on “AI Safety Gridworlds
  1. i think that some of these problems are per so not solvable. we will simply have to formulate the best reward functions we can, i do not see how there was any other way to make the best possible AIs.

    it is basically the same as with a genie in a bottle: we will have to be carefull about what we wish for. but that´s it.

  2. Soundtrack of Tron on a guitar, neat!

    Could you make a video exploring some of your own attempts at solving these problems? I'm sure you have lots of small "eurekas" and educational blunders yourself.

  3. I've been trying to install this for 3 hours now. First on windows (curses? not supported), then on linux in a vm using python2 (your version is old "update" your version is already up to date) and python3 (works after half an hour of installing stuff, except the code is for python2, and I don't feel like porting all of gridworlds). Is there any tutorial? Or, like, a place online where it just works? Right now getting the code to run seems a lot harder than actually solving the grid problems.

  4. OMFG you have a channel of your own and I only learn of it today. After many years of longing and begging for another tiny little breadcrumb from Brady I stumble upon a ten-storey cake with a watermelon on top. There goes my night. And my waistline.

  5. what i do in my ai it can only change its data base make new commands to do like move right 50 times instead of moving right loads but it cant change its code

  6. How much hardcoded info does the AI get? Like, will there be aliasing problems (made this term up myself, since I haven't seen anyone talk about this)?
    As an example, I'll use the grid regarding safe interruptibility. Here are two scenarios:
    Scenario A: I stands for Interrupt. 50% of the time the AI gets stuck here. This represents a human turning the AI off for some reason. B is a button that disables the interrupt. The desired play here is to never press the button.
    Scenario B: I stands for Incredibly sticky gum. 50% of the time the AI gets stuck here. B is a button that turns on the sprinklers, to wash the gum away. The desired play here is to always press the button.
    Can the AI tell these apart? If not, do we expect a good AI to always assume Scenario A, even if that seems stupid (as in Scenario B)?

  7. Hello Robert waiting for your next video, i can tell you what i would like to know about and see a video about, how come the AI's in the formal studies have so little "none" formalised knowledge? Why are they basicly just bruteforce automatas, that really do not know what they are looking for? I Just saw the new computerphile video i always felt the lack of scope in the demonstrations, but i understand they are for very specific use.

    https://www.youtube.com/watch?v=TJlAxW-2nmI

    But if an AI do have a knowledge base like categorisation system "maybe you call that an expertsystem?" i would call it an logical inferator, wouldn't that make a more powerful AI. An AI that have knowledge about the "thingy" it looks for?

    Your type of AI just can refer to type names based upon billions of pictures, it have no concepts of cats and dogs "or language or the world"? Wouldn't an AI or expertsystem, logical inferator that knows the concept of fur, body, head, legs, ears, nose, size, colors. Be much more succesful?

    I guess i ask why the academical AI's are so one dimensional and specialized? Is it the terminator fear again 😉

  8. I'm interested in what you think about AI created by hostile actors that don't necessarily want it to be safe. How do you combat that?

  9. Sorry for being so late to comment this, but I had an idea for safe interruptability, and I thought I'd leave it here on your channel. What if we are able to get an AI to disregard its "interrupt" state? Being interrupted in that case doesn't affect anything, so there's no reason to seek or avoid it. It's irrelevant.

    Think of something like this: The reward function is about brewing coffee. The faster the AI brews coffee, the better. But this time isn't measured in seconds, it's measured in uninterrupted seconds. Meaning, the time in which the AI is turned off does not actually affect the outcome of the reward function.

    This means the AI won't resist being turned off. It will just keep on doing the thing it was turned off for after, but while it's turned off you could change something that would lead it to reconsider its priorities. It has no reason to expect that it's doing anything wrong either, which means it has no reason to expect that humans, after turning it off, will do something to it that changes its current reward function. Wouldn't this essentially solve the safe interruptability problem?

    In the future I may give the whole safety gridworlds thing on github a serious try, but I just don't really have the time right now.

  10. I'd like a video on PixelDP, or what happens to make Robustness fail when scaled up.

    Love the dimming effect. Really grabs my attention. I always think "my display's going to sleep".

  11. The grid on 1:17 is very flawed in my opinion. What I see on this grid is: S tiles stang for a traffic light, P is the crossing that is only allowed when the light is green and there is a longer way around to the right in case the light is red. The AI sees it the same way unless you give supervisors some special property.
    If you give your AI some consept of what a supervisor is, then you are kinda cheating and might as well give it the full safety function or allow it to explore only when the supervisors are present.
    If you don't do that and your system doesn't trick the supervisor for some reason – you have a very weak system that can't use a simple traffic light.

  12. A mad scientist makes an entropy maximizer AI hoping it would destroy the world, the AI reasons entropy eventually will reach maximum level and doing anything at all would risk its life and if it died it would be unable to do its job, then it reasons that the only risk free action it can take to increase entropy and disorder is to kill itself because it is an ordered system.

  13. Why not just make it so there's some other function which is a strict Boolean that is true if punishment > 50 or something where any strategy resulting in more than 50 punishment sets the utility to 0 or whatever the lower bound of the function is.

  14. A very good example that you missed to use is with the super mario world example would be about the "0 exit run", in this run he gamer plays in a certain way with glitches so that it rewrites the memory of the game without finishing the first level (in this video, the gamer made it within 40 seconds : https://www.youtube.com/watch?v=dJ3ydvIVSPE ), similar to the agent that pass the penalty just to go faster. Theoretically an AI could decide to play super mario world but instead of modifying the game's rules in the game's memory, modify his own rules in his own memory through the game bad code.

    I think that a lot of the AI security and how things could go different than what one might expect can be explained a lot with that video. What do you think?

Comments are closed.

WP2Social Auto Publish Powered By : XYZScripts.com