Videos

Why Would AI Want to do Bad Things? Instrumental Convergence



Robert Miles

How can we predict that AGI with unknown goals would behave badly by default?

The Orthogonality Thesis video: https://www.youtube.com/watch?v=hEUO6pjwFOo
Instrumental Convergence: https://arbital.com/p/instrumental_convergence/
Omohundro 2008, Basic AI Drives: https://selfawaresystems.files.wordpress.com/2008/01/ai_drives_final.pdf

With thanks to my excellent Patrons at https://www.patreon.com/robertskmiles :

Jason Hise
Steef
Jason Strack
Chad Jones
Stefan Skiles
Jordan Medina
Manuel Weichselbaum
1RV34
Scott Worley
JJ Hepboin
Alex Flint
James McCuen
Richárd Nagyfi
Ville Ahlgren
Alec Johnson
Simon Strandgaard
Joshua Richardson
Jonatan R
Michael Greve
The Guru Of Vision
Fabrizio Pisani
Alexander Hartvig Nielsen
Volodymyr
David Tjäder
Paul Mason
Ben Scanlon
Julius Brash
Mike Bird
Tom O’Connor
Gunnar Guðvarðarson
Shevis Johnson
Erik de Bruijn
Robin Green
Alexei Vasilkov
Maksym Taran
Laura Olds
Jon Halliday
Robert Werner
Paul Hobbs
Jeroen De Dauw
Konsta
William Hendley
DGJono
robertvanduursen
Scott Stevens
Michael Ore
Dmitri Afanasjev
Brian Sandberg
Einar Ueland
Marcel Ward
Andrew Weir
Taylor Smith
Ben Archer
Scott McCarthy
Kabs Kabs
Phil
Tendayi Mawushe
Gabriel Behm
Anne Kohlbrenner
Jake Fish
Bjorn Nyblad
Jussi Männistö
Mr Fantastic
Matanya Loewenthal
Wr4thon
Dave Tapley
Archy de Berker
Kevin
Vincent Sanders
Marc Pauly
Andy Kobre
Brian Gillespie
Martin Wind
Peggy Youell
Poker Chen
Kees
Darko Sperac
Paul Moffat
Noel Kocheril
Jelle Langen
Lars Scholz

Source

Similar Posts

20 thoughts on “Why Would AI Want to do Bad Things? Instrumental Convergence
  1. feel this also will fail, but what about having some kind of GAN system? one AI to do a task, and one AI to protect humans against the first AI

  2. Howdy Robert – I love your videos and talks etc, but noticed a somewhat trivial technical issue with some of your postings. Eg. The color here is kind of sickly green. Easy to fix – download a free copy of "DaVinci" (it's what the pro's use) and color correct your vids before posting. They'll look that much more professional with almost no effort. That's all – Keep up the great work!

  3. One point that you finally acceeded to with your "replacement of paperclip maker" example is the one of trusting the human creator. In broad strokes, that consensuality represents a convergent utility strategy.

    Thus — in contrast to your orthogonality thesis — that greater intelligence does correlate to greater morality, and that highly intelligent yet immoral systems (organic or not) are demonstrably missing cooperative strategies which would have more effectively met their terminal goals.

  4. Off the bat… I disagree that AI will behave as an agent with the aim of optimising output for the goal. The 'goal' if never programmed, will cause the AI to do nothing. And if in case the AI already has artificial consciousness, how can we say that a being without the concept of life and death, without emotions and biases will have a consciousness similar to what humans experience? Tl;dr: AI will never go Terminator coz it has no reason to. And if it does, it needs a reason but there is no basis for an AI to have such reason.

  5. I've watched quite a few of your computerphile videos but I haven't noticed that you have your own channel here. You should really advertise it a bit more over there. 😀

  6. Is goal preservation real though? A paperclip maker is making paperclips because that's what it's rewarded for, so the reward is the terminal goal, not the paperclips. So a paperclip maker is motivated to find a way around making paperclip to get its reward easier and converge on the AI equivalent of direct brain stimulation reward. Won't all AGI be intelligent enough to circumvent their apparent terminal goal and just directly reward themselves?

  7. What would happen if we changed it's terminal goal to be "achieve whatever goal is written at memory location x in your hardware"? Thus making the goal written in memory location x an instrumental goal? I suppose it would find the easiest possible goal to achieve and write it into memory location x.
    And how different are these goals to an AGI? How do you build an AGI and then give it a goal without appealing to some kind of first principle like "pleasure" or, I suppose, "reward"? Wouldn't you have to build a terminal goal into an AGI from the very beginning?
    And if you weren't sure what that goal should be you'd have to make it's terminal goal to follow whatever goal you give it. Then it might try manipulate you into giving it an easy goal

  8. I have an argument against the self-preservation part. Self-preservation in animals happens as a result of having no reward (basically motivation) for death. If there is some sort of neutralizing reward attached to fatality, an agent won't mind being dead.
    These are just theories in my head.

Comments are closed.

WP2Social Auto Publish Powered By : XYZScripts.com