Lex Fridman
First lecture of MIT course 6.S091: Deep Reinforcement Learning, introducing the fascinating field of Deep RL. For more lecture videos on deep learning, reinforcement learning (RL), artificial intelligence (AI & AGI), and podcast conversations, visit our website or follow TensorFlow code tutorials on our GitHub repo.
INFO:
Website: https://deeplearning.mit.edu
GitHub: https://github.com/lexfridman/mit-deep-learning
Slides: http://bit.ly/2HtcoHV
Playlist: http://bit.ly/deep-learning-playlist
OUTLINE:
0:00 – Introduction
2:14 – Types of learning
6:35 – Reinforcement learning in humans
8:22 – What can be learned from data?
12:15 – Reinforcement learning framework
14:06 – Challenge for RL in real-world applications
15:40 – Component of an RL agent
17:42 – Example: robot in a room
23:05 – AI safety and unintended consequences
26:21 – Examples of RL systems
29:52 – Takeaways for real-world impact
31:25 – 3 types of RL: model-based, value-based, policy-based
35:28 – Q-learning
38:40 – Deep Q-Networks (DQN)
48:00 – Policy Gradient (PG)
50:36 – Advantage Actor-Critic (A2C & A3C)
52:52 – Deep Deterministic Policy Gradient (DDPG)
54:12 – Policy Optimization (TRPO and PPO)
56:03 – AlphaZero
1:00:50 – Deep RL in real-world applications
1:03:09 – Closing the RL simulation gap
1:04:44 – Next step in Deep RL
CONNECT:
– If you enjoyed this video, please subscribe to this channel.
– Twitter: https://twitter.com/lexfridman
– LinkedIn: https://www.linkedin.com/in/lexfridman
– Facebook: https://www.facebook.com/lexfridman
– Instagram: https://www.instagram.com/lexfridman
Source
Another detail I have noticed in many presentations … those agents are not trying to model the environment … that is semantically impossible … what they are trying to do instead, I believe, is to model AN INSTANCE OF A DUAL SPACE associated to the environmental space. It is very common to use linear regressions for instance …
Wonderful lecture.
Remove the human factor. Have the traffic be free of human crossing
@50:06 DQN can't learn stochastic policies. DQN has a softmax output on actions… isn't that a stochastic policy in itself?
Seriously the best Deep RL lecture out there to date.
شكرا جزيلا
How/why can you even upload this for free? Doesn't university cost loads in the US?
Great stuff though!
Much better than the Standford university lecture, where the lady basically only reads the equations without giving any real intuition to what's going on.
Hmm, is it easy to summarise or direct to what you find valuable about Nietzsche's writings? My impression is that, while I might agree with him about some things, he doesn't contribute much that is both novel and helpful. And quotes like the following makes him seem kind of …psychopathic?:
> I abhor the man’s vulgarity when he says “What is right for one man is right for another”; “Do not to others that which you would not that they should do unto you.”. . . . The hypothesis here is ignoble to the last degree: it is taken for granted that there is some sort of equivalence in value between my actions and thine.
> I do not point to the evil and pain of existence with the finger of reproach, but rather entertain the hope that life may one day become more evil and more full of suffering than it has ever been.
> Man shall be trained for war and woman for the recreation of the warrior. All else is folly
But I haven't read any of his books, only heard summaries and quotes, so perhaps I'm missing something or misunderstanding him somewhat.
Bellow are some examples of texts that I personally would recommend ?
* https://nickbostrom.com/utopia.html
* https://reducing-suffering.org/on-the-seriousness-of-suffering/
* https://wiki.lesswrong.com/wiki/Coherent_Extrapolated_Volition
Which Nietzsche book is he recommending at 4:12 ?
Trump 2020
Brilliant!!
Lex is honestly a character from a Wes Anderson film.
THANK YOU MIT
Lex Fridman, I just love your videos. I am your great fan sir. Carry on.
Hi Lex, thanks for this great lecture! Which books of Nietzsche did you have on your mind around 4:33?
The most funny part is where he was trying to explain the ability of human brains by evolution at 6:33 ! And he literally said, "it is some how being encoded" which contradicts the rewards concept he is introducing!
Son, the most logical reason of having a predefined encoding scheme that never been trained, is the existence of a creator!
I have tried to study and understand Deep RL using several books and lectures over the last few years, but I only feel like I understood something in RL after listening to this lecture. Thanks, Lex. I am grateful to you for posting this lecture on YouTube. Thank you!
You are my idol lex
I've seen a lot of these videos & read some of the books in ML; Lex has a clarity thats rare
Super
Professor Lex, can we get the entirety of 6.S091 on MIT OCW ? This is an incredibly interesting topic that I've been working on (Evolutionary Computing) and am currently enrolled in a project with thorough knowledge of Deep RL as a requisite. This research field has very few online resources besides Stanford's CS 234 and Berkeley's CS 285.
Your explanations are immensely helpful and intuitive. Humanity will present it's gratitude if this whole course is made available ! AGI and AI safety issues need more attention before it's the greatest immediate existential risk, your courses can help raise general AI awareness and advance our civilization to higher dimensions. Loved the fact that you grinned while just casually mentioning the Simulation Hypothesis..
1:04:40 Best part, that grin after he just casually dropped that line in an MIT lecture.. All of infinite universes being Simulations