MIT 6.S094: Deep Reinforcement Learning for Motion Planning

January 22, 2017Artis Modus

Lex Fridman

This is lecture 2 of course 6.S094: Deep Learning for Self-Driving Cars taught in Winter 2017. This lecture introduces types of machine learning, the neuron as a computational building block for neural nets, q-learning, deep reinforcement learning, and the DeepTraffic simulation that utilizes deep reinforcement learning for the motion planning task.

INFO:
Slides: http://bit.ly/2H8Fs7g
Website: https://deeplearning.mit.edu
GitHub: https://github.com/lexfridman/mit-deep-learning
Playlist: https://goo.gl/SLCb1y

Links to individual lecture videos for the course:

Lecture 1: Introduction to Deep Learning and Self-Driving Cars
https://youtu.be/1L0TKZQcUtA

Lecture 2: Deep Reinforcement Learning for Motion Planning
https://youtu.be/QDzM8r3WgBw

Lecture 3: Convolutional Neural Networks for End-to-End Learning of the Driving Task
https://youtu.be/U1toUkZw6VI

Lecture 4: Recurrent Neural Networks for Steering through Time
https://youtu.be/nFTQ7kHQWtc

Lecture 5: Deep Learning for Human-Centered Semi-Autonomous Vehicles
https://youtu.be/ByZF8_-OJNI

CONNECT:
– If you enjoyed this video, please subscribe to this channel.
– AI Podcast: https://lexfridman.com/ai/
– LinkedIn: https://www.linkedin.com/in/lexfridman
– Twitter: https://twitter.com/lexfridman
– Facebook: https://www.facebook.com/lexfridman
– Instagram: https://www.instagram.com/lexfridman
– Slack: https://deep-mit-slack.herokuapp.com

Source

Similar Posts

43 thoughts on “MIT 6.S094: Deep Reinforcement Learning for Motion Planning”

@kaizhou7331 says:

June 30, 2017 at 3:19 pm

Hi, it seems the link to the slides of the lecture does not work. I am wondering if you could provide the new links? Thks
@jonathangilliam875 says:

July 16, 2017 at 3:19 am

Is this guy high AND drunk ?
@judenchd says:

July 16, 2017 at 5:35 pm

Thanks Lex!
@Tibetan-experience says:

July 28, 2017 at 11:17 am

Thank you lex for uploading this awesome video here for free.
@hellyesOo says:

August 10, 2017 at 8:06 pm

Thanks for sharing your lecture, Just the web site is not working
@saksaganskasadyba says:

August 13, 2017 at 6:52 am

Thank you!
@enochsit says:

August 15, 2017 at 4:06 pm

Thanks, and it is a good way to deliver the material in a motivation speech style !!!
@csankar69 says:

August 24, 2017 at 10:57 am

Very poor and hand wavy explanation on some of the key concepts. Not at all clear what the deep Q learning loss function is and why it is chosen that way and how it is evaluated. The instructor seems to assume that you have a pretty good basic understanding of the content talked about in the lecture
@Nightlurk says:

August 24, 2017 at 4:01 pm

Am I the only one that finds the explanations to be quite cumbersome and not easily digestible!? I'm having a hard time following some things, gotta pause, go back, rewatch segments, speculate on a lot of things and extrapolate on speculations then rewatch hoping to match speculations on stated facts to confirm my understanding is correct. I'm not an expert in teaching nor am I a genius but when the lesson leaves so many loose ends and raises more question than it answers, it might not be properly optimized for teaching. I do appreciate the effort though and acknowledge the fact that it's a difficult subject, I'm a visual learner and it's a pain in the ass to find material that suits me on this subject.
@bikrambaruah8914 says:

August 26, 2017 at 12:32 pm

Can someone please explain what is the input to the algorithm? Is it just one snapshot of the game , or multiple snapshots taken when humans are playing it, or a video of a human playing it?
@debjyotibiswas3793 says:

November 4, 2017 at 5:49 am

Can someone tell me why are we doing this in the browser? Is the training happening in cloud or on the local system. What is the logic of using browser?
@scadasystem3790 says:

December 18, 2017 at 4:08 pm

good stuff.. thanks
@AnjaliChadha says:

December 24, 2017 at 9:40 am

How did we decide the image size should be 28*28? That 784 pixels are neither too much or too less for the training model.
@stanleyman4100 says:

December 27, 2017 at 10:44 am

Great lesson
@anonymoususer3741 says:

January 15, 2018 at 11:10 am

54:31 So basically you try to predict the result R' of performing A' on a past state S on which you did A and got result R, and then readjust your weights to make your prediction and the actual R' you got closer?
@mariaterzi1763 says:

January 25, 2018 at 3:13 am

excellent lecture! Thank you!
@pauldacus4590 says:

January 29, 2018 at 11:15 am

At 36:56 it seems like you can reduce the reward to Q(t+1) – Q(t), or just the simple increase in the "value" of the state in time period t+1 over period t. Then the discount rate (y) can be applied to that gain to discount it back to time t. The learning rate (a) then becomes a "growth of future state" valuations. Then the most important thing is that y * a > 1, or your learning never overcomes the burden of the discount rate.
This is really similar to the dividend growth model of stock valuation:

D/(k-g)
D=dividend at time 0, k=discount rate, g=growth rate.

The strange similarity is that when the "Learning rate" (feels like this should be "Applied Learning Rate") is greater than the discount rate, there is "growth" in future states, otherwise there is contraction (think The Dark Ages). In the dividend discount model, whenever the growth rate is extrapolated into infinity as higher than the discount rate, the denominator goes to zero and below, and the valuation goes to infinity.

Yeah, I like this guys analogies translating the bedrock of machine learning, etc to fundamental life lessons.
Never stop learning… and then doing!
@ankishbansal420 says:

February 19, 2018 at 2:57 am

Thanks for sharing such a great lecture. But i stuck here at 45.13 where in the Atari game we have 4 image to decide Q parameter and we have dimension of each image as H * W and each pixel represent 256 level (gray scale) and then total size would be 256 * H * W * 4. But how there are 256^(H * W * 4) rows in the Q table.
please anyone can explain?
@AlexeyKrivitsky says:

March 15, 2018 at 12:46 pm

I find the explanations of Q and deep Q a bit unclear
@chidedneck says:

March 19, 2018 at 5:05 pm

Kant knowlege & reason cross-pollination: https://youtu.be/FYkwLiHEEtY
@QuantamTrails says:

April 13, 2018 at 9:39 pm

This is so refreshing! To break down the human psyche to mathematical terms! Mind blown 🤯!! You nailed it!! When science and psychology come together so beautifully like this is an inspiring site! You got my attention 😊
@deeplearningpartnership says:

May 11, 2018 at 5:47 am

Nice talk.
@jayhu6075 says:

May 31, 2019 at 7:58 am

First you are a good human and a fantastic teacher.
Because you share the knowledge with the people who have not has the possibility to study by a university.
Thanks for that and god bless you.
@OEFarredondo says:

June 24, 2019 at 3:55 pm

28 people don’t have email
@TheGodlessGuitarist says:

October 28, 2019 at 2:05 am

Lex, why do you have digital shadow? it's freaking me out man.
@T4l0nITA says:

December 8, 2019 at 4:06 pm

I can see he has a lot of "Work" behind him.
@aryankumarn5323 says:

April 19, 2020 at 12:37 pm

Can anyone tell me the step by step map for learning machine learning? I am a beginner ,I have just completed python programming and have done some small project. Please help me ,I don't know where to start
@prest0n755 says:

November 24, 2020 at 3:27 am

at 7:31your slide shows a threshold for the activation function in the equation but the animation shows a sigmoid for the activation. That might confuse some MIT folks.
@khayyam2302 says:

December 5, 2020 at 11:01 am

Presentation style of the trainer is awesome.
@beattoedtli1040 says:

December 21, 2020 at 1:19 pm

Can you please use bike instead of cars? Car's are polluting, outdated, harmful means of transport.
@summersea37420 says:

March 18, 2021 at 5:19 am

ㅂㅅ0
@murtazaburhani4022 says:

April 7, 2021 at 11:14 am

Thankyou 🙏🙏🙏
@NealOGrady says:

April 25, 2021 at 4:25 pm

Just started watching this series and realized the game is long taken down :'(
@cynicalmedia757 says:

August 13, 2021 at 2:37 am

Learning about one of my favorite topics from Lex is just awesome. Thanks to this humble legend for sharing this!
@Jeff-KK says:

August 18, 2021 at 9:30 pm

Lex, you look like a kid here! Are you sure this was only 4 years ago?
@sharp389 says:

January 10, 2022 at 5:23 am

I like Lex a lot, but the objective function for Q I think is wrong (32:49). Optimal Q-values are intended to maximize the cumulative future reward, not just reward at the next time step. One could easily imagine that the best action to take in one's current state delivers a loss at the next step, but in the long term achieves the greatest net gain in reward.
@gal1leo885 says:

January 18, 2022 at 2:13 pm

Thanks! Great lecture!!
@posie. says:

November 25, 2022 at 3:37 am

Amazing!
@Something-tx6cl says:

December 5, 2022 at 5:42 pm

Sorry, but you're a terrible teacher.
@Kiaranebot says:

May 8, 2023 at 6:43 am

Sorry…
Yuuuummmm.
#sploosh.
#imnasty
@turhancan97 says:

November 14, 2023 at 7:01 am

It's amazing how technology allows us to access such high-quality educational content from anywhere in the world. Huge thanks to Lex for sharing these insightful and inspiring videos with us!
@amothe83 says:

July 23, 2024 at 6:57 pm

This lecture is gold
@Blooper1980 says:

January 30, 2025 at 6:10 pm

Awsome man!

Comments are closed.