Backpropagation calculus | Chapter 4, Deep learning

November 3, 2017Artis Modus

3Blue1Brown

Help fund future projects: https://www.patreon.com/3blue1brown
An equally valuable form of support is to simply share some of the videos.
Special thanks to these supporters: http://3b1b.co/nn3-thanks
Written/interactive form of this series: https://www.3blue1brown.com/topics/neural-networks

This one is a bit more symbol-heavy, and that’s actually the point. The goal here is to represent in somewhat more formal terms the intuition for how backpropagation works in part 3 of the series, hopefully providing some connection between that video and other texts/code that you come across later.

For more on backpropagation:
http://neuralnetworksanddeeplearning.com/chap2.html
https://github.com/mnielsen/neural-networks-and-deep-learning
http://colah.github.io/posts/2015-08-Backprop/

Music by Vincent Rubinetti:
https://vincerubinetti.bandcamp.com/album/the-music-of-3blue1brown

——————
Video timeline
0:00 – Introduction
0:38 – The Chain Rule in networks
3:56 – Computing relevant derivatives
4:45 – What do the derivatives mean?
5:39 – Sensitivity to weights/biases
6:42 – Layers with additional neurons
9:13 – Recap
——————

3blue1brown is a channel about animating math, in all senses of the word animate. And you know the drill with YouTube, if you want to stay posted on new videos, subscribe, and click the bell to receive notifications (if you’re into that): http://3b1b.co/subscribe

If you are new to this channel and want to see more, a good place to start is this playlist: http://3b1b.co/recommended

Various social media stuffs:
Website: https://www.3blue1brown.com
Twitter: https://twitter.com/3Blue1Brown
Patreon: https://patreon.com/3blue1brown
Facebook: https://www.facebook.com/3blue1brown
Reddit: https://www.reddit.com/r/3Blue1Brown

Source

Similar Posts

46 thoughts on “Backpropagation calculus | Chapter 4, Deep learning”

3Blue1Brown says:

November 3, 2017 at 7:20 am

Two things worth adding here:
1) In other resources and in implementations, you'd typically see these formulas in some more compact vectorized form, which carries with it the extra mental burden to parse the Hadamard product and to think through why the transpose of the weight matrix is used, but the underlying substance is all the same.

2) Backpropagation is really one instance of a more general technique called "reverse mode differentiation" to compute derivatives of functions represented in some kind of directed graph form.
Flying Hippies Gaming says:

August 29, 2022 at 10:45 pm

How long will it take to read the Nielson's book?
Flying Hippies Gaming says:

August 30, 2022 at 1:18 am

Thank you very much.
There cannot be a better tutorial to understand back propagation than this video. Thanks again for the effort to make us understand the complex mathematical theory behind this.
Josip Pardon says:

August 30, 2022 at 5:01 pm

This video is breathtaking…
Jan Egil Glæstad says:

August 31, 2022 at 1:37 pm

Took me some time to understand, to backprop you need 3 things: the model, loss (scalar) and input vector
shree patnaik says:

September 3, 2022 at 6:25 pm

Thank you for such a simplified explanation. It truly helps:)
Yamden says:

September 17, 2022 at 12:27 pm

Thank you. For everything you do. This channel is a treasure
TheRainHarvester says:

September 17, 2022 at 1:56 pm

Where does it actually modify the weights? Next video?
Matvey Ovinov says:

September 18, 2022 at 8:50 am

At 5:31, why did you put the partial derivatives with respect to w and b in the same column? Shouldn't it be a two-column vector if you are accounting for 2 variables for all samples?
Adrian Gray says:

September 19, 2022 at 4:14 pm

Beautiful animation, would like to see a behind the scene of how this was done. As I understood this is a specialized software you've build yourself for math animations?
Luca Meyer says:

September 22, 2022 at 12:08 pm

why is a n_L-1 over the sigma at 7:36? for some reason i don't get this. In our example we have two j's: 0 and 1 or am I wrong?

I appreciate any form of help!
BluestoneCreeper says:

September 23, 2022 at 7:52 pm

I, uh, what. I think I might need another year of math
Robertor Rossias says:

September 24, 2022 at 8:50 am

Now that you have experienced the work of God in the last days, what exactly is God’s disposition? Do you dare to say that God is a God who only speaks? You dare not make such a rule. Some people also say that God is the God who opens mysteries, and God is the Lamb who breaks the seven seals. No one dares to make such a rule.
Kamil Budagov says:

September 28, 2022 at 6:13 am

The mathematics is only true science, the rest is its applications!
Demetrius Demarcus Bartholomew James III Jr. says:

October 1, 2022 at 6:15 am

ECE 449 UofA
alexiworld says:

October 1, 2022 at 10:49 am

It would be nice to see all these calculations with real numbers done, back to front. Maybe call it a Chapter 5 🙂
Binit Rupakheti says:

October 2, 2022 at 5:03 am

This is it. Finally it clicked. After countless hours of trying to understand this, I finally somewhat understand neural networks
Prithvi Gupta says:

October 4, 2022 at 3:46 am

4:00
Sajjad Rezvani Khaledi says:

October 4, 2022 at 5:51 am

god bless you for these amazing explanations and animations.
ice cream says:

October 6, 2022 at 11:30 pm

Basically, like any other gradient descent in other machine learning algorythm(linreg, logreg, etc), is just

Param := param – derrivative(cost function)

*note
:= is update

Just that in nn, the cost function is more massive, and looking complex
Is that the basic most fundamental concept ? Did i get it right ?
Eliz Fikret says:

October 10, 2022 at 1:59 am

Thank you so much! I have understood more math from this channel than from all teachers I have had in high school or university in total.
A.I Marshel says:

October 11, 2022 at 7:10 am

Brilliant and Excellent.
Pedro Daccache says:

October 12, 2022 at 9:26 am

Great video! I just don't know if I'm missing something, because I thought we didn't use this cost function when working with neural networks. As we activates every neuron with the sigmoid, the cost function defined as (a – y)^2 gives us a non-convex cost function wich has multiple local minimums. So a better choice is to use the cost function from logistic regression I think.

I'm not sure tho if I'm correct or if I'm really just missing something.

Amazing video anyway, you made me understand neural networks so much better with these series. I'm from Brazil and I'm taking machine learning classes here at my university, and I wouldn't be able to understand backpropagation without you, thank you very much!
Cesar Segura Del Rio says:

October 18, 2022 at 9:29 am

easy to digest video
Vi L says:

October 20, 2022 at 10:28 pm

How easy it is to understand this through your lectures in just 10minutes. THANK YOU.
Vi L says:

October 20, 2022 at 10:38 pm

Thanks
Ben Phillips says:

November 1, 2022 at 1:46 pm

I feel that the video could have benefitted a lot by going into the L-1 layer weights. I'm still a bit confused as to how they're computed in the backward pass
Erhan Seven says:

November 2, 2022 at 12:55 pm

Man, I really appreciate your fantastic work! Do you read the texts when you record, or do you speak spontaneously? You are very good at that, and I am very bad. 😊That's why I wanted to ask you, maybe you should make a tutorial about that, too 😬
Wood Croft Ⓥ says:

November 3, 2022 at 7:00 pm

3:42 That's not really the chain rule, because it works differently with partial derivatives. In this case, the other terms are zero, so you're left out with that formula, which coincidentally looks the same as the one from single-variable calculus.
Rodney Shafer says:

November 8, 2022 at 5:53 pm

Funny to see the number of views drop from video to video.
Congrats every one!
miguel amant says:

November 9, 2022 at 8:42 am

I think I'm in love with you, amazing teacher!
Vishal S K says:

November 10, 2022 at 11:38 pm

One of the best videos watched. Well explained ..
Majd Alkwaja says:

November 11, 2022 at 12:22 pm

thanks a lot this helped me a lot understanding what is going on with gradiant descent , and sure yall dont hesitate to watch this video multiple times to get it right
venkatesh v says:

November 12, 2022 at 1:08 am

From what I understand, finding the global minima of cost function is what we need, does gradient descent ensure that. I think it would end up in local minima.
I am a Rubik's cube says:

November 15, 2022 at 12:28 am

POV: You look away for a few minutes in class.
Imran Khan says:

November 15, 2022 at 2:13 am

Thank you so much for this video. This is the best explanation of backprop algo and neural networks.
Bob Smith says:

November 18, 2022 at 1:03 pm

how to you aply nudges to the weights though ? You didn't explain that at all
thaer12345 says:

November 20, 2022 at 5:48 pm

Great video, missing a few things though.
Ok, so I've calculated how sensitive each weight and bias are, but how much do I actually tweak them by?
이세진 says:

November 23, 2022 at 7:52 am

0:38 코딩으로 구현
ro says:

November 29, 2022 at 6:20 pm

is it a mistake on the last slide that del cost/del activation is a summation with terms from layer (L+1)? i thought that it should go backwards and thus be (L-1)?

edit: nvm i see
Yernar Yerkin says:

November 29, 2022 at 10:23 pm

GREAT I GOT IT FROM FIRST TIME
Andy Miron says:

November 30, 2022 at 3:10 pm

One doesn't need synapses to say that an image of a 2 is NOT a 1, and not a 3 and not a 4 and not a 5 and so on, but one need the synapses that tells you if it can NOT be a 2.
Kirill Losik says:

December 3, 2022 at 1:52 am

Thanks a lot for creating such a fantastic content! Anticipating to see more videos about AI, ML, Deep Learning!
irakli butulashvili says:

December 3, 2022 at 8:56 am

This explanation is so much better than the rest of the internet. Big fan of the channel.
DotNetRussell says:

December 9, 2022 at 6:23 am

10k of your views on this video are going to be my chimp brain trying to understand backpropagation 🤣😂
Thomas Schwarz says:

December 12, 2022 at 8:55 am

This is truly awesome, as pedegogy and as math and as programming and as machine learning. Thank you! …one comment about layers, key in your presentation is the one-neuron-per-layer, four layers. And key in the whole idea of the greater description of the ratio of cost-to-weight/cost-to-bias analysis, is your L notation (layer) and L – 1 notation. Problem, your right most neuron is output layer or "y" in your notation. So one clean up in the desction is to make some decisions: the right most layer is Y the output (no L value), because C[0]/A[L] equals 2(A[L] – y). So the right most three neurons, from right to left, should be Y (output), then L then L minus one, then all the math works. Yes?

Comments are closed.