What is backpropagation really doing? | Deep learning, chapter 3

April 26, 2019Artis Modus

3Blue1Brown

What’s actually happening to a neural network as it learns?
Next video: https://youtu.be/tIeHLnjs5U8
Brought to you by you: http://3b1b.co/nn3-thanks
And by CrowdFlower: http://3b1b.co/crowdflower
Home page: https://www.3blue1brown.com/

The following video is sort of an appendix to this one. The main goal with the follow-on video is to show the connection between the visual walkthrough here, and the representation of these “nudges” in terms of partial derivatives that you will find when reading about backpropagation in other resources, like Michael Nielsen’s book or Chis Olah’s blog.

Source

Similar Posts

47 thoughts on “What is backpropagation really doing? | Deep learning, chapter 3”

Meina Wang says:

January 29, 2019 at 12:07 pm

Great visualization!! Thank you.
Dev deal says:

January 30, 2019 at 1:08 am

you are the best!!!!
Bip901 says:

January 30, 2019 at 3:36 am

My neural network hurts…
I'll let it sleep and keep training it on your videos tomorrow.
Thank you!
Robbie Smith says:

January 30, 2019 at 4:05 am

"Sorry! That offer has expired.
" regarding the free t-shirt
CV M says:

January 30, 2019 at 1:46 pm

This is the best video on machine learning I have seen so far. Thank you for making this complexity as simple as possible.
Vincent Oostelbos says:

February 1, 2019 at 5:00 am

So backpropagation only works when you have a very clearly defined desired output for a given trial, right? Since that is required to come up with the cost function that is at the basis of it all. But you don't always have this clear definition of what you want from a neural network, I think, or at least not in a way that is easy to give to the network as feedback at each time step. For example, I coded a very simple feedforward neural network with a single hidden layer to make decisions for simulated creatures in a simulated ecosystem. There's no clear 'right' move for any creature at any point, but they behave, and based on the rules of the simulation they compete for resources, reproduce (with a mutation to the weights and biases) if they succeed and die if they fail.

This led me to wonder which sorts of problems would have a clear cost function associated to them that would allow for backpropagation, and which ones would require another approach, such as the evolutionary approach that I described above. (Or yet other approaches, perhaps? And then what might those be?) If anyone has any thoughts on this, I would be happy to read them.
dehou Qian says:

February 1, 2019 at 10:17 pm

very amazing! very impressive！
satwik dondapati says:

February 2, 2019 at 12:19 am

what software you used for the graphics
Gustavo Exel says:

February 2, 2019 at 11:49 am

But when you flatten (or squish?) these "rows" into a single line, you are losing information. If a one, for example, is moved sideways, our best hope is that the NNet just remember the most common translations. It is sort of taking the pixels out of context.
Cool Guy says:

February 4, 2019 at 8:34 pm

This is my favorite anime
zachos 2000 says:

February 9, 2019 at 6:46 am

I don't get this, im stumbling around and can't visualize it, I thought I was the smart one, im usually the one my classmates goes to for help, my world is crumbling around me, im a fraud.
Ihsees91 says:

February 9, 2019 at 6:02 pm

Graphics are on point!
Tamahina Pierrot says:

February 10, 2019 at 12:33 pm

So incredible can’t believe we can actually visualize this
Tamahina Pierrot says:

February 10, 2019 at 12:36 pm

Crazy that neurons are capable of this
zachos 2000 says:

February 11, 2019 at 11:16 pm

but how do you decide whether to change the activations of the neurons or the weights connecting them???
merril almeida says:

February 17, 2019 at 8:16 pm

probably by far the best intuitive video
Kevin Gray says:

February 19, 2019 at 6:34 am

Thank you so much for the quality background music in your videos. So much better than the ubiquitous upbeat ukulele.
JetSteam27 says:

February 24, 2019 at 4:37 pm

finally starting to understand this years later
Richard Zipper says:

February 26, 2019 at 1:49 am

Really nicely done. I would love to see a video on the application of the backprop on the weights and a step-by-step walkthrough with a small real example. This series has been very helpful !!!
Rixing Wu says:

February 26, 2019 at 11:19 am

Super helpful
loveld Lu says:

February 26, 2019 at 9:49 pm

Let's hear it for Lindsey she did the right thing to find that girl she was so disrespectful in such a way that you do not talk to your boss and let them know how they should handle their business you're there to represent the brand and be the best of your ability and achieved goal with the bread needs we don't need your personal opinion your sloppy mess has your disrespectful and starting trouble
Her father failed her gave us the wrong tools and telling her to behave in such a matter and think that that makes her being a strong woman and respectful therefore his father feel him to pop or introduce him to the right way to handle yourself in the real world so it comes right back down to his daughter who's going to be a failure in the real world she is incapable of being strong she is weak so raw arrive for Lindsey continue on boss lady continue on a million blessings to you because it was disrespectful how dare you bring somebody's pass into their face you are weak person you never be straw you want to be real go to an Island by yourself be real you think you're perfect you think above everybody else on this planet know you know better your sinner just like everyone else so you hide behind your Cockiness are you disrespectful so this is not you so you not really real
Paul Bloemen says:

March 3, 2019 at 10:45 pm

1. The (activation) value of a neuron should be between 0 and 1, or? ReLu has a leaking minimum around 0, shouldn't ReLu have also a (leaking) maximum around 1?

2. Is there one best activation function, delivering the best neural network with the least amount of effort, like the amount of tests needed, and computer power?

3. Should weights and biases be between 0 and 1 or between -1 and 1? Or any different values?

4. Against vanishing and exploding gradients: can this be prevented with a (leaking) correction minimum and maximum for the weights and biases? There would be some symmetry then with the activation function suggested in the first paragraph.
Stefan Stanković says:

March 11, 2019 at 7:00 pm

I love the fact that I can watch these videos on mute and still get what I came here for. 😀 Great job!
Ibrahim alshubaily says:

March 11, 2019 at 8:57 pm

Thank you for the great videos!
80 KG says:

March 12, 2019 at 6:26 pm

where is the "Join" button?
weerobot says:

March 13, 2019 at 12:08 am

Chubbyemu watched this….this is what happened to his Brain…
karan jathoul says:

March 13, 2019 at 6:17 am

Thank you sir. You are a brilliant teacher. You made such a complicated topic a piece of cake. %Respect..
Thanks again.
U says:

March 14, 2019 at 1:03 am

제발 한국어 추가해줘
adrian bora says:

March 17, 2019 at 12:52 am

is it possible to reverse engineer a neural network using derivatives?
Sahad Zahir says:

March 19, 2019 at 8:44 pm

Just wondering, is there a minor error at 4:07? The probabilities don't add up to 1 for classification in this 'bad' network?
Get Good says:

March 22, 2019 at 8:55 pm

Sensitivity analysis.
Son In Cheong says:

March 27, 2019 at 1:12 am

12:25 Love how Fermat is labelled as "Tease"
David Swygart says:

March 27, 2019 at 1:43 am

I am a neuroscientist, and all your comparisons of actual neural networks to artificial neural networks seem pretty spot on to me. This is a great video series.
胡安啦 says:

March 27, 2019 at 3:56 pm

OMG! You are the legend! Saves my homework
Brice Chivu says:

March 29, 2019 at 8:48 am

I might misunderstand but maybe not. At 7:38, we want to reduce the activation of the neuron responsible for 3 right? So we should decrease the activation of the neurons that have a positive weight to 3, no? In the video, it's actually the opposite. For example, the first neuron has a positive weight with the neuron responsible for 3, so we should decrease its activation right?
Can someone help me on that please?
Joshua Jones says:

April 3, 2019 at 9:19 am

Suppose that we have a neural network with one input layer, one output layer, and one hidden layer. Let's refer to the weights from input to hidden as w and the weights from hidden to output as v. Suppose that we have calculated v via backprogagation. When finding the weights for w, do we keep the weights v constant when updating w or do we allow v to update along with w?
FJmus1C says:

April 4, 2019 at 3:08 pm

I finally understood it 😀 Thanks a lot!
Iorek says:

April 6, 2019 at 6:48 am

This is awesome, thanks man!
hupa1a says:

April 11, 2019 at 4:06 am

This explained it great! Thank you!
Aparup says:

April 11, 2019 at 11:50 am

I want to see this in IMAX !!
Aparup says:

April 12, 2019 at 12:38 pm

Hey Grant. Could you do a video on intuition of vanishing and exploding gradients ?
Harry M says:

April 15, 2019 at 2:56 am

5:09 How do you decide how to balance your 3 different avenues? Do you put the emphasis on the biases, then the weights of the current layer and give a low priority to the activations of the previous layer? That would mean that the weights of the first layer will hardly change. Do you put the highest priority on the weights of the current layer? Do you emphasize the activations of the previous layer? How do you decide?
Also, you talk about changing the weights and activations, but you don't talk about how to find the changes to the biases.
DrMantisToboggan says:

April 17, 2019 at 9:56 am

Thanks, this is super helpful. I studied back prop in grad school and after rewinding a few times, I think I understand now. I'm ready to tackle the math now. Really great explanation.
Raise Knowledge says:

April 17, 2019 at 11:44 pm

Sir I subscribed you. Please tell me how do you create videos.
Aayush Tomar says:

April 20, 2019 at 11:12 am

how did you create those animations?
Bartosz K. says:

April 22, 2019 at 8:19 pm

Sorry for a depth and detailed analysis, but shouldn't the colors of the arrows (related to the amount of desired change) appearing at 7:37 be in an opposite color (for digit outputs 0, 1, 3, …, 9 i.e. last layer)?
Luke the Giant says:

April 24, 2019 at 10:52 pm

Help. I have question. Is it true that the gradient gets computed for for a training example and that the average of the costs function is only for measurement?

Comments are closed.