Videos

What is backpropagation really doing? | Deep learning, chapter 3



3Blue1Brown

What’s actually happening to a neural network as it learns?
Next video: https://youtu.be/tIeHLnjs5U8
Brought to you by you: http://3b1b.co/nn3-thanks
And by CrowdFlower: http://3b1b.co/crowdflower
Home page: https://www.3blue1brown.com/

The following video is sort of an appendix to this one. The main goal with the follow-on video is to show the connection between the visual walkthrough here, and the representation of these “nudges” in terms of partial derivatives that you will find when reading about backpropagation in other resources, like Michael Nielsen’s book or Chis Olah’s blog.

Source

Similar Posts

47 thoughts on “What is backpropagation really doing? | Deep learning, chapter 3
  1. This is the best video on machine learning I have seen so far. Thank you for making this complexity as simple as possible.

  2. So backpropagation only works when you have a very clearly defined desired output for a given trial, right? Since that is required to come up with the cost function that is at the basis of it all. But you don't always have this clear definition of what you want from a neural network, I think, or at least not in a way that is easy to give to the network as feedback at each time step. For example, I coded a very simple feedforward neural network with a single hidden layer to make decisions for simulated creatures in a simulated ecosystem. There's no clear 'right' move for any creature at any point, but they behave, and based on the rules of the simulation they compete for resources, reproduce (with a mutation to the weights and biases) if they succeed and die if they fail.

    This led me to wonder which sorts of problems would have a clear cost function associated to them that would allow for backpropagation, and which ones would require another approach, such as the evolutionary approach that I described above. (Or yet other approaches, perhaps? And then what might those be?) If anyone has any thoughts on this, I would be happy to read them.

  3. But when you flatten (or squish?) these "rows" into a single line, you are losing information. If a one, for example, is moved sideways, our best hope is that the NNet just remember the most common translations. It is sort of taking the pixels out of context.

  4. I don't get this, im stumbling around and can't visualize it, I thought I was the smart one, im usually the one my classmates goes to for help, my world is crumbling around me, im a fraud.

  5. Really nicely done. I would love to see a video on the application of the backprop on the weights and a step-by-step walkthrough with a small real example. This series has been very helpful !!!

  6. Let's hear it for Lindsey she did the right thing to find that girl she was so disrespectful in such a way that you do not talk to your boss and let them know how they should handle their business you're there to represent the brand and be the best of your ability and achieved goal with the bread needs we don't need your personal opinion your sloppy mess has your disrespectful and starting trouble
    Her father failed her gave us the wrong tools and telling her to behave in such a matter and think that that makes her being a strong woman and respectful therefore his father feel him to pop or introduce him to the right way to handle yourself in the real world so it comes right back down to his daughter who's going to be a failure in the real world she is incapable of being strong she is weak so raw arrive for Lindsey continue on boss lady continue on a million blessings to you because it was disrespectful how dare you bring somebody's pass into their face you are weak person you never be straw you want to be real go to an Island by yourself be real you think you're perfect you think above everybody else on this planet know you know better your sinner just like everyone else so you hide behind your Cockiness are you disrespectful so this is not you so you not really real

  7. 1. The (activation) value of a neuron should be between 0 and 1, or? ReLu has a leaking minimum around 0, shouldn't ReLu have also a (leaking) maximum around 1?

    2. Is there one best activation function, delivering the best neural network with the least amount of effort, like the amount of tests needed, and computer power?

    3. Should weights and biases be between 0 and 1 or between -1 and 1? Or any different values?

    4. Against vanishing and exploding gradients: can this be prevented with a (leaking) correction minimum and maximum for the weights and biases? There would be some symmetry then with the activation function suggested in the first paragraph.

  8. I am a neuroscientist, and all your comparisons of actual neural networks to artificial neural networks seem pretty spot on to me. This is a great video series.

  9. I might misunderstand but maybe not. At 7:38, we want to reduce the activation of the neuron responsible for 3 right? So we should decrease the activation of the neurons that have a positive weight to 3, no? In the video, it's actually the opposite. For example, the first neuron has a positive weight with the neuron responsible for 3, so we should decrease its activation right?
    Can someone help me on that please?

  10. Suppose that we have a neural network with one input layer, one output layer, and one hidden layer. Let's refer to the weights from input to hidden as w and the weights from hidden to output as v. Suppose that we have calculated v via backprogagation. When finding the weights for w, do we keep the weights v constant when updating w or do we allow v to update along with w?

  11. 5:09 How do you decide how to balance your 3 different avenues? Do you put the emphasis on the biases, then the weights of the current layer and give a low priority to the activations of the previous layer? That would mean that the weights of the first layer will hardly change. Do you put the highest priority on the weights of the current layer? Do you emphasize the activations of the previous layer? How do you decide?
    Also, you talk about changing the weights and activations, but you don't talk about how to find the changes to the biases.

  12. Thanks, this is super helpful. I studied back prop in grad school and after rewinding a few times, I think I understand now. I'm ready to tackle the math now. Really great explanation.

  13. Sorry for a depth and detailed analysis, but shouldn't the colors of the arrows (related to the amount of desired change) appearing at 7:37 be in an opposite color (for digit outputs 0, 1, 3, …, 9 i.e. last layer)?

  14. Help. I have question. Is it true that the gradient gets computed for for a training example and that the average of the costs function is only for measurement?

Comments are closed.

WP2Social Auto Publish Powered By : XYZScripts.com