Videos

Reinforcement Learning Course – Full Machine Learning Tutorial



freeCodeCamp.org

Reinforcement learning is an area of machine learning that involves taking right action to maximize reward in a particular situation. In this full tutorial course, you will get a solid foundation in reinforcement learning core topics.

The course covers Q learning, SARSA, double Q learning, deep Q learning, and policy gradient methods. These algorithms are employed in a number of environments from the open AI gym, including space invaders, breakout, and others. The deep learning portion uses Tensorflow and PyTorch.

The course begins with more modern algorithms, such as deep q learning and policy gradient methods, and demonstrates the power of reinforcement learning.

Then the course teaches some of the fundamental concepts that power all reinforcement learning algorithms. These are illustrated by coding up some algorithms that predate deep learning, but are still foundational to the cutting edge. These are studied in some of the more traditional environments from the OpenAI gym, like the cart pole problem.

💻Code: https://github.com/philtabor/Youtube-Code-Repository/tree/master/ReinforcementLearning

⭐️ Course Contents ⭐️
⌨️ (00:00:00) Intro
⌨️ (00:01:30) Intro to Deep Q Learning
⌨️ (00:08:56) How to Code Deep Q Learning in Tensorflow
⌨️ (00:52:03) Deep Q Learning with Pytorch Part 1: The Q Network
⌨️ (01:06:21) Deep Q Learning with Pytorch part 2: Coding the Agent
⌨️ (01:28:54) Deep Q Learning with Pytorch part
⌨️ (01:46:39) Intro to Policy Gradients 3: Coding the main loop
⌨️ (01:55:01) How to Beat Lunar Lander with Policy Gradients
⌨️ (02:21:32) How to Beat Space Invaders with Policy Gradients
⌨️ (02:34:41) How to Create Your Own Reinforcement Learning Environment Part 1
⌨️ (02:55:39) How to Create Your Own Reinforcement Learning Environment Part 2
⌨️ (03:08:20) Fundamentals of Reinforcement Learning
⌨️ (03:17:09) Markov Decision Processes
⌨️ (03:23:02) The Explore Exploit Dilemma
⌨️ (03:29:19) Reinforcement Learning in the Open AI Gym: SARSA
⌨️ (03:39:56) Reinforcement Learning in the Open AI Gym: Double Q Learning
⌨️ (03:54:07) Conclusion

Course from Machine Learning with Phil. Check out his YouTube channel: https://www.youtube.com/channel/UC58v9cLitc8VaCjrcKyAbrw

Learn to code for free and get a developer job: https://www.freecodecamp.org

Read hundreds of articles on programming: https://medium.freecodecamp.org

And subscribe for new videos on technology every day: https://youtube.com/subscription_center?add_user=freecodecamp

Source

Similar Posts

20 thoughts on “Reinforcement Learning Course – Full Machine Learning Tutorial
  1. Here are some time stamps folks!

    Intro 00:00:00
    Intro to Deep Q Learning 00:01:30
    How to Code Deep Q Learning in Tensorflow 00:08:56
    Deep Q Learning with Pytorch Part 1: The Q Network 00:52:03
    Deep Q Learning with Pytorch part 2: Coding the Agent 01:06:21
    Deep Q Learning with Pytorch part 3: Coding the main loop 01:28:54
    Intro to Policy Gradients 01:46:39
    How to Beat Lunar Lander with Policy Gradients 01:55:01
    How to Beat Space Invaders with Policy Gradients 02:21:32
    How to Create Your Own Reinforcement Learning Environment Part 1 02:34:41
    How to Create Your Own Reinforcement Learning Environment Part 2 02:55:39
    Fundamentals of Reinforcement Learning 03:08:20
    Markov Decision Processes 03:17:09
    The Explore Exploit Dilemma 03:23:02
    Reinforcement Learning in the Open AI Gym: SARSA 03:29:19
    Reinforcement Learning in the Open AI Gym: Double Q Learning 03:39:56
    Conclusion 03:54:07

  2. self.q = tf.reduce_sum(tf.multiply(self.Q_values, self.actions))
    Why are you doing this? I fail to understand the meaning of this line?? Thank you in advance 🙂

  3. I am currently creating an agent-based model that will generate x numver of agents. Each agent has a step function. I would LOVE to incorporate this reinforced learning method into the model. How would you adjust it from taking a visual frame like from a game to using only the global environment variables? Is it simply as easy as swapping out one for the other?

  4. Hello Phil, I think there is another mistake in the code, in the learn function it should be reward_batch + gamma*np.max(Q_next, axis=1)*(1-terminal_batch) instead of just terminal_batch. Since we are passing int(done) as a stored observation. Therefore for done=False, int(done)=0 and vice versa. And if episode does not end that is done equals False then we need to add the next Q_value otherwise we only add reward. What do you think? Am I correct?

  5. The length of the flatten outputlayer can actually be calculated from first conv layer tracing the data through the network. Just use the function:

    ((dimension length – kernal size for the dimension + 2*padding)/stride)+1 = output length for the dimension

    do this for each dimension for each conv layer and multiply by number of outputs in the end to find the length of the flat dimension as such:

    1st conv layer: ((185 – 8 + 2*1)/4) + 1 = 44 (acutally 44.75 but you always round down, since there are no 0.75 pixels)
    ((95 – 8 + 2*1)/ 4) + 1 = 22 (rounded down from 22.25)

    2nd conv: ((44 – 4 + 2*0)/2) + 1 = 21
    ((22 – 4 + 2*0)/2) + 1 = 10

    3rd conv: ((21 – 3 + 2*0)/1) + 1 = 19
    ((10 – 3 + 2*0)/1) + 1 = 8

    this means the 3rd layer outputs 128 frames with each having dimensions 19*8 and therefore if you wanted to flatten them into one you would get one dimension with 128*19*8 vectors.
    Just neat little trick for those who want it

  6. This is a great video if you already understand the topic, understand the code and just want a guy saying what he's typing out aloud, kinda explaining bits and pieces here and there.

Comments are closed.

WP2Social Auto Publish Powered By : XYZScripts.com