Videos

Train Q-learning Agent with Python – Reinforcement Learning Code Project



deeplizard

Welcome back to this series on reinforcement learning! As promised, in this video, weโ€™re going to write the code to implement our first reinforcement learning algorithm. Specifically, weโ€™ll use Python to implement the Q-learning algorithm to train an agent to play OpenAI Gymโ€™s Frozen Lake game that we introduced in the previous video. Letโ€™s get to it!

OpenAI Gym:
https://gym.openai.com/docs/

TED Talk: https://youtu.be/uawLjkSI7Mo

๐Ÿ’ฅ๐ŸฆŽ DEEPLIZARD COMMUNITY RESOURCES ๐ŸฆŽ๐Ÿ’ฅ

๐Ÿ‘‹ Hey, we’re Chris and Mandy, the creators of deeplizard!
๐Ÿ‘€ CHECK OUT OUR VLOG:
๐Ÿ”— https://www.youtube.com/channel/UC9cBIteC3u7Ee6bzeOcl_Og

๐Ÿ‘‰ Check out the blog post and other resources for this video:
๐Ÿ”— https://deeplizard.com/learn/video/HGeI30uATws

๐Ÿ’ป DOWNLOAD ACCESS TO CODE FILES
๐Ÿค– Available for members of the deeplizard hivemind:
๐Ÿ”— https://www.patreon.com/posts/27743395

๐Ÿง  Support collective intelligence, join the deeplizard hivemind:
๐Ÿ”— https://deeplizard.com/hivemind

๐Ÿคœ Support collective intelligence, create a quiz question for this video:
๐Ÿ”— https://deeplizard.com/create-quiz-question

๐Ÿš€ Boost collective intelligence by sharing this video on social media!

โค๏ธ๐ŸฆŽ Special thanks to the following polymaths of the deeplizard hivemind:
Prash

๐Ÿ‘€ Follow deeplizard:
Our vlog: https://www.youtube.com/channel/UC9cBIteC3u7Ee6bzeOcl_Og
Twitter: https://twitter.com/deeplizard
Facebook: https://www.facebook.com/Deeplizard-145413762948316
Patreon: https://www.patreon.com/deeplizard
YouTube: https://www.youtube.com/deeplizard
Instagram: https://www.instagram.com/deeplizard/

๐ŸŽ“ Deep Learning with deeplizard:
Fundamental Concepts – https://deeplizard.com/learn/video/gZmobeGL0Yg
Beginner Code – https://deeplizard.com/learn/video/RznKVRTFkBY
Advanced Code – https://deeplizard.com/learn/video/v5cngxo4mIg
Advanced Deep RL – https://deeplizard.com/learn/video/nyjbcRQ-uQ8

๐ŸŽ“ Other Courses:
Data Science – https://deeplizard.com/learn/video/d11chG7Z-xk
Trading – https://deeplizard.com/learn/video/ZpfCK_uHL9Y

๐Ÿ›’ Check out products deeplizard recommends on Amazon:
๐Ÿ”— https://www.amazon.com/shop/deeplizard

๐Ÿ“• Get a FREE 30-day Audible trial and 2 FREE audio books using deeplizardโ€™s link:
๐Ÿ”— https://amzn.to/2yoqWRn

๐ŸŽต deeplizard uses music by Kevin MacLeod
๐Ÿ”— https://www.youtube.com/channel/UCSZXFhRIx6b0dFX3xS8L1yQ
๐Ÿ”— http://incompetech.com/

โค๏ธ Please use the knowledge gained from deeplizard content for good, not evil.

Source

Similar Posts

30 thoughts on “Train Q-learning Agent with Python – Reinforcement Learning Code Project
  1. what i dont get is how you can have an expected reward for anything unless you're at the second to last state before game-over. nothing elese gives a reward! so how could most of those rows for each state have anything in them at all?

  2. This tutorial is incredibly clear and is the best tutorial that I have found about RL on the internet, I have learned a lot, thanks a lot for the effort in creating this and sharing the knowledge.

  3. Why are there large q-values for left and top action for starting State in the First Row of q_table, as we know it can only move to right or down?

    Also large value for left in 5th row for [F]HFH?

    By the way you are the best teacher EVER!!!! Really thankyou for these lectures.

  4. Thanks for the amazing work. i have a question. for a state like "S" , which is the initial state for the agent. There will be only 2 action choices, moving right or moving down. the same also applies with states found on the edges (3 action choices) and corners (2 action choices). should the Q values of those state action pairs be zeros? just cannot locate them on the final Q-table. I know there are 23 zeros on the final Q-table. Are those for state action pairs that lead to "H" and for cases mentioned above when only 2 or 3 action choices are available?

  5. So I read about a term called "return from the reward". I have two questions:

    1) Is it the same thing as the Q function?
    2) To calculate the return, We need the reward of FUTURE actions. How do we get that?

  6. Dear deeplizard-team,
    first of all I want to congratulate you for having put together an amazing Reinforcement Learning playlist. Your theoretical explanations are very precise, you walk us very smoothly through the source code, and your videos are both entertaining and rich on great content! By watching your videos, you can tell that you are very passionate about AI and on sharing your knowledge with fellow tech enthusiasts. Thank you for your hard work and dedication.

    I have one question though: I am unsure where exactly we specify the positive or negative rewards in the Python code? I followed your explanation and understood that the agent is basing its decisions for each state-action-pair on the q-table-values depending on the exploration-exploitation-tradeoff. But where exactly in the source code do we actually tell the agent that by stepping onto the fields with letter H, it will receive minus 1 points and for landing on F-letter fields it is "safe"? Is this information specified in the env.step-function and thus already imported from OpenAI gym's environment? I look forward to your reply.

    Thanks!

  7. Beautiful video, really clear and well made!
    Only a quick question: shouldn't we consider the discount when we compute the value in "rewards_current_episode" in order to get the return? Maybe with 0.99 and only 100 steps at most one could barely notice the difference, but I think in general the formula for this implementation should be like rewards_current_episode += (DISCOUNT_RATE ** (step+1)) *reward. The +1 comes from the fact that the value corresponding to the "first" episode is actually 0 due to how range(..) works.
    Hope this is a meaningful consideration, please correct me if I'm wrong ๐Ÿ™‚

Comments are closed.

WP2Social Auto Publish Powered By : XYZScripts.com