Train Q-learning Agent with Python – Reinforcement Learning Code Project

October 18, 2018Artis Modus

deeplizard

Welcome back to this series on reinforcement learning! As promised, in this video, we’re going to write the code to implement our first reinforcement learning algorithm. Specifically, we’ll use Python to implement the Q-learning algorithm to train an agent to play OpenAI Gym’s Frozen Lake game that we introduced in the previous video. Let’s get to it!

OpenAI Gym:
https://gym.openai.com/docs/

TED Talk: https://youtu.be/uawLjkSI7Mo

💥🦎 DEEPLIZARD COMMUNITY RESOURCES 🦎💥

👋 Hey, we’re Chris and Mandy, the creators of deeplizard!
👀 CHECK OUT OUR VLOG:
🔗 https://www.youtube.com/channel/UC9cBIteC3u7Ee6bzeOcl_Og

👉 Check out the blog post and other resources for this video:
🔗 https://deeplizard.com/learn/video/HGeI30uATws

💻 DOWNLOAD ACCESS TO CODE FILES
🤖 Available for members of the deeplizard hivemind:
🔗 https://www.patreon.com/posts/27743395

🧠 Support collective intelligence, join the deeplizard hivemind:
🔗 https://deeplizard.com/hivemind

🤜 Support collective intelligence, create a quiz question for this video:
🔗 https://deeplizard.com/create-quiz-question

🚀 Boost collective intelligence by sharing this video on social media!

❤️🦎 Special thanks to the following polymaths of the deeplizard hivemind:
Prash

👀 Follow deeplizard:
Our vlog: https://www.youtube.com/channel/UC9cBIteC3u7Ee6bzeOcl_Og
Twitter: https://twitter.com/deeplizard
Facebook: https://www.facebook.com/Deeplizard-145413762948316
Patreon: https://www.patreon.com/deeplizard
YouTube: https://www.youtube.com/deeplizard
Instagram: https://www.instagram.com/deeplizard/

🎓 Deep Learning with deeplizard:
Fundamental Concepts – https://deeplizard.com/learn/video/gZmobeGL0Yg
Beginner Code – https://deeplizard.com/learn/video/RznKVRTFkBY
Advanced Code – https://deeplizard.com/learn/video/v5cngxo4mIg
Advanced Deep RL – https://deeplizard.com/learn/video/nyjbcRQ-uQ8

🎓 Other Courses:
Data Science – https://deeplizard.com/learn/video/d11chG7Z-xk
Trading – https://deeplizard.com/learn/video/ZpfCK_uHL9Y

🛒 Check out products deeplizard recommends on Amazon:
🔗 https://www.amazon.com/shop/deeplizard

📕 Get a FREE 30-day Audible trial and 2 FREE audio books using deeplizard’s link:
🔗 https://amzn.to/2yoqWRn

🎵 deeplizard uses music by Kevin MacLeod
🔗 https://www.youtube.com/channel/UCSZXFhRIx6b0dFX3xS8L1yQ
🔗 http://incompetech.com/

❤️ Please use the knowledge gained from deeplizard content for good, not evil.

Source

Similar Posts

30 thoughts on “Train Q-learning Agent with Python – Reinforcement Learning Code Project”

deeplizard says:

October 18, 2018 at 4:40 pm

Check out the corresponding blog and other resources for this video at:
http://deeplizard.com/learn/video/HGeI30uATws
pepe6666 says:

May 9, 2019 at 4:55 am

hoooray exponential decay. the champion's decay
pepe6666 says:

May 11, 2019 at 12:39 am

what i dont get is how you can have an expected reward for anything unless you're at the second to last state before game-over. nothing elese gives a reward! so how could most of those rows for each state have anything in them at all?
The Hung Pham says:

May 16, 2019 at 7:29 am

Hey I get an error, when I update the q_table.
"IndexError: index 4 is out of bounds for axis 0 with size 4"
Can somebody help me?
viren singh says:

May 17, 2019 at 11:20 pm

NOBODY:
LITERALLY NOBODY:
deeplizard: Don't be shy. I wanna hear from you.
Chris James says:

May 21, 2019 at 12:06 pm

This tutorial is incredibly clear and is the best tutorial that I have found about RL on the internet, I have learned a lot, thanks a lot for the effort in creating this and sharing the knowledge.
Cesar Augusto says:

May 25, 2019 at 4:14 pm

You are the best!!! EXCELLENT Explanation, EXCELLENT video! Thank you very much, I became a fan of your work!
Wonderful Vamsi says:

June 8, 2019 at 7:29 am

ERROR : env.reset( ) must be called before calling env.step( )
RAGHAV GOEL says:

June 13, 2019 at 12:03 pm

I am getting a decrease in the reward after running the code for 10k episodes, any ideas why ?
Farhanking7864 says:

July 5, 2019 at 9:34 pm

After this video I'm going to become a patreon! This was an absolutely amazing series up to this point. Can't wait to finish
Marcaunon says:

July 17, 2019 at 3:47 am

Excellent video series!
Rubén Vicente says:

July 20, 2019 at 9:53 am

great video, but you forgot a parentesis in the main formula (Update Q-table)
Noah Turner says:

July 26, 2019 at 6:05 pm

Definitely the most clear and by far the best tutorial on q learning out there. Thank you!!
Himanshu Gullaiya says:

July 27, 2019 at 10:16 pm

Why are there large q-values for left and top action for starting State in the First Row of q_table, as we know it can only move to right or down?

Also large value for left in 5th row for [F]HFH?

By the way you are the best teacher EVER!!!! Really thankyou for these lectures.
Dmitriy S says:

August 1, 2019 at 1:38 pm

Thank you for awesome explanation!!! It's the greatest tutorial I have ever seen about Q-learning
Gleidson Leite says:

August 22, 2019 at 12:13 pm

Thanks for this series. I had learned so much with it and i'm learning with you a lot of things.
Mohammad Elghandour says:

August 25, 2019 at 8:21 am

Thanks for the amazing work. i have a question. for a state like "S" , which is the initial state for the agent. There will be only 2 action choices, moving right or moving down. the same also applies with states found on the edges (3 action choices) and corners (2 action choices). should the Q values of those state action pairs be zeros? just cannot locate them on the final Q-table. I know there are 23 zeros on the final Q-table. Are those for state action pairs that lead to "H" and for cases mentioned above when only 2 or 3 action choices are available?
sarvagya Gupta says:

August 27, 2019 at 9:31 pm

So I read about a term called "return from the reward". I have two questions:

1) Is it the same thing as the Q function?
2) To calculate the return, We need the reward of FUTURE actions. How do we get that?
Maximilian Wittmann says:

September 19, 2019 at 12:46 am

Dear deeplizard-team,
first of all I want to congratulate you for having put together an amazing Reinforcement Learning playlist. Your theoretical explanations are very precise, you walk us very smoothly through the source code, and your videos are both entertaining and rich on great content! By watching your videos, you can tell that you are very passionate about AI and on sharing your knowledge with fellow tech enthusiasts. Thank you for your hard work and dedication.

I have one question though: I am unsure where exactly we specify the positive or negative rewards in the Python code? I followed your explanation and understood that the agent is basing its decisions for each state-action-pair on the q-table-values depending on the exploration-exploitation-tradeoff. But where exactly in the source code do we actually tell the agent that by stepping onto the fields with letter H, it will receive minus 1 points and for landing on F-letter fields it is "safe"? Is this information specified in the env.step-function and thus already imported from OpenAI gym's environment? I look forward to your reply.

Thanks!
Arturas Druteika says:

September 30, 2019 at 11:30 am

Hi, why do we need to set done = False when later we get done value from env.step(action)?
Deividas Simanavičius says:

October 13, 2019 at 10:33 am

Heyyyy, why am I getting zeroes everywhere?
Nive Yoga says:

October 14, 2019 at 12:41 am

Thank you!
Dylan Campbell says:

October 15, 2019 at 12:24 am

Why am I only getting zero's, I've followed the code word for word, letter for letter, number for number. But I only get zero's
Jian Guo says:

October 21, 2019 at 4:11 am

my reward table shows all zeros. could not replicate your result.
Andrew Montana says:

November 3, 2019 at 4:39 pm

4:13. what is doing?
Andrew Montana says:

November 3, 2019 at 4:49 pm

Giving likes for any non music video very rarely, but here i gotta make exception. These videos deserve thousands, you're awesome ^^
Hanser J says:

November 29, 2019 at 11:30 am

Amazing explanation. I was wondering if you have the code implementation for a continous task as well.
¥MoritzPainz¥ says:

December 1, 2019 at 4:54 am

why dont you publish your courses on udemy or coursera?
i would be happy to support you there 😀
KEEP IT UP <3
Massimo Zambelli says:

December 23, 2019 at 7:47 am

Beautiful video, really clear and well made!
Only a quick question: shouldn't we consider the discount when we compute the value in "rewards_current_episode" in order to get the return? Maybe with 0.99 and only 100 steps at most one could barely notice the difference, but I think in general the formula for this implementation should be like rewards_current_episode += (DISCOUNT_RATE ** (step+1)) *reward. The +1 comes from the fact that the value corresponding to the "first" episode is actually 0 due to how range(..) works.
Hope this is a meaningful consideration, please correct me if I'm wrong 🙂
Nina ten Pas says:

January 11, 2020 at 8:00 am

love you

Comments are closed.