Replay Memory Explained – Experience for Deep Q-Network Training

November 3, 2018Artis Modus

deeplizard

Welcome back to this series on reinforcement learning! In this video, we’ll continue our discussion of deep Q-networks. Before we can move on to discussing exactly how a DQN is trained, we’re first going to explain the concepts of experience replay and replay memory, which are utilized during the training process. So, let’s get to it!

Jürgen Schmidhuber interview: https://youtu.be/zK_x3Ba2l5Q

💥🦎 DEEPLIZARD COMMUNITY RESOURCES 🦎💥

👋 Hey, we’re Chris and Mandy, the creators of deeplizard!
👀 CHECK OUT OUR VLOG:
🔗 https://www.youtube.com/channel/UC9cBIteC3u7Ee6bzeOcl_Og

👉 Check out the blog post and other resources for this video:
🔗 https://deeplizard.com/learn/video/Bcuj2fTH4_4

💻 DOWNLOAD ACCESS TO CODE FILES
🤖 Available for members of the deeplizard hivemind:
🔗 https://www.patreon.com/posts/27743395

🧠 Support collective intelligence, join the deeplizard hivemind:
🔗 https://deeplizard.com/hivemind

🤜 Support collective intelligence, create a quiz question for this video:
🔗 https://deeplizard.com/create-quiz-question

🚀 Boost collective intelligence by sharing this video on social media!

❤️🦎 Special thanks to the following polymaths of the deeplizard hivemind:
Prash

👀 Follow deeplizard:
Our vlog: https://www.youtube.com/channel/UC9cBIteC3u7Ee6bzeOcl_Og
Twitter: https://twitter.com/deeplizard
Facebook: https://www.facebook.com/Deeplizard-145413762948316
Patreon: https://www.patreon.com/deeplizard
YouTube: https://www.youtube.com/deeplizard
Instagram: https://www.instagram.com/deeplizard/

🎓 Deep Learning with deeplizard:
Fundamental Concepts – https://deeplizard.com/learn/video/gZmobeGL0Yg
Beginner Code – https://deeplizard.com/learn/video/RznKVRTFkBY
Advanced Code – https://deeplizard.com/learn/video/v5cngxo4mIg
Advanced Deep RL – https://deeplizard.com/learn/video/nyjbcRQ-uQ8

🎓 Other Courses:
Data Science – https://deeplizard.com/learn/video/d11chG7Z-xk
Trading – https://deeplizard.com/learn/video/ZpfCK_uHL9Y

🛒 Check out products deeplizard recommends on Amazon:
🔗 https://www.amazon.com/shop/deeplizard

📕 Get a FREE 30-day Audible trial and 2 FREE audio books using deeplizard’s link:
🔗 https://amzn.to/2yoqWRn

🎵 deeplizard uses music by Kevin MacLeod
🔗 https://www.youtube.com/channel/UCSZXFhRIx6b0dFX3xS8L1yQ
🔗 http://incompetech.com/

❤️ Please use the knowledge gained from deeplizard content for good, not evil.

Source

Similar Posts

17 thoughts on “Replay Memory Explained – Experience for Deep Q-Network Training”

deeplizard says:

November 2, 2018 at 7:39 pm

Check out the corresponding blog and other resources for this video at:

http://deeplizard.com/learn/video/Bcuj2fTH4_4
Don Rosenthal says:

November 3, 2018 at 1:41 pm

A question about replay memory — We start by picking a size N for the memory capacity (in your video, N was chosen to be 6). It wasn't fully explained, but is it correct to assume that at each time step, we store the data from that step, and then, when capacity is full, we make room for the data from the next step but releasing the memory from the oldest data? In other words, replay memory will always hold data from the last N steps? Thanks!
Aneek Das says:

November 3, 2018 at 3:08 pm

Keep up the amazing work ! Love from India .
Don Rosenthal says:

November 5, 2018 at 11:04 pm

Hey there Deeplizard — I'm back with another question. (I'm having to watch this one multiple times — there's a lot to unpack). Ok, so experience at time t is defined as the state at time t, plus the action at time t, plus the reward at t+1, and the state at t+1. Im not sure I understand why the reward term in this definition is the reward at the next time step rather than the reward (if any) at the current time step. Thanks in advance for any help!
Chris Freiling says:

November 8, 2018 at 8:08 pm

Your videos show a lot of game playing at the end of the videos. I have a hard time understanding this in the context of Markov Decision Processes. In many games the environment is adversarial. So you have to keep from being predictable. How does the agent learn this? For example, what would happen if you try to train an agent to play "rock-paper-scissors"?
Ashar Khan says:

November 9, 2018 at 2:01 am

Great explanation!
Also it would be great if you increase the audio volume in your videos. It's so low.
Andrew Tseng says:

November 10, 2018 at 1:46 pm

I just finished all the RL series and found out that this video is recorded just few days ago! Can't wait for the next video!
karthikeyan mg says:

November 13, 2018 at 8:46 pm

Good Job…
Eagerly waiting for policy gradient and actor critic
Please upload em…
Dulmina Renuka says:

November 15, 2018 at 8:26 pm

awesome video. please upload the next tutorial.we are waiting. Thank you
Moonz97 says:

November 16, 2018 at 11:57 pm

Christmas came early. Loving this series!
Amey Naik says:

November 30, 2018 at 11:09 pm

Awesome videos, eagerly waiting for the next one!
刘新新 says:

December 24, 2018 at 2:14 am

What I learned:
1.Replay memory:we store the agents experiences at each time step.Include st,at,rt+1,st+1.We store N steps.(I will get more clear when to see the impliment)
2.Experience replay:gaining experince and sampling from the replay memory.
3.Why:break the correlation between consecutive samples.
4.Get the big picture of how it all combine together.A little confuse . Can't wait to try it in code. I think I may try direct train from the sequnce to see the result.Maybe.
Parth Patel says:

March 31, 2019 at 10:47 pm

I am confused. Shouldn't experience tuple e(t) be define as (s(t), a(t), r(t), s(t+1))? I thought we would be storing reward corresponding to current state-action pair and not the next one.
pepe6666 says:

May 11, 2019 at 2:37 am

what i reeeeallly dont get is how we can say 'this is the reward you will get from doing action A'. ive been following the series and i just dont get how you can move along on the frozen ice, get 0 reward for that action, but somehow learn stuff? i mean if you are at the second to last action, sure, i get it. you can win 100 reward. but isnt that because the agent knows that if he moves say Left, which state comes next?

for breakout, do we some how program the thing to understand what the next state is likely to be when it preses Left or Right? its like i get all the follow-up material but i just cannot get past this starting roadblock in my understanding. ive been following the blog & everything and i dont get it.
pepe6666 says:

May 11, 2019 at 2:41 am

5:02 watch the ball float through bricks. someone's got a bug in their breakout 😛
Harry Stuart says:

May 21, 2019 at 7:16 pm

If an experience tuple does not contain a "q-value", and random samples are taken from the replay memory, is exploration vs exploitation really necessary? Can't we just explore randomly?
sarvagya Gupta says:

September 18, 2019 at 10:36 pm

I have a question. Why do we take random samples from the experience memory to train the neural network? What I read somewhere else is that we NEED the sequence because we want the network to learn what sequence of actions will fail and what sequence of action will succeed. If we break the sequence, the network won't be able to learn that.

Can you help me with this?

Comments are closed.