freeCodeCamp.org
In this intermediate deep learning tutorial, you will learn how to go from reading a paper on deep deterministic policy gradients to implementing the concepts in Tensorflow. This process can be applied to any deep learning paper, not just deep reinforcement learning.
In the second part, you will learn how to code a deep deterministic policy gradient (DDPG) agent using Python and PyTorch, to beat the continuous lunar lander environment (a classic machine learning problem).
DDPG combines the best of Deep Q Learning and Actor Critic Methods into an algorithm that can solve environments with continuous action spaces. We will have an actor network that learns the (deterministic) policy, coupled with a critic network to learn the action-value functions. We will make use of a replay buffer to maximize sample efficiency, as well as target networks to assist in algorithm convergence and stability.
🎥 Course created by Phil Tabor. Check out his YouTube channel: https://www.youtube.com/channel/UC58v9cLitc8VaCjrcKyAbrw
⭐️ Course Contents ⭐️
⌨️ (0:00:00) Introduction
⌨️ (0:04:58) How to Implement Deep Learning Papers
⌨️ (1:59:00) Deep Deterministic Policy Gradients are Easy in Pytorch
—
Learn to code for free and get a developer job: https://www.freecodecamp.org
Read hundreds of articles on programming: https://www.freecodecamp.org/news
Source
Bunch of thanks 🙏
Thank u for this video 😀
Are u a brother of Bucky Roberts (thenewboston) ?
Reminded of Sherlock's assistant.
Thank you! This is an incredible Reinforcement Learning tutorial!
Its very advanced for me i guess (still watched watched for 20mins) … Hope to get some advice from phil for beginners… To really reach to a level of implementing papers….any advice on learning road path would be helpful. Have subscribed to your channel also.Thanks Phil. 🙂
Awesome! Looking to learn more and post on my channel.
Phil youre a fucking legend
thanks please more like those read scientific papers .
I really like to learn python, and I have a question what is this video about? cause I didn't get anything
Please make more videos on implementing research papers on your channel😃
Hi, i have run the code, but it did not converge at all. So I waana to know your hyperparameter's setting. Thanks a lot =-=
2:45:50 Michael Jackson still alive guys
This is actually a video tutorial with so much academic quality.
I am really amazed by this video and ability to implement a paper in this pace.
Would plz keep up your good job?
Thanks bro.
Where is the code of this video?
At 43:01, you say: " i is each element of that minibatch transitions" which is wrong. i is just the index of the reply memory, i.e. state i+1 follows after state i.
And thanks for your great explanation. helped me a lot.
Where did you get your brain?
Why not use an IDE to see typos before running it?
Hello, I need help with a paper " A new Deep-Q-Learning-Based Transmission Scheduling Mechanism for the Cognitive Internet of things". How can I contact you, inbox?
41:00 I dont think you need to wait a million steps. If you minibatch size is 64 then you just need to wait 64 steps. Right ?
This dude sucks
Perfect
where can I find this paper nd what is it called?
Thank you so much.
I have one question, implementing D3QN in dynamic environment, where obstacles are continuously moving, how one can implement it on hardware. And which one is better DDPG or D3QN in the scenario started above.
Thank you this is an amazing tutorial , but i want to ask you about the traveling thief problem , and about that environment if i want to sovling by deep reinforcment learning….. can you give me some advice about this approach???
thanks
Hello, I'm trying to decompose all the problem and I have a question when you use OUActionNoise based on Ornstein-Uhlenbeck process
x = (
self.x_prev
+ self.theta * (self.mu – self.x_prev) * self.dt
+ self.sigma * np.sqrt(self.dt) * np.random.normal(size=self.mu.shape)
)
I check the equations of OU process but I dont know how this "np.sqrt(self.dt)" is a valid implementation of a differential.
woke up to this lol
Priceless stuff
Amazing!
You don't spy on other human beings or their broadcasted thoughts. Especially a human that created the universe that you reside in.
why did i wake up to this
Thank you for tNice tutorials video. I just downloaded soft soft and I was so, so lost. I couldn't even figure out how to make a soft. Your video
yea man…im in some super deep rabbit hole of programming videos while eating candy. Idk how I got here.
using the code piece state = state[np.newaxis,:] is giving an error ValueError: Cannot feed value of shape (3,) for Tensor TargetActor_9/inputs:0, which has shape (None, 3)
can anyone help me with this ?
Hi, I don't know if you the right person, and I'm an ignorant in machine learning. This being said I would like to know just a simple question to justify if i jump in the world of ML. The problem is the development of question answer system. Think in a project with many disciplines with 200-300 people where the information is dynamic spread in many documents and wiki's and the data change along the time. Is it possible to have a question and answer system with natural language that can understand the progression of time. 2 pieces of information had a relationship in the past but now they are not related and the system refrain on the actual question to mix the past information with the new one. The system can show how the question changed along time but not infer relationships of past events with actual ones.
😀😀
Hello, thank you for your tutrial. I have only one issue. I tried to replicate your code, but I have an error "cannot import name 'plotLearning' from 'utils'". Do you have any idea how can I fix that?
This guy is a master! Do check its own Youtube channel. Thanks to him my undergraduate final project went so well.
Nv
Why is it in my recommendation after 3 years. I wish this was recommended earlier.