Deep Reinforcement Learning in Python Tutorial – A Course on How to Implement Deep Learning Papers

July 16, 2019Artis Modus

freeCodeCamp.org

In this intermediate deep learning tutorial, you will learn how to go from reading a paper on deep deterministic policy gradients to implementing the concepts in Tensorflow. This process can be applied to any deep learning paper, not just deep reinforcement learning.

In the second part, you will learn how to code a deep deterministic policy gradient (DDPG) agent using Python and PyTorch, to beat the continuous lunar lander environment (a classic machine learning problem).

DDPG combines the best of Deep Q Learning and Actor Critic Methods into an algorithm that can solve environments with continuous action spaces. We will have an actor network that learns the (deterministic) policy, coupled with a critic network to learn the action-value functions. We will make use of a replay buffer to maximize sample efficiency, as well as target networks to assist in algorithm convergence and stability.

🎥 Course created by Phil Tabor. Check out his YouTube channel: https://www.youtube.com/channel/UC58v9cLitc8VaCjrcKyAbrw

⭐️ Course Contents ⭐️
⌨️ (0:00:00) Introduction
⌨️ (0:04:58) How to Implement Deep Learning Papers
⌨️ (1:59:00) Deep Deterministic Policy Gradients are Easy in Pytorch

—

Learn to code for free and get a developer job: https://www.freecodecamp.org

Read hundreds of articles on programming: https://www.freecodecamp.org/news

Source

Similar Posts

40 thoughts on “Deep Reinforcement Learning in Python Tutorial – A Course on How to Implement Deep Learning Papers”

Adamu Fura Suleiman says:

July 16, 2019 at 7:29 am

Bunch of thanks 🙏
Deepak S.M. says:

July 16, 2019 at 7:29 am

Thank u for this video 😀

Are u a brother of Bucky Roberts (thenewboston) ?
No Reason Channel says:

July 16, 2019 at 7:30 am

Reminded of Sherlock's assistant.
Connor Shorten says:

July 16, 2019 at 7:45 am

Thank you! This is an incredible Reinforcement Learning tutorial!
GShan says:

July 16, 2019 at 8:01 am

Its very advanced for me i guess (still watched watched for 20mins) … Hope to get some advice from phil for beginners… To really reach to a level of implementing papers….any advice on learning road path would be helpful. Have subscribed to your channel also.Thanks Phil. 🙂
Dev Isle says:

July 16, 2019 at 9:11 am

Awesome! Looking to learn more and post on my channel.
Boris the Blade says:

July 16, 2019 at 9:21 am

Phil youre a fucking legend
Omar ElKhatib says:

July 16, 2019 at 9:22 am

thanks please more like those read scientific papers .
Hejar Shahabi says:

July 17, 2019 at 12:59 am

I really like to learn python, and I have a question what is this video about? cause I didn't get anything
Vandan Gorade says:

July 17, 2019 at 4:53 am

Please make more videos on implementing research papers on your channel😃
Liu daniel says:

September 4, 2019 at 6:55 am

Hi, i have run the code, but it did not converge at all. So I waana to know your hyperparameter's setting. Thanks a lot =-=
Fktu diablo says:

October 10, 2019 at 1:34 pm

2:45:50 Michael Jackson still alive guys
masoud masoumi moghaddam says:

October 21, 2019 at 9:04 pm

This is actually a video tutorial with so much academic quality.
I am really amazed by this video and ability to implement a paper in this pace.
Would plz keep up your good job?
Thanks bro.
Debayon Dhar Chowdhury says:

November 28, 2019 at 6:48 am

Where is the code of this video?
Hossein Haeri says:

January 22, 2020 at 7:29 pm

At 43:01, you say: " i is each element of that minibatch transitions" which is wrong. i is just the index of the reply memory, i.e. state i+1 follows after state i.
And thanks for your great explanation. helped me a lot.
Meaningful Name says:

March 2, 2020 at 12:00 pm

Where did you get your brain?
Thijs van den Hout says:

April 1, 2020 at 10:30 am

Why not use an IDE to see typos before running it?
Armeline Dembo Mafuta says:

April 29, 2020 at 8:33 am

Hello, I need help with a paper " A new Deep-Q-Learning-Based Transmission Scheduling Mechanism for the Cognitive Internet of things". How can I contact you, inbox?
Akshay Aradhya says:

May 19, 2020 at 6:37 pm

41:00 I dont think you need to wait a million steps. If you minibatch size is 64 then you just need to wait 64 steps. Right ?
yashodeep chikte says:

August 4, 2020 at 5:04 am

This dude sucks
HERA Bidas says:

January 30, 2021 at 3:15 am

Perfect
Jasmin says:

February 3, 2021 at 2:01 pm

where can I find this paper nd what is it called?
Satyendra Shukla says:

April 3, 2021 at 5:58 am

Thank you so much.
I have one question, implementing D3QN in dynamic environment, where obstacles are continuously moving, how one can implement it on hardware. And which one is better DDPG or D3QN in the scenario started above.
ChessRecap says:

July 6, 2021 at 7:49 am

Thank you this is an amazing tutorial , but i want to ask you about the traveling thief problem , and about that environment if i want to sovling by deep reinforcment learning….. can you give me some advice about this approach???
Keith Mason says:

October 18, 2021 at 3:53 am

thanks
Martin Cervantes says:

February 23, 2022 at 5:13 pm

Hello, I'm trying to decompose all the problem and I have a question when you use OUActionNoise based on Ornstein-Uhlenbeck process
x = (

self.x_prev

+ self.theta * (self.mu – self.x_prev) * self.dt

+ self.sigma * np.sqrt(self.dt) * np.random.normal(size=self.mu.shape)

)

I check the equations of OU process but I dont know how this "np.sqrt(self.dt)" is a valid implementation of a differential.
2007 says:

May 22, 2022 at 4:31 pm

woke up to this lol
Ansup Babu says:

June 13, 2022 at 1:47 pm

Priceless stuff
Imran Q says:

August 6, 2022 at 10:05 am

Amazing!
David Hughes says:

August 9, 2022 at 1:19 am

You don't spy on other human beings or their broadcasted thoughts. Especially a human that created the universe that you reside in.
that_endo says:

August 27, 2022 at 1:21 pm

why did i wake up to this
B.S Gaming says:

September 10, 2022 at 5:11 pm

Thank you for tNice tutorials video. I just downloaded soft soft and I was so, so lost. I couldn't even figure out how to make a soft. Your video
Chris OA says:

September 18, 2022 at 11:00 pm

yea man…im in some super deep rabbit hole of programming videos while eating candy. Idk how I got here.
Sai Srinivas Murthy Narayanam says:

September 20, 2022 at 10:20 pm

using the code piece state = state[np.newaxis,:] is giving an error ValueError: Cannot feed value of shape (3,) for Tensor TargetActor_9/inputs:0, which has shape (None, 3)
can anyone help me with this ?
Thiago Coura says:

September 27, 2022 at 12:24 am

Hi, I don't know if you the right person, and I'm an ignorant in machine learning. This being said I would like to know just a simple question to justify if i jump in the world of ML. The problem is the development of question answer system. Think in a project with many disciplines with 200-300 people where the information is dynamic spread in many documents and wiki's and the data change along the time. Is it possible to have a question and answer system with natural language that can understand the progression of time. 2 pieces of information had a relationship in the past but now they are not related and the system refrain on the actual question to mix the past information with the new one. The system can show how the question changed along time but not infer relationships of past events with actual ones.
Yair Phyo says:

October 6, 2022 at 9:11 pm

😀😀
Enroot says:

October 12, 2022 at 5:28 am

Hello, thank you for your tutrial. I have only one issue. I tried to replicate your code, but I have an error "cannot import name 'plotLearning' from 'utils'". Do you have any idea how can I fix that?
Manuel Novella says:

October 15, 2022 at 9:26 am

This guy is a master! Do check its own Youtube channel. Thanks to him my undergraduate final project went so well.
Shadowcross says:

October 19, 2022 at 11:25 am

Nv
Priyanshu says:

October 26, 2022 at 8:11 am

Why is it in my recommendation after 3 years. I wish this was recommended earlier.

Comments are closed.