TensorFlow and deep reinforcement learning, without a PhD (Google I/O '18)

May 10, 2018Artis Modus

TensorFlow

On the forefront of deep learning research is a technique called reinforcement learning, which bridges the gap between academic deep learning problems and ways in which learning occurs in nature in weakly supervised environments. This technique is heavily used when researching areas like learning how to walk, chase prey, navigate complex environments, and even play Go. This session will teach a neural network to play the video game Pong from just the pixels on the screen. No rules, no strategy coaching, and no PhD required.

Rate this session by signing-in on the I/O website here → https://goo.gl/mh5Wi8

Watch more TensorFlow sessions from I/O ’18 here → https://goo.gl/GaAnBR
See all the sessions from Google I/O ’18 here → https://goo.gl/q1Tr8x

Subscribe to the TensorFlow channel → https://goo.gl/ht3WGe

#io18

Source

Similar Posts

34 thoughts on “TensorFlow and deep reinforcement learning, without a PhD (Google I/O '18)”

Joppe Geluykens says:

May 11, 2018 at 7:43 am

Is the example code published?
Martin Görner says:

May 11, 2018 at 11:44 am

You can find the slides and the code for this talk, as well as all the other talks in the "Tensorflow without a PhD" series at this URL: https://github.com/GoogleCloudPlatform/tensorflow-without-a-phd
Teh LaughingMan says:

May 11, 2018 at 11:14 pm

HIS ARMS WERE WARM, BUT HIS NECK WAS COLD.
Akshay Aradhya says:

May 12, 2018 at 2:54 pm

Amazing talk. Just what I was looking for.
Mayank Prakash Jain says:

May 14, 2018 at 9:47 pm

Here we go……. "without a Phd" I love your sessions!!
Thank you for doing this ….
Ashish Agarwal says:

May 15, 2018 at 9:51 am

How do we decide between SOFTMAX and SIGMOID as a function for the final layer?
Cyril Furtado says:

May 15, 2018 at 2:36 pm

Nice will check the code
Oleksandr Fialko says:

May 16, 2018 at 1:22 am

of course you do not need PhD to understand simple linear algebra.
Constantine N. Mbufung says:

May 16, 2018 at 2:20 am

great talk.
Ajay Kumar Bharaj says:

May 16, 2018 at 7:56 pm

great explanation by martin and Han . Great work
Green Eggs and Ham says:

May 21, 2018 at 9:38 am

I like this double-act thing they have going on
himansu odedra says:

May 23, 2018 at 1:38 pm

where is the code for this? where is the game environment anyone know where i can find it ? thank you
Cze Houl Yee says:

May 28, 2018 at 1:02 am

Form the code, refer to the following line
loss = tf.reduce_sum(processed_rewards * cross_entropies + move_cost)

Could I know the reason processed_reward is passed in as it is instead of negating it? Cause to my understanding, even it is normalised, negative or small reward indicate losing point or result of bad action and it should be discouraged. And from the code it minimize loss in optimization function, so it seems to encourage bad action?
Ruben Chevez says:

June 13, 2018 at 8:46 pm

Where is the code for this session?
Tikke says:

June 28, 2018 at 11:33 am

Can anybody explain why there is no dtype e.g. in
observations = tf.placeholder(shape=[None, 80×80])
Shishir Narayan says:

July 17, 2018 at 2:08 pm

This is adapted from Karpathy's Blog. The original post with the code and everything is here : http://karpathy.github.io/2016/05/31/rl/
Melhor Comentário says:

July 21, 2018 at 8:36 am

BRASILLL
Thomas Bingel says:

July 21, 2018 at 10:41 pm

Very recommendable! I liked the robot arm that learned how to flip pancakes
Chethan Kumar says:

July 31, 2018 at 6:12 am

Inspiring!!
Tikke says:

August 14, 2018 at 6:09 am

Does somebody know, is this actually the "REINFORE" algorithm? (Williams 1992)
CRYKrafter [ArtC] | Motion Design says:

August 20, 2018 at 7:02 am

Great explanation. Finally understood all the math behind drl
Kofi A says:

September 8, 2018 at 3:30 pm

Still need a PhD
Carl Aiau says:

September 25, 2018 at 2:26 pm

Great stuff!

Typo:

tf.losses.softmax_cross_entropy(one_hot_labels,
should be:
tf.losses.softmax_cross_entropy(onehot_labels,
Youtube Watching says:

October 13, 2018 at 6:11 am

How to choose the number of neurons of 200 or 20
EndAtDay says:

October 24, 2018 at 1:51 am

The code is here.
https://github.com/GoogleCloudPlatform/tensorflow-without-a-phd/tree/master/tensorflow-rl-pong
Jahn Thomas Fidje says:

December 4, 2018 at 2:17 am

But when they take the reward and multiply by the cross_entropy, won't a negative reward ( loss ) turn the cross entropy negative? And by minimizing this, they actually encourage the algorithm to lose? I notice in the slides that they do: loss = – R( … ), but I can't see this reflected in the code?
Sevi So says:

December 5, 2018 at 10:22 am

One day to converge? Thats a lot! Imagine a more complex problem
Ronny Polle says:

December 14, 2018 at 6:42 am

This is incredible!! Thank you for the great talk!!
Ezio Alditore says:

December 23, 2018 at 1:41 am

Pls introduce tensor flow for 32 bit pc
MC says:

February 6, 2019 at 9:52 pm

Thank you for the insight. I was able to successfully apply your approach to problems in OpenAI gym
Hongtao Zhang says:

August 10, 2019 at 3:15 pm

Policy Gradient
Axel Solhall says:

September 9, 2019 at 11:35 am

I can't quite get the loss function to work with TF2.0,

loss = tf.reduce_sum(R * cross_entropies)
model.compile(optimizer="Adam", loss=loss, metrics=['accuracy'])

TypeError: Using a `tf.Tensor` as a Python `bool` is not allowed. Use `if t is not None:` instead of `if t:` to test if a tensor is defined, and use TensorFlow ops such as tf.cond to execute subgraphs conditioned on the value of a tensor.

Anyone got some advice? Thanks!
מאור יצקן says:

September 24, 2019 at 10:44 pm

I must say something about the math, There are two ways teaching ML:
1. Pre-requiring from your audience the underlying math needed.
2. Explain the needed math during the lecture.

Ignoring the math or showing it without explaining it carefully in a detailed fashion, is not helpful.
I'm 3rd year college student and this is not a clear lecture for me.
All I got is that Tensorflow can do Reinforcement with NN, that we use softmax in the last layer.

What I'm missing is the full understanding of the pipeline/graph and the derivation part.
Rodrigo Silveira says:

September 26, 2019 at 7:14 am

My parents' scarf store in San Francisco was just about to go bankrupt because nobody buys scarfs in SF. Then Martin Gorner moved there, and the business is thriving again!

Comments are closed.