TensorFlow
On the forefront of deep learning research is a technique called reinforcement learning, which bridges the gap between academic deep learning problems and ways in which learning occurs in nature in weakly supervised environments. This technique is heavily used when researching areas like learning how to walk, chase prey, navigate complex environments, and even play Go. This session will teach a neural network to play the video game Pong from just the pixels on the screen. No rules, no strategy coaching, and no PhD required.
Rate this session by signing-in on the I/O website here → https://goo.gl/mh5Wi8
Watch more TensorFlow sessions from I/O ’18 here → https://goo.gl/GaAnBR
See all the sessions from Google I/O ’18 here → https://goo.gl/q1Tr8x
Subscribe to the TensorFlow channel → https://goo.gl/ht3WGe
#io18
Source
Is the example code published?
You can find the slides and the code for this talk, as well as all the other talks in the "Tensorflow without a PhD" series at this URL: https://github.com/GoogleCloudPlatform/tensorflow-without-a-phd
HIS ARMS WERE WARM, BUT HIS NECK WAS COLD.
Amazing talk. Just what I was looking for.
Here we go……. "without a Phd" I love your sessions!!
Thank you for doing this ….
How do we decide between SOFTMAX and SIGMOID as a function for the final layer?
Nice will check the code
of course you do not need PhD to understand simple linear algebra.
great talk.
great explanation by martin and Han . Great work
I like this double-act thing they have going on
where is the code for this? where is the game environment anyone know where i can find it ? thank you
Form the code, refer to the following line
loss = tf.reduce_sum(processed_rewards * cross_entropies + move_cost)
Could I know the reason processed_reward is passed in as it is instead of negating it? Cause to my understanding, even it is normalised, negative or small reward indicate losing point or result of bad action and it should be discouraged. And from the code it minimize loss in optimization function, so it seems to encourage bad action?
Where is the code for this session?
Can anybody explain why there is no dtype e.g. in
observations = tf.placeholder(shape=[None, 80×80])
This is adapted from Karpathy's Blog. The original post with the code and everything is here : http://karpathy.github.io/2016/05/31/rl/
BRASILLL
Very recommendable! I liked the robot arm that learned how to flip pancakes
Inspiring!!
Does somebody know, is this actually the "REINFORE" algorithm? (Williams 1992)
Great explanation. Finally understood all the math behind drl
Still need a PhD
Great stuff!
Typo:
tf.losses.softmax_cross_entropy(one_hot_labels,
should be:
tf.losses.softmax_cross_entropy(onehot_labels,
How to choose the number of neurons of 200 or 20
The code is here.
https://github.com/GoogleCloudPlatform/tensorflow-without-a-phd/tree/master/tensorflow-rl-pong
But when they take the reward and multiply by the cross_entropy, won't a negative reward ( loss ) turn the cross entropy negative? And by minimizing this, they actually encourage the algorithm to lose? I notice in the slides that they do: loss = – R( … ), but I can't see this reflected in the code?
One day to converge? Thats a lot! Imagine a more complex problem
This is incredible!! Thank you for the great talk!!
Pls introduce tensor flow for 32 bit pc
Thank you for the insight. I was able to successfully apply your approach to problems in OpenAI gym
Policy Gradient
I can't quite get the loss function to work with TF2.0,
loss = tf.reduce_sum(R * cross_entropies)
model.compile(optimizer="Adam", loss=loss, metrics=['accuracy'])
TypeError: Using a `tf.Tensor` as a Python `bool` is not allowed. Use `if t is not None:` instead of `if t:` to test if a tensor is defined, and use TensorFlow ops such as tf.cond to execute subgraphs conditioned on the value of a tensor.
Anyone got some advice? Thanks!
I must say something about the math, There are two ways teaching ML:
1. Pre-requiring from your audience the underlying math needed.
2. Explain the needed math during the lecture.
Ignoring the math or showing it without explaining it carefully in a detailed fashion, is not helpful.
I'm 3rd year college student and this is not a clear lecture for me.
All I got is that Tensorflow can do Reinforcement with NN, that we use softmax in the last layer.
What I'm missing is the full understanding of the pipeline/graph and the derivation part.
My parents' scarf store in San Francisco was just about to go bankrupt because nobody buys scarfs in SF. Then Martin Gorner moved there, and the business is thriving again!