Reinforcement Learning: Crash Course AI#9

October 11, 2019Artis Modus

CrashCourse

Reinforcement learning is particularly useful in situations where we want to train AIs to have certain skills we don’t fully understand ourselves. Unlike some of the techniques we’ve discussed so far, reinforcement learning generally only looks at how an AI performs a task AFTER it has completed it. And when an AI completes that task figuring out when and how to reward an AI, called credit assignment, is one of the hardest parts of reinforcement learning. So today, we’re going to explore these ideas, introduce a ton of new terms like value, policy, agent, environment, actions, and states and we’ll show you how we can use strategies like exploration and exploitation to train John Green Bot to find things more efficiently next time.

Crash Course AI is produced in association with PBS Digital Studios:
https://www.youtube.com/user/pbsdigitalstudios/videos

Crash Course is on Patreon! You can support us directly by signing up at http://www.patreon.com/crashcourse

Thanks to the following patrons for their generous monthly contributions that help keep Crash Course free for everyone forever:

Eric Prestemon, Sam Buck, Mark Brouwer, Indika Siriwardena, Avi Yashchin, Timothy J Kwist, Brian Thomas Gossett, Haixiang N/A Liu, Jonathan Zbikowski, Siobhan Sabino, Zach Van Stanley, Jennifer Killen, Nathan Catchings, Brandon Westmoreland, dorsey, Kenneth F Penttinen, Trevin Beattie, Erika & Alexa Saur, Justin Zingsheim, Jessica Wode, Tom Trval, Jason Saslow, Nathan Taylor, Khaled El Shalakany, SR Foxley, Sam Ferguson, Yasenia Cruz, Eric Koslow, Caleb Weeks, Tim Curwick, David Noe, Shawn Arnold, William McGraw, Andrei Krishkevich, Rachel Bright, Jirat, Ian Dundore
—

Want to find Crash Course elsewhere on the internet?
Facebook – http://www.facebook.com/YouTubeCrashCourse
Twitter – http://www.twitter.com/TheCrashCourse
Tumblr – http://thecrashcourse.tumblr.com
Support Crash Course on Patreon: http://patreon.com/crashcourse

CC Kids: http://www.youtube.com/crashcoursekids

#CrashCourse #ArtificialIntelligence #MachineLearning

Source

Similar Posts

34 thoughts on “Reinforcement Learning: Crash Course AI#9”

Hassler Castro says:

October 11, 2019 at 6:22 pm

Nice ,
The GenXican says:

October 11, 2019 at 8:12 pm

Communist Indoctrination!!!!!
Alexi Xeno says:

October 11, 2019 at 10:37 pm

I feel there are people in place of power/rich who need to watch this video… >.>
Antti Björklund says:

October 12, 2019 at 12:58 am

Why would JohnGreenBot in that battery example only go in straight lines? Would it not be better to go in a diagonal path?
Sarujan Rupan says:

October 12, 2019 at 1:29 am

It's Jabril!!!
Ab9 Fat Reza says:

October 12, 2019 at 5:41 am

Thanks for your Awsome Course, I got interested in Machine learning and I am planning to study that for my M.A.
gitoshri sen says:

October 12, 2019 at 7:11 am

This reminds me of Pavlov from my psychology class.
Pranjal Jaiswal says:

October 12, 2019 at 7:25 am

"A trade off between exploration and exploitation" – Thats life
Kurt Oehlberg says:

October 12, 2019 at 8:10 am

I don't agree with the bagel/donut choice example. Why choose the option of two bagels or donuts vs. the greater risk of more donuts (6) or a guaranteed single donut?
Pedro Martins says:

October 13, 2019 at 1:50 am

I'm really liking this series of videos
Keep ip the good work 🙂
Saadia Hayat says:

October 13, 2019 at 4:36 am

Its a bit oftopic but can you plz give me a written text of all the crash course US history vedios…If Possible. PLEASE
Raj J. says:

October 13, 2019 at 7:55 am

Are you going to use openai for rl and keras when we come to deep reinforcement learning
When will this playlist be finished.
Jany JJ says:

October 13, 2019 at 11:39 am

I WOULD LOVEEEE IT IF CRASH COURSE HAD AN ACCOUNTING COURSE!!❤️️.
Matt Wyman says:

October 14, 2019 at 9:38 am

Is there a better reason than consolidating the total amount of stored data the reason we only store a single value per square? Why not store 4 values per square so you can store a value per direction you could go from the current spot. That way you could find/exploit the near black hole shortcut that the current algorithm is too scared to find.
Dan of Xymox says:

October 14, 2019 at 7:34 pm

Jabril? Jabril? Laughing too much to type. Hey, I'm Jabril. Unreal, these people.
Reginald Robust says:

October 15, 2019 at 3:31 am

Like if AI beats slavery
Nøah Hale says:

October 15, 2019 at 10:43 pm

Can we get crash course music theory?
Shashank Sam says:

October 16, 2019 at 4:14 am

yay, this was actually better than most of the explanatory videos i have seen. thanks for providing us always with informative content, crash course. looking forward for more of these videos <3
Jessica S says:

October 16, 2019 at 11:41 am

BTW this channel sucks
Ian Buck says:

October 21, 2019 at 2:04 pm

Not sure the kitchen metaphor works for me. Why is the bag more likely to contain donuts than the box? It sure looked like the kind of box that donuts come in to me.
bill Niko says:

October 23, 2019 at 5:32 am

Can we get computation theory lesson ,CC ?
bill Niko says:

October 23, 2019 at 5:35 am

Love you love you love you love ❤️
Jimmy Bangus says:

October 26, 2019 at 3:51 am

That was a robot playing Don't Wake Daddy.
Piu John says:

October 27, 2019 at 3:09 am

Markov decision process and Q learning, fcking tedious
John Opalko says:

November 11, 2019 at 10:49 am

This fellow and his donut obsession. I don't know… 😊
Recoded Zaphod says:

November 16, 2019 at 1:37 am

now,after watching multiple episodes in a row, I really want donuts 😛 also really enjoying this series 🙂
COOPSTOP says:

November 25, 2019 at 10:48 pm

Agent? like….. Agent Smith???!!!
Mr Professor says:

December 12, 2019 at 8:33 am

Who drives a car looking at side ways?
Void Skeleton says:

December 28, 2019 at 4:31 am

5:11 I'll just take all three items
Trent NaPa says:

March 4, 2020 at 2:10 pm

In the john green bot example, is the objective to find the shortest path or get the most points? What would getting more points even do, I feel like in that case exploration is best so that you can find the shortest path, exploiting only when racing another bot
Cornell Waters says:

April 15, 2020 at 5:39 pm

🏃 Thank You!
Argon Air says:

May 9, 2020 at 12:01 am

Open Ai and Alpha Go
chanel de la rosa says:

May 19, 2020 at 8:34 am

Thank you!
Amit Ramnarain says:

June 13, 2020 at 11:58 am

is this related to dijkstra's or a*?

Comments are closed.