CrashCourse
Reinforcement learning is particularly useful in situations where we want to train AIs to have certain skills we don’t fully understand ourselves. Unlike some of the techniques we’ve discussed so far, reinforcement learning generally only looks at how an AI performs a task AFTER it has completed it. And when an AI completes that task figuring out when and how to reward an AI, called credit assignment, is one of the hardest parts of reinforcement learning. So today, we’re going to explore these ideas, introduce a ton of new terms like value, policy, agent, environment, actions, and states and we’ll show you how we can use strategies like exploration and exploitation to train John Green Bot to find things more efficiently next time.
Crash Course AI is produced in association with PBS Digital Studios:
https://www.youtube.com/user/pbsdigitalstudios/videos
Crash Course is on Patreon! You can support us directly by signing up at http://www.patreon.com/crashcourse
Thanks to the following patrons for their generous monthly contributions that help keep Crash Course free for everyone forever:
Eric Prestemon, Sam Buck, Mark Brouwer, Indika Siriwardena, Avi Yashchin, Timothy J Kwist, Brian Thomas Gossett, Haixiang N/A Liu, Jonathan Zbikowski, Siobhan Sabino, Zach Van Stanley, Jennifer Killen, Nathan Catchings, Brandon Westmoreland, dorsey, Kenneth F Penttinen, Trevin Beattie, Erika & Alexa Saur, Justin Zingsheim, Jessica Wode, Tom Trval, Jason Saslow, Nathan Taylor, Khaled El Shalakany, SR Foxley, Sam Ferguson, Yasenia Cruz, Eric Koslow, Caleb Weeks, Tim Curwick, David Noe, Shawn Arnold, William McGraw, Andrei Krishkevich, Rachel Bright, Jirat, Ian Dundore
—
Want to find Crash Course elsewhere on the internet?
Facebook – http://www.facebook.com/YouTubeCrashCourse
Twitter – http://www.twitter.com/TheCrashCourse
Tumblr – http://thecrashcourse.tumblr.com
Support Crash Course on Patreon: http://patreon.com/crashcourse
CC Kids: http://www.youtube.com/crashcoursekids
#CrashCourse #ArtificialIntelligence #MachineLearning
Source
Nice ,
Communist Indoctrination!!!!!
I feel there are people in place of power/rich who need to watch this video… >.>
Why would JohnGreenBot in that battery example only go in straight lines? Would it not be better to go in a diagonal path?
It's Jabril!!!
Thanks for your Awsome Course, I got interested in Machine learning and I am planning to study that for my M.A.
This reminds me of Pavlov from my psychology class.
"A trade off between exploration and exploitation" – Thats life
I don't agree with the bagel/donut choice example. Why choose the option of two bagels or donuts vs. the greater risk of more donuts (6) or a guaranteed single donut?
I'm really liking this series of videos
Keep ip the good work 🙂
Its a bit oftopic but can you plz give me a written text of all the crash course US history vedios…If Possible. PLEASE
Are you going to use openai for rl and keras when we come to deep reinforcement learning
When will this playlist be finished.
I WOULD LOVEEEE IT IF CRASH COURSE HAD AN ACCOUNTING COURSE!!❤️️.
Is there a better reason than consolidating the total amount of stored data the reason we only store a single value per square? Why not store 4 values per square so you can store a value per direction you could go from the current spot. That way you could find/exploit the near black hole shortcut that the current algorithm is too scared to find.
Jabril? Jabril? Laughing too much to type. Hey, I'm Jabril. Unreal, these people.
Like if AI beats slavery
Can we get crash course music theory?
yay, this was actually better than most of the explanatory videos i have seen. thanks for providing us always with informative content, crash course. looking forward for more of these videos <3
BTW this channel sucks
Not sure the kitchen metaphor works for me. Why is the bag more likely to contain donuts than the box? It sure looked like the kind of box that donuts come in to me.
Can we get computation theory lesson ,CC ?
Love you love you love you love ❤️
That was a robot playing Don't Wake Daddy.
Markov decision process and Q learning, fcking tedious
This fellow and his donut obsession. I don't know… 😊
now,after watching multiple episodes in a row, I really want donuts 😛 also really enjoying this series 🙂
Agent? like….. Agent Smith???!!!
Who drives a car looking at side ways?
5:11 I'll just take all three items
In the john green bot example, is the objective to find the shortest path or get the most points? What would getting more points even do, I feel like in that case exploration is best so that you can find the shortest path, exploiting only when racing another bot
🏃 Thank You!
Open Ai and Alpha Go
Thank you!
is this related to dijkstra's or a*?