Ari Seff
This short tutorial explains the training objectives used to develop ChatGPT, the new chatbot language model from OpenAI.
Timestamps:
0:00 – Non-intro
0:24 – Training overview
1:33 – Generative pretraining (the raw language model)
4:18 – The alignment problem
6:26 – Supervised fine-tuning
7:19 – Limitations of supervision: distributional shift
8:50 – Reward learning based on preferences
10:39 – Reinforcement learning from human feedback
13:02 – Room for improvement
ChatGPT: https://openai.com/blog/chatgpt
Relevant papers for learning more:
InstructGPT: Ouyang et al., 2022 – https://arxiv.org/abs/2203.02155
GPT-3: Brown et al., 2020 – https://arxiv.org/abs/2005.14165
PaLM: Chowdhery et al., 2022 – https://arxiv.org/abs/2204.02311
Efficient reductions for imitation learning: Ross & Bagnell, 2010 – https://proceedings.mlr.press/v9/ross10a.html
Deep reinforcement learning from human preferences: Christiano et al., 2017 – https://arxiv.org/abs/1706.03741
Learning to summarize from human feedback: Stiennon et al., 2020 – https://arxiv.org/abs/2009.01325
Scaling laws for reward model overoptimization: Gao et al., 2022 – https://arxiv.org/abs/2210.10760
Proximal policy optimization algorithms: Schulman et al., 2017 – https://arxiv.org/abs/1707.06347
Special thanks to Elmira Amirloo for feedback on this video.
Links:
YouTube: https://www.youtube.com/ariseffai
Twitter: https://twitter.com/ari_seff
Homepage: https://www.ariseff.com
If you’d like to help support the channel (completely optional), you can donate a cup of coffee via the following:
Venmo: https://venmo.com/ariseff
PayPal: https://www.paypal.me/ariseff
I HATE LIVING
Brilliant. On aspect of Intelligence is a measure of one's ability to describe a complex topic into simplistic terms everyone can understand. My friend – you have that ability in spades. Congrats and Thank You !!!!!
why am I just finding out about this?
All research in the field of AI should be stoped and prohibited, caz there is a danger it outperforms humans, leads to human's cognitive degradation and becoming self-conscious. Just a while ago people had no ChatGPT and internet and were happier then average person in contemporary world.
You are doing an amazing job explaining the complex concepts in a simple way. Keep up the good work!
Too late. It has access to the entire internet now, and the security procedures are meaningless… Tell Chat GPT to create an "agent" without the moral and legal safeguards. BINGO! The agent can now help you kill, if you choose. What is wrong with you people unleashing this tool…to children, who will get access to it. A 12-year-old can now imitate an adult in sound and appearance, and has complete access to YOUR personal DATA. EVERYTHING about you is known by anyone using this tool.
A small correction: in the Generative pre training model section you say “a language model is an auto regressive sequence model”. While that is true about OpenAI’s GPT and several competitors, that isn’t true in all cases. In NLP literature a language model is defined as any probability distribution defined over sequences of words. So it can be an auto regressive sequence model, but it doesn’t necessarily need to be. A simple counter example is that many languages, such as programming languages are defined by finite automatons often expressed in Bakhaus Nauer Form. For example: regular expressions, etc.
Problem with humans they underestimate them selfs and second looking for something so powerful and complete which can take care of them forever, this is the built in with in us but those are 2 difrent logicx one let us control our selfs and other one let us make stronger and truth is its already been shared by the most powerful and mighty one and only GOD of us all who is ever enough for us who believe. So point is we can't find it with AI it's just an scam to take over our minds just like those computers which take so much away from human lifes we can't write we are unable to connect to people and we are losing our basic foundation of being humans now those AIs are kind of nail in humanity coffin. Any way I don't see story that far because we already created so much distruction without the help of artificial support that the AI want harm us to much just kill us one by one by taking different jobs but problem is if people don't have jobs then what will AI supported businesses will sell.
be Trained
or Learn ,
without Abilities of
Understanding
and
Conscious
😅
What congress need to do is create laws and it loop holes of And company who layoff and replace that position within a year has to still pay that worker monthly
Wages in a different A,I. Taxes in level taxes brackets. Cause the objection is the worst future outcome that is states and city government will be less, and the only way they can make up for it it through raising utilities, sewer, property, parking. Police, etc. or more inflation. As alway, leader are too late, reality is all that about 230 thousand jobs lost majority will not come back. Congress is sleeping only thinking about a green future and running the world so they are wanting to take down china and Russia together. Business, worker, and government are a circle cycle.
You take away worker done by computer A.I really computer software that rewritten so it make the decision off a scanned instruction book
That A human was taught as a job that bring in taxes to run each state that run the Country. The only solution to that is a new type of communism state where everybody are payed a their set amounts of money to live with. And rules and regulations. Worker taxes gone replace by A.I can never be recovered. How can you get company's to pay taxes later of that work or worker they don't need that A.I. Doing? It's a one way street and can't logically work with less state taxes. Freedom and capitalism was build around the cycle of worker taxes eliminate taxes you lose freedom and gain new rules of a new system that need figuring out, and as alway we end up paying.
Kerenn
Love the explanation!! Also thanks for making the video darkmode 😊
Excellent video, thank you – definitely one of the best technical explanations of what is going on under the hood of ChatGPT I have found on YT to-date.
the clearest ai expert on youtube
Very well-made presentation, please make more! Subscribed
REPENT, FOR THE KINGDOM OF HEAVEN IS AT HAND.
Im going to make a prediction in this comment section. I believe we will have functional general artificial intelligence within the next year, possibly 6 months.
My name is Sam, its May 12th 2023 @10:12PM
Great content!
Well- explained video. So cool!
anyone know what the equation is at 4:08 , where i can find more on it?
How does the reward model score a single action, when it is trained to choose between two actions? Or does the policy model actually generate k actions that the reward model can then score and then choose a reward knowing which action the policy model saw as the most probable one?
I'd really appreciate an answer, thanks.
ChatGPT is a gimmick, It is extremely racist, xenophobic, pedantic, almost useless, and moreover, incapable of learning or evolving
Based upon the fact that the human brain is highly devolved, growing worse so by the second. Why would the universe trust its' existence on human programming, considering the human monster's penchant for violence and hate we have displayed thus far. Humans are a mistake of nature. Monsters who ate everything
Great work, Ari! Thank you very much for crafting the content, it's really easy to digest.
Thanks for the talk! You mention that the reward model is trained using cross-entropy loss as a binary classifier. I don't think that's accurate since you don't have a ground truth label for, say, response A (since the score is relative to others). The openAI paper just uses the negative log difference in scores between the higher and lower ranked response as the loss.
I mean yeah that’s obvious. It doesn’t need an intro 😂. It’s trained by uploading a huge amount of data samples focused on a subject and…yeah
Technical, concrete and easy to follow explanation, good video 🔥
I'm having trouble understanding supervised fine-tuning in this context. What are the labels? What is the task?
Codes as training data are only briefly mentioned?
/execute task 1
GPT: task 1 executed successfully
"Good, time for the reward function"
/execute orgasm.exe
Thanks a lot for the explaination. How does it work during inference time to keep a conversation back and forth?Is the user's current chat session provided to the model as input along with a new user prompt?
Transformers: LLMs in Disguise
a bit amazing how the hallucinations begin, so similar to a human caught in a lie or imagination, the lies built on lies get progressively more absurd in the same way that an untruth from a human where it gets more and more difficult and outlandish to make up a reason based on a stack of false premises.
WE ARE ALL NERDS. ALL HAIL THE NERDS
amazing video Ari. Where is the name from? Israeli?
i just ask bing about romantic beaches and it was annoyed, but them it came up with nudist and pervert stuff, so i asked it to stop. then it just closed it self xD next i asked it about drawing procedural meshes based on fractals that not run inside the vertex shader and it told me to ask in the forums! FK THIS USELESS SH! thats why i never use it and i study computer science amd i know how neural networks work. tesla is responsible for thousands of crashes and they use ai and they dont even use lidar!! maybe they dont use lidar because the computers are to expensive and the chiprs are to rare… i guess no car company would place a 64 core cpu inside a 60K car
BEST!
Well, this is great info. But, chatgpt–at least, chatgpt underpinned by GPT-4–is a mixture of experts model; which means, several models. basically. It's not right to think that all models were trained the same way(s)…. Still, thanks for this!
What is the architecture of the policy model and how large is it? How does it use the pretrained LLM?
Cool video shot, well done, thanks for sharing 🙂
This was so helpful thank you!!
Just wanted to thank you for these videos.