Lucidate
Fine-tuning GPT-3.
In this video we demonstrate how to use APIs and audio to create prompts and completions that can be used to fine-tune Transformers such as GPT-3. We show how to use a News API with Python to extract news articles and create a dataset for training and fine-tuning models.
We explain what prompt and completion datasets are, and the role they play in natural language processing in fine-tuning transformers. We then discuss using video and audio streams to build data pipelines for fine-tuning and transcribing speech to text using OpenAI’s Whisper.
With over 500 hours of YouTube videos being uploaded every minute video is a rich source of detailed information with which to train AI.
The video first includes step-by-step demonstrations of using the News API to create a prompt and completion dataset and then shows how to build pipelines transcribing speech to text using Whisper.
Foundation models like GPT-3 can be ‘Fine-Tuned’. This Fine_tuning allows them to learn the vocabulary and semantics of particular disciplines. If you want your AI solution to be familiar with options pricing algorithms, then you can show it lots of options pricing papers. If you’d like your AI to be able to converse and write about monetary policy then you can expose it to policy papers, books on economic theory, central bank meeting minutes and news conferences.
You train transformer AI models in a particular way. You don’t simply feed in a file containing the text of the book or article that you are using to train the AI. You break the text up into a series of “Prompts” and “Completions”. Prompts are the beginning of a passage of text and completions are what follow a prompt.
In this post we leverage the breadth and depth of content on YouTube. We propose a strategy to use selected videos to train our AI language models. These videos are chosen because they are rich in expertise in an area we want to train our language model in. The example used in this video is to train our AI in monetary policy by studying content generated by the Federal Reserve. We devise a strategy to use speech to text transcription of YouTube videos. We use OpenAI’s ‘Whisper’ to perform this transcription. We then break this text up into ‘prompts’ and completions to train our model.
Following this approach we can easily train our NLP models to have expertise in any field we choose.
#NLP #Python #FineTuning #AI #MachineLearning #NewsAPI #Whisper #Lucidate #chatgpt #gpt3 #gpt-3 #openai #whisper #openai-whisper
Links:
Attention Mechanism in Transformers: https://youtu.be/sznZ78HquPc Transformer playlist: https://youtube.com/playlist?list=PLaJCKi8Nk1hwaMUYxJMiM3jTB2o58A6WY
Federal reserve Description: https://youtu.be/lJTmCv2kKKs
FOMC Feb 2023 Press Conference: https://www.youtube.com/watch?v=3Iv4aCN0OOo&t=2s
Python code for Transcriber class: https://github.com/mrspiggot/prompts_and_completions/blob/master/transcriber.py
=========================================================================
Link to introductory series on Neural networks:
Lucidate website: https://www.lucidate.co.uk/blog/categ…
YouTube: https://www.youtube.com/playlist?list…
Link to intro video on ‘Backpropagation’:
Lucidate website: https://www.lucidate.co.uk/post/intro…
YouTube: https://youtu.be/8UZgTNxuKzY
‘Attention is all you need’ paper – https://arxiv.org/pdf/1706.03762.pdf
=========================================================================
Transformers are a type of artificial intelligence (AI) used for natural language processing (NLP) tasks, such as translation and summarisation. They were introduced in 2017 by Google researchers, who sought to address the limitations of recurrent neural networks (RNNs), which had traditionally been used for NLP tasks. RNNs had difficulty parallelizing, and tended to suffer from the vanishing/exploding gradient problem, making it difficult to train them with long input sequences.
Transformers address these limitations by using self-attention, a mechanism which allows the model to selectively choose which parts of the input to pay attention to. This makes the model much easier to parallelize and eliminates the vanishing/exploding gradient problem.
Self-attention works by weighting the importance of different parts of the input, allowing the AI to focus on the most relevant information and better handle input sequences of varying lengths. This is accomplished through three matrices: Query (Q), Key (K) and Value (V). The Query matrix can be interpreted as the word for which attention is being calculated, while the Key matrix can be interpreted as the word to which attention is paid. The eigenvalues and eigenvectors of these matrices tend to be similar, and the product of these two matrices gives the attention score.
=========================================================================
#ai #deeplearning #chatgpt #gpt3 #neuralnetworks #attention #attentionisallyouneed
This technique looks great however, i dont understand the use case 100%. Eventually, what can we do with a gpt3 model which was fine tuned with 500 hours of monetary content? If you fine tune a model then you are then using it on the basis of one of openai's model such as davinci, ada, etc and you are not using the fine tune model on the basis of chatGPT. In other words, you are adding more up to date content and adding new vocabulary but once trained, the model can no longer work as a chat agent similar to chatGPT. Could you provide insight of what you can do with the model you fine tuned in your video so we can understand the use case? Thank you!
Hi Richard, your videos are amazing. I can't wait to connect with you and tell you about my small pet project that I have recently started. I have a full-time job so I usually work on it on weekends and holidays.
😊
awesome video, thank you so much
awesome video,i wanna know if i donnot provide a prompt,just give many completions,how it works
Wouldn’t a vector database of the data for semantic search with Ada to search for context for additional info to put in the prompt be cheaper for a researcher for example?
I would guess you could input data into the system way faster and you have way more control over the formatting of the data you want answers on.
Im still very confused about all of this so take what I say with a teaspoon of salt please😂
I used the requests library to query the coinstats api to get real time crypto prices and then dump that into a conversation history that can be refed into a prompt to give historical context.
Very good video, despite annoying audience music 🙂 My question is – you are just randomly splitting the text into prompt and completion. Is that logical? Should we not take care about it? Completion should be the reaction to the prompt – if we just randomly split it between prompt and completion, it might miss the point there?
Amazing video, thanks a lot!
I cannot emphasize enough how much your video has helped me, both in my studies and my extracurricular stuff, thank you!
can you share your presentation ?
Great video, thanks !
This is a truly remarkable piece of content, filled with insights that rival those of top-tier media outlets like National Geographic and Discovery Channel.