Jay Alammar
AI/ML has been witnessing a rapid acceleration in model improvement in the last few years. The majority of the state-of-the-art models in the field are based on the Transformer architecture. Examples include models like BERT (which when applied to Google Search, resulted in what Google calls “one of the biggest leaps forward in the history of Search”) and OpenAI’s GPT2 and GPT3 (which are able to generate coherent text and essays).
This video by the author of the popular “Illustrated Transformer” guide will introduce the Transformer architecture and its various applications. This is a visual presentation accessible to people with various levels of ML experience.
Intro (0:00)
The Architecture of the Transformer (4:18)
Model Training (7:11)
Transformer LM Component 1: FFNN (10:01)
Transformer LM Component 2: Self-Attention(12:27)
Tokenization: Words to Token Ids (14:59)
Embedding: Breathe meaning into tokens (19:42)
Projecting the Output: Turning Computation into Language (24:11)
Final Note: Visualizing Probabilities (25:51)
The Illustrated Transformer:
https://jalammar.github.io/illustrated-transformer/
Simple transformer language model notebook:
https://github.com/jalammar/jalammar.github.io/blob/master/notebooks/Simple_Transformer_Language_Model.ipynb
Philosophers On GPT-3 (updated with replies by GPT-3):
https://dailynous.com/2020/07/30/philosophers-gpt-3/
—–
Twitter: https://twitter.com/JayAlammar
Blog: https://jalammar.github.io/
Mailing List: http://eepurl.com/gl0BHL
More videos by Jay:
Jay’s Visual Intro to AI
https://www.youtube.com/watch?v=mSTCzNgDJy4
How GPT-3 Works – Easily Explained with Animations
https://www.youtube.com/watch?v=MQnJZuBGmSQ
Making Money from AI by Predicting Sales – Jay’s Intro to AI Part 2
https://www.youtube.com/watch?v=V4-lX…
🔥❤️
❤️ That library!!!!
Finally you come
Waiting for the illustrated transformers to be updated with the new lovely visualizations.
This is a noob question, I was just curious when I was watching the video. How is it Unsupervised pre-training when you are actually providing the correct output (label) at the end?
Amazing video. Have to admit that every time I heard the wrong pronunciation of "Shawshank" it did feel a bit like nails on a blackboard but easily forgivable. Jay, your resources and videos are phenomenal 🙂 Thank you for putting in the work to help us all out.
Wow, that's really a good video on the transformers, how did you get that cool output display in jupyter notebook.
Nice collection of albuns man! Miles Davis, Radiohead, John Coltrane, very classy! 👏👏👏
Watching it now, thanks so much! It's really helpful to go through these kinds of things with clear examples and explanations.
My only preference would've been to reduce the volume of the background music in the intro. So many podcasts do this and it's an annoying trend!
👏 👏 👏
Hey Jay, first of all, thank you so much for your blog post and the video, those are awesome. I still have a question regarding going from the last step of the transformer to the logits. As a matter of fact, am I am mostly interested in the same step during the BERT pertaining in the case of the masked language model. Concretely, the case you are discussing is clear. You take an embedding of the sentence (however it is produced out of embeddings of the words) feed it into a linear layer that blows it into 50,000 softmax. In the case of the BERT masked language model, you have something like The Robot must MASK hurt the human. You hope that it will predict "not" Is it also interpreted as a sentence (using CLS token embedding or whatever…) before it gets fed into the linear layer? I couldn't find a detailed description of this step anywhere Thanks!
Great video!
Thank you so much for all the tireless work you do for us visual learners out there! I’m looking forward to videos where you get into your excellent visualizations of the underlying matrix operations. Your visual abstractions both at the flow chart level and matrix/vector level have really shaped my mental model for what I think about when I’m engineering models. I’m so grateful and so excited to see what you come out with next (this library you hint at looks wonderful!)
Great video! Best regards from Brazil!
Your blog on Illustrated Transformer was my intro to Deep Learning with NLP. Thanks for the amazing contributions for the community.
Speak with the same volume. Some words we can’t hear. You have great content