Peltarion
In this video, we give a step-by-step walkthrough of self-attention, the mechanism powering the deep learning model BERT, and other state-of-the-art transformer models for natural language processing (NLP). More on attention and BERT: https://bit.ly/38vpOyW
How to solve a text classification problem with BERT with this tutorial: https://bit.ly/2Ij6tGa
0:00 Introduction of NLP
0:39 Text tokenization
1:07 Text embedding
2:06 Context and attention
2:25 Self-attention mechanism
5:57 Key, Query, and Value projections
7:25 Multi-head attention
8:12 Building a full NLP network
9:00 Example
Find Peltarion here:
Website: https://bit.ly/3k2MCIC
Twitter: https://bit.ly/2RJZpnB
Linkedin: https://bit.ly/2FGWkSS
#peltarion #textsimilarity #nlp.
This is by far the best video I've ever seen! Awesome content, thanks.
Really amazing video. The visualizations are stellar and the examples really ground the whole thing. Thank you!
As with the normalization, I thought we should apply softmax row by row, not column by column.
Really good stuff! Visualisations help a lot
Nice video. Much appreciated.
very good illustration
Does anyone has any implementation : if i give a sentence and code can give attentions of each word.
well-explained, thanks so much!
Brilliant explanation!
This is better than 3blue1brown. I went from basically not knowing what self attention was to understanding how it works in the span of a carride.
Very well structured explanation and great animations, thanks!
Thank You for valuable content
Nice high level explanation aided by thoughtful visualization. Thank you!
Te quiero papu, me salvaste la vida <3
The best explanation I have seen. Much better than how they teach in the top-tier universities.
This was beautifully explained. Thanks 🙂
Subscribed and Notifications are on! Really great video and animations, hats off! I know I am asking for more, but having animations of scalar product happening would be amazing!
very well explained! the visualizations are on point. Thanks!
explained by excellent visuals, thanq very much
Amazing visualization, thank you for your work!
7:00 Something I don't quite get.
It makes sense to dot product vectors from the same embedding to determine their similarity, because the components pair up with one another.
But here we are dot producting vectors from two different projected subspaces. So there's no longer any meaningful pairing up between the components. So I'm not sure what the dot product is doing?
The explanation kind of suggests that it's just the magnitude of the projected vectors that matters. So if the projected vectors are large in the "preposition" projection and large in the "location" projection, we want them to result in a large value. But a more appropriate operation for this would simply be to calculate the (e.g. Euclidean) norm of the two vectors and multiply them together. No?
Great vizualization, thank you for this video!