Illustrated Guide to Transformers Neural Network: A step by step explanation

April 27, 2020Artis Modus

The A.I. Hacker – Michael Phi

Transformers are the rage nowadays, but how do they work? This video demystifies the novel neural network architecture with step by step explanation and illustrations on how transformers work.

CORRECTIONS:
The sine and cosine functions are actually applied to the embedding dimensions and time steps!

⭐ Play and Experiment With the Latest AI Technologies at https://grandline.ai ⭐

Hugging Face Write with Transformers
https://transformer.huggingface.co/

Source

Similar Posts

35 thoughts on “Illustrated Guide to Transformers Neural Network: A step by step explanation”

@darwx says:

October 6, 2023 at 4:51 am

https://www.youtube.com/watch?v=zWNrjZXKOtU
@itsfabiolous says:

November 2, 2023 at 12:25 pm

bro you conciiiiiiiiise! Thank you for this!!!
@scsherm3525 says:

November 12, 2023 at 8:11 am

The Sal Khan of Deep Learning! Thank you
@rw9733 says:

November 15, 2023 at 10:17 am

is there a small mistake in the graphical explaination in 4:45? could you clearify this?

The graph seems to have an inconsistency in the representation of the positional encodings for the different time steps, based on your description. In a correctly implemented positional encoding applied to a transformer model, the values for sine and cosine alternate in successive dimensions of the positional encoding vector, not in successive time steps. This means that each element in the sequence is given the same encoding vector, with the values differing only by the position (pos), not by the points in time (time steps).
@arkadiuszkulpa says:

November 21, 2023 at 9:47 am

at 10:50 you jump into outputs but you don't explain where these come from… the way you explain it it sounds like they just appear out of thin air… so …
@ritpatidar2678 says:

November 24, 2023 at 3:15 am

Better than statquest. He over complicated it.
@rokljhui864 says:

December 1, 2023 at 2:59 pm

This is more of a Description, than an Explanation. Simply describing a diagram and naming the blocks is not necessarily helpful.. Anyone else here completely confused, don't dismay. I'm a software engineer with experience in some other machine learning algorithms, and couldn't make much sense of any of this.
@yuryvlasov8601 says:

December 3, 2023 at 4:58 am

Thanks great explanation.
@user-xz7qq2vm3h says:

December 11, 2023 at 10:40 pm

5:26
@nanadayo709 says:

December 12, 2023 at 4:31 am

omg you save my seminar😇😇😇it is very great explanation!!!!!!
@karigucio says:

December 13, 2023 at 8:29 am

when there's layerNorm( … + …) is it pointwise + or concatenation inside LayerNorm argument?
@TheUmaragu says:

December 22, 2023 at 4:28 am

A complex process- I need to listen to this multiple times to fully understand this.
@pythonismyvice-fo5bz says:

December 24, 2023 at 4:38 am

can you do it again, but next time slower.
@deghta says:

December 24, 2023 at 6:25 pm

What software was used to create the presentation?
@tet1896 says:

December 27, 2023 at 12:26 am

I can't thank you enough for this great video!!
@JulianHarris says:

December 30, 2023 at 11:17 am

Amazing. I still don’t really understand how the Q K and V values are calculated but I learnt a lot more about this seminal paper than others provided — thank you! 🙏
@zsoltfehervari625 says:

January 3, 2024 at 4:32 pm

GPT stands for Generative Pre-training Transformer, not Pre-Training.
@stormyS2011 says:

January 11, 2024 at 5:14 pm

This was helpful! Thank you!
@jaceju283 says:

January 19, 2024 at 7:56 pm

In 7:27, In the right the attention wieghts is a 4*4matrix while value matrix is 3*4, a 4*3 matrix for value will be more appropriate
@dwaynepeter25 says:

February 2, 2024 at 11:31 pm

THE OPTIMUM PRIDE?????!!!!!! OOO EEAHH OOEEAHAH
@dhanooshpooranan1861 says:

February 6, 2024 at 7:47 am

souldnt it be a 3*3 matrix (ie corresponding to the length of the vector )
@user-zw3dn6uj7j says:

February 21, 2024 at 11:11 am

great content
@Waterlmelon says:

February 26, 2024 at 9:17 am

amazing explanation, honestly this is the first time i understand how Transformers work.
@Matieu666 says:

February 29, 2024 at 8:58 am

Best explanation I've seen – thanks !
@channel-ds9wm says:

March 4, 2024 at 11:26 am

Amazing vid
@viswesz says:

March 14, 2024 at 5:55 am

Watching this video after 3 year, i would say this is best till today
@shawnx.1238 says:

March 17, 2024 at 11:53 am

Hi, could you please explain more about the words at 13:07 (about step 6)? You said 'the process matches the encoder input to the decoder input allowing the decoder to decide which encoder input is relevant to put focus on'. So are you meaning that there are not only one but more encoders whose outputs will be parallely inputted to the decoders?
@lizzy8399 says:

March 18, 2024 at 6:55 am

do they detect audio deepfake?
@gudisamahesh says:

March 21, 2024 at 10:27 pm

This seems to be one of the best videos on Transformers
@mohamedboufnichel6187 says:

April 1, 2024 at 8:31 am

great explanation
@FelixFeist says:

April 1, 2024 at 2:59 pm

Need more math to properly understand what's going on
@tanveerulmustafa9232 says:

April 1, 2024 at 10:03 pm

This explanation is INCREDIBLE!!!
@BooleanDisorder says:

April 4, 2024 at 2:24 pm

You have such a sweet and pleasant voice. Thank you, mate, for the good explanation. 😊
@csvegso says:

April 5, 2024 at 12:50 pm

Why does the decoder select the token with the maximum probability instead of randomly selecting a token based on the probability distribution?
@tsunningwah3471 says:

April 12, 2024 at 10:09 am

看進步健康情形不僅是看不見卡巴斯基開心吧就是

Comments are closed.