One of many key elements of transformers are positional embeddings. It’s possible you’ll ask: why? As a result of the self-attention mechanism in transformers is permutation-invariant; meaning it computes the quantity of `consideration` every token within the enter receives from different tokens within the sequence, nevertheless it doesn’t take the order of the tokens into consideration. In truth, consideration mechanism treats the sequence as a bag of tokens. Because of this, we have to have one other element referred to as positional embedding which accounts for the order of tokens and it influences token embeddings. However what are the various kinds of positional embeddings and the way are they carried out?
On this publish, we check out three main forms of positional embeddings and dive deep into their implementation.
Right here is the desk of content material for this publish:
1. Context and Background
2. Absolute Positional Embedding
- 2.1 Discovered Method
- 2.2 Fastened Method (Sinusoidal)
- 2.3 Code Instance: RoBERTa Implementation