## Positional encoding

From: Attention is all you need this reprint with comments

where pos is the position and i is the dimension. That is, each dimension of the positional encoding corresponds to a sinusoid. The wavelengths form a geometric progression from $2\pi$ to $10000 \cdot 2\pi$. We chose this function because we hypothesized it would allow the model to easily learn to attend by relative positions, since for any fixed offset $k$, $PE_{pos+k}$ can be represented as a linear function of $PE_{pos}$. We chose the sinusoidal version because it may allow the model to extrapolate to sequence lengths longer than the ones encountered during training.

Jupyter notebook to illustrate it

### What?

Maybe it's just me but I don't understand how sin/cos functions can save positional relations. I can understand convolutional operations how they transform data based on their positions.

Why $sin$ for even indices and $cos$ for odd ones?