The GPT-3 Architecture, on a Napkin

I eat words · 3 years ago

The GPT-3 Architecture, on a Napkin

Behohippy@lemmy.world · 2 years ago

I’ve got a background in deep learning and I still struggle to understand the attention mechanism. I know it’s a key/value store but I’m not sure what it’s doing to the tensor when it passes through different layers.

Varun@mastodon.social · 2 years ago

@behohippy @saint Instead of timestep by timestep sequence modeling the attention allows us to pass sequential model in a parallel NN just like fully connected one, where the positional encoding helps us to know the sequence of each and we can remove the keys having less attention value…

kromem@lemmy.world · 2 years ago

What are you eating which needs that large of a napkin?