Inside the GPT-3

  • Behohippy@lemmy.world
    link
    fedilink
    arrow-up
    4
    ·
    1 year ago

    I’ve got a background in deep learning and I still struggle to understand the attention mechanism. I know it’s a key/value store but I’m not sure what it’s doing to the tensor when it passes through different layers.

    • Varun@mastodon.social
      link
      fedilink
      arrow-up
      1
      ·
      1 year ago

      @behohippy @saint Instead of timestep by timestep sequence modeling the attention allows us to pass sequential model in a parallel NN just like fully connected one, where the positional encoding helps us to know the sequence of each and we can remove the keys having less attention value…