• lledrtx@lemmy.world
    link
    fedilink
    arrow-up
    9
    arrow-down
    1
    ·
    5 months ago

    Agree this is better but neither of them actually seem “provable” though?

    • Corbin@programming.dev
      link
      fedilink
      English
      arrow-up
      1
      ·
      5 months ago

      I’ll be informal to boost your intuition. You know how a parser can reject invalid inputs? Parsers can be generated from grammars, so we can think of the grammars themselves as rejecting invalid inputs too. When we use a grammar for generation, every generated output will be a valid input when parsed, because the grammar can’t build any invalid sentences (by definition!)

      For example, suppose we want to generate a JSON object. The grammar for JSON objects starts with an opening curly brace “{”. This means that every parser which accepts JSON objects (and rejects everything else) must start by accepting “{”. So, our generator must start by emitting a “{” as well. Since our language-modeling generators work over probability distributions, this can be accomplished by setting the probability of every token which doesn’t start with “{” to zero.