@fullofredgoo

fullofredgoo@lemmy.world · 8 months ago

Found the other paper I was thinking of: ‘Discovering the Hidden Vocabulary of DALLE-2’ https://arxiv.org/abs/2206.00169

fullofredgoo@lemmy.world · 8 months ago

This sounds like it might be something similar to a ‘noken’, or maybe just a regular token which represents a word fragment, a concept I picked up from this article. https://www.lesswrong.com/posts/c6uTNm5erRrmyJvvD/mapping-the-semantic-void-strange-goings-on-in-gpt-embedding

“TL;DR: GPT-J token embeddings inhabit a zone in their 4096-dimensional embedding space formed by the intersection of two hyperspherical shells. This is described, and then the remaining expanse of the embedding space is explored by using simple prompts to elicit definitions for non-token custom embedding vectors (so-called “nokens”). The embedding space is found to naturally stratify into hyperspherical shells around the mean token embedding (centroid), with noken definitions depending on distance-from-centroid and at various distance ranges involving a relatively small number of seemingly arbitrary topics (holes, small flat yellowish-white things, people who aren’t Jews or members of the British royal family, …) in a way which suggests a crude, and rather bizarre, ontology. Evidence that this phenomenon extends to GPT-3 embedding space is presented. No explanation for it is provided, instead suggestions are invited.”

In particular I was reminded of the list of tokens near the beginning of the article, and how it contains not just words, but also fragments of words, prefixes, and things like that. I’m also reminded of another article (which I can’t find right now) about people finding ways to bypass word filters by utilizing nonsense words that the LLM has mistakenly associate with some meaning. From what others have said in this thread, ‘araffe’ sounds like it might be something like that