Authors using a new tool to search a list of 183,000 books used to train AI are furious to find their works on the list.

  • El Barto@lemmy.world
    link
    fedilink
    English
    arrow-up
    30
    arrow-down
    10
    ·
    1 year ago

    These are machines, though, not human beings.

    I guess I’d have to be an author to find out how I’d feel about it, to be fair.

      • FaceDeer@kbin.social
        link
        fedilink
        arrow-up
        6
        arrow-down
        2
        ·
        1 year ago

        If an AI “reproduces” a work it was trained on it is a failure of an AI. Why would anyone want to spend millions of dollars and devote oodles of computing power to build something that just does what a simple copy/paste operation can accomplish?

        When an AI spits out something that’s too close to one of the original training set that’s called “overfitting” and it is considered an error to be corrected. Most overfitting that’s been detected has been a result of duplication in the training set - when you hammer an AI image generator in training with thousands of copies of the Mona Lisa it eventually goes “alright, I get it already, when you say ‘Mona Lisa’ you want that exact pattern!” And will try its best to replicate that pattern when you ask it to later. That’s why training sets need to be de-duplicated.

        AIs are meant to produce new things.

    • kromem@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      1
      ·
      1 year ago

      Did you write a comment on Reddit before 2015? If so, your copyrighted content was used without your permission to train today’s LLMs, so you absolutely get to feel one way or another about it.

      The idea that these authors were somehow the backbone of the models when any individual contribution was like spitting in the ocean and model weights would have considered 100 pages of Twilight fan fiction equivalent to 100 pages from Twilight is honestly one of the negative impacts of the extensive coverage these suits are getting.

      Pretty much everyone who has ever written anything indexed online is a tiny part of today’s LLMs.

      • El Barto@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        edit-2
        1 year ago

        Thank you for your reply.

        On a completely separate note, it’s funny to think that there exists Twilight fan fiction when Twilight itself started as fan fiction work.

        Edit: I dun goofed.

        • kromem@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          ·
          1 year ago

          Pretty sure it’s the other way around.

          Fifty Shades of Gray started out as Twilight fanfiction before becoming its own thing.

          AFAIK Twilight was always just its own pulp fiction.

    • sab@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      arrow-down
      2
      ·
      1 year ago

      I don’t think anyone is faulting the machines for this, just the people who instruct the machines to do it.

    • Shurimal@kbin.social
      link
      fedilink
      arrow-up
      15
      arrow-down
      17
      ·
      1 year ago

      These are machines, though, not human beings.

      What’s the difference? On the most fundamental level it’s all the same.

      • brygphilomena@lemmy.world
        link
        fedilink
        English
        arrow-up
        15
        arrow-down
        4
        ·
        1 year ago

        A human, regardless of how many books they read, will have personal experiences that are undeniably unique to themselves. They will interpret the works they read differently from each other based on their worldly experiences. Their writing, no matter how many books they read and get inspired on, will always be influenced by their own personal lives. They can experience love, hate, heartbreak, empathy, sadness, and happiness.

        This is something a LLM does not have, and in my opinion, is a massive distinguishing factor. So on a “fundamental” level, it is not the same. It is no where near the same.

        • lloram239@feddit.de
          link
          fedilink
          English
          arrow-up
          3
          arrow-down
          2
          ·
          1 year ago

          A human, regardless of how many books they read, will have personal experiences that are undeniably unique to themselves.

          So will every AI. ChatGPT will give you different answers than Bard or WizardLM, since they are all trained on different books. And every StableDiffusion model creates different images, different styles, different topics, etc. It’s all in the data they “experienced”.

        • originalucifer@moist.catsweat.com
          link
          fedilink
          arrow-up
          2
          arrow-down
          4
          ·
          1 year ago

          do you really think we are that far off… from giving a foundational memory and motivation layers to these LLMs, that could mimic… or even… generate the generic thoughts youre indicating?

          i dont think so. you seem to imply its impossibility, i expect its inevitability. the human brain will not be a black box forever… it still exists in a world of physics we can emulate, even if rudimentary.

      • AnonStoleMyPants@sopuli.xyz
        link
        fedilink
        English
        arrow-up
        17
        arrow-down
        7
        ·
        1 year ago

        The same thing as with tooooooons of things: scale.

        Nobody cares if one dude steals office supplies at work. Now, if everyone stats doing it, or if the single guy steals everything, then action is taken.

        Nobody cares if a random person draws in the same style and with same characters as you, but if they start to sell them, or god forbid, out-sell you, then there is a problem.

        Nobody cares (except police I guess) if a random driver drives double the speed limit and annoys people living next to the road on the weekends, but when tons of people do it, you get speed bumps.

        Nobody cares if few people pirate movies, but when it gets to mainstream and companies notice that there might be money being lost. Then you get whatever we have now.

        Nobody cares if the mudhill behind your house erodes a bit and you get mud on your shoes. Have a bunch of that erode and you realise the danger…

        You have been fine-tuning your own writing style for a decade and random schmuck starts to write similarly, you probably don’t care. No harm done. Now, get an AI to write 10 000 books in a weekend and someone starts to sell them… well now you have a completely different problem.

        On a fundamental level the exact same thing is happening, yet action is only taken after a certain threshold is step over.

      • Wander@kbin.social
        link
        fedilink
        arrow-up
        15
        arrow-down
        5
        ·
        1 year ago

        Unless you think theres no difference between killing a person and closing a program, I think we can agree they should be treated differently in the eyes of the law.

        And so theres a difference between a person reading a book and being inspired by it, and someone writing a program that automatically transforms the book in data that can create new books.