‘Impossible’ to create AI tools like ChatGPT without copyrighted material, OpenAI says::Pressure grows on artificial intelligence firms over the content used to train their products

  • S410@lemmy.ml
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    1
    ·
    edit-2
    6 months ago

    Machine learning doesn’t retain an exact copy either. Just how on earth do you think can a model trained on terabytes of data be only a few gigabytes in side, yet contain “exact copies” of everything? If “AI” could function as a compression algorithm, it’d definitely be used as one. But it can’t, so it isn’t.

    Machine learning can definitely re-create certain things really closely, but to do it well, it generally requires a lot of repeats in the training set. Which, granted, is a big problem that exists right now, and which people are trying to solve. But even right now, if you want an “exact” re-creation of something, cherry picking is almost always necessary, since (unsurprisingly) ML systems have a tendency to create things that have not been seen before.

    Here’s an image from an article claiming that machine learning image generators plagiarize things.

    However, if you take a second to look at the image, you’ll see that the prompters literally ask for screencaps of specific movies with specific actors, etc. and even then the resulting images aren’t one-to-one copies. It doesn’t take long to spot differences, like different lighting, slightly different poses, different backgrounds, etc.

    If you got ahold of a human artist specializing in photoreal drawings and asked them to re-create a specific part of a movie they’ve seen a couple dozen or hundred times, they’d most likely produce something remarkably similar in accuracy. Very similar to what machine learning images generators are capable of at the moment.