• Nina@lemmy.ml
    link
    fedilink
    arrow-up
    4
    ·
    edit-2
    2 years ago

    they are just watching and learning Why is it treated so differently

    Because it isn’t human. It isn’t watching and learning, it is being fed my creative content as data that I have not allowed nor have been compensated for, which is then turned around and sold as a service. My work is being consumed for commercial uses by an inhuman who does not have fair use education rights, with the sole intent to create a profitable product, and I’m getting nothing. I have legal rights, no matter where I post my work, to retain my copyrights and I have the right to not consent to improper use of my works that do not align with the licenses I have chosen to give it. Websites ask for a licenses in their ToS to be able to even just display and share my artwork when I upload it. When I create an image, I am given ownership of it’s copyright to control the use, distribution, and right to create derivatives. This isn’t a fuzzy area, it’s very clear. If an artist did not consent to their artwork being used as training data for a non-fair use reason, it is stealing their works.

    And no, it’s not fair use under education. Copyright exists for human protection and uses. It isn’t being used for ‘learning’ it’s used as data to be repackaged and sold. Google’s use of it showing up in search is to link back to posts that contain my work, retain my copyright, and are not derivatives. If you mean by captchas, yeah capchas are pretty bullshit.

    And circling back to my original post. So? AI companies aren’t paying for their image training data, so why would they pay for reddit’s api?

    • aianarchist@lemmygrad.ml
      link
      fedilink
      arrow-up
      1
      ·
      2 years ago

      I think the biggest problem is the license we have all chosen to give our artwork explicitly doesn’t cover this. Your work isn’t being copied by AI, it’s training AI and sometimes being emulated by AI, but there are literally 0 laws about reading copywrited work unless you break down a barrier to do so like a paywall, and there are no laws on derivative work otherwise we wouldn’t have Pokémon knock offs and such. Lastly artists post a lot of content online freely to entities who do in turn claim control over their distribution.

      Ultimately, I think reddit as the owner over the distribution of our data (yuk) did the right thing by making paid api access, but it was stupid of them to do it at normal human scales and not just at bot scraping scales and then using their TOS to give them the ability to sue if they aren’t paid for training data.

    • slacktoid@lemmy.ml
      link
      fedilink
      arrow-up
      1
      ·
      2 years ago

      I feel the bigger problem with these AIs is more how they are solely being used to improve profits and productivity, these only affect the capital owners. None of that is going to improve the laborer (i.e., the artist, the coder, the writer, the people who create value from capital). This is only going to get worse. We are being normalized to automation and AI with the use of self-checkout.

      Also, about Reddit training data, I think they are too late to the party. The weights they were needed for are made. I do not think they are the exclusive source of specialized information, and (I hope) they are going to find out. They are just going to further show how silly the free market and the stock market are. The people who require the data will probably have other ways of getting it. r/datahoarders and people like that come to mind. Reddit is only making new data hard to access which, which they are not (and hopefully never) an exclusive source of.

      • Nina@lemmy.ml
        link
        fedilink
        arrow-up
        3
        ·
        edit-2
        2 years ago

        Yeah, AI can totally exist and be useful, but currently it’s in the hands of tech dudes and admins who have a terrible track record with developing things responsibly and over hyping and masking flaws. It’s used to make a profit at the colossal detriment to humans. It’s used to hurt us currently, not help at all.

        I think the training data from reddit probably only used the API because it was easier and free. And if no longer free, there’s nothing pointing to them actually paying for it. It’s not like reddit is the only data, they very much likely already have web scrapers for other uses that they can just tune for reddit.