If an AI “reproduces” a work it was trained on it is a failure of an AI. Why would anyone want to spend millions of dollars and devote oodles of computing power to build something that just does what a simple copy/paste operation can accomplish?
When an AI spits out something that’s too close to one of the original training set that’s called “overfitting” and it is considered an error to be corrected. Most overfitting that’s been detected has been a result of duplication in the training set - when you hammer an AI image generator in training with thousands of copies of the Mona Lisa it eventually goes “alright, I get it already, when you say ‘Mona Lisa’ you want that exact pattern!” And will try its best to replicate that pattern when you ask it to later. That’s why training sets need to be de-duplicated.
Did you write a comment on Reddit before 2015? If so, your copyrighted content was used without your permission to train today’s LLMs, so you absolutely get to feel one way or another about it.
The idea that these authors were somehow the backbone of the models when any individual contribution was like spitting in the ocean and model weights would have considered 100 pages of Twilight fan fiction equivalent to 100 pages from Twilight is honestly one of the negative impacts of the extensive coverage these suits are getting.
Pretty much everyone who has ever written anything indexed online is a tiny part of today’s LLMs.
A human, regardless of how many books they read, will have personal experiences that are undeniably unique to themselves. They will interpret the works they read differently from each other based on their worldly experiences. Their writing, no matter how many books they read and get inspired on, will always be influenced by their own personal lives. They can experience love, hate, heartbreak, empathy, sadness, and happiness.
This is something a LLM does not have, and in my opinion, is a massive distinguishing factor. So on a “fundamental” level, it is not the same. It is no where near the same.
A human, regardless of how many books they read, will have personal experiences that are undeniably unique to themselves.
So will every AI. ChatGPT will give you different answers than Bard or WizardLM, since they are all trained on different books. And every StableDiffusion model creates different images, different styles, different topics, etc. It’s all in the data they “experienced”.
do you really think we are that far off… from giving a foundational memory and motivation layers to these LLMs, that could mimic… or even… generate the generic thoughts youre indicating?
i dont think so. you seem to imply its impossibility, i expect its inevitability. the human brain will not be a black box forever… it still exists in a world of physics we can emulate, even if rudimentary.
The same thing as with tooooooons of things: scale.
Nobody cares if one dude steals office supplies at work. Now, if everyone stats doing it, or if the single guy steals everything, then action is taken.
Nobody cares if a random person draws in the same style and with same characters as you, but if they start to sell them, or god forbid, out-sell you, then there is a problem.
Nobody cares (except police I guess) if a random driver drives double the speed limit and annoys people living next to the road on the weekends, but when tons of people do it, you get speed bumps.
Nobody cares if few people pirate movies, but when it gets to mainstream and companies notice that there might be money being lost. Then you get whatever we have now.
Nobody cares if the mudhill behind your house erodes a bit and you get mud on your shoes. Have a bunch of that erode and you realise the danger…
You have been fine-tuning your own writing style for a decade and random schmuck starts to write similarly, you probably don’t care. No harm done. Now, get an AI to write 10 000 books in a weekend and someone starts to sell them… well now you have a completely different problem.
On a fundamental level the exact same thing is happening, yet action is only taken after a certain threshold is step over.
Unless you think theres no difference between killing a person and closing a program, I think we can agree they should be treated differently in the eyes of the law.
And so theres a difference between a person reading a book and being inspired by it, and someone writing a program that automatically transforms the book in data that can create new books.
These are machines, though, not human beings.
I guess I’d have to be an author to find out how I’d feel about it, to be fair.
Machines that aren’t reproducing or distributing works
If an AI “reproduces” a work it was trained on it is a failure of an AI. Why would anyone want to spend millions of dollars and devote oodles of computing power to build something that just does what a simple copy/paste operation can accomplish?
When an AI spits out something that’s too close to one of the original training set that’s called “overfitting” and it is considered an error to be corrected. Most overfitting that’s been detected has been a result of duplication in the training set - when you hammer an AI image generator in training with thousands of copies of the Mona Lisa it eventually goes “alright, I get it already, when you say ‘Mona Lisa’ you want that exact pattern!” And will try its best to replicate that pattern when you ask it to later. That’s why training sets need to be de-duplicated.
AIs are meant to produce new things.
But terminator said neural networks
Damn.
Did you write a comment on Reddit before 2015? If so, your copyrighted content was used without your permission to train today’s LLMs, so you absolutely get to feel one way or another about it.
The idea that these authors were somehow the backbone of the models when any individual contribution was like spitting in the ocean and model weights would have considered 100 pages of Twilight fan fiction equivalent to 100 pages from Twilight is honestly one of the negative impacts of the extensive coverage these suits are getting.
Pretty much everyone who has ever written anything indexed online is a tiny part of today’s LLMs.
Thank you for your reply.
On a completely separate note, it’s funny to think that there exists Twilight fan fiction when
Twilight itself started as fan fiction work.Edit: I dun goofed.
Pretty sure it’s the other way around.
Fifty Shades of Gray started out as Twilight fanfiction before becoming its own thing.
AFAIK Twilight was always just its own pulp fiction.
Oh true! My memory was fuzzy on the details. Thanks for the correction.
I don’t think anyone is faulting the machines for this, just the people who instruct the machines to do it.
What’s the difference? On the most fundamental level it’s all the same.
A human, regardless of how many books they read, will have personal experiences that are undeniably unique to themselves. They will interpret the works they read differently from each other based on their worldly experiences. Their writing, no matter how many books they read and get inspired on, will always be influenced by their own personal lives. They can experience love, hate, heartbreak, empathy, sadness, and happiness.
This is something a LLM does not have, and in my opinion, is a massive distinguishing factor. So on a “fundamental” level, it is not the same. It is no where near the same.
So will every AI. ChatGPT will give you different answers than Bard or WizardLM, since they are all trained on different books. And every StableDiffusion model creates different images, different styles, different topics, etc. It’s all in the data they “experienced”.
do you really think we are that far off… from giving a foundational memory and motivation layers to these LLMs, that could mimic… or even… generate the generic thoughts youre indicating?
i dont think so. you seem to imply its impossibility, i expect its inevitability. the human brain will not be a black box forever… it still exists in a world of physics we can emulate, even if rudimentary.
The same thing as with tooooooons of things: scale.
Nobody cares if one dude steals office supplies at work. Now, if everyone stats doing it, or if the single guy steals everything, then action is taken.
Nobody cares if a random person draws in the same style and with same characters as you, but if they start to sell them, or god forbid, out-sell you, then there is a problem.
Nobody cares (except police I guess) if a random driver drives double the speed limit and annoys people living next to the road on the weekends, but when tons of people do it, you get speed bumps.
Nobody cares if few people pirate movies, but when it gets to mainstream and companies notice that there might be money being lost. Then you get whatever we have now.
Nobody cares if the mudhill behind your house erodes a bit and you get mud on your shoes. Have a bunch of that erode and you realise the danger…
You have been fine-tuning your own writing style for a decade and random schmuck starts to write similarly, you probably don’t care. No harm done. Now, get an AI to write 10 000 books in a weekend and someone starts to sell them… well now you have a completely different problem.
On a fundamental level the exact same thing is happening, yet action is only taken after a certain threshold is step over.
Bingo.
Unless you think theres no difference between killing a person and closing a program, I think we can agree they should be treated differently in the eyes of the law.
And so theres a difference between a person reading a book and being inspired by it, and someone writing a program that automatically transforms the book in data that can create new books.
Wait. Are human beings machines?
Biological machines, yes.