Server admins could add in the policy that any AI scrapping requires the previous permission of the copyright holders of the contents (i.e., the users) when the scrap is done for exploitation of the data for greed. Also, the robots.txt could be used to forbid AI HTML scrap.
I don’t think that restrictions should be added at a protocol level, but, may be, some declarative tags should be fine:
{
"rich": "eat",
"about-meta": "fck-genocidal-and-youth-suicidal-promoter-zuckenberg",
"ai": "not-for-greed"
}
Racist and sexism, two of the values of the EU.