"Did you realize that we live in a reality where SciHub is illegal, and OpenAI is not?"

Star@sopuli.xyz · 10 months ago

"Did you realize that we live in a reality where SciHub is illegal, and OpenAI is not?"

Star@sopuli.xyz · edit-2 10 months ago

It’s so ridiculous when corporations steal everyone’s work for their own profit, no one bats an eye but when a group of individuals do the same to make education and knowledge free for everyone it’s somehow illegal, unethical, immoral and what not.

Grimy@lemmy.world · 10 months ago

Using publically available data to train isn’t stealing.

Daily reminder that the ones pushing this narrative are literally corporation like OpenAI. If you can’t use copyright materials freely to train on, it brings up the cost in such a way that only a handful of companies can afford the data.

They want to kill the open-source scene and are manipulating you to do so. Don’t build their moat for them.

givesomefucks@lemmy.world · edit-2 10 months ago

And using publicly available data to train gets you a shitty chatbot…

Hell, even using copyrighted data to train isn’t that great.

Like, what do you even think they’re doing here for your conspiracy?

You think OpenAI is saying they should pay for the data? They’re trying to use it for free.

Was this a meta joke and you had a chatbot write your comment?

tourist@lemmy.world · 10 months ago

Was this a meta joke and you had a chatbot write your comment?

if someone said this to me I’d cry

webghost0101@sopuli.xyz · edit-2 10 months ago

The point that was being made was that public available data includes a whole lot amount of copyrighted data to begin with and its pretty much impossible to filter it out. Grand example, the Eiffel tower in Paris is not copyright protected, but the lights on it are so you can only using pictures of the Eiffel tower during the day, if the picture itself isn’t copyright protected by the original photographer. Copyright law has all these complex caveat and exception that make it impossible to tell in glance whether or not it is protected.

This in turn means, if AI cannot legally train on copyrighted materials it finds online without paying huge sums of money then effectively only mega corporation who can pay copyright fines as cost of business will be able to afford training decent AI.

The only other option to produce any ai of such type is a very narrow curated set of known materials with a public use license but that is not going to get you anything competent on its own.

EDIT: In case it isn’t clear i am clarifying what i understood from Grimy@lemmy.world comment, not adding to it.

RainfallSonata@lemmy.world · 10 months ago

I didn’t want any of this shit. IDGAF if we don’t have AI. I’m still not sure the internet actually improved anything, let alone what the benefits of AI are supposed to be.

RememberTheApollo@lemmy.world · 10 months ago

It doesn’t matter what you want. What matters is if corporations can extract $ from you, gain an efficiency, or cut their workforce using it.

That’s what the drive for AI is all about.

RainfallSonata@lemmy.world · 10 months ago

No doubt.

Alien Nathan Edward@lemm.ee · 10 months ago

this is because the technocrats are allowed to steal from you, but when you steal from them what they’ve stolen from actual researchers that’s a problem

LWD@lemm.ee · edit-2 10 months ago

AFAIK the individual researchers who get their work pirated and put on Sci-Hub don’t seem to particularly mind.

Check out blog post critical of sci-hub and how it appeals to academic faculty:

By freeing published scholarship from the chains of toll access and copyright protection and making them freely available to all, it can feel like you are helping a Robin Hood figure rob from the rich and give to the poor.

It goes on to explain potential security issues, but it doesn’t even try to attack the concept of freely providing academic papers to begin with.

I’m starting to think the term “piracy” is morally neutral. The act can be either positive or negative depending on the context. Unfortunately, the law does not seem to flow from morality, or even the consent of the supposed victims of this piracy.

brsrklf@jlai.lu · 10 months ago

I follow a few researchers with interesting youtube channels, and they often mention that if you ask them or their colleagues for a publication of theirs, chances are they’ll be glad to send it to you.

A lot of them love sharing their work, and don’t care at all for science journal paywalls.

breakfastmtn@lemmy.ca · 10 months ago

Academics don’t care because they don’t get paid for them anyway. A lot of the time you have to pay to have your paper published. Then companies like Elsevier just sit back and make money.

Imgonnatrythis@sh.itjust.works · 10 months ago

Don’t mind? Hell, we want people to read that shit. We don’t profit at all if it’s paywalled, it hurts us and hurts science in general. This is 100% the wishes of scientific for profit journals.

I Cast Fist@programming.dev · 10 months ago

What really breaks the suspension of disbelief in this reality of ours is that fucking advertising is the most privacy invasive activity in the world. Seriously, even George Orwell would call bullshit on that.

hottari@lemmy.ml · 10 months ago

This is different. AI as a transformative tech is going to usher the US economy into the next boom of prosperity. The AI revolution will change the world and allow people to decide if they want to work for money or not (read UBI). In case you haven’t caught on, am being sarcastic.

All this despite ChatGPT being a total complete joke.

Joe Cool@lemmy.ml · 10 months ago

So, I feel taking an .epub and putting it in a .zip is pretty transformative.

Also you can make ChatGPT (or Copilot) print out quotes with a bit of effort, now that it has Internet.

Jknaraa@lemmy.ml · 10 months ago

And people wonder why there’s so much push back against everything corps/gov does these days. They do not act in a manner which encourages trust.

TWeaK@lemm.ee · 10 months ago

OpenAI isn’t really proven as legal. They claim it is, and it’s very difficult to mount a challenge, but there definitely is an argument that they have no fair use protection - their “research” is in fact development of a commercial product.

Bananigans@lemmings.world · 10 months ago

If this ends with LLMs getting shutdown to some degree, I wonder if it’s going to result in something like a Pirate Bai.

skeptomatic@lemmy.ca · 10 months ago

Not to be confused with, “Pirate Bae”, the pirate dating site for those endowed with abundant doubloons.

LibreFish@lemmy.world · 10 months ago

Yes, because 1:1 duplication of copy written works violates copyright, but summaries of those works and relaying facts stated in those works is perfectly legal (by an ai or not).

unexpectedteapot@lemmy.ml · edit-2 10 months ago

If you mean by “perfectly legal” a fair use claim, then could you please explain how a commercial for-profit company using the works, sometimes echoing verbatim results, is infringing on the copyrights in a fair use manner?

LibreFish@lemmy.world · 10 months ago

I do not mean a fair use claim. To quote the copyright office “Copyright does not protect facts, ideas, systems, or methods of operation, although it may protect the way these things are expressed” source

Facts and ideas cannot be copy written, so what I was specifically referring to is that if I or an AI read a paper about jellyfish being ocean creatures, then later talk about jellyfish being ocean creatures, there’s no restrictions on that whatsoever as long as we don’t reproduce the paper word by word.

Now, most of the time AI summarizes things or collects facts, and since those themselves cannot be protected by copyright it’s perfectly legal. On the occasion when AI spits out copy written work then that’s a gray area and liability if any will probably decided in the courts.

Uriel238 [all pronouns]@lemmy.blahaj.zone · 10 months ago

The IP system, which goes to great lengths to block things like open-access scientific publications, is borked borked borked borked borked.

If OpenAI and other generative AI projects are the means by which we finally break it so we can have culture and a public domain again, well, we had to nail Capone with tax evasion.

Yes, industrialists want to use AI [exactly they way they want to use every other idea – plausible or not] to automate more of their industries so they can pay fewer people less money for more productivity. And this is a problem of which generative AI figures centrally, but it’s not really all that new, and eventually we’re going to have to force our society to recognize that it works for the public and not money. I don’t think AI is going to break the system and lead us to communist revolution ( The owning class will tremble…! ) But eventually it will be 1789 all over again. Or we’ll crush the fash and realize the only way we can get the fash to not come back is by restoring and extending FDR’s new deal.

I am skeptical the latter can happen without piles of elite heads and rivers of politician blood.

CrayonRosary@lemmy.world · 10 months ago

Lemmy users: Copyright law is broken and stupid.

Also Lemmy users: A.I. violates copyright law!

UnderpantsWeevil@lemmy.world · edit-2 10 months ago

A.I. doesn’t violate copywrite laws. It is the data-mining done to train A.I. and the regurgitation of said data in the responses that ultimately violate these laws. A model trained on privately owned, properly licensed, or exclusively public works wouldn’t be a problem.

Even then, I would argue that lack of attribution is a bigger problem than merely violating copywrite. A big part of the LLM mystique is in how it can spit out a few lines of Shakespeare without accreditation and convince its users that its some kind of master poet.

Copywrite law is stupid and broken. But plagarism is a problem in its own right, as it seeks to effectively sell people their own creative commons at an absurd markup.

trafficnab@lemmy.ca · edit-2 10 months ago

A model trained on privately owned, properly licensed, or exclusively public works wouldn’t be a problem.

This is how we end up with only corpo owned AIs being allowed to exist imo, places like stock photo sites are the only ones with large enough repositories of images to train AI that they have all the legal rights to

The way I see it, either generative AI is legal, free for everyone to run locally, and the created works are public domain, OR, everyone pays $20/mo to massive faceless corpos for the rest of their lives to have the privilege of access to it because they’re the only ones who own all (or have enough money to license) the IP needed to train them

UnderpantsWeevil@lemmy.world · 10 months ago

This is how we end up with only corpo owned AIs being allowed to exist imo

Its how you end up with sixteen different streaming services that only vend a sliver of the total available content, sure. But the underlying technology of AI grows independent of what its trained on.

The way I see it, either generative AI is legal, free for everyone to run locally, and the created works are public domain, OR, everyone pays $20/mo to massive faceless corpos for the rest of their lives to have the privilege of access to it

There are other alternatives. These sites can be restricted to data within the public domain. And we can increase our investment in public media. The problem of NYT articles being digested and regurgitated as ChatGPT info-vomit isn’t a problem if the NYT is a publicly owned and operated enterprise. Then its not struggling to profit off journalism, but treating this information as a loss-leading public service open to all, with ChatGPT simply operating as a tool to store, process, and present the data.

Similarly, if you limit generative AI to the old Mickey Mouse and Winnie-the-Pooh films from the 1930s, you leave plenty of room for original artists to create new works without fear that their livelihoods get chews up and fed back into the system. If you invest in public art exhibitions then these artists can get paid to pursue their craft, the art becomes public domain immediately, and digital tools that want to riff on the original are free to do so without undermining the artists themselves.

UnderpantsWeevil@lemmy.world · 10 months ago

Consider who sits on OpenAI’s board and owns all their equity.

SciHub’s big mistake was to fail to get someone like Sundar Pichai or Jamie Iannone with a billion-dollar stake in the company.

"Did you realize that we live in a reality where SciHub is illegal, and OpenAI is not?"

"Did you realize that we live in a reality where SciHub is illegal, and OpenAI is not?"

Yani Bellini Saibene (@yabellini@fosstodon.org)