Report: Potential NYT lawsuit could force OpenAI to wipe ChatGPT and start over

Powderhorn@beehaw.org · 2 years ago

Report: Potential NYT lawsuit could force OpenAI to wipe ChatGPT and start over

sapient [they/them]@infosec.pub · 2 years ago

I hope not. Not a big fan of propriety AI (local AI all the way, and I hope people leak all these models, both code and weights), but fuck copyright and fuck capitalism which makes automation seem like a bad thing when it shouldn’t be ;p nya

wim@lemmy.sdf.org · 2 years ago

Yes, because AI and automation will definitely not be on the side of big capital, right? Right?

Be real. The cost of building means they’re always going to favour the wealthy. At best right now were running public copies of the older and smaller models. Local AI will always be running behind the state of the art big proprietary models, which will always be in the hands of the richest moguls and companies in the world.

hascat@programming.dev · 2 years ago

No leaks necessary; there are a number of open-source LLM’s available:

https://github.com/Hannibal046/Awesome-LLM#open-llm

The key differentiator between these and proprietary offerings will always be the training data. Large amounts of high-quality data will be more difficult for an individual or a small team to source. If lawsuits like this one block ingestion of otherwise publicly-available data, we could have a future where copyright holders charge AI builders for access to their data. If that happens, “knowledge” could become exclusive to various AI platforms much the same way popular shows or movies are exclusive to streaming platforms.

pax@rblind.com · 2 years ago

the opensource models are so bad that they give you responses out of context. they have completely random responses.

Franzia@lemmy.blahaj.zone · 2 years ago

While my gut reaction is “yeah, make them pay for this art and these articles they’re stealing to train the model” - I don’t think copyright is going to actually win the creators any money for their work this time.

I’d rather it remains a wild west. and copyright loses.

Gaywallet (they/it)@beehaw.org · 2 years ago

Not a strong case for NYT, but I’ve long believed that AI is vulnerable to copyright law and likely the only thing to stop/slow it’s progression. Given the major issues with all AI and how inequitable and bigoted they are and their increasing use, I’m hoping this helps to start conversations about limiting the scope of AI or application.

FIash Mob #5678@beehaw.org · 2 years ago

It’s pretty apparent that AI developers are training their applications using stolen images and data.

This was always going to end up in the courts.

teawrecks@sopuli.xyz · 2 years ago

A human brain is just the summation of all the content it’s ever witnessed, though, both paid and unpaid. There’s no such thing as artwork that is completely 100% original, everything is inspired by something else we’re already familiar with. Otherwise viewers of the art would just interpret it as random noise. There has to be some amount of familiarity for a viewer to identify with it.

So if someone builds an atom-perfect artificial brain from scratch, sticks it in a body, and shows it around the world, should we expect the creator to pay licensing fees to the owners of everything it looks at?

davehtaylor@beehaw.org · edit-2 11 months ago

deleted by creator

teawrecks@sopuli.xyz · 2 years ago

No offense, but I get the sense that you don’t actually know how ML works and you’re just familiar with pop science descriptions of it. Am I wrong?

It’s an incredibly bold claim to say that a human brain is doing something an AI could never do. That is a very antiquated notion, to the point that I would say it’s 100% devoid of any critical thinking.

Now if you’re arguing that there is a supernatural plane of some kind that cannot be measured in any way, and is fully responsible for our consciousness, then that’s a different story, there’s nothing I can say to change your mind.

There is so so so so so much more to human experience, life experience, and just being alive than simply absorbing “content.”

That’s the thing though, it’s all the same “content” to a living brain. Your brain doesn’t distinguish between your lived experiences and watching cat videos, the experience of watching those videos is also a lived experience.

I know it’s tempting to say humans (or living creatures) are special and unique in their ability to experience emotions and consciousness etc, but the reality is, you’re a biological machine. You take inputs via various senses, chemical reactions happen throughout your body, and the illusion of memory and experience is created. Now either prove to me that this phenomenon is not replicable in a lab or virtual setting, or get off your high horse and join the actual discussion that needs to happen.

davehtaylor@beehaw.org · edit-2 11 months ago

deleted by creator

teawrecks@sopuli.xyz · 2 years ago

So you acknowledge there is a valuable discussion to be had here. Thank you. I would like to have that discussion, would you? Or would you like to stick with the dismissive and arrogant schtick?

davehtaylor@beehaw.org · edit-2 11 months ago

deleted by creator

Barry Zuckerkorn@beehaw.org · 2 years ago

A human brain is just the summation of all the content it’s ever witnessed, though, both paid and unpaid.

But copyright is entirely artificial. The deal is that the law says you have to pay when you copy a bunch of copyrighted text and reprint it into new pages of a newly bound book. The law also says you don’t have to pay when you are giving commentary on a copyrighted work, or parodying a copyrighted work, or drawing inspiration from a copyrighted work to create something new but still influenced by that copyrighted work. The question for these lawsuits is whether using copyrighted works to train these models and generate new text (or art or music) is infringement of those artificial, human-made, legal rights.

As an example, sound recording copyrights only protect the literal copying of a sound recording. Someone who mimics that copyrighted recording, no matter how perfectly, doesn’t actually infringe on the recording copyright (even if they might infringe on the composition copyright, a separate and distinct copyright). But a literal duplication process of some kind would be infringement.

We can have a debate whether the law draws the line in the correct places, or whether the copyright regime could be improved, and other normative discussion what what the rules should be in the modern world, especially about whether the rules in one area (e.g., the human brain) are consistent with the rules in another area (e.g., a generative AI model). But it’s a separate discussion from what the rules currently are. Under current law, the human brain is currently allowed to perform some types of copying and processing and remixing that some computer programs are not.

Rakn@discuss.tchncs.de · edit-2 2 years ago

This comparison doesn’t make sense to me. If the person then makes money off it: yes.

Otherwise the question would be if copyright law should be abolished entirely. E.g. if I create a new news portal with content copied form other source, would that be okay then?

You are comparing a computer program to a human. Which… is weird.

dolphone@beehaw.org · 2 years ago

Just because it’s weird to you doesn’t make it any less valid.

As a species we sit at the threshold of artificial life, created by us. Seems silly to think that such a monumental jump would not be accompanied by substantial changes in our made up rules of engagement.

Rakn@discuss.tchncs.de · edit-2 2 years ago

Might be a fundamental difference in opinion. I don’t see us anywhere near anything related to artificial life.

What they’ve built there is a product, a computer program and they used other folks data to build it without getting their permission. I also cannot go and just copy and paste source code from all over the internet to build my program. There are licenses attached to it that determine what you can or can’t do with it.

I feel like just because the term “learning” is involved people no longer view it as simply building or programming a system. Which it is.

teawrecks@sopuli.xyz · 2 years ago

If the person then makes money off it: yes.

Every idea you’ve ever profited from was inspired by something you saw in the past. That’s my point. There are no ideas that exist entirely within a vacuum, they all stem from something else, we just draw a line arbitrarily and say “this idea is too much like that other idea”. But if you combine 3 other ideas into something that is sufficiently non-obvious (which is entirely relative) then we call it “novel” and “original”.

I think the line should probably be, either it’s a tool and you need to license any work it references, OR it’s conscious, has rights, gets paid, and is a person. I think most tech companies would much rather stay in the former camp, not having to answer any ethical dilemmas if they don’t have to. But on the other hand, the first company to make something that people consider actually “conscious” will make history.

You are comparing a computer program to a human. Which… is weird.

Sounds like you have about 100 years of philosophical discussion, AI research, and scifi to catch up on 😄.

Rakn@discuss.tchncs.de · 2 years ago

It feels like you are making a computer program out to be more than it actually is right now. At the same time this all isn’t about what that program is doing. It’s about how it was built.

couragethebravedog@lemmy.ml · 2 years ago

Everyone wants they’re piece of the pie. I just want AI to evolve to the point we can use it to create real innovation. But we’ll never get there with all these greedy removed.

thestarfraction@beehaw.org · 2 years ago

So you’d rather some of the world’s biggest corporations get to monopolise AI profits (meanwhile pushing out some very dodgy ‘creations’ including b******* text masquerading as truth) while the people whose actual creative labour it is built on get nothing? Who’s the real greedy ones here? Seems to me it’s the likes of Google and OpenAI.

thestarfraction@beehaw.org · 2 years ago

When little people torrent they get prosecuted, when Google steals all the text they can get their hands on, it gets legalised…

BiNonBi@lemmy.blahaj.zone · 2 years ago

NPR reported that a “top concern” is that ChatGPT could use The Times’ content to become a “competitor” by “creating text that answers questions based on the original reporting and writing of the paper’s staff.”

That’s something that can currently be done by a human and is generally considered fair use. All a language model really does is drive the cost of doing that from tens or hundreds of dollars down to pennies.

To defend its AI training models, OpenAI would likely have to claim “fair use” of all the web content the company sucked up to train tools like ChatGPT. In the potential New York Times case, that would mean proving that copying the Times’ content to craft ChatGPT responses would not compete with the Times.

A fair use defense does not have to include noncompetition. That’s just one factor in a fair use defense and the other factors may be enyon their own.

I think it’ll come down to how “the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes” and “the amount and substantiality of the portion used in relation to the copyrighted work as a whole;” are interpreted by the courts. Do we judge if a language model by the model itself or by the output itself? Can a model itself be uninfringing and it still be able to potentially produce infringing content?

fuzzywolf23@beehaw.org · 2 years ago

The model is intended for commercial use, uses the entire work and creates derivative works based on it which are in direct competition.