OpenAI’s Mounting Legal & Ethical Problems
UK, EU, and US considering investigations of the OpenAI/Microsoft partnership, copyright lawsuits, many private companies blocking the GPTBot, OpenAI's journalism blog post + EU AI Act Leak
Yesterday, the final text of the EU AI Act was leaked. A well-formatted and downloadable version can be found here.
According to editor at Euractive, Luca Bertuzzi, who was behind the leak, member states will still have to provide their technical feedback before the final text is adopted. How much the finally adopted text will diverge from the leaked document, I don’t how know. But if you are a nerd like myself, it can still serve as some nice bedtime reading.
I am considering if I should write an early post about the AI Act in its leaked form. On the other hand, when it comes to law, its rarely an advantage to be first, it’s much better to be throuough and draw conclusions once all the relevant evidence is collected.
Today’s post is about two of my favourite topics, OpenAI and the intersection of copyright law and generative AI.
OpenAI’s Mounting Legal & Ethical Problems
Generative AI is here to stay. Legislation will have to adapt. OpenAI will have to adapt to legislation. That is the anticipated chain of causes and effects. Unless, however, OpenAI is able to sell its lofty ideals about AGI and the importance of AI development to lawmakers. In this case, legislation will ultimately have to bend in the favor of OpenAI’s business practices. No one can gaze into the crystal ball and predict how the future will turn on but let’s for the purpose of this post take a closer look at OpenAI’s situation today.
What OpenAI Really Wants
OpenAI is by any measure an atypical company. It is radical in its mission, heavily financed, fast-moving, and acts boldly, while deeply concerned about the safety of its own products. The company is a curious mixture of a for-profit company sponsored by Microsoft, a non-profit organization specializing in responsible and safe AI, and a quasi-cult devoted to developing AGI for the benefit of all humanity. Steven Levy from Wired wrote a longread on the company, “What OpenAI Really Wants” in September last year that I recommend if you want to go deeper.
Bloomberg News has reported that OpenAI is in talks to raise a fresh round of funding at a valuation at or above $100 billion. Not only is OpenAI’s market valuation astounding, so is its beat of consistent product roll-out and updates, latest with its GPT store for paying customers. We can conclude that OpenAI is not only an atypical company but also wildly successful.
Now for the bad news. For the same reason that OpenAI is successful - its ambition, boldness, first-mover advantage, etc. - the company is also facing mounting legal and ethical problems. It remains to be seen whether these problems will make or break the company.
The OpenAI-Microsoft partnership
Under the OpenAI-Microsoft partnership agreement, OpenAI is reportedly paying 75% of its revenue to Microsoft, until its $13 billion investment is recouped. At that point, Microsoft will have a 49% ownership share, while OpenAI’s nonprofit parent company is getting 2% (OpenAI has a complicated company structure where a for-profit subsidiary is fully controlled by the OpenAI nonprofit.)
The OpenAI-Microsoft partnership is currently under scrutiny from the EU watchdog and UK regulators. In California, the consumer advocacy group, Public Citizen, has petitioned the California Attorney General to investigate the nonprofit status of OpenAI Inc. According to Gary Marcus, one possible outcome if the State of California chooses to investigate could be that OpenAI would be forced to dissolve its nonprofit parent company and make a distribution, conceivably measured in billions of dollars, to an external charity focused in whole or in part on AI.
As if it wasn’t enough, Politico reported on Friday that the Department of Justice (DOJ) and the Federal Trade Commission (FTC) in the US are deep in discussions over who has the jurisdiction to review the OpenAI-Microsoft partnership on antitrust grounds.
The scrutiny from regulators in the UK, EU, and US is still at an early probing stage We don’t know if the petition to the California Attorney General will lead to an investigation either. What we do know, is that Microsoft’s cloud infrastructure and funding are indispensable for training and maintaining GPT-3.5 and GPT-4, and that OpenAI couldn’t make any products without it.
Microsoft does not formally exercise any control over OpenAI, as the company’s for-profit arm as said earlier is controlled by the OpenAI nonprofit. However, namely after last year’s firing and re-hiring of Sam Altman as CEO, the regulatory powers have started to pay much more attention to OpenAI’s peculiar company structure and close ties to Microsoft. The regulatory threats to OpenAI's partnership with Microsoft are not yet acute, but very real and looming in the horizon.
The Copyright Issue
OpenAI as well as Microsoft are defendants in multiple class-action lawsuits from various authors, journalists, coders, and recently The New York Times. More lawsuits will likely follow this year from both small and big players in the creative industry.
Many organizations have opted out of OpenAI’s AI training by disallowing the web crawler, GPTBot, to access their websites. (Originiality.ai). Facebook, Pinterest, Vimeo, Amazon, Tumblr, Disney, CNN, Reuters, BBC, CNBC, NPR, The Guardian, ABC, The New York Times, Bloomberg, Scribd, The Atlantic, Lonely Planet, and Washington Post are among the many names who have opted out.
In fact, as of today, 36% of the top 100 websites and 30.6% of the top 1000 websites have blocked the GPTBot . It shows that not only indie artists, book authors, and The New York Times are unhappy to contribute to OpenAI’s training data, many other established companies are.
In a submission to the House of Lords communications and digital select committee, OpenAI subtly asks for a copyright exemption to AI training:
“Because copyright today covers virtually every sort of human expression– including blog posts, photographs, forum posts, scraps of software code, and government documents–it would be impossible to train today’s leading AI models without using copyrighted materials. Limiting training data to public domain books and drawings created more than a century ago might yield an interesting experiment, but would not provide AI systems that meet the needs of today’s citizens.”
The statement is factually correct but OpenAI do nt to address whether digital creators should explicitly opt-in or be compensated for AI training, and why not. Why would a licensing scheme be unrealistic?
For example, Substack is a platform rich with high-quality content that is ripe for AI training. How difficult would it be to pay Substack a yearly licensing fee that would be dispersed among all writers and creators on the platforms who have opted in for AI training? Perhaps it could amount to a couple of $1.000 a year for those creators who have a minimum number of active readers and have contributed with at least 100.000 words or more. Why is that out of the question? Sure, it will be expensive for foundation model providers, but it would benefit human creators, and after all, the AI companies depend on these human creations to build their models.