Discussion about this post

User's avatar
Alex Tolley's avatar

When you say non-public works were detected in the likely training set, does this imply that the paywall was broken to reach the material, perhaps by accessing the O'Reilly library through a subscription and extracting the content?

Expand full comment
Dave Hansen's avatar

One of the takeaways at the end of the report was about sustainability: "If left unaddressed, the current disregard for IP rights could ultimately harm AI developers themselves, even if its use is ruled legally permissible. Sustainable ecosystems need to be designed so that both creators and developers can benefit from generative AI. Otherwise, model developers are likely to rapidly plateau in their progress, especially as newer content becomes produced less and less by humans."

I wonder if you have or would be willing to produce an analysis of what effect the use of these 34 titles has had on expected sales over time? Or, given that payment is via licensing deals that are long-term enough that these changes wouldn't have any impact on revenue just yet, have you seen any changes in behind-the-paywall usage? E.g. a drop off in users accessing content because users can get it (or enough of it) outside the paywall via an AI system to satisfy their need? There is scant data available users' willingness to substitute paid-for access with AI tools and this data could be valuable.

Expand full comment
1 more comment...

No posts