Startups and Data with AI

Training AI on public web data isn't theft, it's fair use... but we need laws that say so explicitly. The real legal fight shouldn't be about what LLMs train on, it should be about what they output.

Turning billions of words into numeric model weights is closer to indexing than copying. Forcing companies to license everything retroactively just locks in today's media giants and locks out tomorrow's startups. Ironically, that makes AI less competitive and less diverse, not more fair.

Instead of patching 1970s copyright law with duct tape, Congress should:

- Create a safe harbor for lawful training - Clarify guardrails around harmful or verbatim outputs - Focus policy on deepfake prevention and transparency

Trying to retrofit copyright and antitrust laws for this moment will only create confusion, and hand the future to the incumbents.

Discuss on LinkedIn