AI and the Corporate Capture of Knowledge
More than a decade after Aaron Swartz’s death, the United States is still living inside the contradiction that destroyed him.
Swartz believed that knowledge, especially publicly funded knowledge, should be freely accessible. Acting on that, he downloaded thousands of academic articles from the JSTOR archive with the intention of making them publicly available. For this, the federal government charged him with a felony and threatened decades in prison. After two years of prosecutorial pressure, Swartz died by suicide on Jan. 11, 2013.
The still-unresolved questions raised by his case have resurfaced in today’s debates over artificial intelligence, copyright and the ultimate control of knowledge.
At the time of Swartz’s prosecution, vast amounts of research were funded by taxpayers, conducted at public institutions and intended to advance public understanding. But access to that research was, and still is, locked behind expensive paywalls. People are unable to read work they helped fund without paying private journals and research websites.
Swartz considered this hoarding of knowledge to be neither accidental nor inevitable. It was the result of legal, economic and political choices. His actions challenged those choices directly. And for that, the government treated him as a criminal.
Today’s AI arms race involves a far more expansive, profit-driven form of information appropriation. The tech giants ingest vast amounts of copyrighted material: books, journalism, academic papers, art, music and personal writing. This data is scraped at industrial scale, often without consent, compensation or transparency, and then used to train large AI models.
AI companies then sell their proprietary systems, built on public and private knowledge, back to the people who funded it. But this time, the government’s response has been markedly different. There are no criminal prosecutions, no threats of decades-long prison sentences. Lawsuits proceed slowly, enforcement remains uncertain and policymakers signal caution, given AI’s perceived economic and strategic importance. Copyright infringement is reframed as an unfortunate but necessary step toward “innovation.”
Recent developments underscore this imbalance. In 2025, Anthropic reached a settlement with publishers over allegations that its AI systems were trained on copyrighted books without authorization. The agreement reportedly valued infringement at roughly $3,000 per book across an estimated 500,000 works, coming at a cost of over $1.5 billion. Plagiarism disputes between artists and accused infringers routinely settle for hundreds of thousands, or even millions, of dollars when prominent works are involved. Scholars estimate Anthropic avoided over $1 trillion in liability costs. For well-capitalized AI firms, such settlements are likely being factored as a predictable cost of doing business.
As AI becomes a larger part of America’s economy, one can see the writing on the wall. Judges will twist themselves into knots to justify an innovative technology premised on literally stealing the works of artists, poets, musicians, all of academia and the internet, and vast expanses of literature. But if Swartz’s actions were criminal, it is worth asking: What standard are we now applying to AI companies?
The question is not simply whether copyright law applies to AI. It is why the law appears to operate so differently depending on who is doing the extracting and for what purpose.
The stakes extend beyond copyright law or past injustices. They concern who controls the infrastructure of knowledge going forward and what that control means for democratic participation, accountability and public trust.
Systems trained on vast bodies of publicly funded research are increasingly becoming the primary way people learn about science, law, medicine and public policy. As search, synthesis and explanation are mediated through AI models, control over training data and infrastructure translates into control over what questions can be asked, what answers are surfaced, and whose expertise is treated as authoritative. If public knowledge is absorbed into proprietary systems that the public cannot inspect, audit or meaningfully challenge, then access to information is no longer governed by democratic norms but by corporate priorities.
Like the early internet, AI is often described as a democratizing force. But also like the internet, AI’s current trajectory suggests something closer to consolidation. Control over data, models and computational infrastructure is concentrated in the hands of a small number of powerful tech companies. They will decide who gets access to knowledge, under what conditions and at what price.
Swartz’s fight was not simply about access, but about whether knowledge should be governed by openness or corporate capture, and who that knowledge is ultimately for. He understood that access to knowledge is a prerequisite for democracy. A society cannot meaningfully debate policy, science or justice if information is locked away behind paywalls or controlled by proprietary algorithms. If we allow AI companies to profit from mass appropriation while claiming immunity, we are choosing a future in which access to knowledge is governed by corporate power rather than democratic values.
How we treat knowledge—who may access it, who may profit from it and who is punished for sharing it—has become a test of our democratic commitments. We should be honest about what those choices say about us.
This essay was written with J. B. Branch, and originally appeared in the San Francisco Chronicle.
Subscribe to comments on this entry
You Know Who You Gubmint Leeches • January 16, 2026 11:02 AM
Professor Schneier,
Thank you very much for keeping this injustice alive.
The US Government hates it when people rub in their huge failings.
What really happens, each time a citizen is wrongfully charged, prosecuted, and convicted,
is always a fault of a couple-three sick bastards, ego-maniacs in the government, trying to prove something which they never will be able to accomplish, and rather than giving up and admitting their wrongs – they double down and do everything they can to hide their dark secrets, whereby not sparing any expenses, because they can afford it, it’s not their own, personal funds they are wasting – it’s the taxpayer’s money and the best part is, no matter what – they’ll never have to answer for their dirty deeds because “I WAS ONLY DOING MY JOB.”
Regarding Mr. Swartz’s fight, he was totally right.
Why should a private corporation get taxpayer’s moneys to fund a project which they then patent and call their own “Proprietary” Intellectual Property?
It is wrong. But as in most cases, FOLLOW THE MONEY.
Those private corps are usually the ones backing certain candidates, senators, governors, mayors, with huge donations and once elected, there’s big payback in terms of those “GRANTS” or financially/economically speaking – RoI.
AI is going to contribute to Balkanization of the Internet because, please do tell me, who’s going to invest tons of money into building, training, maintaining AI/ML/LLMs and then make it available to everyone – even competition. This previous sentence applies, for the most part, to large corporations, or even government institutions who have already been operating on LANs (Secured/Autonomous Intranets) to which you cannot Google/Surf/Connect because they are Private Networks which are now essentially cataloguing their vast databases, consolidating their information using AI tools to make their corporations/organizations more effective/productive/faster, etc…
I do not wish to hijack this thread, but if one or more individuals in the government have the INTENT to destroy a citizen – it will happen no matter what.
It will happen regardless of how absurd/wrong it looks. Even if it’s much too obvious, they do not care, just like in case of this guy from Idaho, who frequently chooses this very comment section to desperately plead for help and justice which he never got.
https://drive.google.com/drive/folders/16GB5NiUu4Zb07RD6B3ai08qerHbHxJhI