AI and the Corporate Capture of Knowledge

More than a decade after Aaron Swartz’s death, the United States is still living inside the contradiction that destroyed him.

Swartz believed that knowledge, especially publicly funded knowledge, should be freely accessible. Acting on that, he downloaded thousands of academic articles from the JSTOR archive with the intention of making them publicly available. For this, the federal government charged him with a felony and threatened decades in prison. After two years of prosecutorial pressure, Swartz died by suicide on Jan. 11, 2013.

The still-unresolved questions raised by his case have resurfaced in today’s debates over artificial intelligence, copyright and the ultimate control of knowledge.

At the time of Swartz’s prosecution, vast amounts of research were funded by taxpayers, conducted at public institutions and intended to advance public understanding. But access to that research was, and still is, locked behind expensive paywalls. People are unable to read work they helped fund without paying private journals and research websites.

Swartz considered this hoarding of knowledge to be neither accidental nor inevitable. It was the result of legal, economic and political choices. His actions challenged those choices directly. And for that, the government treated him as a criminal.

Today’s AI arms race involves a far more expansive, profit-driven form of information appropriation. The tech giants ingest vast amounts of copyrighted material: books, journalism, academic papers, art, music and personal writing. This data is scraped at industrial scale, often without consent, compensation or transparency, and then used to train large AI models.

AI companies then sell their proprietary systems, built on public and private knowledge, back to the people who funded it. But this time, the government’s response has been markedly different. There are no criminal prosecutions, no threats of decades-long prison sentences. Lawsuits proceed slowly, enforcement remains uncertain and policymakers signal caution, given AI’s perceived economic and strategic importance. Copyright infringement is reframed as an unfortunate but necessary step toward “innovation.”

Recent developments underscore this imbalance. In 2025, Anthropic reached a settlement with publishers over allegations that its AI systems were trained on copyrighted books without authorization. The agreement reportedly valued infringement at roughly $3,000 per book across an estimated 500,000 works, coming at a cost of over $1.5 billion. Plagiarism disputes between artists and accused infringers routinely settle for hundreds of thousands, or even millions, of dollars when prominent works are involved. Scholars estimate Anthropic avoided over $1 trillion in liability costs. For well-capitalized AI firms, such settlements are likely being factored as a predictable cost of doing business.

As AI becomes a larger part of America’s economy, one can see the writing on the wall. Judges will twist themselves into knots to justify an innovative technology premised on literally stealing the works of artists, poets, musicians, all of academia and the internet, and vast expanses of literature. But if Swartz’s actions were criminal, it is worth asking: What standard are we now applying to AI companies?

The question is not simply whether copyright law applies to AI. It is why the law appears to operate so differently depending on who is doing the extracting and for what purpose.

The stakes extend beyond copyright law or past injustices. They concern who controls the infrastructure of knowledge going forward and what that control means for democratic participation, accountability and public trust.

Systems trained on vast bodies of publicly funded research are increasingly becoming the primary way people learn about science, law, medicine and public policy. As search, synthesis and explanation are mediated through AI models, control over training data and infrastructure translates into control over what questions can be asked, what answers are surfaced, and whose expertise is treated as authoritative. If public knowledge is absorbed into proprietary systems that the public cannot inspect, audit or meaningfully challenge, then access to information is no longer governed by democratic norms but by corporate priorities.

Like the early internet, AI is often described as a democratizing force. But also like the internet, AI’s current trajectory suggests something closer to consolidation. Control over data, models and computational infrastructure is concentrated in the hands of a small number of powerful tech companies. They will decide who gets access to knowledge, under what conditions and at what price.

Swartz’s fight was not simply about access, but about whether knowledge should be governed by openness or corporate capture, and who that knowledge is ultimately for. He understood that access to knowledge is a prerequisite for democracy. A society cannot meaningfully debate policy, science or justice if information is locked away behind paywalls or controlled by proprietary algorithms. If we allow AI companies to profit from mass appropriation while claiming immunity, we are choosing a future in which access to knowledge is governed by corporate power rather than democratic values.

How we treat knowledge—who may access it, who may profit from it and who is punished for sharing it—has become a test of our democratic commitments. We should be honest about what those choices say about us.

This essay was written with J. B. Branch, and originally appeared in the San Francisco Chronicle.

Tags: Aaron Swartz, AI, copyright, LLM

Posted on January 16, 2026 at 9:44 AM • 10 Comments

Comments

You Know Who You Gubmint Leeches • January 16, 2026 11:02 AM

Professor Schneier,
Thank you very much for keeping this injustice alive.
The US Government hates it when people rub in their huge failings.
What really happens, each time a citizen is wrongfully charged, prosecuted, and convicted,
is always a fault of a couple-three sick bastards, ego-maniacs in the government, trying to prove something which they never will be able to accomplish, and rather than giving up and admitting their wrongs – they double down and do everything they can to hide their dark secrets, whereby not sparing any expenses, because they can afford it, it’s not their own, personal funds they are wasting – it’s the taxpayer’s money and the best part is, no matter what – they’ll never have to answer for their dirty deeds because “I WAS ONLY DOING MY JOB.”

Regarding Mr. Swartz’s fight, he was totally right.
Why should a private corporation get taxpayer’s moneys to fund a project which they then patent and call their own “Proprietary” Intellectual Property?
It is wrong. But as in most cases, FOLLOW THE MONEY.
Those private corps are usually the ones backing certain candidates, senators, governors, mayors, with huge donations and once elected, there’s big payback in terms of those “GRANTS” or financially/economically speaking – RoI.

AI is going to contribute to Balkanization of the Internet because, please do tell me, who’s going to invest tons of money into building, training, maintaining AI/ML/LLMs and then make it available to everyone – even competition. This previous sentence applies, for the most part, to large corporations, or even government institutions who have already been operating on LANs (Secured/Autonomous Intranets) to which you cannot Google/Surf/Connect because they are Private Networks which are now essentially cataloguing their vast databases, consolidating their information using AI tools to make their corporations/organizations more effective/productive/faster, etc…

I do not wish to hijack this thread, but if one or more individuals in the government have the INTENT to destroy a citizen – it will happen no matter what.
It will happen regardless of how absurd/wrong it looks. Even if it’s much too obvious, they do not care, just like in case of this guy from Idaho, who frequently chooses this very comment section to desperately plead for help and justice which he never got.
https://drive.google.com/drive/folders/16GB5NiUu4Zb07RD6B3ai08qerHbHxJhI

Rontea • January 16, 2026 12:35 PM

Democracy, at its core, is an information system—a collective agreement that the wisdom of the many shapes the governance of the few. Yet somewhere along the way, we gave the keys to that knowledge to gatekeepers. We locked up the raw materials of our shared understanding—academic papers, public research, the very data of our civic life—behind paywalls and proprietary systems. And then, with a kind of quiet irony, we handed that same knowledge to machines and the corporations that own them, granting them the power to harvest and interpret it without returning it to the public.

The result is an inversion of democracy itself: instead of knowledge flowing freely to the people to inform debate, decision, and dissent, it flows upward, into black boxes controlled by a few companies. These new custodians decide what is knowable, what is profitable, and increasingly, what is true. In a system where access to knowledge determines power, the question is not just about copyright or innovation—it’s about whether democracy can survive when the people are denied the tools to understand their own world.

A democracy that defers its knowledge to private algorithms is one that risks becoming a spectator to its own governance. If knowledge is a public good, it should belong to the public—not as a privilege, but as a right.

RIP Aaron Swartz, who reminded us that the fight for open knowledge is the fight for a living democracy.

Clive Robinson • January 16, 2026 3:28 PM

@ Bruce,

Some of us remember the actual details of what happened at the time…

Aaron was probably not guilty of any crime other than “theft of electricity” which would have been a misdemeanor.

The prosecutor who forced her way in was acting under political pressure from the executive to “make an example of him” in effect a “witch hunt for a public burning at the stake” as a “warning to all”.

Because the prosecutor knew there was little hope of gaining a conviction in a fair court, she was going for a life wrecking plea deal solution.

So rather than go into court, she quite deliberately delayed and delayed and ratcheted things up by repeatedly putting more threats on the table to pressure Aaron.

In the process of “rights stripping” she was also trying to bankrupt Aaron.

We know this from still available records.

Unfortunatly Aaron was given bad advice by others who had positioned themselves as activists for their own advantage…

So Aaron was caught between a rock and a hard place and ended up in a state that should have been a cause of great concern for those around him. Apparently they claim not to have noticed… So Aaron did not receive the medical help he should have.

The outcome you mention.

People need to remember that the US has the worst figures for the number of people in jail against population size. Because running jails can be very profitable we know judges have taken bribes to get not just the number of people jailed up but the also the length of their sentence. As for conditions well the violence levels are some of the highest and it’s difficult to get figures to say where America stand with regards deaths in prison but we know it’s high.

The US Justice system is clearly “two tier” for those that get pulled into it.

However there is a third upper tier that few think about, that is those that “buy the legislators” so their crimes don’t get effective legislation or regulation if any against them. Throw in politics as well and that is why,

‘But this time, the government’s response has been markedly different. There are no criminal prosecutions, no threats of decades-long prison sentences. Lawsuits proceed slowly, enforcement remains uncertain and policymakers signal caution, given AI’s perceived economic and strategic importance. Copyright infringement is reframed as an unfortunate but necessary step toward “innovation.”’

The simple fact is General AI by Current AI LLM and ML Systems is not going to happen, regardless of scale, or how much money is thrown at them. We now also know they can not be made either “safe or secure” and their potential for harm almost unlimited in comparison. Further no matter what is currently claimed the required if not essential “world view” is not going to happen either.

This also means that AI Agents apart from really trivial tasks are not going to be “safe or secure” thus a very real security nightmare that will prove endless and unfixable with an arms race that will develop where the attackers not the defenders will have the upper hand.

So there really wont be a ‘step toward “innovation”’ that investors, politicians, and the AI Corporates are desperate for currently or in the foreseeable future.

This as I’ve noted on a number of occasions is unfortunate because the US Economy is at best stagnant due to political stupidity and is already in an underlying recession. With the only thing hiding it being clearly visible is “AI froth” that is already turning into scum.

Do the Current AI LLM and ML Systems have useful “innovative” functions, the answer is actually yes but in very limited and specific types of applications. But there are really very of these thus that all important ROI investors want is not there.

I can not predict when or how the AI Hype Bubble will end (expload, implode or deflate). But it’s now abundantly clear there really is no deliverables of worth to support it in the general usage sense thus at some point it’s going to be over.

What I can say is that the “lost opportunity cost” will be immense and the “Tech Sector” as we know it will be forever changed and probably as history shows “not for the weavers” be it code rather than cloth this time.

It’s almost certain the US Corporates will make the same mistake as they’ve done since the 60’s in that they will let “experience and expertise in the US age out”. Then when things get dire try to “go abroad for it” if they can… Which they probably won’t be able to do for various “short sighted” reasons most here can probably work out for themselves.

The question for other Western Nations is, are they going to just,

“Rearrange the deckchairs on what is a ship in distress, or man the lifeboats and cast off? So as to get a good safe distance away whilst they still can, so they don’t get pulled down…”

Keep an eye on European Defence Spending, if they buy US systems then they are probably not heading for the lifeboats… Likewise what they do to resolve the energy crisis that is slowly building.

Clive Robertson • January 17, 2026 10:38 AM

Aaron is not the first to be destroyed by the U.S. Government.
Here’s another example of what happens when the employees of our government lie and cover-up for each other, their friends, and their wh0res.

Winter • January 17, 2026 11:40 AM

@Clive

But there are really very of these thus that all important ROI investors want is not there.

Not in it’s current form. But we already know how to design “better” systems.

A bee is the archetypical autonomous drone that flies very successfully in a very complex environment. It flies literally on a few drops of honey.

We know how their neural system works. We even know how to put it into silicon, more or less. It will not be as efficient, but it will still be orders of magnitude better than the waste used now.

The problem is feedback loops and loops in general. An AI will only be intelligent if it can evaluate it’s own output. Current LLMs do this very, very clumsily and very inefficiently with error back propagation. The reason is that no one has found a way to tame loops.

Models should fit a task. Alphafold is great because it does one thing, and does it well.
Small is beautiful. The smaller the mode
Heat is a sign of failure.

Sionyn • January 17, 2026 7:36 PM

Thanks Bruce. We remember our brother Aron.

Drive By Idealogue • January 18, 2026 2:45 PM

I skimmed part of this, then searched for the string ‘wikipedia’. I think their recent ‘deals’ to exchange money for access to their horde of creative commons content is worth mentioning in this context.

r • January 18, 2026 5:20 PM

@Clive,

If I recall correctly, although I am on his side: there was a closet of hardware rotating Mac addresses specifically set up to avoid rate limiting from the mit systems.

Access fraud may have played a role in the scare tactics?

bisento • January 21, 2026 5:04 AM

and now big tech giants are doing it wholesale:
https://torrentfreak.com/nvidia-contacted-annas-archive-to-secure-access-to-millions-of-pirated-books/

NVIDIA leeched torrents of books from anna’s archive and was informed that this would be illegal and proceeded nonetheless.

Lets put the board of nvidia behind bars and pressure them the same as Aaron was pressured…
If this finding results in a small fine it will just be the cost of doing business and nothing will change and rich corporations can break the law with impunity
(yeah, I know)

lurker • January 21, 2026 1:16 PM

@bisento

Good luck to nVidia with Anna’s-Archive. There are more than a few books there that are already samidatz, riddled with with hideous ocr errors.

Schneier on Security

AI and the Corporate Capture of Knowledge

Comments

Leave a comment Cancel reply