Tech Companies' New Favorite Solution For The AI Content Crisis Isn't Enough

convincingly resemble those created by humans. One big result is an online content crisis, an enormous and growing glut of unchecked, machine-made material riddled with potentially dangerous errors, misinformation and criminal scams. This situation leaves security specialists, regulators and everyday people scrambling for a way to tell AI-generated products apart from human work. Current AI-detection tools are deeply unreliable. Even OpenAI, the company behind ChatGPT, recently took its AI text identifier offline because the tool was so inaccurate.

Now, another potential defense is gaining traction: digital watermarking, or the insertion of an indelible, covert digital signature into every piece of AI-produced content so the source is traceable. Late last month the Biden administration announced that seven U.S. AI companies had voluntarily signed a list of eight risk management commitments, including a pledge to develop “robust technical mechanisms to ensure that users know when content is AI generated, such as a watermarking system.” Recently passed European Union regulations require tech companies to make efforts to differentiate their AI output from human work. Watermarking aims to rein in the Wild West of the ongoing machine learning boom. It’s only a first step—and a small one at that—overshadowed by generative AI’s risks.

Muddling human creation with machine generation carries a lot of consequences. “Fake news” has been a problem online for decades, but AI now enables content mills to publish tidal waves of misleading images and articles in minutes, clogging search engines and social media feeds. Scam messages, posts and even calls or voice mails can be cranked out quicker than ever. Students, unscrupulous scientists and job applicants can generate assignments, data or applications and pass it off as their own work. Meanwhile unreliable, biased filters for detecting AI-generated content can dupe teachers, academic reviewers and hiring managers, leading them to make false accusations of dishonesty.

And public figures can now lean on the mere possibility of deepfakes—videos in which AI is used to make someone appear to say or do something—to try dodging responsibility for things they really say and do. In a recent filing for a lawsuit over the death of a driver, lawyers for electric car company Tesla attempted to claim that a real 2016 recording in which its CEO Elon Musk made unfounded claims about the safety of self-driving cars could have been a deepfake. Generative AI can even “poison” itself as the Internet’s massive data trove—which AI relies on for its training—gets increasingly contaminated with shoddy content. For all these reasons and more, it is becoming ever more crucial to separate the robot from the real.

Existing AI detectors aren’t much help. “Yeah, they don’t work,” says Debora Weber-Wulff, a computer scientist and plagiarism researcher at the University of Applied Sciences for Engineering and Economics in Berlin. For a preprint study released in June, Weber-Wulff and her co-authors assessed 12 publicly available tools meant to detect AI-generated text. They found that, even under the most generous set of assumptions, the best detectors were less than 80 percent accurate at identifying text composed by robots—and many were only about as good as flipping a coin. All had a high rate of false positives, and all became much less capable when given AI-written content was lightly edited by a human. Similar inconsistencies have been noted among fake-image detectors.

Watermarking “is pretty much one of the few technical alternatives that we have available,” says Florian Kerschbaum, a computer scientist specializing in data security at the University of Waterloo in Ontario. “On the other hand, the outcome of this technology is not as certain as one might believe. We cannot really predict what level of reliability we’ll be able to achieve.” There are serious, unresolved technical challenges to creating a watermarking system—and experts agree that such a system alone won’t meet the monumental tasks of managing misinformation, preventing fraud and restoring peoples’ trust.

Adding a digital watermark to an AI-produced item isn’t as simple as, say, overlaying visible copyright information on a photograph. To digitally mark images and videos, small clusters of pixels can be slightly color adjusted at random to embed a sort of barcode—one that is detectible by a machine but effectively invisible to most people. For audio material, similar trace signals can be embedded in sound wavelengths.

Text poses the biggest challenge because it’s the least data-dense form of generated content, according to Hany Farid, a computer scientist specializing in digital forensics at the University of California, Berkeley. Even text can be watermarked, however. One proposed protocol, outlined in a study published earlier this year in Proceedings of Machine Learning Research, takes all the vocabulary available to a text-generating large language model and sorts it into two boxes at random. Under the study method, developers program their AI generator to slightly favor one set of words and syllables over the other. The resulting watermarked text contains notably more vocabulary from one box so that sentences and paragraphs can be scanned and identified.

In each of these techniques, the watermark’s exact nature must be kept secret from users. Users can’t know what pixels or soundwaves have been adjusted or how that has been done. And the vocabulary favored by the AI generator has to be hidden. Effective AI watermarks must be imperceptible to humans in order to avoid being easily removed, says Farid, who was not involved with the study.

There are other difficulties, too. “It becomes a humongous engineering challenge,” Kerschbaum says. Watermarks must be robust enough to withstand general editing, as well as adversarial attacks, but they can’t be so disruptive that they noticeably degrade the quality of the generated content. Tools built to detect watermarks also need to be kept relatively secure so that bad actors can’t use them to reverse-engineer the watermarking protocol. At the same time, the tools need to be accessible enough that people can use them.

Ideally, all the widely used generators (such as those from OpenAI and Google) would share a watermarking protocol. That way one AI tool can’t be easily used to undo another’s signature, Kerschbaum notes. Getting every company to join in coordinating this would be a struggle, however. And it’s inevitable that any watermarking program will require constant monitoring and updates as people learn how to evade it. Entrusting all this to the tech behemoths responsible for rushing the AI rollout in the first place is a fraught prospect.

Other challenges face open-source AI systems, such as the image generator Stable Diffusion or Meta’s language model LLaMa, which anyone can modify. In theory, any watermark encoded into an open-source model’s parameters could be easily removed, so a different tactic would be needed. Farid suggests building watermarks into an open-source AI through the training data instead of the changeable parameters. “But the problem with this idea is it’s sort of too late,” he says. Open-source models, trained without watermarks, are already out there, generating content, and retraining them wouldn’t eliminate the older versions.

Ultimately building an infallible watermarking system seems impossible—and every expert Scientific American interviewed on the topic says watermarking alone isn’t enough. When it comes to misinformation and other AI abuse, watermarking “is not an elimination strategy,” Farid says. “It’s a mitigation strategy.” He compares watermarking to locking the front door of a house. Yes, a burglar could bludgeon down the door, but the lock still adds a layer of protection.

Other layers are also in the works. Farid points to the Coalition for Content Provenance and Authenticity (C2PA), which has created a technical standard that’s being adopted by many large tech companies, including Microsoft and Adobe. Although C2PA guidelines do recommend watermarking, they also call for a ledger system that keeps tabs on every piece of AI-generated content and that uses metadata to verify the origins of both AI-made and human-made work. Metadata would be particularly helpful at identifying human-produced content: imagine a phone camera that adds a certification stamp to the hidden data of every photograph and video the user takes to prove it’s real footage. Another security factor could come from improving post hoc detectors that look for inadvertent artifacts of AI generation. Social media sites and search engines will also likely face increased pressure to bolster their moderation tactics and filter out the worst of the misleading AI material.

Still, these technological fixes don’t address the root causes of distrust, disinformation and manipulation online—which all existed long before the current generation of generative AI. Prior to the arrival of AI-powered deepfakes, someone skilled at Photoshop could manipulate a photograph to show almost anything they wanted, says James Zou, a Stanford University computer scientist who studies machine learning. TV and film studios have routinely used special effects to convincingly modify video. Even a photorealistic painter can create a trick image by hand. Generative AI has simply upped the scale of what’s possible.

People will ultimately have to change the way they approach information, Weber-Wulff says. Teaching information literacy and research skills has never been more important because enabling people to critically assess the context and sources of what they see—online and off—is a necessity. “That is a social issue,” she says. “We can’t solve social issues with technology, full stop.”