close
close

topicnews · October 23, 2024

Google releases technology to watermark AI-generated text

Google releases technology to watermark AI-generated text

Google is making SynthID Text, its technology that allows developers to watermark and recognize text generated by generative AI models, generally available.

SynthID Text can be downloaded from Google’s Hugging Face AI platform and updated Responsible GenAI Toolkit.

“Today we are making our SynthID Text watermarking tool open source,” the company wrote in a post on X. “It is available free to developers and companies to help them identify their AI-generated content.”

So how does it work?

Given a prompt like “What is your favorite fruit?”, text-generating models predict which “token” is most likely to follow the other – one token at a time. Tokens are the building blocks that a generative model uses to process information. This can be a single character, a word, or part of a phrase.

The model assigns each possible token a score equal to the percentage probability that it is included in the output text. According to Google, SynthID Text inserts additional data into this token distribution by “modulating the probability of token generation.”

“The final scoring pattern for both word choices from the model, combined with the adjusted probability scores, is considered a watermark,” the company wrote in a blog post. “This scoring pattern is compared to the expected scoring pattern for text with and without watermarks, helping SynthID identify whether the text was generated by an AI tool or whether it may have come from other sources.”

Google claims that SynthID Text, which has been integrated into its Gemini models since this spring, does not compromise on the quality, accuracy, or speed of text generation and works even on text that has been trimmed, paraphrased, or altered.

However, the company also admits that its watermarking technology has limitations.

For example, SynthID Text does not perform well on short texts, texts rewritten or translated from another language, and answers to factual questions. “When responding to factual prompts, there are fewer opportunities to adjust token distribution without compromising factual accuracy,” the company explains. “This also includes questions like ‘What is the capital of France?’ or requests where little or no variety is expected, such as “Recite a poem by William Wordsworth.”

Google isn’t the only company working on AI technology for text watermarking. OpenAI has been researching watermarking methods for years, but delayed their release due to technical and commercial concerns.

If watermarking techniques become widespread, they could help turn the tide against inaccurate – but increasingly popular – “AI detectors” that falsely label generic essays.