There are valid use cases for using AI to help you with the writing, eg for non-native speakers. It could also lead to false positives for random texts written by real people. In the end this could cause all kinds of harms to small number of people basically randomly, eg imagine that your university application is randomly flagged as AI generated by mistake and you will be rejected because of that.
It appears that the guiding political calculus for the next 4 (+?) years is out in the open for all to see: 1. Does any proposal cause any headache for anyone who financially supports the Administration (i.e., the big tech barons in front row at the inauguration)? -- If so, it's dead in the water. 2. Does a proposal limit the Administration's (or its funders') ability to spread disinformation (including about elections)? -- ditto. Your excellent proposal and arguments might, maybe, pass #1 (if big tech is convinced), but would likely fail on #2.
I suppose one could argue that corruption in plain sight is better than corruption in hiding, but if that's where we are now... ouch.
>> there is a way, one implementable immediately, with high upside and zero downside: ordering that AI outputs be robustly “watermarked” such that they’re always detectable.
This is an exaggeration. Mandated AI watermarking, like any type of regulatory system, is a matter of costs and benefits. Even if its net positive, it will involve trade-offs.
- Establishing, tracking, and enforcing watermarking regulations will require layers of costly administrative bureaucracy for the government, corporations, and individuals.
- We cannot be 100% certain that the imposition of an arbitrary constraint on token production has zero impact on current and future model capabilities. Like everything about this technology, we should be somewhat modest in our claims to know exactly how it works and will evolve.
- While semantics-based watermarking might be harder to spoof than pure token-based schemes, it seems plausible that any form of watermarking will kick off an enforcement/evasion arms race, which could to lead additional unforeseen costs.
- Finally, while, intuitively, watermarking might help with the AI slop problem it's not at all obvious that it definitely will. The dynamics that lead to slop are extraordinarily complex, they involve a mix of demand- and supply- side issues, and we shouldn't make super-confident claims that we can accurately predict how any one particular mechanism will affect them.
While I'm largely sympathetic to your overall goals here, I think a more nuanced approach that was less pure advocacy would be far more persuasive.
Erik, I think you make great points regarding the effectiveness and necessity of AI watermarks. However, what is your definition of politically neutral? Technology and science are inherently political, where political is defined as relating to power and authority. You claim that the watermarks are politically neutral, but if they will help with minimizing election interference, what does that mean to a President who spread blatantly false election fraud claims and incited his supporters to try and overturn the verification of an election? What does that mean to Elon Musk who is letting political bots run rampant on X which spread misinformation about the election? Even though you claim it is politically neutral, the technology still has inherent political values of democratizing access (open-source) and keeping power in check. Watermarks are similar to copyrights which are heavily political. I think you should re-consider your use of politically-neutral in favor of acknowledging that all tech is political
By your definition, all ideas and sentiments are political because any person can have an ideological position on anything. "Spinach is gross" is a political sentiment when said by the Spinach is Gross Party.
But to be politically neutral means that a sentiment is not inflected to support or detract from a political entity's interests. Otherwise, we must say that objective truth itself is political. Reading a thermometer aloud would be political.
We can't help if emergent reality aligns with the understanding and beliefs of one political entity more than another, but we can adopt a position of insisting on truth and consistency irrespective of who it benefits. That's political neutrality.
I'm not saying all ideas and sentiments are political, I specifically mentioned technology which is a human constructed artifact - as referenced in this article by Langdon Winner (https://faculty.cc.gatech.edu/~beki/cs4001/Winner.pdf). It's a good introduction to understanding that technology is not created in a vacuum, it is shaped by societal processes and by humans who have their own political beliefs.
Let me take both of your examples. Spinach itself is not constructed, it just exists, so it is not inherently political. It becomes political within the context of spinach exports, or spinach production, or as you mentioned, a hypothetical Spinach is Gross Party which may lobby powerful institutions and people to advocate their position.
On the other hand, the thermometer is a technological artifact which is highly political. Thermometers used to be made with mercury, however as it was discovered that mercury is toxic, mercury thermometers stopped being used in medical settings. In its basic construction, the thermometer had effects on directly harming individuals - a political action.
Furthermore, thermometers today are used for meteorology experiments to understand how our planet is warming. As you claim, thermometers provide a basis of truth that is not political. But is it political when those thermometer readings are read aloud by climate scientists to say that our planet is warming? That is certainly a political statement to climate deniers. It is certainly political in response to a President who has taken the US out of the Paris Climate Accords, an effort to bring down global average temperatures.
You mention "emergent reality" - what is it? There is reality and there is how humans shape and perceive reality. Understanding the political construction of science and technology brings us further to the truth, ignoring how artifacts are shaped and assuming objectivity masks actual truth. Our goal shouldn't be objectivity, its not possible when you create technology and unleash it into the real world. Our goals should be transparency and designing technology which has specific values in mind.
Where exactly would the economic motivation come from? As you pointed out correctly, US politics run on donations, the big companies behind generative models don't want mandatory water-marking, so why would Republicans (or Democrats for that matter) make it mandatory?
The last decade has shown us that no one of note in US politics cared enough about election interference to interfere with the profit motive, and the same will play out w.r.t. generative model use.
You can't count on the faker being the authenticator. Just digitally sign all media, so that orgs and individuals can claim "this is real". People will blacklist liars and a hierarchy of trustworthiness will be developed over time.
With paraphrasing-resistant watermarking attackers have to use the model they plan on paraphrasing with, which I think will always be weaker (or else, what is all that money for compute being spent on?) I also feel it should be the AIs that get labeled, not us.
I can't help but feel that any real solution will only be outdated in 3-5 yrs or sooner. Like the joke of online security. I think that we are already in over our heads and treading water.
The power of technology has far outstripped the wisdom (and possibly know-how) to regulate it wisely for the benefit of all. There again, I'm guessing "the benefit of all" is not the intention for Big-Tech.
Watermarks are technically infeasible to do consistently and do not really provide enough detail or precision. The Content Authenticity Initiative already has broad industry support and is a much more robust solution for data provenance.
There's been many quiet research advances in robust watermarking. Given the state of the field, it seems very likely that, e.g., OpenAI could implement a good one, especially for larger outputs. And whatever you think about watermarking, it is obviously more effective than meta-data.
I bet my money on the Content Authenticity solution. It's a reputational solution to the problem that will work much better than the technical rat race of watermarking and removal.
My main concern here would be that, assuming people continue to use AI, we would be imposing new patterns of thought developed by a select few onto wide swaths of the population, especially with the semantic approach you mentioned. Granted, perhaps AI as it currently stands does this already. However, my (limited) understanding of AI is that its output emerges fairly organically and isn't explicitly told to mimic a specific speech pattern. The watermarking you're talking about sounds like it would be more of an explicit direction, although maybe I just don't understand its implementation well enough.
Regardless, if ~89% of students continue to use AI, then we should expect that usage to impact how they think and use language. That may be problematic enough as it is right now. But if we add an extra layer of having AI uses concepts in a specific, unusual, and non-human way (and I imagine it would have to be so or else it wouldn't be a watermark), then we would be shifting our own sense of meaning along with it, as we naturally adapt to the inputs we're presented with. And if specific people or groups of people get a more direct say in how those watermarks work, i.e. how those concepts should be made unnatural, then we'd potentially be allowing a few people to significantly impact the way we think on a very fundamental, structural level for years to come.
What I'm understanding from all this AI debate, is that the more sophisticated the technology, the more sophisticated are the needed regulatory mechanisms. As technologies advance, regulation is always 'after the horse has bolted' - especially due to the speed of AI development, and politicians being obligated to Big-Tech donors. One solution is mass disengagement; I don't have a smart-phone, and try to live simply a land-based life.
Knowing nothing of the subject except that it all depends upon electrical power squared, everyone and every company or government should pay a small surcharge routed through their internet provider or? …will the energy come from the unrealized concepts of Mr. Tesla and other discredited scientists like Kristian Birkeland or the coffers of the IRS and ERS?
After not really buying into watermarking for a long time, you have won me over! I didn't realize how many innovations have happened in the area recently. There are many reasons why I can think of as to why watermarking would not solve the problem of AI misinformation completely (foreign actors, fine-tuned open-source models, better paraphrasing, etc.) but I am a firm believer in not letting perfect get in the way of good when it comes to policy.
If we have a watermarking policy, the important part would be to make sure there is a cultural awareness that even if watermarked=AI, non-watermarked still doesn't necessarily equal human. Especially if there is sufficient incentive for an actor to spend enough time and money to get around the watermarking.
This seems like such a no-brainer. What are the cons (other than "it's not enough")?
There are valid use cases for using AI to help you with the writing, eg for non-native speakers. It could also lead to false positives for random texts written by real people. In the end this could cause all kinds of harms to small number of people basically randomly, eg imagine that your university application is randomly flagged as AI generated by mistake and you will be rejected because of that.
Great article.
It appears that the guiding political calculus for the next 4 (+?) years is out in the open for all to see: 1. Does any proposal cause any headache for anyone who financially supports the Administration (i.e., the big tech barons in front row at the inauguration)? -- If so, it's dead in the water. 2. Does a proposal limit the Administration's (or its funders') ability to spread disinformation (including about elections)? -- ditto. Your excellent proposal and arguments might, maybe, pass #1 (if big tech is convinced), but would likely fail on #2.
I suppose one could argue that corruption in plain sight is better than corruption in hiding, but if that's where we are now... ouch.
This sounds good, but I lack the technical competence to assess it.
I hope you will bring this to the attention of people who could make it actionable.
>> there is a way, one implementable immediately, with high upside and zero downside: ordering that AI outputs be robustly “watermarked” such that they’re always detectable.
This is an exaggeration. Mandated AI watermarking, like any type of regulatory system, is a matter of costs and benefits. Even if its net positive, it will involve trade-offs.
- Establishing, tracking, and enforcing watermarking regulations will require layers of costly administrative bureaucracy for the government, corporations, and individuals.
- We cannot be 100% certain that the imposition of an arbitrary constraint on token production has zero impact on current and future model capabilities. Like everything about this technology, we should be somewhat modest in our claims to know exactly how it works and will evolve.
- While semantics-based watermarking might be harder to spoof than pure token-based schemes, it seems plausible that any form of watermarking will kick off an enforcement/evasion arms race, which could to lead additional unforeseen costs.
- Finally, while, intuitively, watermarking might help with the AI slop problem it's not at all obvious that it definitely will. The dynamics that lead to slop are extraordinarily complex, they involve a mix of demand- and supply- side issues, and we shouldn't make super-confident claims that we can accurately predict how any one particular mechanism will affect them.
While I'm largely sympathetic to your overall goals here, I think a more nuanced approach that was less pure advocacy would be far more persuasive.
Erik, I think you make great points regarding the effectiveness and necessity of AI watermarks. However, what is your definition of politically neutral? Technology and science are inherently political, where political is defined as relating to power and authority. You claim that the watermarks are politically neutral, but if they will help with minimizing election interference, what does that mean to a President who spread blatantly false election fraud claims and incited his supporters to try and overturn the verification of an election? What does that mean to Elon Musk who is letting political bots run rampant on X which spread misinformation about the election? Even though you claim it is politically neutral, the technology still has inherent political values of democratizing access (open-source) and keeping power in check. Watermarks are similar to copyrights which are heavily political. I think you should re-consider your use of politically-neutral in favor of acknowledging that all tech is political
By your definition, all ideas and sentiments are political because any person can have an ideological position on anything. "Spinach is gross" is a political sentiment when said by the Spinach is Gross Party.
But to be politically neutral means that a sentiment is not inflected to support or detract from a political entity's interests. Otherwise, we must say that objective truth itself is political. Reading a thermometer aloud would be political.
We can't help if emergent reality aligns with the understanding and beliefs of one political entity more than another, but we can adopt a position of insisting on truth and consistency irrespective of who it benefits. That's political neutrality.
I'm not saying all ideas and sentiments are political, I specifically mentioned technology which is a human constructed artifact - as referenced in this article by Langdon Winner (https://faculty.cc.gatech.edu/~beki/cs4001/Winner.pdf). It's a good introduction to understanding that technology is not created in a vacuum, it is shaped by societal processes and by humans who have their own political beliefs.
Let me take both of your examples. Spinach itself is not constructed, it just exists, so it is not inherently political. It becomes political within the context of spinach exports, or spinach production, or as you mentioned, a hypothetical Spinach is Gross Party which may lobby powerful institutions and people to advocate their position.
On the other hand, the thermometer is a technological artifact which is highly political. Thermometers used to be made with mercury, however as it was discovered that mercury is toxic, mercury thermometers stopped being used in medical settings. In its basic construction, the thermometer had effects on directly harming individuals - a political action.
Furthermore, thermometers today are used for meteorology experiments to understand how our planet is warming. As you claim, thermometers provide a basis of truth that is not political. But is it political when those thermometer readings are read aloud by climate scientists to say that our planet is warming? That is certainly a political statement to climate deniers. It is certainly political in response to a President who has taken the US out of the Paris Climate Accords, an effort to bring down global average temperatures.
You mention "emergent reality" - what is it? There is reality and there is how humans shape and perceive reality. Understanding the political construction of science and technology brings us further to the truth, ignoring how artifacts are shaped and assuming objectivity masks actual truth. Our goal shouldn't be objectivity, its not possible when you create technology and unleash it into the real world. Our goals should be transparency and designing technology which has specific values in mind.
I agree with you on the technical merits. BUT!
Where exactly would the economic motivation come from? As you pointed out correctly, US politics run on donations, the big companies behind generative models don't want mandatory water-marking, so why would Republicans (or Democrats for that matter) make it mandatory?
The last decade has shown us that no one of note in US politics cared enough about election interference to interfere with the profit motive, and the same will play out w.r.t. generative model use.
You can't count on the faker being the authenticator. Just digitally sign all media, so that orgs and individuals can claim "this is real". People will blacklist liars and a hierarchy of trustworthiness will be developed over time.
With paraphrasing-resistant watermarking attackers have to use the model they plan on paraphrasing with, which I think will always be weaker (or else, what is all that money for compute being spent on?) I also feel it should be the AIs that get labeled, not us.
I can't help but feel that any real solution will only be outdated in 3-5 yrs or sooner. Like the joke of online security. I think that we are already in over our heads and treading water.
The power of technology has far outstripped the wisdom (and possibly know-how) to regulate it wisely for the benefit of all. There again, I'm guessing "the benefit of all" is not the intention for Big-Tech.
Watermarks are technically infeasible to do consistently and do not really provide enough detail or precision. The Content Authenticity Initiative already has broad industry support and is a much more robust solution for data provenance.
There's been many quiet research advances in robust watermarking. Given the state of the field, it seems very likely that, e.g., OpenAI could implement a good one, especially for larger outputs. And whatever you think about watermarking, it is obviously more effective than meta-data.
I bet my money on the Content Authenticity solution. It's a reputational solution to the problem that will work much better than the technical rat race of watermarking and removal.
My main concern here would be that, assuming people continue to use AI, we would be imposing new patterns of thought developed by a select few onto wide swaths of the population, especially with the semantic approach you mentioned. Granted, perhaps AI as it currently stands does this already. However, my (limited) understanding of AI is that its output emerges fairly organically and isn't explicitly told to mimic a specific speech pattern. The watermarking you're talking about sounds like it would be more of an explicit direction, although maybe I just don't understand its implementation well enough.
Regardless, if ~89% of students continue to use AI, then we should expect that usage to impact how they think and use language. That may be problematic enough as it is right now. But if we add an extra layer of having AI uses concepts in a specific, unusual, and non-human way (and I imagine it would have to be so or else it wouldn't be a watermark), then we would be shifting our own sense of meaning along with it, as we naturally adapt to the inputs we're presented with. And if specific people or groups of people get a more direct say in how those watermarks work, i.e. how those concepts should be made unnatural, then we'd potentially be allowing a few people to significantly impact the way we think on a very fundamental, structural level for years to come.
What I'm understanding from all this AI debate, is that the more sophisticated the technology, the more sophisticated are the needed regulatory mechanisms. As technologies advance, regulation is always 'after the horse has bolted' - especially due to the speed of AI development, and politicians being obligated to Big-Tech donors. One solution is mass disengagement; I don't have a smart-phone, and try to live simply a land-based life.
Yes my earlier point exactly. Glad to know that my husband and I are not the only ones who don't have a smart-phone.
This is a great idea!
Proof of Humanity and AI watermarks is the way to go. Great article.
Knowing nothing of the subject except that it all depends upon electrical power squared, everyone and every company or government should pay a small surcharge routed through their internet provider or? …will the energy come from the unrealized concepts of Mr. Tesla and other discredited scientists like Kristian Birkeland or the coffers of the IRS and ERS?
After not really buying into watermarking for a long time, you have won me over! I didn't realize how many innovations have happened in the area recently. There are many reasons why I can think of as to why watermarking would not solve the problem of AI misinformation completely (foreign actors, fine-tuned open-source models, better paraphrasing, etc.) but I am a firm believer in not letting perfect get in the way of good when it comes to policy.
If we have a watermarking policy, the important part would be to make sure there is a cultural awareness that even if watermarked=AI, non-watermarked still doesn't necessarily equal human. Especially if there is sufficient incentive for an actor to spend enough time and money to get around the watermarking.