Spot the Human

Why detecting AI plagiarism is a losing battle.

Jul 05, 2023

Originally published on rso.ai on the 23rd of February 2023. Things have not changed much since then.

The last few months have been exciting: OpenAI’s ChatGPT was opened to the masses, for free.

ChatGPT (or GPT3) is an example of a Large Language Model (LLM) – a mathematical approximation, designed with one purpose: producing text that looks human-written.

If you have not used it yet, ChatGPT is accessible here. It is worth surrendering some of your personal details for a quick experiment (as long as it is not showing you apologetic sadfaces because it is too busy.)

ChatGPT is just one of a family of tools designed with the same purpose. These tools have their limitations, and I am usually the first to point it out. For now, however: these models really do produce adequate text that appears human-written.

Understandable concerns around machines producing human-like text include the issues of plagiarism and disinformation. AI-text detection tools are now an emerging business and include GPTZero, or TurnitIn’s not-yet released addon (this has since been released). These tools promise the ability to spot LLM-generated text with a high degree of certainty.

Some people already swear by GPTZero, and, unfortunately, I am not one of them.

In fact, I would state decively that GPTZero-like approaches to solving AI plagiarism will not work…

Not now, not later - they cannot work in principle, and I hope to explain why below.

It is not all doom-and-gloom, however, so please read on!

What does AI plagiarism look like?

An obvious use for a tool that writes extensive prose that looks like a human wrote it is, of course, homework plagiarism. Or any sort of plagiarism for that matter.

Students can (and do) submit their History essay title as a prompt, and ChatGPT happily produces some text in response. The text may or may not be factually correct; it may or may not answer the title question; it may or may not be of a high quality.

What the generated text most definitely will be is:

grammatically correct,
well-punctuated,
not the student’s own work,
and a waste of the teacher’s time.

This brings us back to the counter-tools like GPTZero.

The stated aim of these counter-tools is distinguishing generated text and genuinely human-written text. And, GPTZero seems to do its job well – for my test cases.

You are welcome to try it for yourself too: GPTZero is currently free to use (with some limitations), just as ChatGPT currently is free to use.

Enter a prompt of your choosing into ChatGPT, transfer the response over the GPTZero – and feel reassured to see AI text being flagged as AI text (most of the time).

Before we get into why it cannot work (despite it appearing like it obviously does), let us consider how GPTZero functions under the bonnet.

How does GPTZero detect generated text?

The following is true for GPTZero and would likely apply to other similar products too.

It uses a different Language Model, one that was shown thousands of human-written essays, to assess the likelihood of a piece of text being generated by any LLM.

It uses two measures to arrive at such an evaluation: ‘perplexity’ and ‘burstiness’, both terms previously known from statistics and NLP (Natural Language Processing) research.

Perplexity measures how surprising each next word is, while burstiness measures how this surprise level changes throughout the text.

Burstiness also refers to the idea that when humans write text about a broad topic, specific words appear in bursts rather than being evenly spread throughout. For example, in this article about ChatGPT and GPTZero, references to “models” tend to cluster in some paragraphs and yet be completely absent from others.

Simply put, both measures represent the degree to which a piece of text can be considered consistently boring in terms of its linguistic structure. And, if you have been following this topic for a while, you would agree that typical GPT-generated text has a certain feel and rhythm to it.

Humans do not usually write the same way.

Tricking the detector

No matter what you think of this article – you got this far at least – it was a challenge for me to write something that would get flagged as “likely AI written” by GPTZero. I kept slipping around fourth or fifth sentence and giving myself away as a human. My best result was supposed to be the opening paragraph, but can now be found in the appendix.

This is not to say that there are no false positives in the wild: portions of well-established old (therefore definitely not generated) Wikipedia articles were flagged as “possibly written by AI,“ as were some passages by the amazing Randall Munroe. I could not, however, get a false positive out of any of the student work that I could get my hands on; they are just too “weird and wonderful,” as a colleague would put it, to be flagged as “AI written.”

What about false negatives? I.e., LLM-generated text that gets labelled as “likely written by a human”. Ultimately, this is where the real task lies – stopping humans attempting to pass off AI work as their own. Again, it was a challenge to lead ChatGPT to produce a passage that would be out of character for its usual boring self. But it worked, and I give a couple of examples further below.

It took some time, some back-and-forth, but it worked. It is important that it worked at all, and to me this was enough to conclude that GPTZero is not the solution to the problem of AI palgiarism.

Not because it failed a couple of times, no – the developers themselves claim only a modest 96% detection rate on their tests.

There is a deeper reason.

The paradox of AI detection

I previously defined the purpose of LLMs like GPT3 as “producing text that looks human-written.” The purpose of GPTZero can be rephrased as “recognising text that looks human-written”.

Hm… We have a paradox on our hands, akin to: What happens if an all-destroying cannon ball hits an indestructible wall?

This time it is: What happens if an indistinguishably human-like AI text was analysed by a perfect AI detector?

In both cases the answer is: the two entities cannot exist at the same time. Either we have a Language Model that writes like humans, or we have a detector that spots an LLM failing to write like humans.

As long as neither side of the tech is perfect they can coexist in an arms race for some time.

The question then: Which side is playing catch up?

Given that ChatGPT has displayed that it can generate non-boring, not to say factually correct and relevant, text that passes as “human-made”, it means the detectors will be at a loss.

Our current LLMs can already write comparably to humans, and they will only get better.

Exploiting weaknesses

A fair question to ask is: why do the detectors appear to work that well, let alone at all, if our LLMs are that good?

Well, the world’s most exquisite, fine-tuned violin would fare no better than a saucepan in my nonmusical hands. Our LLMs are extremely powerful tools that seem to default to very predictable, read: dull, linguistic patterns in basic usage. Our detection methods exploit that, and hence, appear to work very well at first glance.

With elaborate prompting and a degree of skill, the models can be nudged out of this plateau of boredom into more idiosyncratic ‘humanness’ of their vast training data. I achieved my false positives below by providing the model a colourful piece of real human writing as an example to emulate first: a passage from Tolstoy and an extract from a high-quality student essay.

ChatGPT obligingly produced more text that sounds the same – with the same level of “humanness” to it. And emulate Tolstoy it can, at least in style, if not yet (if ever) in a deeper creative quality.

So, approach number one is learning to play the instrument that you are using…

…and then there is always the heartless cynical strategy of pitting one machine against the other.

Why not:

ask an LLM for an essay,
pass it to GPTZero for flagging,
return the suspicious flagged sentences to the LLM for mindless rewriting,
repeat the above steps until the entire essay is “likely written by a human.”

This is exactly what this entirely not-endorsed-by-me-or-RSO website does, which, I am sure, is one of many to come. We also don’t have the time to talk about finetuning models against detection, which I would see as the next escalation.

The better our detectors get, the more we can exploit them - the irony of the situation sums up the state of this arms race.

The way forward

Do we give up on trying to control AI plagiarism and “let it rip”? Of course not, we just have to be pragmatic with our expectations. My aim with this article is to demistify the systems at play to help with realistic judgmenent.

We have never historically been 100% certain that any given piece of work or publication is truly the work of that individual, outside of standing over their shoulder and watching them scribble it on a piece of paper. As teachers, and as humans in general, we have heuristics that allow us to assess the likelihood of a piece of work being genuine - much like what GPTZero is trying to do.

GPTZero will still work for most of the primitive cases of “Oh, magical computer, give me an essay”, and it would also deter a lot of the bad apples from trying.

We simply must not assume that it solves the problem.

So how do we evolve and adapt to the “post-AI world”, as I have seen it already be described? On one hand the Large Language Models have not introduced any genuinely new problems into this aspect of our world - we had plagiarism and disinformation before.

On the other hand, they are a powerful language tool for those of us who struggle with committing words to paper. One can now be adept at writing through the skill of taming the LLM, ‘challenging’ it to think outside its default box and exploring the landscape of ideas that is concealed within.

It really is a bright new world, even if we are losing the battle we never had a chance of winning in the first place. We should learn to play on the new instruments, even if for no other reason than that they are here to stay.

And it’s ok.

Appendix

Tolstoy-inspired “likely written by human” ChatGPT passage:

The morning was bright, the road before us stretching like a ribbon through the cornfields, with dew sparkling on the blades of grass. Small birch trees and bushes threw shadows on the uneven surface of the track. Larks sang and soared above us, their melody rising above the monotonous sound of our carriage wheels and collar-bells. The air was filled with a mixture of scents, from the musty odor of moth-eaten cloth and dust to the freshness of the morning.
I felt a sudden urge to be up and about, a sign of true enjoyment. I had forgotten to say my prayers in the inn's courtyard, and this omission filled me with a sense of unease. I remembered a warning that some misfortune would befall me if I missed a single day. So, I quickly took off my cap and stooped in a corner of the britchka. I recited my prayers, making the sign of the cross beneath my coat.
Amidst the distractions of the surrounding objects, I inadvertently repeated some of the prayers twice over.

Extract from a ChatGPT History essay, inspired by student work, “likely written by a human”:

Many changes took place in the Middle East between the years 1917-47. These changes included territorial, political, and violence changes. These changes happened because of the actions of the Jews and Arabs, as well as the policies of the British.
The British policy was the most important factor in facilitating change in the Middle East. This was mainly due to the Balfour Declaration. However, British policy was not constant, it was always changing and conflicting.
The Balfour Declaration of 1917 was a significant policy. It promised Palestine to the Jews as their 'national homeland'. However, Palestine was already inhabited by the Palestinian Arabs. This policy also conflicted with earlier agreements like the McMahon-Hussein agreement of 1915.
As a result of the Balfour Declaration, Jewish immigrants started to flood into Palestine. This influx of Jewish immigrants led to massive and irreversible demographical change. From 1922 to 1936, the Jewish population in Palestine increased from 17% to over 25%.
In summary, the Middle East underwent many changes during the years 1917-47, with territorial, political, and violence changes. The changes were primarily facilitated through the British policy, Jewish actions, and Arab actions. However, the most important factor in facilitating change was the British policy, especially the Balfour Declaration.

My attempt at AI-like writing that was flagged by GPTZero as “likely written by AI” (a bit of an unfair test given how short the passage is):

The ease of use and accessibility of the language models have raised considerable concerns around the issues of plagiarism. Educators are worried that their students are using the products to cheat in their assessments, while missing out on key learning.

Happy to be wrong

Discussion about this post

Ready for more?