AIToday

Authors Guild test shows AI detectors vary wildly—some perfectly identify human writing, others flag all texts as AI-generated.

THE DECODER16h ago5 min read
Authors Guild test shows AI detectors vary wildly—some perfectly identify human writing, others flag all texts as AI-generated.

Key takeaway

The Authors Guild tested several AI detectors on ten published articles and found stark differences in reliability. Pangram and Grammarly correctly identified all texts as human-written, while Sidekicker flagged every article as mostly AI-generated. The Authors Guild warns these tools should never alone decide whether text is human or AI, since professionally written language and AI output share similar statistical patterns—making it hard to distinguish skilled human writers from machines trained on their work.

Summaries like this, in your inbox every morning.

Sign up free →

3 Key Points

  • What happened

    Pangram and Grammarly correctly identified every human-written text as human in a test by the Authors Guild using ten articles published between 2020 and 2022. Originality.ai also performed well. By contrast, Sidekicker flagged every single article as mostly AI-generated, with two scoring 100 percent, and ZeroGPT reported sometimes high AI percentages for all the human-written texts.

  • Why it matters

    False positives from unreliable detectors can cost authors their contracts and reputations. The Authors Guild warns that even the best-performing tools should never be the sole basis for any decision, and publishers should disclose their methods and give authors a chance to defend themselves. The core problem: professional writers and language models produce statistically similar text because models were trained on human-written material, making detection nearly impossible for the best-performing tools.

  • What to watch

    The test results apply mainly to correctly recognizing human writing—Pangram and Originality.ai are tuned to minimize false positives—but do not necessarily show how well they catch AI-generated texts. The Authors Guild notes that detection tools change constantly and their accuracy cannot be taken for granted.

FAQ

Which AI detectors performed best in the Authors Guild test?
Pangram and Grammarly correctly identified every human-written text as human. Originality.ai also performed well.
Which detector performed worst?
Sidekicker delivered the worst results, flagging every single article as mostly AI-generated, with two scoring 100 percent. ZeroGPT was also unreliable, reporting sometimes high AI percentages for all the human-written texts.
Why do AI detectors struggle to identify human writing?
Language models were trained on professional human writing, so professionally written texts share many of the same statistical patterns as AI output. Detection tools cannot distinguish between a human writer who has mastered the craft and a machine that has learned to imitate it, because at the level these tools operate, there may be little difference to find.

Discussion

No comments yet. Be the first to share your thoughts!

Log in to join the discussion

Related Articles

Stay ahead with AI news

Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.

Get Started Free

Free · takes 30 seconds · unsubscribe anytime

5 minutes a day. The AI essentials.

200+ sources · Email / LINE / Slack

Get it free →