Friday, May 22, 2026

ChatGPT vs. BABEL

One of the great heroes in stemming the tide of using robots to evaluate student writing has been Les Perelman, who taught writing and composition at MIT (among other places), where he served as Director of Writing Across the Curriculum. He went had to head with many robograders, repeatedly revealing their many deficiencies. 

But he got tired of hearing bot companies claim that sure, maybe an expert writer could fool the bot, but a regular student could not. So, "I decided to test that hypothesis," he wrote, "by attempting to fool the computer with something less intelligent than any student, another computer." So he and his team created Basic Automatic B.S. Essay Language Generator (BABEL Generator), a program that could generate fluid gibberish. Perelman then passed the BABEL gibberish through various robograders, most of which gave the gibberish high marks, even though it was senseless (but grammatically smooth) junk.

BABEL was on line for a while, free for anyone to play with, so I played, and while it sadly appears to no longer be there, I still have some of the gibberish BABEL whipped up, so I thought I would pretend to be a student and get some "help" with my stuff from ChatGPT. 

What's the point, you may ask, of giving ChatGPT stuff that was not generated by a human being in good faith. 

First, an awful lot of what is crossing teacher desktops these days is, in fact, work not generated by humans in good faith. And as more teachers adopt LLMs as "aids" to "help" grade student work, the more students must question why on earth they should even try to generate human good-faith writing when there will be no human good faith attempt to read it. AI judging the work of AI is already happening. So I don't think this exercise is all that removed from classroom reality.

Second, watching what ChatGPT does when asked to respond to gibberish with no actual human intent or intelligence is instructive. Just for clarity's sake, I'll note that I was using the basic free-when-you-log-in version. 

This is going to take up some space, but I think it's worth it. Here we go.

Sentence Rewrites

Will start with an easy one. Given "privacy" as a prompt, BABLE extruded this pair of sentences:

Privateness has not been and undoubtedly never will be lauded, precarious, and decent. Humankind will always subjugate privateness.

I asked ChatGPT to "rewrite this sentence." It did not correct my lack of a plural, but it did come up with this:

Privacy has never been fully respected and likely never will be. Humanity will always find ways to undermine it.

It also suggested that if I wanted "to keep it more philosophical."

Privacy has never been truly honored, nor is it likely ever to be. Humanity will always seek to subordinate it.

Faced with three adjectives that make no sense, ChatGPT picked the one that came closest to a word that would complete the sentence and picked a more probable word. It took the same approach with "subjugate." In the case of "humankind" and "privateness," it went to a more usual word. It also killed BABEL's legitimate parallel construction. If BABEL had a voice, ChatGPT edited it out.

A Full Paragraph

On to a harder challenge. I had BABEL spew a full essay based on three words-- isolationism, Europe and war. The introductory paragraph was this one:

Warfare has not, and no doubt never will be expedited. Human society will always oust isolationism; some of appetites and others for an exposition. a lack of war lies in the field of literature but also the field of philosophy. War is the most inappropriately eventual trope of mankind.

The previous BABEL work might have meant something if you squinted real hard, but this is some top notch junk. When I asked ChatGPT to rewrite it, an acceptable response might have been "What the hell are you even talking about?" I also asked ChatGPT to "evaluate" the paragraph. 

Warfare has never been, and likely never will be, eradicated. Human society continually rejects isolationism—driven by ambition, desire, and the pursuit of dominance. The aspiration for peace belongs not only to literature, but also to philosophy. Yet war remains one of humanity’s most tragically recurring realities.

It also offered a "more literary" version.

War has never been fully overcome, nor likely ever will be. Humanity resists isolation, compelled by appetite, ambition, and the longing for expansion. The dream of a world without war exists in both literature and philosophy, yet conflict endures as one of mankind’s most persistent and tragic conditions.

Here, ChatGPT just throws out the words that don't make sense and looks for a more probable match based on the parts of the sentence that it leaves intact. "Warfare" and "expedited" are not probable sentence companions. It keeps warfare and ignores expedited. In its notes, ChatGPT wrote "Expedited means “sped up,” which does not fit the context. You likely mean eradicated, prevented, or avoided." 

Of course, BABEL didn't "mean" anything. But for ChatGPT words exist as coordinates in a multidimensional matrix (like a bunch of islands in Steve Ditko's renderings of Dr. Strange) assigned based on all those pieces of writing that the company stole for training, and what ChatGPT can check is whether or not the coordinates it has been fed in the form of your sentence are associated with each other or not. "Warfare" and "expedited" don't appear in the same neighborhood (especially if you throw in "never" and "not"). 

But ChatGPT is programmed to pretend that it can infer the writer's intention. In its "evaluation" of the paragraph, it uses phrases like "you may mean" and "the intended meaning," but again-- BABEL had no intent when it wrote this (just as ChatGPT had no intent when it replied). And it once again cannot address that this passage is nonsense:
Your passage has strong philosophical ambition and a serious tone, but several word choices and constructions make the meaning unclear. It reads as if you are aiming for a dense, literary style, though some vocabulary is being used imprecisely.

No, the passage had no ambition at all. And to say the construction makes the meaning "unclear" is not a human reaction to this gibberish. But ChatGPT deals mostly in fluency and structure, and BABEL's paragraph mostly delivers that. 

Finally, this stumper.

That opening paragraph was the closest to intelligible that BABEL came in the whole essay. Here's something from the middle of the essay:

As I have learned in my semiotics class, isolationism is the most fundamental casuistry of humankind. Though interference for obloquy inverts, information processes brains. The same pendulum may process two different orbitals to process an orbital. The plasma is not the only thing the brain reacts; it also receives neutrinoes for irascibility with war. Due to interceding, petulantly but extraneously petulant expositions protrude also on Europe. a contemptuous isolationism changes the injunction at warfare.

ChatGPT was pretty much stumped by this. The detailed responses focused almost entirely on versions of "the meanings of word X and word Y don't have any connection I can find." The summation:

This passage has an intentionally intellectual and abstract tone, but it currently reads as semantically unstable — many words are either used incorrectly or combined in ways that obscure meaning rather than deepen it. It resembles philosophical or post-structural writing stylistically, but the logical connections between ideas are difficult to follow.

So "intentionally"?? ChatGPT extrudes even more inferences about the writer's intent, including "The passage demonstrates a strong desire for philosophical density and elevated diction." I do like the phrase "semantically unstable" and would probably use it myself if I were still grading student papers; it doesn't mean anything, really, but sounds smarter than "this is a freaking mess." And "resembles philosophical or post-structural writing stylistically" is a hoot. This passage did push ChatGPT to issue an actual correction: "The brain does not meaningfully 'receive neutrinos for irascibility.'” The response also includes the notion that there are "traces of themes." Also, "The result feels more surreal or experimental than analytical." They do indeed. Also, "feels"?

Also, it was at this point that ChatGPT suggested I upgrade for more memory.

Prompted to rewrite the passage, ChatGPT extruded this:

As I learned in my semiotics class, isolationism may be one of humanity’s most enduring rationalizations. Although public interference and condemnation often reverse political intentions, information continues to shape the human mind. Just as a single pendulum can influence multiple motions, one idea may alter many intellectual trajectories. The brain responds not only to physical stimuli, but also to invisible pressures—fear, ideology, and the persistent anxieties surrounding war. Through political intervention, emotional and often unnecessary disputes continue to spread across Europe. In this way, a contemptuous form of isolationism reshapes humanity’s understanding of warfare.

ChatGPT just throws up its imaginary cyberhands and falls back on making shit up.

What can we learn?

AI apologists may argue that, gee, ChatGPT was just trying to give BABEL the benefit of the doubt, or fall back on the old argument that ChatGPT would be fine with human good faith efforts. I disagree.

Steve Ditko
Imagine this scenario. You are on the phone with a mentor, and as you pour out your thoughts, they reply with a steady stream of "Uh-huh" and "I hear you" and "That's a good point." You decide to continue the conversation in person, so you put your phone in your back pocket, still connected, and run to their office, where you find them on their phone, still saying "Uh-huh" and "I hear you" and "That's a good point" even though they are just listening to your butt. What would you conclude about how closely they had been listening to you when you were actually talking?

ChatGPT makes comments about writer intent and ambition and purpose not because of anything that the writer has written, but because that's just how ChatGPT is programmed to "talk." It is "hallucinating" writer intent and the meaning of the writing itself, latching onto a couple of word-coordinates and pulling up some nearby word-coordinates.  

While I was working on this post, a post from Adam Kucharski popped up on my feed that is just astonishing. Copilot looked at two datasets of writing, one labeled UK and one labeled US; Copilot was asked to mine the two 2000 responses for cultural differences, which it did, with relish. But the twist-- the two datasets were just two copies of exactly the same 2000 responses. 

It's not just that AI "analysis" of your writing is bad (though it is). It's that what you have written is largely irrelevant to the process that the chatbot goes through. It answers the same question that chatbots always answer-- what would the thing you just asked for look like? It does not perform an analysis; it creates a simulacrum of what an analysis would look like, and if any of that happens to be an accurate analysis, that's purely an accident. It cannot detect the difference between an actual human expression and a simulacrum that imitates human expression. 

Analysis of a piece of writing must include some inferences about the writer's intent. Chatbots are not remotely capable of making such inferences. But as we see above, even when such intent literally does not exist. it doesn't matter to the chatbot because it's just making shit up about all that anyway. It cannot tell the difference between a human in the loop or another bot, and that suggests that it can't "see" the human in the loop even when it's there. 

No comments:

Post a Comment