Thursday, March 7, 2024

A Truly Terrible Use For ChatGPT And Its Ilk

"Teachers are embracing ChatGPT-powered grading," says the headline at Axios, and with all my heart I hope that's not true, because what a terrible idea. What a supremely terrible awful bad idea.

The good (-ish) news is that the article's source for this reported embrace is one of the companies pushing it. The bad news is that the company is Houghton Mifflin Harcourt, the 800 pound gorilla of school instructional materials. 

It's not that tech has no place in the world of writing (obviously). There have some good pieces of software, for instance, that grew out of the idea of providing a quicker, simpler way of attaching those comments that you find yourself using on student work over and over and over again (though in my experience, as with lots of software timesavers, there's a huge investment of time up front to get the time savings further down the line). 

But to use a bot to assess writing? Crappy idea.

This latest version (Writable) tries to soften the blow by calling for a "hybrid" system with "a human in the loop," which seems to mean that the bot assesses the writing and the human looks over its work, just in case. But why bother? To really check the bot's work would take as much time as just assessing the writing yourself. No, a human in the loop is just a wink wink nudge nudge moment, a way to help folks pretend that things haven't gone too far yet.

But what a lousy idea. Let me count the ways.

The software just isn't very good at it.

We have been over and over and over and over and over and over and over this. Computer software does not "know" or "understand" in any conventional sense of the words. Once you get past the very technical explanations (and here are three good ones of varying complexity), what AI language generating models do is decide, based on all the examples fed into them, what a very probable sentence might be. Give it a topic and specific sort of prompt (which basically allows it to narrow its sample base of examples), and it will give you a high-probability string of words. As an essay grader, what it can do is turn that around and decide if the submitted material falls within the probability parameters established by the examples it has "learned" from.

What it can't decide is whether or not the student has written something stupid. It may spot whether or not the student has included a specific example for support, but it can't judge how good an example it is. And given generative AI's propensity for just making shit up, it's not clear how good it would be at catching students doing the same. 

Misplaced trust in authority.

Your computer cannot think, does not understand, is not smart in a conventional human sense of the word. It's an object whose virtues are an absolute tireless ability to follow instructions at the speed of light. 

But since they first poked their heads into pop culture, computers have been portrayed as possessing some sort of objective superhuman wisdom and knowledge. And human beings continue to defer to computers as having some higher level of authority.

However, computers are machines. They do exactly what their human programmers tell them to do. Even when they employ machine learning to "teach" themselves, they do so according to the instructions of human programmers. In short, computers do not implement and express the computed wisdom of some higher power; they simply implement the ideas of whatever humans programmed them. 

When it comes to insights that might take a human a lifetime to work out, like complicated computations, computers get us knowledge that we can trust and which would have been hard to find otherwise. But an essay is not a computation, and a computer has nothing to offer that improves on human judgment. Software assessment of writing should just be viewed as humans using the programming to make a judgment about writing, not as some sort of objective wisdom over and above what humans could provide. Yet, I'm afraid that some folks will view it as exactly that, and instead of treating the software assessment as they would one more human voice in the room (whose judgment might be suspect), they'll treat it as some digital Word Of God.

Distorting the entire process.

Writing is the work of communicating thoughts, ideas, emotions, and other human stuff to other human beings. Stringing words together in order to satisfy the algorithm is not any sort of meaningful writing (and that is true even if the algorithm is being applied by humans). This is conditioning young humans to string words together in a manner completely unrelated to anything they want to say or express.

Lord knows we don't need computers to promote this bad kind of word spitting. I've seen too many students who figured out that trying to focus on what they actually think or believe just gets in the way of satisfying the assessment algorithm that gives them their grade. And the Big Standardized Test only enshrined that sort of anti-writing as a important goal. 

What do you suppose it does to a student's approach to writing when they start with the understanding that they are writing not for a human audience, but a computerized one? Not to communicate, but to perform word spitting for a digital audience? 

Writable and its brethren are pitched as tools to save labor and time, but they save that labor and time by changing the very nature of the task and distorting the learning goals for students. 

It could be worse, I suppose. The software could be wired to a dispenser that fed students a piece of candy every time they spit out an especially probably string of words. Or it could aim even more directly at the current internet cyber-hell, where AI spits out articles designed to be pleasing to the AI that pushes those articles on search engines-- "Ten Weird Tricks I Used To Enjoy My Summer Vacation (You won't believe number eight)"

I sure hope teachers don't embrace this attempt to train human children to become word spitting widgets. We can do better. 

1 comment:

  1. I have a Sophomore in college. He texted me last semester raving mad that the college was allowed to use ChatGPD grading, but that if students were caught using the same kinds of applications to write papers, those assignments would be given a "0". I told him that I thought the use of these types of applications were wrong on both sides! All the $$$ that parents are paying for college educations and computers are getting to run the system....and yes, there are still MANY a-synch and online classes for full time students.

    ReplyDelete