Alternate title: Reason #451,632 that computer software, no matter how many times its vendors call it AI, should be allowed to assess student writing. Though you can also file this under "reasons that content knowledge is the foundation of literacy."
Our ability to use language is astonishing and magical. Now that the Board of Directors are 4.5 years old, I've again lived through the absolutely amazing spectacle of human language development. There are so many things we do without thinking--or rather, we do them with thinking that is barely conscious. And this is where software is still trying to catch up.
Meet the Winograd Schema. It's a collection of sentences that humans have little trouble understanding, but which confuse computers.
Frank felt crushed when his longtime rival Bill revealed that he was the winner of the competition. Who was the winner?
The drain is clogged with hair. It has to be cleaned. What has to be cleaned?
It's true that if a student wrote these in an essay, we might suggest they go back and punch the sentence up to reduce ambiguity. But for English language users who understand rivals and winners and competition and hair and drains and clogging, it's not hard to understand what these sentences mean.
Well, not hard for humans. For computers, on the other hand.
It's always important to remember that computers don't "understand" anything (as my professor told us in 1978, computers are as dumb as rocks). What computer can do is suss out patterns. Software that imitates language use does so basically (warning: gross oversimplification ahead) by just looking at giant heaps of examples and working out the pattern. When you read that GPT-3 is better than GPT-2, mostly what that means is that they've figured out a way to feed it even more examples to break down. When engineers say that the software is "learning," what they mean is that the software has broken down a few thousand more examples of how and when the word "hair" is used, not that the software has learned what hair is and how it works.
This type of learning is how AI often wanders far astray, learning racist language or failing to recognize Black faces--the algorithm (really, a better name for these things than AI) can only "learn" from the samples it encounters.
So AI cannot read. It can only look at a string of symbols and check to see if the use of those symbols fits generally within the patterns established by however many examples it has "seen." And it cannot tell whether or not your student has written a coherent, clever, or even accurate essay--it can only tell if your student has used symbols in ways that fall within the parameters of the ways those symbols have been used in the examples it has broken down.
Essay assessment software has no business assessing student essays.
As a bonus, here's a good little video on the topic from Tom Scott, whose usual thing is unusual places, but who also dips into language stuff.
A variation on how computers don't really do induction with anything like the mind-eluding speed (if occasional inaccuracy) of people. Kathryn Schulz discusses this in her brilliant "Being Wrong" book: Give a person the sentence "The giraffe had a very long _______," and pretty much everyone over 5 who's played with animal-oriented toys knows the answer. But a computer has to evaluate the probabilities, in turn, of "neck," "criminal record," "nap after his customary three-martini lunch," "train ride home," etc.
ReplyDeleteChatgpt can answer all these questions
ReplyDelete