The story comes to us from Francesca Paris at NPR's Here and Now, and it can serve as our sixty-gazillionth reminder that computer algorithms-- even ones that are marketed as Artificial Intelligence-- cannot grade student work to save their cybernetic lives.
A student in LAUSD's virtual school was dismayed when his first history assignment came back an F. It was short written responses, and the score came back instantly, so his mother figured out that it had been graded by the software and not a human being. But we just saw here a short time ago, computer algorithms can't really read, and they don't understand content--which would seem to be a real drawback when scoring history assignments.
This particular virtual school product is from Edgenuity, one of the more widely used school-in-a-can computer products. Its CEO won recognition at the EdTech Awards last year, and the company is in something like 20,000 schools. But it gets plenty of criticism for being standardized to death. It doesn't appear to have a great deal of bench strength when it comes to questions; users talk about how easy it is to just google answers while taking assessments, and Slate discovered that students can do well by just repeatedly taking tests, which turn out to ask mostly the same questions every time the student tries again.
But the student in this story didn't have to work that hard. Some quick trial and error yielded perfect results.
The question? "How did the location of Constantinople help it grow wealthy and prosperous?
Their answer:
It was between the Aegean and the Black seas, which made it a hub for boats of traders and passengers. It was also right between Europe and Asia Minor, which made it a huge hub for trade, and it was on many trade routes of the time. Profit Diversity Spain Gaul China India Africa
Yes, that's a non-sentence string of words at the end. Like other assessment algorithms, Edgenuity's appears to be looking for a few key terms, maybe signs that there's more than one sentence. But it has no idea what it's "looking" at.
This story, besides providing one more example of how assessment algorithms fail at anything but the simplest tasks, is also a demonstration of the danger of these stupid things. In just twenty-four hours, this program taught this student to answer questions with little attention to coherence or content meaning. Just string together the words the computer wants.
Advocates of computer assessment often point at tales like this and argue, "But the writer wasn't making a good faith effort to answer the question." Well, no-- that's kind of the point. Since the scoring program didn't--and can't--make a good faith effort to read the work, that quickly teaches students that making a good faith effort to write an answer is not the task at hand, that such a task is for suckers, that it is, literally, pointless. Advocates will also point to studies that show robo-graders awarding scores that match scores from human graders; this is accomplished by giving the barely-trained human scorers instructions that require them to score the work with the same stunted dopiness that the algorithm uses.
Edgenuity offered a statement to Paris in which it offered the defense that the algorithms should not "supplant" teacher grading, but "only to provide scoring guidance to teachers." And teachers can override the robo-grader, but then, what's the point of using the robograder in the first place, particularly if its "scoring guidance" is junk?
Robograding for anything beyond simple objective questions continues to be junk. It provides the wrong analysis and teaches the wrong lessons, while training students to placate the algorithm rather than grasp the content. No doubt the shiny over-promising marketing makes this kind of thing appealing to the people in school systems who do the purchasing, but they are one more ed tech product that over-promises and under-delivers. They are junk, but they're profitable junk. They're continued presence in schools is infuriating.
Go read Paris's entire piece. The look up some pictures of puppies or something more soothing.
[For more from this blog on the business of robograding, see here, here, here, here, or even here.]
I would encourage kids and parents to keep copies of the stuff they send in to these robo-scorer algorithms, and to share the howlers as well as the good stuff (if there is any).
ReplyDelete"It was between the Aegean and the Black seas, which made it a hub for boats of traders and passengers. It was also right between Europe and Asia Minor, which made it a huge hub for trade, and it was on many trade routes of the time. Profit Diversity Spain Gaul China India Africa."
ReplyDeleteIt, it, it, it, it . . . ? ? ? ? ?
Was the Edgenuity pronoun detector malfunctioning?