Education Week has just run an article by Caralee J. Adams announcing (again) the rise of essay-grading software. There are so many things wrong with this that I literally do not know where to begin, so I will use the device of subheadings to create the illusion of order and organization even though I promise none. But before I begin, I just want to mention the image of a plethora of peripatetic penguins using flamethrowers to attack an army of iron-clad gerbils. It's a striking image using big words that I may want later. Also, look at what nice long sentences I worked into this paragraph.
Look! Here's My First Subheading!
Speaking for the software will be Mr. Jeff Pence, who apparently teaches middle school English to 140 students. God bless you, Mr. Pence. He says that grading a set of essays may take him two weeks, and while that seems only a hair slow to me, I would certainly agree that nobody is taking 140 7th grade essays home to read overnight.
But Mr. Pence is fortunate to have the use of Pearson WriteToLearn, a product with the catchy slogan "Grade less. Teach more. Improve scores." Which is certainly a finely tuned set of catchy non-sequitors. Pearson's ad copy further says, "WriteToLearn—our web-based literacy tool—aligns with the Common Core
State Standards by placing strong emphasis on the comprehension and
analysis of information texts while building reading and writing skills
across genres." So you know this is good stuff.
Pearson White Papers Are Cool!
Pearson actually released a white paper "Pearson's Automated Scoring of Writing, Speaking, and Mathematics" back in May of 2011 (authors were Lynn Streeter, Jared Bernstein, Peter Foltz, and Donald DeLand-- all PhD's except DeLand).
The paper wears its CCSS love on its sleeve, leading with an assertion that the CCSS "advocate that students be taught 21st century skills, using authentic tasks and assessments." Because what is more authentic than writing for an automated audience? The paper deals with everything from writing samples of constructed response answers (I skipped the math parts) and in all cases finds the computer better, faster, and cheaper than the humans.
The Pearson website also includes a link to a webinar about formative assessment which heavily emphasizes the role of timely, specific feedback, followed by targeted instruction, in improving student writing. Then we move on to why automated assessment is good for all these things (in this portion we get to hear about the work of Peter Foltz and Jeff Pence, who is apparently Pearson's go-to guy for pitching this stuff). This leads to a demo week in Pence's class to show how this works, and much of this looks usable. Look-- the 6+1 traits are assessed. Specific feedback. Helps.
And we know it works because the students who have used the Pearson software get better scores on the Pearson assessment of writing!! Magical!! Awesome!! We have successfully taught the lab rats how to push down the lever and serve themselves pellets.
Wait! What? Not Miraculous??
"Critics," Adams notes drily, "contend the software doesn't do much more than count words and therefor can't replace human readers." They contend a great deal more, and you can read about their contending at the website humanreaders.org, and God bless the internet that is a real thing.
"Let's face the realities of automated essay scoring," says the site. "Computers cannot 'read'." They have plenty of research findings and literature to back them up, but they also have a snappy list of one-word reasons that automated assessors are inadequate. Computerized essay grading is:
Unlike Pearson, the folks at this website do not have snappy ad copy and slick production values to back them up. They are forced to resort to research and facts and stuff, but their conclusion is pretty clear. Computer grading is indefensible.
Adams gets into the history. I'm going to summarize.
Computer grading has been around for about forty years, and yet somehow it never quite catches on.
Why do you suppose that is?
That Was A Rhetorical Question
Computer grading of essays is the very enshrinement of Bad Writing Instruction. Like most standardized writing assessment in which humans score the essays based on rubrics so basic and mindless that a computer really could do the same job, this form of assessment teaches students to do an activity that looks like writing, but is not.
Just as reading without comprehension or purpose becomes simply word calling, writing without purpose becomes simply making word marks on a piece of paper or a screen.
Authentic writing is about the writer communicating something that he has to say with an audience. It's about sharing something she wants to say with people she wants to say it to. Authentic writing is not writing created for the purpose of being assessed.
If I've told my students once, I've told them a hundred times--good writing starts with the right question. The right question is not "What can I write to satisfy this assignment?" The right question is "What do I want to say about this?"
Computer-assessed writing has no more place in the world of humans than computer-assessed kissing or computer-assessed singing or computer-assessed joke delivery. These are all performance tasks, and they all have one other thing in common-- if you need a computer to help you assess them, you have no business assessing them at all.
And There's The Sucking Thing
Adams wraps up from some quotes from Les Perelman, former director of the MIT Writing Across the Curriculum program. He wrote an awesome must-read take-down of standardized writing for Slate, in which, among other things, he characterized standardized test writing as a test of "the ability to bullshit on demand." He was also an outspoken critic of the SAT essay portion when it first appeared, noting that length, big wordiness, and a disregard for factual accuracy were the only requirements. And if you have any illusions about the world of human test essay scoring, reread this classic peek inside the industry.
His point about computer-assessed writing is simple. "My main concern is that it doesn't work." Perelman is the guy who coached two students to submit an absolutely execrable essay to the SAT. The essay included gem sentences such as:
American president Franklin Delenor Roosevelt advocated for civil unity despite the communist threat of success by quoting, "the only thing we need to fear is itself," which disdained competition as an alternative to cooperation for success.
That essay scored a five. So when Pearson et al tell you they've come up with a computer program that assesses essays just as well as a human, what they mean is "just as well as a human who is using a crappy set of standardized test essay assessment tools." In that regard, I believe they are probably correct.
Computer-assessed grading remains a faster, cheaper way to enshrine the same hallmarks of bad writing that standardized tests were already promoting. Just, you know, faster and cheaper, ergo better. The good news is that the system is easy to game. Recycle the prompt. Write lots and lots of words. Make some of them big. And use a variety of sentence lengths and patterns, although you should err on the side of really long sentences because those will convince the program that you have expressed a really complicated thought and not just I pledge allegiance to the flag of the United States of Estonia; therefor, a bicycle, because a vest has no plethora of sleeves. And now I will conclude by bring up the peripatetic penguins with flamethrowers again, to tie everything up. Am I a great writer, or what?