Les Perelman is one of my heroes. For years he has poked holes in the junk science that is computer-graded writing, bringing some sanity and clarity to a field clogged with silly puffery.
We are all indebted to Fred Klonsky for publishing an exchange between Perelman (retired Director of Writing Across the Curriculum at MIT) and Jack Smith, the Maryland State Superintendent of Schools. Maryland is one of the places where the PARCC test now uses computer grading for a portion of the test results. This is a bad idea, although Smith has no idea why. I'm going to touch on some highlights here in hopes of enticing you to head on over and read the whole thing.
The exchange begins with a letter from Smith responding to Perelman's concerns. It seems entirely possible that Smith created the letter by cutting and pasting from PARCC PR materials.
In response to a question about how many tests will be computer scored, Smith notes that "PARCC was built to be a digital assessment, integrating the latest technology in order to drive better, smarter feedback for teachers, parents and students. Automated scoring drives effective and efficient scoring" which means faster and cheaper. Also, more consistent. No word on whether the feedback will actually be good or useful (spoiler alert: no), but at least it will be fast and cheap.
In responding to other points, Smith repeats the marketing claim that computer scoring has proven to be as accurate as human scoring, that there's a whole report of "proof of concept," and that report includes three whole pages of end notes, so there's your research basis.
Perelman's restraint in responding to all this baloney is, as always, admirable. Here are the points of his response to Smith, filtered through my own personal lack of restraint.
1) Maryland's long use of computers to grade short answers is not germane. Short answer scoring just means scanning for key words, while scoring an entire essay requires reading for qualities that computers are as yet incapable of identifying. Read about Perelman's great work with his gibberish-generating program BABEL.
2) Studies have shown that computers grade as well as humans-- as long as you are comparing the computer scoring to then work of humans who have been trained to grade essays like a machine. Perelman observes that real research would compare the computer's work to the work of expert readers, not the $12/hour temps that Pearson and PARCC use.
3) The research is bunk. The three pages of references are not outside references, but mostly the product of the same vendor that is trying to sell the computer grading system.
4) Perelman argues that no major test is using computers to grade writing. I'm not really sure how much longer that argument is going to hold up.
5) The software can be gamed. BABEL (a software program that creates gibberish essays that receive high scores from scoring software) is a fine example, but students need not be that ambitious. Write many words, long sentences, and use as many big words as you can think of. I can report that this actually works with the $12/hour scorers who are trained to score like machines as well. For years, my department achieved mid-nineties proficiency on the state writing test, and we did it by teaching students to fill up the page, write neatly, and use big words (we liked "plethora" a lot). We also taught them that this was lousy writing, but it would make the state happy. Computer scoring works just as well, and can be just as easily gamed. If one of your school's goals is to teach students about going through ridiculous motions in order to satisfy clueless bureaucrats, then I guess this is worthwhile. If you want to teach them to write well, it's all a huge waste of time.
6) There's evidence that the software has built-in cultural bias, which is unsurprising because all software reflects the biases of its writers.
It remains to be seen if any of these arguments penetrate State Superintendent Smith's consciousness. I suppose the good news is that it's relatively easy to teach students to game the system. The bad news, of course, is that system is built on a foundation of baloney and junk science.
It angers me because teaching writing is a personal passion, and this sort of junk undermines it tremendously. It pretends that good writing can be reduced to a simple algorithm, and it does it for the basest of reasons. After all, we already know how to properly assess writing-- you hire a bunch of professionals. Pennsylvania used to do that, but then they sub-contracted the whole business out to a company that wanted a cheaper process.
And that's the thing. The use of computer assessment for writing is not about better writing or better feedback-- the software is incapable of providing anything but the most superficial feedback. The use of computer assessment for writing is about getting the job done cheaply and without the problems that come with hiring meat widgets. This is education reform at its absolute worst-- let's do a lousy job so that we can save a buck.
I live in the once great state of MD....Howard County specifically, which is home of the high property values, high test scores and best gaming of the BS test. I am here to tell you that the students won't be able to game the system by using big words and writing big sentences using the big words. NO MORE SPELLING OR VOCABULARY is in our curriculum, so no more learning big words or how/when to use them. 4th grade was the last time that my middle schooler had any kind of vocabulary/spelling. It's a shame!
ReplyDeleteAnd here we thought we might be getting a superintendent upgrade this time around in my district, where Smith is poised to assume that mantle before next school year. *sigh*
ReplyDeletehttps://www.washingtonpost.com/local/education/new-superintendent-hopes-to-improve-high-performing-school-system/2016/06/27/e20e0020-3865-11e6-9ccd-d6005beac8b3_story.html
Delete