CURMUDGUCATION

Thursday, February 4, 2021

More Absurd Learning Loss Data

Pandemic education has featured a great deal of chicken littling about "learning loss" that is largely ridiculous. Test manufacturers and folks in test-adjacent edu-biz endeavors are selling the picture of students as buckets and education as water, and now the education is just leaking out at a rate so alarming that it's kind of amazing that people have not turned into drooling couch potatoes once they've been out of school for a decade.

NWEA's report from last April is oft-cited, along with its sexily simply assertion that students will learn days and months of learning. That's baloney, and what it really means is that the folks at NWEA are guessing that test scores will probably go down X points. And if that isn't enough evidence this was a nothingburger, note that in November NWEA announced that their guesstimate had turned out to be over-stated.

But the learning loss drama continues, with Hechinger Report this week emailing links to some research from Amplify (motto: "Trying to make a buck off education since 2000), launched by Rupert Murdoch, later sold to Joel Klein after the company failed to make a killing with their flawed school tablet program. Those guys.

Anyway, they want you to know that covid has damaged the DIBELS scores.

DIBELS (it should come with a little registered trademark sign, but I don't have that capability) is a special test in which the littles are asked to read a bunch of nonsense. Seriously--they are give small clumps of letters that are meaningless gibberish and then "sound them out" and pronounce them out loud. This is supposed to measure their skill with phonetics and not, say, their ability to comply with apparently senseless requests from adults.

It is exactly the kind of performance task that would only ever be practiced, measured, or expected in a school setting, so the fact that scores dipped a bit between Fall 2019 and Fall 2020 is no surprise. Parents were probably not practicing nonsense syllable pronunciation often enough at home.

At the end of the report, in large, bold font, we're that Amplify's mCLASS "can provide you with valid, reliable data on your students." Never let a crisis go to waste. Ka-ching.

Wednesday, February 3, 2021

Third Grade Reading Retention: Still A Bad Idea

A former colleague of mine, a math teacher, used to say, "Don't even teach primary students math. Just teach them to read. By the time they get to me, if they can read well, I can teach them all the math you want."

That was many years ago, long before the rise of third grade reading retention laws. But those laws, while (usually) well-intentioned, are a terrible, rotten, no good, really bad idea.

Sixteen states require third graders to pass a standardized reading test in order to be promoted to fourth grade. Most of them allow for some exceptions, but not all, and not all the time. It's long been conventional wisdom that fourth grade represents a new style of learning--that primary students learn to read, but once they hit fourth grade, they read to learn. But why turn that into a testing barrier.

Matt Barnum, the Chalkbeat reporter who occasionally decides to track down sources of conventional wisdom, traces the big push to some research from 2011. Authored by Donald J. Hernandez (Hunter College and the Graduate Center, City University of New York, and Foundation for Child Development) on behalf of the Annie E. Casey Foundation, "Double Jeopardy" ties third grade reading proficiency (more or less as defined by NAEP) to high school graduation as well as tying both to poverty.

Without getting into too detail, the report finds that students who are not reading proficiently in third grade are more likely not to graduate, students who are poor for at least a year are less likely to graduate, and students who are both are even less likely to end up with a diploma. Black and Hispanic students who lagged in third grade reading skills were also less likely to graduate.

Hernandez notes that research had been suggesting a connection for quite a while (he offers an endnote to research from 1978). But his report (combined in some cases with NAEP results from 2015) seems to be the one that goosed many states along.

There are nits to be picked. Barnum noticed that the report says "3rd grade reading scores" when it is really lumping second, third and fourth grade scores together. When NAEP says "proficient," they mean "roughly A level" and not "on grade level." And let's not even get into the actual validity of these tests right now.

Mostly, the big fat log-sized nit here is that Hernandez has identified correlations, not causations. Research might well show that third grade shoe size is a good predictor of adult height, but it does not follow that making third graders wear bigger shoes, or making them stay in third grade until their feet are big enough, will lead individual students to grow taller, nor raise the average height of adults.

And while it is traditional to shift fourth graders into "read to learn" mode, there does not seem to be any body of research that suggests that some developmental door slams shut when students are eight years old.

Nor do the various discussions of third grade reading retention mention one other reason states might find erecting this artificial barrier useful; third grade marks the start of years of Big Standardized Testing, and letting bad readers rise through the grades is bad for the numbers. On the other hand, if the poor-testing third graders aren't allowed to take the fourth grade test, the state is going to look like it is accomplishing miracles with the teaching of reading.

There are so many things wrong with these laws. Note that Hernandez identifies three factors, but nobody has suggested that students be retained in third grade until they are less poor or less Black. Of course not--because that would be absurd and abusive.

But these laws, like so much of the modern ed reform canon, are predicated on the theory that students and teachers are holding out on us, and only threats will shake loose those teaching and reading abilities that the little jerks are withholding. It is, once again, all stick and no carrot. Imagine if, instead, states had passed laws that said students who failed the BS Reading Test will be passed on to Grade 4+Reading Enrichment, and the state will pay tire hire one additional teacher for every six students who have to enter that program. But that would cost money, and it wouldn't put blame anywhere.

Meanwhile, there is a mountain of research showing that retaining students is at best unhelpful and at worst, closely correlated with bad outcomes for students, including research from one of the most hard-core retention states--Florida. There's no real evidence that it actually works.

And there are side effects. See, again, Florida, where the law became a victim of Florida's demands for testing compliance and third graders who had demonstrated reading competence in other ways, but who refused to take the BS Test, were held back. I know of no research showing a connection between high school graduation and knuckling under to state testocratic demands.

These tests are going to be yet another battleground in this pandemic year, because a law is a law, even if it's a bad law and the year is a terrible year for standardized testing. If your state has one of these, take a moment to let your elected representatives that it's a bad law and in this year of all years, eight year olds don't need to be pummeled with anything else. Let them just learn to read.

Update: How Pandemic School Is Going In One Rural Area

I've been reporting periodically on how pandemic school is going in my own rural/small town county. Since nobody is doing any kind of systematic large-scale tracking of which schools are doing what and how it's going, I'm just throwing one more batch of data into the general big-city-dominated noise (here's the most recent post on the subject, from the beginning of November).

Numbers have continued to climb here, though deaths are still relatively low. The four local districts moved to save winter sports season, so basketball and even wrestling have been going on scholastically. In Pennsylvania we experienced a tightening of the rules around the holidays, but now they're loosened and restaurants are operated at limited capacity again.

A couple of local school districts (there are four) mostly decided to go back to full face-to-face at the beginning of the second semester last week, while the others remained hybrid and anticipated going full face to face in a week or two. That plan lasted a couple of days, until it turned out that a cafeteria worker at my old high school had tested positive for covid. That triggered a two day shut down, but while that shutdown was happening, more positive cases turned up and now the school will be remote for a couple of weeks, then hybrid, then maybe full on again in March. That seems to have given some other districts pause about going back full on.

The in and out, on and off is taxing; we have reached the point where a change in plan no longer makes it into the newspaper. It also raises the question of how many students are being sent to school with some suspicion of illness (it's safe to assume that's a non-zero number).

Preparation and protocols are an ongoing issue. When heading back to full face-to-face, parents at one district had questions about ventilation. "Windows may be cracked open on the bus and in classrooms," was the response. If a student is out for testing for covid, it's unclear what the chain of notification actually is. Who should be told, and who will do the telling? Nobody really seems to know.

Monday tested snow day response; it wasn't great. One district was already in distance learning mode. One district simply did an old fashioned snow day and cancelled everything. Another switched from hybrid to distance--they announced they were cancelling transportation--but required teachers to do their distance learning from the school building, no exceptions (triggering a big last-minute scramble for child care). School leaders are giving the impression that they are not so much planning as waiting to see what happens and then coming up with a reaction; that may be a result of poor planning or poor communication, but either way, after 11 months of this, it seems as if folks in leadership positions would be a bit further along in their learning curve.

Local teacher unions have not been particularly vocal one way or another, but then, local unions are as divided as the general population on what best to do (at least one area teacher attended the insurrection rally in DC on January 6). As with all issues, ever, how well or easily the administration works with the staff during this crisis has lots to do with the reserves of trust that the administration has either built or squandered. Our school boards are mostly filled with regular folks and not politicians, trying to do the best they can with the same bunch of hard-to-parse information that everyone else is looking at.

There's a general sense in the community that the end is in sight. People are finally able to sign up for vaccination (if they are old or unhealthy--teachers still aren't to the front of the line). Organizations are planning to conduct events this year that were cancelled last year. It's unclear how much of this is just wishful thinking. There's still not a great deal of urgency around the pandemic here, though people are mostly following social distancing and masking guidelines. Folks are mostly trying to do the right thing, but there's a lot of disagreement about what that is.

It's hard to overstate how much an area like this has been hurt by the deliberate lack of state and national leadership over the last year. And the bad thing is that even if leadership emerges now, it will not be effective because the previous leadership vacuum was filled with all manner of stuff that will not be easily driven out.

But overall, we're mostly doing kind of okay here in that most of what we've got is chaos and disorder rather than a lot of disease and death. I'll let you know if that stays true.

Monday, February 1, 2021

Donors Choose Monday: Give This Book

Ms. Evans-Klopp is at Andrew T. Morrow in Central Islip, NY in a high-poverty neighborhood and trying to work through open-yet-restricted building work, and she'd like a class set of Give This Book aTitle by Jarrett Lerner to give her third graders some excitement and a little creative break.

You can help with even a small donation. And as always, I encourage you to lend a hand to someone, somewhere. I do these posts to make it easy, but if you know of a local need, now is the ntime to step up and help. Every little bit helps.

Sunday, January 31, 2021

ICYMI: Worse Week Than I Thought Edition (1/31)

You know, I thought last week was pretty okay until I looked at the pieces I had collected. So maybe you don't want to read every single item on the list this week. But do stick around for the palate cleanser at the end.

Jeff Bezos wants to go to the moon. Then, public education.

From Dominik Dresel at EdSurge, a piece that will not warm your heart or lift your spirits.

2nd Grader expelled for telling another girl she had a crush on her

While we're not lifting your spirits--from CNN. Just in case you need one more example of how that nice Christian private school doesn't have to take--or keep--any kid they don't want to.

Unions just got a rare bit of good news.

If you thought the Janus case, which illegalized fair share payments and allowed teachers to be free riders on their unions work--well, if you thought that was the end of it, you underestimated how much some people hate unions. The next wave of suits is asking the court to make unions pay back all the fair share money they ever collected. SCOTUS announced this week that it will not hear at least the first block of such cases. Fully explained at Vox.

LA Virtual School's Whopper Course Sizes, with a Side of Edgenuity

Let's start a quick tour of some states by starting down south with the indispensable Mercedes Schneider, who reports on how virtual school is working out in Louisiana.

Norfolk remains deeply segregated

The Virginian-Pilot begins its long look at the city that was the site of the first federally funded public housing, the first to be released from federally mandated bussing. They have some issues, and this series, produced with support from the Education Writers Association Reporting Fellowship program, looks to be a long haul.

ASD Light

Against all sense, somebody in Tennessee thinks that maybe a do-over on the failed Achievement School District concept might work. Andy Spears has the story at Tennessee Ed Report.

All the World's A Stage

TC Weber has a variety of news items from TN, including an item that suggests TNTP is getting ready to teach everyone literacy stuff.

Ohio: Funding Doesn't Matter

The state auditor has decided that funding schools doesn't really do anything. Jan Resseger begs to differ, and brings some receipts.

Will North Carolina continue to whitewash history for its students?

North Carolina was on a path toward acknowledging some systemic problems. Then they elected a new state superintendent.

Will SB48 make educating your child more difficult than finding a covid vaccine?

Florida is set to take one more giant bite out of its public education system. I wrote about this bill, but Accountabaloney is one the scene and has a clear picture of what's going on. And everyone needs to pay attention, because Florida is using the same playbook that other states crib from.

The school choice movement reckons with its conservative ties

The splintering of choice's right and left wings has been a story for a while, but when the Philly PBS station notices, you know something's going on. Avl Wolfman-Arent reports for WHYY.

Teacher Comments on Being Tech Skeptics

Larry Cuban has collected some real comments from real teachers about the value of ed tech.

Is there really a science of reading?

At the Answer Sheet, David Reinking, Victoria J. Risko, and George G. Hruby stop by to explain in calm, measured tones why the whole "science of reading" thing is not the cure-all it's promoted to be.

More states seek federal waivers

Also the Answer Sheet, Valerie Strauss reports that more and more states are asking for what is so obviously the right thing to do-- scrap the 2021 Big Standardized Test.

Marketplace mentality toward schools hurts society

The Baptist Standard, of all places, has an interview with Jack Schneider and Jennifer Berkshire about Wolf at the Schoolhouse Door (do you have your copy yet? get it today!) and how the market approach to education is bad for everyone.

Trump conspiracists in the classroom

Buzzfeed, of all places, takes a look at the problem of teachers who have fallen down the Trump/Qanon hole. Politics in the classroom are one thing; lies and debunked conspiracies are another order of trouble.

Meet the Vermont Teacher behind Bernie's Mittens

Just in case you haven't met her already. I've got to leave you with something encouraging.

Saturday, January 30, 2021

Nationwide Initiative Underway To Dismantle Public Education

Well, happy School Choice Week. What better way to celebrate than getting a bunch of states to ram through bills to gut (or in some cases further gut) public education.

The preferred method seems to be ESA-style vouchers (ESA used to stand for "education savings account," but sometimes "education scholarship account"). In an ESA/Tax Credit Scholarship program, rich benefactors give money to a "scholarship" organization, which in turn hands the money over to criterion-meeting parents who then hand it over to a private edu-vendor. Meanwhile, the state reimburses the benefactor in the form of tax credits. Ed disruptors generally prefer that you not call these "vouchers," and they have half a point, since school vouchers have generally been used strictly as tuition to a private school. ESAs, on the other hand, are meant to be more versatile, allowing parents to buy any sort of educational service from a variety of vendors. Usually these programs are capped, because remember--the amount of money that goes into tax credit scholarship programs is the same amount of money that is cut from state budget revenues.

The right likes this for a couple of reasons. Rich folks escape taxes. Public schools get cut back--in fact, the whole idea of "school" gets undercut, which is nice for vendors because now if you want to score some of those sweet taxpayer dollars, you don't have to set up an entire school--just tutoring or maybe just a math class or who knows (because what most of these laws have in common is that nobody is really providing much oversight on how the money is spent). ESAs open up a nice clear path for state funding of private religious schools (even as Espinoza v. Montana Department of Revenue also clears the road for more taxpayer-funded Jesus). And if gutting public schools also weakens those nasty teachers unions, that would be a bonus, too.

The dream is a free market bazaar, where parents go searching for the bits and pieces of the program they want to put together for their children. Of course, they'll be given no guarantees that the vendors have to accept their children, and they will have to wade through marketing noise to find those bits and pieces, but the state will be able to say "We gave you a voucher. It's your problem, now." Also, schools that wanted to provide religious indoctrination or teach that the world is flat or that slaves were really quite happy would not have to deal with silly gummint busybodies telling them they can't have public tax dollars to do that. The wealthy and elite will still get what they want for their children, but they won't have to worry about so many of their tax dollars being spent on Those People's Children.

So lets check in across the nation.

Florida

We've already looked at Florida just a week or two ago, where the end game is in sight, courtesy of a bill that would combine all their voucher programs, reduce the practically nonexistent oversight already exerted on voucher schools, and turn them into tax scholarship-fed ESAs. Florida. as always, is the dream for privatizers and anti-public school forces.

Arizona

SB 1041 would quadruple the cap on tax credit scholarships over the next three years (from $5 mill to $20 mill), meaning that much more money would be cut from the state budget as the wealthy make their contributions to the neo-voucher program. The bilkl also includes provisions for regular increases in the cap after that.

Georgia

HB 60 would create ESAs. It follows the usual pattern of using students with special needs or from low-income families as the foot in the door, but adds a cool new wrinkle--anybody who's school building isn't open for 100% in person instruction can also have one of these vouchers. The bill starts out with a modest 8.432 student cap, then just keeps adding another 8,000 or students every year thereafter, eventually approaching almost half a billion dollars diverted from public schools to private ones.

Georgia already has voucher programs, and it has been rife with fraud and financial shenanigans, along with minimal-to-nonexistent oversight of the schools involved.

Indiana

HB 1005 establishes ESA accounts, as well as expanding the program. This is the standard approach--open a choice program by creating something to "rescue" poor and/or disabled students, then once you've created the program, expand all the limits on it. Indiana is right on track. For instance, Republicans would like to raise the income limits on eligibility so that a family of four with a six-figure income will still be eligible to send their children to private schools at public expense. In Indiana, that means a religious school; about 97% of Indiana voucher schools are Christian religious schools.

We should also note that Indiana wants to increase tax support for virtual schools so that they get the same money per student as bricks and mortar schools--even though virtual schools have neither bricks nor mortars to maintain. As a resident of Pennsylvania, where cyber-schools get that full payment (and generally perform super-poorly), I can tell you that the financial impact on public schools is brutal. This is a dumb idea, Indiana.

Iowa

Iowa just fast-tracked Senate Study Bill 1065, courtesy of Governor Kim Reynolds, to create the "Students First Scholarship Program." I could sum it up, but let's hear what the editorial board at the Des Moines Register has to say:

Enter more proposals for “school choice.”

That phrase, as everyone knows by now, is an attempt to put lipstick on the pig of siphoning taxpayer money from public schools to funnel to private schools. And, largely, to Christian schools. Also, largely, to help families that already have the resources to send their kids to private schools.

The Iowa State Auditor said that "parents should be alarmed" that the program comes with no requirements for any audit whatsoever, and therefor no oversight. That's a feature, not a bug, of course; the idea is not just to give private schools public dollars, but to make sure they are free to use them as they choose.

One should note, however, that Iowa has also generated noteworthy push-back, in the form of spirited defense of the bill by the Iowa Satanic School:

The Iowa Satanic School recognizes the considerable efforts of the Iowa GOP to move forward with School Choice for our state. This will give Iowa families the choice to seek educational opportunities outside of public schools, using their taxpayer-funded student first scholarship to make Iowa’s FIRST Satanic School a reality.

Kansas

Speaking of taxpayer-funded discrimination, Kansas this week was scheduled to hold hearings on two bills set to expand the tax credit scholarship program in the state. The Kansas School Boards Association came out in opposition to HB 2068 and SB 61, which are apparently aimed at expanding Kansas vouchers to include private schools without any corresponding accountability or oversight. So private schools could go ahead and discriminate based on religion, gender, or sexual orientation and still get taxpayer funding (just like Florida--the dream).

Kansas is also looking at HB 2119, which will create the "student empowerment act" aka ESAs for students who are academically at risk. Which, when you think about it, is a novel way to frame them. "Son, Mommy and Daddy would like to get a voucher to send you to Aryan Academy High School, but to do it, we're going to need you to start doing really crappy school work right now."

Kentucky

HB 149 would create an Education Opportunity Account, ESA tax credit program that would allow people to contribute to scholarships (that can be spent on a wide variety of education-flavored items and services) while creating a like-sized hole in state budget revenue, eventually draining billions of dollars from the state coffers. The pass-through organization managing the money might be audited if the department determines there have been shenanigans, but how one spots the shenanigans without an audit is not clear. Is this all starting to sound repetitious?

Minnesota

SF 260 (and HF 308) would set up a tax credit scholarship program. This one's called the Equity and Opportunity in Education." They're going with the low-income (twice the free and reduced lunch cutoff seems to be the standard) and disability foot-in-the-door approach.

Missouri

SB 55 is a whole bundle of anti-public ed joy. This one started out as a simple bill about allowing homeschooled students to play school sports. But it has attracted a number of education barnacles. Among the bills that are now part of SB 55 is one that allows charters to open in any city or county with a population over 30,000 (that's another 61 school districts), and a school board recall rule-- if 25% of the voters who voted in the election sign a petition,. there can be a recall election which--wait! what? So any election that a school board member doesn't win with at least 75% of the vote, there could be a recall election. That's nuts.

Oh, and also--surprise--a provision to create the Missouri Empowerment Scholarship Accounts Program, a program that would allow tax credit scholarships to fund an ESA program.

New Hampshire

HB 20 would establish the Richard "Dick" Hinch education freedom account programs. Richard Hinch was the staunchly conservative speaker of the house who died of covid on January 1, 2021.

And, yes, here we go again. This time they're called Education Freedom Accounts. Tax credit scholarships. Used for any number of education-flavored stuff. There is a legislative oversight committee, but that appears to be about doing checks on how well the program is working and not on how the money is actually being spent.

This one is unusual in that it does not specify which students are eligible nor offer any caps on how much the tax credit program can take (that's important, remember, because the size of the tax credit cap is the size of the hole it's going to blow in your state budget).

And that's all the state I'm aware of at the moment, all busily and rapidly ramming through creation or expansion of ESA-style neo-voucher programs that are pretty much the state-level version of the Education Freedom Scholarships that Betsy DeVos was pitching throughout her tenure in DC. Plenty of states are working on the issue and in various stages of chipping away at public education, but this honor roll notes those states that have made it a priority for the start of 2021.

Thursday, January 28, 2021

Selling Roboscoring: How's That Going, Anyway?

The quest continues--how to best market the notion of having student essays scored by software instead of actual humans. It's a big, bold dream, a dream of world in which test manufacturers don't have to hire pesky, expensive meat widgets and the fuzzy unscientific world of writing can be reduced to hard numbers--numbers that we know are objective and true because, hey, they came form a computer. The problem, as I have noted many times elsewhere, is that after all these years, the software doesn't actually work.

But the dream doesn't die. Here's a paper from Mark D. Shermis (University of Houston--Clear Lake) and Susan Lottridge (AIR) presented at a National Council of Measurement in Education in Toronto, courtesy of AIR Assessment (one more company that deals in robo-scoring). The paper is two years old, but it's worth a look because it shows the reasoning (or lack thereof) used by the folks who just can't let go of the robograding dream.

"Communicating to the Public About Machine Scoring: What Works, What Doesn't" is all about managing the PR when you implement roboscoring. Let's take a look.

Warming Up

First, let's lay out the objections that people raise, categorized by a 2003 paper as humanistic, defensive and construct.

The humanistic objection stipulates that writing is a unique human skill and cannot be evaluated by machine scoring algorithms. Defensive objections deal with concerns about “bad faith” or off-topic essays and scoring algorithm vulnerabilities to them. The construct argument suggests that what the human rater is evaluating is substantially different than what machine scoring algorithms used to predict scores for the text.

Well, sort of. The defensive objection is mis-stated; it's not just that robograding is "vulnerable" to bad-faith or off-topic essays, but that those vulnerabilities show that the software is bad at its job.

The paper looks at six case studies, three in which the implementation went well and three in which the implementation "was blocked or substantially hindered." Which is a neat rhetorical trick-- note that the three failed cases are not "cases where implementers screwed it up" but cases where those nasty obstructionists got in our way. Long-time ed reform watchers will recognize this--it's always the implementation or the opponents or "entrenched bureaucracies" but never, ever, "we did a lousy job."

Let's look at our six (or so) cases.

West Virginia

West Virginia has been at this a while, having first implemented roboscoring in 2005. They took a brief break from 2015-2017, then signed up with AIR. Their success is attributed to continuing professional development and connecting "formative writing assignments to summative writing assignments" via a program called Writing Roadmap that combined an electronic portfolio and the same robograder used for the Big Test.

In other words, they developed a test prep system linked directly to the Big Test, and they trained teachers to be part of the teach-to-the-test system. This arrangement allows for a nice, circular piece of pedagogical tautology. Research shows that students who go through the test prep system get better scores on the test. It looks like something is happening, but imagine that the final test score was based on how dependably students put the adjective "angry" in front of the noun "badger." You could spend years teaching them to always write "angry badger" and then watch their "angry badger" test scores go up, then crow about how effective your prep program is and how well students are now doing-- but what have you actually accomplished, and what have they actually learned. You can make your test more complicated than the angry badger test, but it will always be something on that order because--and I cannot stress this enough--computer software cannot "read" in any meaningful sense of the word, so it must always look for patterns and superficial elements of the writing.

When AIR came to town in 2018, they did some teacher PD (1.5 whole days!) which was apparently aimed at increasing teacher confidence in the roboscorers by doing the old "match the computer score to human scores" show, a popular roboscore sales trick that rests on a couple of bits. One is training humans to score with the same algorithm the computer uses (instead of vice versa, which is impossible). The other is just the statistical oddity of holistic scoring. If an essay is going to get a score form 1-5, the chances that a second score will be equal or adjacent are huge, particularly since essays that are sucky (1) or awesome (5) will be rarer outliers.

Even so, in the 1.5 day training, most teachers didn't meet the 70% exact agreement rate for scores, "suggesting that the training did not result in industry-standard performance." Industry, indeed.

Louisiana

The LEAP implemented robo-scoring as a second reader for the on-line version of the test. The advantages of this are listed include "flagging and recalibrating readers" in case their training didn't quite stick. Also, it can help with rater drift and rater bias, and I have my doubts about its usefulness here, but I will agree that these are real things when you are ploughing through a large pile of essays.

Since 2016 that has been handled by DRC, with their own proprietary robo-scorer now the primary scorer, with some humans "monitoring reads" for a portion of the essays. For a while yet another "engine" was scoring open-ended items on the tests because "increasing student use of the program translated into significant hand-scoring costs." The paper doesn't really get into why Louisiana's program was "successful" or how communication, if any, with any part of the public was accomplished.

Utah

Utah has been robo-scoring since 2008, both "summative and formative," a phrase to watch for, since it usually indicates an embedded test prep program--robo-scoiring works better if you train students to comply with the software's algorithm. Utah went robotic for the usual three reasons-- save money, consistent scores, faster return of scores.

Utah's transition had its bumps. In particular, the paper notes that students began gaming the system and writing bad-faith essays (one student submitted an entire page of "b"s and got a good score). Students also learned that they could write one good paragraph, then write it four or five times. The solution was to keep "teaching" the program to spot issues, and to implement a confidence rating which allowed the software to say, "You might want to have a human look at this one." There were also huge flaps over robo-scorers finding (or not) large chunks of copied text, which led to changes in filters and more PD for teachers.

Utah has had a history of difficulty with test manufacturers for the Big Standardized Test--since this paper was issued, yet another company crashed and burned and ended up getting fired over their inability to get the job done, resulting in a lawsuit.

Ohio

The authors call Ohio an "interesting" case, and I suppose it is if you are the kind of person who goes to the race track to watch car crashes. Ohio had a "modestly successful" pilot, but didn't brief the State School Board on much of anything before the first year of robo-scoring flunked a way-huger-than-previous number of students--including third graders, which is a problem since Ohio has one of those stupid third grade reading retention rules. Turns out the robo-scorer rejected the time-honored tradition of starting the essay response with a restatement of the prompt. Oopsies. It's almost as if the robo-scoring folks didn't even try to consult with or talk to actual teachers. Ohio has been trying to PR its way out of this.

Alberta

Yes, the Canadian province. They started out in 2014 with LightSide, the only "non-commercial product" in the crowded field, though its "main drawback is that it employs a variety of empirical predictors that do not necessarily parallel traditional writing traits." That tasty little phrase leads to this observation:

This makes it difficult to explain to lay individuals how writing models work and what exactly differentiates one model from another. Most commercial vendors employ NLP routines to tease out characteristics that lay audiences can relate to (e.g., grammar errors), though this information does not necessarily correspond to significantly better prediction models (Shermis, 2018).

So, it doesn't use qualities related to actual writing traits? And these robo-scorers are tweaked to kick out things like grammar errors, not because they're using them to score, but because it's something the civilians can relate to? That... does not make me feel better about robo-scoring.

However, the Alberta Teacher's Federation called on Dr. Les Perelman, my personal hero and a friend of the institute, who came on up and made some pointed observations (They also insist on spelling his name incorrectly). The authors do not like those observations. They say his comments "reflect a lack of understanding of how large-scale empirical research is conducted," which musty have made his years as a professor at MIT pretty tough. They also claim that he fell into the classic correlation-causation fallacy when he pointed out the correlation between essay length and score. The authors reply "The correlation is not with word count...but rather the underlying trait of fluency" and right here I'm going to call bullshit, because no it isn't. They launch into an explanation of how important fluency is, which is a nice piece of misdirection because, yes, fluency is important in writing but no, there's no way for a hunk of software to measure it effectively.

The authors also take issue with Perelman's point that the system can be gamed. Their response is basically, yeah, but. "Gaming is something that could theoretically happen in the same way that your car could theoretically blow up." As always, robo-score fans miss the point. If the lock on my front door is easy to pick, yeah, I might go months at a time, maybe years, and never have my house broken into. But if the lock is easy to pick, it's a bad lock. Furthermore, the game-ability is a symptom of the fact that robo-scoring fundamentally changes the task, from writing as a means of communicating an idea to other human beings into, instead, performing for an automated audience that will neither comprehend nor understand what you are doing. (Also, the car analogy is dumb, because it's theoretically almost impossible that my car will blow up.)

Robo-scoring is bad faith reading; why should students feel any moral or ethical need to make a good faith effort at writing? Why? Because a bunch of adults want them to perform so they can be judged?

At any rate, Alberta has decided maybe not on the robo-grading front.

Australia

Oh, Australia. "Early work on machine scoring was successful in Australia, but the testing agency was outmaneuvered by a Teacher’s Federation with an agenda to oppose machine scoring." Specifically, the teachers called in the dastardly Dr. Perelman, whose conclusions were pretty rough including "It would be extremely foolish and possibly damaging to student learning to institute machine grading of the NAPLAN essay, including dual grading by a machine and a human marker." Perelman also noted that even without the robo-scoring out of the NAPLAN writing assessment, "It's by far the most absurd and the least valid of any test that I've seen."

The teachers named Perelman a Champion of Public Education. The writers of this paper have, I suspect, entirely different names for him. But in railing against him, they reveal how much they simply don't get.

For example, he suggested that because the essays were scored by machine algorithms students would not have a legitimate audience for which to write, as if students couldn’t imagine a particular audience in a persuasive essay task. There was no empirical evidence that this was a problem.

[Sound of my hand slapping my forehead.] Is an imaginary audience supposed to be a legitimate one? Is the point here that we just have to get the little buggers to write so we can go ahead and score it? Perhaps, because...

He rightly suggested that computers could not assess creativity, poetry, or irony, or the artistic use of writing. But again, if he had actually looked at the writing tasks given students on the ACARA prompts (or any standardized writing prompt), they do not ask for these aspects of writing—most are simply communication tasks.

Great jumping horny toads! Mother of God! Has any testocrat ever explained so plainly that what the Test wants is just bad-to-mediocre writing? Yeah, all that artsy fartsy figurative language and voice and, you know, "artistic" stuff--who cares? Just slap down some basic facts to "communicate," because lord knows communication is just simple artless fact-spewing. These are people who should not be allowed within a hundred feet of any writing assessment, robo-scored or otherwise.

And in attempting to once again defend against the "bad faith" critique, the paper cites actual research which, again, misses the point.

Shermis, Burstein, Elliot, Miel, & Foltz (2015) examined the literature on so-called “bad faith” essays and concluded that it is possible for a good writer to create a bad essay that gets a good score, but a bad writer cannot produce such an artifact. That is, an MIT technical writing professor can write a bad essay that gets a good score, but a typical 9th grader does not. The extensiveness of bad faith essays is like voter fraud—there are some people that are convinced it exists in great numbers, but there is little evidence to show for it.

First, I'm not sure how that research finding, even if accurate, absolves robo-grading of its badness (I find the finding hard to believe, actually, but I'm not looking at the actual paper, so I can't judge the research done). The idea that the good scores only go to bad essays by good writers is more than a little weird, as if bad writers can't possibly learn how to game the system (pretty sure it wouldn't take a good writer to write the page of "b"s in the earlier example). Is the argument here that false positives only go to students who deserved positives anyway?

And again, the notion that it doesn't happen often so it doesn't matter is just dumb. First of all, how do you know it happens rarely? Second, it makes bad faith writing the backbone of the system, and beating it the backbone of bad writing instruction. The point of the writing becomes satisfying the algorithm, rather than expressing a thought. We figured out how to beat PA's system years ago-- write neatly, write a lot (even if it's redundant drivel), throw in some big words (I always like "plethora"). Being able to satisfy an algorithm is not not not NOT the same thing as writing well.

In the end, the writers dismiss Perelman's critique as a "hack job" that slowed down the advance of bad assessment in the land down under.

Florida

Florida was using humans and software, but the humans had greater weight. If they disagree, another human would adjudicate. One might ask what the point of having the robo-scorer at all, but this is Florida and one assumes that the goal is to phase the human out.

Recommendations

The paper ends with some recommendations about how to implement a robo-scoring plan (it does not, nor does the rest of the paper, offer any arguments about why it's a good idea to do so). In general they suggest starting with humans and phasing in computers. Teacher perception also matters. They offer some phases.

Phase 1. Start with a research study endorsed by a technical advisory committee. So, some practice work, showing how the robo-scorer does with validity (training) papers, as well as seeing how it does with responses that repeat text, copy the pledge, copy themselves, gibberish, off-topic essays, etc. The state "could involve teachers as appropriate" and that phrase should be the end of it all, because if you are developing an assessment program to assess writing and teacher aren't involved from Day One and every day thereafter as a major voice in the program development, then your program should not see the light of day. Robo-scoring underlines how badly much ed tech completely silences teachers and replaces them with software designers and tech folks. If an IT guy from your local tech shop stopped in and said, "I would like to take over the teaching of writing in your class," you wouldn't seriously consider the offer for even a second. And yet, that is exactly what these robo-score companies propose.

Phase 2. Design initial scoring plan. Use the research to plan stuff. Because it wouldn't be ed tech if we weren't designing the program based on what the tech can do, and not on what actually needs to be done. Nowhere in the robo-scorer PR world will you find an actual discussion of what constitutes good writing--a really tough topic that professional English teacher humans have a tough time with, but which robo-scorers seem to assume is settled, and settled in ways that can be measured by a computer, even though software doesn't--and I still can't stress this enough-- actually read in any meaningful sense of the word.

Phase 3. Design a communication plan. And only now, in phase three, will we develop a plan for convincing administrators and teachers that this is all a great idea. Use "rationale" and "evidence." There are six steps to this, some of which involve non-existent items like "a description of how essay scoring maps to achievement," but at number six, in the last item on the last list in the third phase, we arrive at "An opportunity and method for teachers to ask questions." Nothing here about what to do if the questions are on the order of, "Would you like to bite me?"

Phase 4. Propose a pilot. Get some school or district to be a guinea pig.

Phase 5. Implementation. If it works in that pilot, deploy it on the state level. With communication, and lots of PD, centered on how to use the scoring algorithm and "improve learning" aka "raise test scores."

Phase 6. Review and revise. Debrief and get feedback.

Blerg

Robo-scoring has all the bad qualities of ed tech combined with all the qualities of bad writing instruction, courtesy of the usual problem of turning edu-amateurs loose to--well, I was going to say "solve a problem" but robo-scoring is really a solution in search of a problem. Or maybe the problem is that some folks can't get comfortable with how squishy writing instruction is by nature. Creating robo-scoring software is as hopeless an endeavor as creating a program to rank all the great writers, or using a computer to set the canon for English literature. The best you can hope for is software that quickly and efficiently applies the biases of the programmer who wrote it, filtered through the tiny little lens of what a computer can actually do.

And while two years ago they were working on the general PR problem of robo-graders, now ed techies are salivating at the golden opportunities they see in pandemic-fueled distance learning, meaning that these things are running around loose more than ever. And it is everywhere, unfortunately, useful for basic proofreading and word counting and basically any kind of writing check that doesn't involve actual reading. Here's hoping this weed isn't in your own garden of education.

Pages