CURMUDGUCATION

Tuesday, March 11, 2014

Essay-Grading Software & Peripatetic Penguins

Education Week has just run an article by Caralee J. Adams announcing (again) the rise of essay-grading software. There are so many things wrong with this that I literally do not know where to begin, so I will use the device of subheadings to create the illusion of order and organization even though I promise none. But before I begin, I just want to mention the image of a plethora of peripatetic penguins using flamethrowers to attack an army of iron-clad gerbils. It's a striking image using big words that I may want later. Also, look at what nice long sentences I worked into this paragraph.

Look! Here's My First Subheading!

Speaking for the software will be Mr. Jeff Pence, who apparently teaches middle school English to 140 students. God bless you, Mr. Pence. He says that grading a set of essays may take him two weeks, and while that seems only a hair slow to me, I would certainly agree that nobody is taking 140 7th grade essays home to read overnight.

But Mr. Pence is fortunate to have the use of Pearson WriteToLearn, a product with the catchy slogan "Grade less. Teach more. Improve scores." Which is certainly a finely tuned set of catchy non-sequitors. Pearson's ad copy further says, "WriteToLearn—our web-based literacy tool—aligns with the Common Core State Standards by placing strong emphasis on the comprehension and analysis of information texts while building reading and writing skills across genres." So you know this is good stuff.

Pearson White Papers Are Cool!

Pearson actually released a white paper "Pearson's Automated Scoring of Writing, Speaking, and Mathematics" back in May of 2011 (authors were Lynn Streeter, Jared Bernstein, Peter Foltz, and Donald DeLand-- all PhD's except DeLand).

The paper wears its CCSS love on its sleeve, leading with an assertion that the CCSS "advocate that students be taught 21st century skills, using authentic tasks and assessments." Because what is more authentic than writing for an automated audience? The paper deals with everything from writing samples of constructed response answers (I skipped the math parts) and in all cases finds the computer better, faster, and cheaper than the humans.

Also, Webinar!

The Pearson website also includes a link to a webinar about formative assessment which heavily emphasizes the role of timely, specific feedback, followed by targeted instruction, in improving student writing. Then we move on to why automated assessment is good for all these things (in this portion we get to hear about the work of Peter Foltz and Jeff Pence, who is apparently Pearson's go-to guy for pitching this stuff). This leads to a demo week in Pence's class to show how this works, and much of this looks usable. Look-- the 6+1 traits are assessed. Specific feedback. Helps.

And we know it works because the students who have used the Pearson software get better scores on the Pearson assessment of writing!! Magical!! Awesome!! We have successfully taught the lab rats how to push down the lever and serve themselves pellets.

Wait! What? Not Miraculous??

"Critics," Adams notes drily, "contend the software doesn't do much more than count words and therefor can't replace human readers." They contend a great deal more, and you can read about their contending at the website humanreaders.org, and God bless the internet that is a real thing.

"Let's face the realities of automated essay scoring," says the site. "Computers cannot 'read'." They have plenty of research findings and literature to back them up, but they also have a snappy list of one-word reasons that automated assessors are inadequate. Computerized essay grading is:
trivial
reductive
inaccurate
undiagnostic
unfair
secretive

Unlike Pearson, the folks at this website do not have snappy ad copy and slick production values to back them up. They are forced to resort to research and facts and stuff, but their conclusion is pretty clear. Computer grading is indefensible.

There's History

Adams gets into the history. I'm going to summarize.

Computer grading has been around for about forty years, and yet somehow it never quite catches on.

Why do you suppose that is?

That Was A Rhetorical Question

Computer grading of essays is the very enshrinement of Bad Writing Instruction. Like most standardized writing assessment in which humans score the essays based on rubrics so basic and mindless that a computer really could do the same job, this form of assessment teaches students to do an activity that looks like writing, but is not.

Just as reading without comprehension or purpose becomes simply word calling, writing without purpose becomes simply making word marks on a piece of paper or a screen.

Authentic writing is about the writer communicating something that he has to say with an audience. It's about sharing something she wants to say with people she wants to say it to. Authentic writing is not writing created for the purpose of being assessed.

If I've told my students once, I've told them a hundred times--good writing starts with the right question. The right question is not "What can I write to satisfy this assignment?" The right question is "What do I want to say about this?"

Computer-assessed writing has no more place in the world of humans than computer-assessed kissing or computer-assessed singing or computer-assessed joke delivery. These are all performance tasks, and they all have one other thing in common-- if you need a computer to help you assess them, you have no business assessing them at all.

And There's The Sucking Thing

Adams wraps up from some quotes from Les Perelman, former director of the MIT Writing Across the Curriculum program. He wrote an awesome must-read take-down of standardized writing for Slate, in which, among other things, he characterized standardized test writing as a test of "the ability to bullshit on demand." He was also an outspoken critic of the SAT essay portion when it first appeared, noting that length, big wordiness, and a disregard for factual accuracy were the only requirements. And if you have any illusions about the world of human test essay scoring, reread this classic peek inside the industry.

His point about computer-assessed writing is simple. "My main concern is that it doesn't work." Perelman is the guy who coached two students to submit an absolutely execrable essay to the SAT. The essay included gem sentences such as:

American president Franklin Delenor Roosevelt advocated for civil unity despite the communist threat of success by quoting, "the only thing we need to fear is itself," which disdained competition as an alternative to cooperation for success.

That essay scored a five. So when Pearson et al tell you they've come up with a computer program that assesses essays just as well as a human, what they mean is "just as well as a human who is using a crappy set of standardized test essay assessment tools." In that regard, I believe they are probably correct.

To Conclude

Computer-assessed grading remains a faster, cheaper way to enshrine the same hallmarks of bad writing that standardized tests were already promoting. Just, you know, faster and cheaper, ergo better. The good news is that the system is easy to game. Recycle the prompt. Write lots and lots of words. Make some of them big. And use a variety of sentence lengths and patterns, although you should err on the side of really long sentences because those will convince the program that you have expressed a really complicated thought and not just I pledge allegiance to the flag of the United States of Estonia; therefor, a bicycle, because a vest has no plethora of sleeves. And now I will conclude by bring up the peripatetic penguins with flamethrowers again, to tie everything up. Am I a great writer, or what?

When Is Your Last Teaching Day of School?

Years ago, the Tax Foundation hit upon a great tool for illustrating how large our individual tax load is-- Tax Freedom Day. Starting from January 1, how many days would Americans (or residents of your state, if you break it down that way) have to work just to pay off taxes.

In a small piece of PR serendipity, Tax Freedom Day falls in April in most states. By some mid-April day, all Americans have earned enough money to pay off the income tax debt.

It makes me wonder, as we enter testing season, when our Last Teaching Day of School would be.

If all standardized testing came at the very end of the year, what would be our last day of school?

How would the pubic react if we handed out final report cards in April (or in some heavily-besieged elementary schools, March) and told parents, "Okay, the teaching year is over. But your child needs to come to school for the next 4/6/8/10 weeks just to take all their tests."

How quickly would the remaining public support for our massive testing status quo evaporate if the tests were no longer hidden and camouflaged among the teaching days of the real school year, but had to stand on their own? I'm guessing pretty quickly. Wherever you are, maybe it would help crystallize things for folks if you could tell them what the Last Teaching Day of the year would be.

Bad Threats

It's a lesson from Teacher Basics 101. Don't make a threat if you can't live with the consequences.

Do not tell your students that if they don't hand in the homework, you'll fail them for the year. Don't tell your students that everybody runs a four-minute mile or everybody's off the team. And never, ever tell an unruly class that if you hear "one more peep," everybody gets a detention.

The Masters of Reforming Our Nation's Schools have no teaching experience to their collective names, and so they've been breaking the Bad Threat rule with abandon, and children are paying the price.

There are two parts to a bad threat.

Part I is the expectation.

We've heard plenty about the "soft bigotry of low expectations." And Michael Gerson wasn't entirely wrong-- we have a history of all too often writing off students because of poverty or race or chaotic home life or not-so-brightness. Too often we really have held our most challenged students to no expectation at all.

But for the soft bigotry of low expectations, we have substituted the hard tyranny of ridiculous expectations. We have, for instance, substituted the expectation that every third grader will read at grade level no matter what. In some states (I'm looking at you, NY) we raised the standard for proficiency arbitrarily. And we have just generally pushed the idea that all students should be at grade level (as determined by anything from data averages to a politician's whim) all the time.

That seems like a swell expectation. It's not. It's stupid. Let's just apply that reasoning some more. Let's compute the average height for an eight-year-old and declare that all third graders must be that height. Let's require all children to be walking by their tenth month and potty trained by month thirteen. Let's require all seventeen-year-old males to be able to grow facial hair and all fifteen-year-old females to fill a B cup. And let's tell all young men and women that they must be engaged by age twenty-two.

Let's take every single human developmental milestone and set a point by which every human being must have achieved it. Because that is totally how human beings develop and learn and grow-- on exactly the same path, at exactly the same speed, at exactly the same time.

Then we get to Part II of a Bad Threat-- the "or else."

A bad "or else" creates a punishment that neither the Person In Charge or the person being punished can easily live with. If I give everybody in my class a detention because one kid said "peep" (and everyone laughed, because it is kind of funny, but now I look like a fool, so detentions for all of you rotten kids!!), my students have an hour of their lives wasted, and I have to sit in the principal's office and explain what they hell I was thinking.

But the "or else" under the current standards testing regime is far worse. We've said to students, "You are going to reach this level of development, or we'll flunk you, even if it's a ton of you." As noted by Carol Burris and Alan A. Aja, "ton" is an understatement. We've sold many communities on the idea that being subjected to the hard tyranny of ridiculous expectations is a good thing, and now the cost is becoming clear.

The achievement gap is widening. Students are falling below basic in staggering numbers (50% of third grade black students below basic on ELA tests, 84% ELL students below basic on ELA tests, and the list goes on).

The "promise of the common core" turns out to be nothing more than threatening students "You're going to pass this high stakes test or we're going to label you a failure, punish your teachers, and keep you from graduating." That's not the soft bigotry of low expectations, but the rather harsh bigotry of "Those damn lazy kids just aren't motivated enough. Threaten them." They don't need help, support, resources, economic relief, or anything else-- just threats.

The cost of this bad threat is more than the students should have to bear and certainly of no benefit to us as a society. And the test results recall one more lesson from Basic Teacher 101. If you have given a test to your class and a huge percentage of the students have failed it, it's a bad test.

Monday, March 10, 2014

Reclaiming Public Education 101

Today, I'm opening a branch blog office.

I've come to believe there's an unmet need in the edublogoverse (the unmet need is not the one for new made-up words). Most of us who frequent these spots have spent months or years sorting through the giant convoluted multi-threaded novel series that is reformy stuff, and what we write, while perfectly sensible to each other, may leave many other folks scratching their heads and feeling that they've stumbled into a private party where everyone speaks some odd form of Greek.

In here I could write, "Cami and King can just VAM-cram their reformy stuff into a slow boat to Estonia and let Arne steer the whole way while Rhee sings lullabies through her duct-taped lips," and most of you would actually understand what the heck I was talking about. The average human on the planet would not.

I am in a place where much of the new reformy stuff hasn't attracted many peoples' attention yet, but is poking at the edges of consciousness where it is perceived as This Year's Slightly Dumber Than Usual Mandates from the State. I would like to help them understand.

I occasionally write pieces aimed at that audience, but those quickly disappear into the pile of witty takedowns of bureaucratic nonsense written by people they don't know about things they haven't heard of (it's possible that I write too much and too fast).

Several months ago I created a tumblr of ed links just so I wouldn't lose stuff as I found it (it's right here) but I just got my tumblr 1000-post merit badge and the site looks like the online version of my grandfather's attic.

So I'm going to try this. It's called Reclaiming Public Education 101, and I believe in it enough to actually fork over the nominal wordpress domain fee so that it's easy to remember/find.

If you are reading my blog, RPE101 is probably not for you. It's for your friend who says things like, "So why does the Common Core make you spit and growl, exactly?" Or your other friend who says, "That can't be how that Value Added thing is really supposed to work." Or your coworker who says, "I tried to read that link you sent me, but it made my brain hurt hard."

It's set up to be quick and simple. Just an excerpt, an abstract, and a link for each post. A kind of gateway drug for edublog consumption. I will gladly take suggestions and ecstatically take referrals (I've already included Anthony Cody's classic 10 CCSS mistakes and Erin Osborne's invaluable new Gates $$ chart). I'll even take requests. I'll keep adding things as I find them or write them (and I'll post the ones I write here first, because I come from a long line of mild OCD sufferers).

I respect and admire so many people in this fight. The people who do the hard core scholarship, the people who get out in the streets and fight and holler, the people who work the halls of power. For a variety of reasons, those are not things I can really do. But I can write and tag and collect. I can pound a keyboard like a sumbitch. So this is my next contribution. Let me know if it's useful and how it can be more useful, and I'll see if we can make a helpful tool out of it.

Grit-- Not Just For Students!

New leaps forward have been made in grittology, the study of that elusive quality, the lack of which gives reformy leaders cause to castigate schoolchildren across the country.

Holly Yettick reports at EdWeek that University of Pennsylvania researchers Claire Robertson-Kraft and Angela Duckworth have published a study of grit as it applies to teachers and the hiring process. The study (the pdf of which is titled "truegrit.pdf," so kudos for the academic humor) opens with this statement of background/context:

Surprisingly little progress has been made in linking teacher effectiveness and retention to factors observable at the time of hire. The rigors of teaching, particularly in low income school districts, suggest the importance of personal qualities that have so far been difficult to measure objectively.

Was it possible, they wondered, to hire teachers who were actually going to be tough enough to stick it out on the job. In short, could we spot the teachers with grit?

Duckworth is the scientist for the job, having coined the term grit back in 2007 (presumably as it applies to education and not sandpaper). As a founding mother of grittology, Duckworth worked on a 2009 study that linked grit to effectiveness in novice teachers, but that study, says Yettick was limited because the subjects self-reported for grittiness (doesn't everybody want to think of themselves as gritty, and can we count on gritty people to be fully self-aware? being a scientist is hard).

So this time the intrepid grittologists looked to see if they could find a way to measure grit objectively. They looked at novice teachers' college activities and gave the teachers scores of 0 through 6 for aspects like years of participation or rising through the ranks of the groups to honored positions. In short, did they make commitments and stick with them?

This was correlated to retention (did the teacher stick around without quitting partway through the year) and to effectiveness (did the --uh-oh. hold on a second). Yeah, we were not quite so sure we could come up with a serious effectiveness measure other than some testing data. So there's that. Conclusion? "Grittier teachers outperformed their less gritty colleagues and were less likely to leave their classrooms mid-year."

There's not a lot of research on how to engrittify teachers, so researcher Matthew Kraft thinks it's better to hire teachers that come with their own grit and then help them if things get tough. "School contexts can support teachers to maximize their potential or undercut their efforts."

So what have we learned? If you hire people whose application shows that they joined things and stuck with them in college, they are more likely to stick with the job when you hire them. And if you need to make a common sense observation, turn it into number, and then turn it back into an observation, then maybe you shouldn't be involved in hiring people.

I'm just imagining a young man coming back from a date. His friends ask how it went. He tells them to wait, sits down, gets out a calculator and iPad and creates a spreadsheet of the quality and length of the kisses that he and his date shared, converts the observations to a digital data set, plugs the numbers into a formula, collates the data, and looks at the final numbers. He turns to his friends and says, "Well, according to this kissological data rating, our date went well. And this score indicates there's a high probability that we may kiss again soon."

I get that many schools' hiring practices are somewhere between "phone a friend" and "darts tossed blindly at wall." But if you need a data set to tell you how to take an impression of other carbon based life forms, education is probably not the field for you.

But then, there are probably a few other people who need to get that message beyond those doing the hiring.

Sunday, March 9, 2014

What We Don't Know About Normal

I only just heard about Joe Henrich and his groundbreaking work, but I plan to educate myself further. Henrich's work has implications for anyone who works with human beings and their brains. So, teachers.

Let me try to simplify Henrich's work and then I'll try to explain why we should care.

Normal Isn't

Henrich was doing anthropology grad school field work in Peru in 1995 when he decided to rerun a well-worn economics study game. In the Ultimatum Game, one person receives a chunk of money and then offers a chunk to a second person. If person #2 refuses the offer, both people lose all the money. Both players know the rules.

This is a classic in its field, one of many experiments that is used to argue for certain cognitive consistencies across all human beings, how it is normal to enforce certain standards of fair play. And when Henrich tried it with the residents of the remote village, it didn't turn out the way it was supposed to, at all.

Repeated attempts with various distant groups revealed that this experiment which had long been used to illustrate something basic and normal about all human beings in fact did no such thing. He got a MacArthur Grant and some Presidential recognition, but he couldn't get hired in anthropology. So he went to work at the University of British Columbia, where they split him between economics and psychology.

If you want to read more in detail about his career, here's a great article for that. In short, he and his colleagues have been unraveling a boatload of allegedly normal cognitive issues. For instance-- this optical illusion?

Yeah, it's not normal to see an optical illusion. Turns out that lots of folks have no trouble at all seeing two lines of equal length. Turns out that whether you grow up in houses or in fields affects how your brains manages visual information. This "illusion" is not universal-- it's cultural.

Henrich, working with Steven Heine and Ara Norenzaya, was realizing that spatial reasoning, categorization, moral reasoning, inferences about other people, boundaries between self and others-- these and other areas of cognition are hugely shaped by culture and experience. Our experience and culture has a huge impact on how our brains work.

On top of that, as they charted the range of human perception, they found one group of people consistently off in their own little corner. For that group they coined the term WEIRD-- Western, Educated, Industrialized, Rich, and Democratic.

What That Means To Teachers

One of the implications of Henrich's work are simple but profound-- an awful lot of previous research is crap.

This is not entirely a new insight. When scientists work with subjects of convenience, we get meaningless results. The education field is filed with studies that are useless (unless you really need some insights into how the minds of college sophomores at one particular university work). But Henrich's work forces us to look at just how unjustified many of our ideas about normal are.

We think science has been studying human beings, but it has really been studying human beings from our own culture.

So anybody starts telling you that X is a cognitive quality normal to all human beings, your bovine fecal matter detection software should be going off.

What That Means To Education Reform

To review: people who have grown up in different cultures end up with differently-wired brains.

Now. Do you know where you find people who have grown up in completely different cultures living side by side? Yes, that would be America. Folks have been saying, loudly, for a while now, that people living in poverty do not see the world the same way as the rest of us. Now we have research from a completely different angle confirming the same thing.

Chasing Finland and Estonia is a pointless activity because they are different cultures. They do not perceive the world the same way we do.

My position on standardization is pretty simple-- I think it's bunk. Standardization is a simple process by which we declare that X is Normal, and we then measure everything by that idea of Normal. Our educational status quo is a one-size-fits-all approach to both teaching and the measuring of that teaching. Everyone should learn the same thing the same way and then demonstrate their understanding in the same way. Because all of those same things are normal, right?

Wrong. There is no normal. And a nation founded on the notion of combining many cultures and backgrounds and worldviews should be the absolute last place that we insist everybody be Normal. Particularly when the research suggests that we are the least Normal peoples in the world.

Saturday, March 8, 2014

How Did CCSS Happen?

My wife is a smart person, and great and committed teacher. But she rarely reads this blog because mostly she says that it's over her head. She's a reminder to me that for every one of us who have been wading in this stuff for what feels like ages, there are many other concerned professionals who feel like they just walked in on the second season finale of Game of Thrones and aren't sure how to figure out what the hell is going on.

So I'm making a commitment to create and curate material for those folks. These occasional columns will be different from my usual stuff. Straightforward titles, clear explanations, basic materials for smart, interested people who are just coming in on the middle of this Nightmare on School Reform Street movie festival.

To keep myself honest, I'm going to imagine this first entry as a conversation between me and my wife. To all readers who are actually married to me, let me just say that this will be done with nothing but love and respect.

So where did Common Core come from anyway?

Well, back in 1983 with A Nation at Risk--

You promised this would not be like six seasons of How I Met Your Mother

Right. No Child Left Behind put school districts under a lot of pressure. We had to get a certain percentage of students to get above average scores on a standardized test every year. The above-average percentage ramped a little every year until 2009, when it ramped up like a sumbitch.

Same year George Bush was out of office, right?

Exactly. We were supposed to hit 100% this year, which meant that everybody was either going to be failing or lying. Schools were feeling highly motivated to do something else. It turned out that something else was already waiting in the wings. In 2009, the National Governor's Association and the Council of Chief State School Operators formed a committee to write standards. This whole process was pretty murky because A) it was done in secret and B) it involved people and groups that had already been working on this stuff for years. If you want to get into more detail, you can find it here and here. Most of the shadowy previous work was connected to a group named Achieve.

There were two groups that did the writing. Those 25 people included folks from the College Board, the ACT, and Achieve. There was also a feedback group of 35 people; 34 college profs and 1 classroom teacher. Some of those people quit in protest during the process.

They say these are so all students will be college and career ready. How do they know that?

At this point, nobody has seen a shred of the research or data that supports that. The Gates Foundation has paid the bills for most of the support for CCSS that you see, and Bill called this a "best guess" and that we would have to wait ten years to find out if it was right.

So why do they keep saying teachers worked on the Common Core?

As near as anyone can tell, some teachers were allowed to see drafts and provide comments. There's no shred of evidence to suggest that anybody paid any attention to what the teachers said. By the time they saw it, the work was already done.

And the states?

Yeah, they didn't have a leadership role here, either. You'll noticed people don't make the state-led, teacher-involved claim quite so much any more. Everybody who follows this stuff knows that it was federally pushed without the benefit of research or teacher input.

So if the states didn't really develop the Core, why did they adopt it?

You remember they were in a tight place, NCLB-wise. The Obama administration offered them a way out. Two, actually.

First, they could compete for free federal money by joining the Race to the Top. We didn't hear much about that in PA because it required a whole lot of people to sign off on the application, and in PA, they wouldn't. They wouldn't because there was a whole lot of mystery language in it. But if you wanted to compete you had to agree to do a couple of things:

1) You had to agree to collect a boatload of data.

2) You had to agree to being measured by beaucoups testing.

3) You had to agree to evaluate teachers by using testing data at least a little.

4) You had to adopt some college and career ready standards and pretend that you were helping develop them. (You can read about that whole business here.) This also meant in some cases that you were agreeing to them sight unseen.

Wasn't it only a few states that won Race to the Top? What about the rest?

It was only a few. But that No Child Left Behind kept squeezing, and it became obvious that nobody in Congress was going to rewrite the law. Even though everyone could see we were headed for a cliff, nobody wanted to touch the stupid thing. But the administration said states could get a waiver and be excused from NCLB 100% above average requirements.

I bet the list of requirements to get a waiver sounds familiar.

Good bet.

So can't states just rewrite it to suit themselves?

The Common Core State Standards are actually copyrighted. States aren't allowed to change a thing, and can only add 15%. Now, whether that would just void the warranty or invoke fines or lose federal money or put a sheriff on the statehouse steps I don't think anybody knows. But you can't mess with them.

Can anybody ask them to change the standards?

Nope. There's no toll-free service number, no appeal process, no feedback system, no nothing. I don't know about the math, but the guy who wrote the English standards has a completely different job at this point.

So if this really wasn't the states bringing a bunch of teachers together to develop standards that would make sense for everybody, then why did this happen? Why would anybody do this?

The short answer is money and power and who knows. The long answer is a piece of writing for another day.

Pages